Skip to content

Tags: zhhao1/sacrebleu

Tags

v2.1.0

Toggle v2.1.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Upgrade mecab version (mjpost#196)

v2.0.1

Toggle v2.0.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Bugfix in using max_ngram_order (mjpost#174)

* bugfix: corpus_score() was ignoring self.max_ngram_order (fixes mjpost#173)
* added test case for max_ngram_order
* simplified pytest build

list

Toggle list's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Bugfix in using max_ngram_order (mjpost#174)

* bugfix: corpus_score() was ignoring self.max_ngram_order (fixes mjpost#173)
* added test case for max_ngram_order
* simplified pytest build

v2.0.0

Toggle v2.0.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Merge changes for 2.0.0 (mjpost#152)

  - Build: Add Windows and OS X testing to github workflow
  - Improve documentation and type annotations.
  - Drop `Python < 3.6` support and migrate to f-strings.
  - Drop input type manipulation through `isinstance` checks. If the user does not obey
    to the expected annotations, exceptions will be raised. Robustness attempts lead to
    confusions and obfuscated score errors in the past (fixes mjpost#121)
  - Use colored strings in tabular outputs (multi-system evaluation mode) through
    the help of `colorama` package.
  - tokenizers: Add caching to tokenizers which seem to speed up things a bit.
  - `intl` tokenizer: Use `regex` module. Speed goes from ~4 seconds to ~0.6 seconds
    for a particular test set evaluation. (fixes mjpost#46)
  - Signature: Formatting changed (mostly to remove ' ' separator as it was
    interfering with chrF  ). The field separator is now '|' and key values
    are separated with ':' rather than '.'.
  - Metrics: Scale all metrics into the [0, 100] range (fixes mjpost#140)
  - BLEU: In case of no n-gram matches at all, skip smoothing and return 0.0 BLEU (fixes mjpost#141).
  - BLEU: allow modifying max_ngram_order (fixes mjpost#156)
  - CHRF: Added multi-reference support, verified the scores against chrF  .py, added test case.
  - CHRF: Added chrF  support through `word_order` argument. Added test cases against chrF  .py.
    Exposed it through the CLI (--chrf-word-order) (fixes mjpost#124)
  - CHRF: Add possibility to disable effective order smoothing (pass --chrf-eps-smoothing).
    This way, the scores obtained are exactly the same as chrF  , Moses and NLTK implementations.
    We keep the effective ordering as the default for compatibility, since this only
    affects sentence-level scoring with very short sentences. (fixes mjpost#144)
  - CLI: Allow modifying TER arguments through CLI. We still keep the TERCOM defaults.
  - CLI: Prefix metric-specific arguments with --chrf and --ter. To maintain compatibility, BLEU argument names are kept the same.
  - CLI: Added `--format/-f` flag. The single-system output mode is now `json` by default.
    If you want to keep the old text format persistently, you can export `SACREBLEU_FORMAT=text` into your
    shell.
  - CLI: sacreBLEU now supports evaluating multiple systems for a given test set
    in an efficient way. Through the use of `tabulate` package, the results are
    nicely rendered into a plain text table, LaTeX, HTML or RST (cf. --format/-f argument).
    The systems can be either given as a list of plain text files to `-i/--input` or
    as a tab-separated single stream redirected into `STDIN`. In the former case,
    the basenames of the files will be automatically used as system names.
  - Statistical tests: sacreBLEU now supports confidence interval estimation
    through bootstrap resampling for single-system evaluation (`--confidence` flag)
    as well as paired bootstrap resampling (`--paired-bs`) and paired approximate
    randomization tests (`--paired-ar`) when evaluating multiple systems (fixes mjpost#40 and fixes mjpost#78).

v1.5.1

Toggle v1.5.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
dataset: fix mTEDx hashes (mjpost#145)

Fix the md5 sums for the newly added mTEDx test/valid sets

v1.5.0

Toggle v1.5.0's commit message
Updated credits

v1.4.14

Toggle v1.4.14's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
minor doc updates (mjpost#117)

v1.4.13

Toggle v1.4.13's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Added WMT20 newstest (mjpost#109)

* Added WMT20 newstest (mjpost#103)

* updated CHANGELOG and README

Co-authored-by: Ozan Caglayan <[email protected]>

v1.4.12

Toggle v1.4.12's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
add tokenizers and metrics packages to PyPI (mjpost#97)

* add tokenizers and metrics packages to PyPI
* bump version to 1.4.12
* Updated CHANGELOG

v1.4.11

Toggle v1.4.11's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Refactoring & Fixes (mjpost#88)

* Added Multi30k multimodal MT test set metadata
* Refactored all tokenizers into respective classes (fixes mjpost#85)
* Refactored all metrics into respective classes
* Moved utility functions into utils.py
* Implemented signatures using BLEUSignature and CHRFSignature classes, expose `Signature().info`
* metrics: Signature().info is now exposed (fixes mjpost#75)
* Simplified checking of Chinese characters (fixes mjpost#5)
* Unified common regexp tokenization codes for tokenizers (fixes mjpost#27)
* Fixed --detail failing when no test sets are provided
* Fixed multi-reference BLEU failing when tab-delimited reference stream is used
* Removed lowercase option for ChrF which was not functional (mjpost#85)
* Simplified ChrF and used the same I/O logic as BLEU to allow for future
   multi-reference reading
* Added score regression tests for chrF using reference chrF   implementation
* Added multi-reference & tokenizer & signature tests
* Pin mecab version to 0.996.5 as the newer ones are incompatible (fixes mjpost#94)
* bump version to 1.4.11