-
Notifications
You must be signed in to change notification settings - Fork 32
Feature: Configurable word characters
As of v0.0.22, you can configure the characters that check-spelling handles.
Previously, check-spelling would only look at /[A-Za-z']/
and generally with a minimum run length of 3.
Certain escaped characters are converted to decoded characters first. (e.g. '
and '
)
Support for similarly html encoded entities isn't currently supported.
To support Spanish, this needs to be extended to allow some accent characters and ñ.
extra_dictionaries:
cspell:es_ES/src/hunspell/index.dic
ignore-pattern: "[^'a-záéíóúñçüA-ZÁÉÍÓÚÑÇÜ]"
upper-pattern: '[A-ZÁÉÍÓÚÑÇÜ]'
lower-pattern: '[a-záéíóúñçü]'
not-lower-pattern: '[^a-záéíóúñçü]'
not-upper-or-lower-pattern: '[^A-ZÁÉÍÓÚÑÇÜa-záéíóúñçü]'
punctuation-pattern: "'"
- Ll, Lm, Lt, Lu
Perl Unicode: General Category
[\p{Ll}\p{Lm}\p{Lt}\p{Lu}]
The general configuration is:
ignore-pattern: '[^\p{Ll}\p{Lm}\p{Lt}\p{Lu}]'
upper-pattern: '[\p{Lu}\p{Lt}\p{Lm}]'
lower-pattern: '[\p{Ll}\p{Lm}]'
not-lower-pattern: '[^\p{Ll}\p{Lm}]'
not-upper-or-lower-pattern: '[^\p{Lu}\p{Lt}\p{Lm}]'
punctuation-pattern: "'"
With some selection from available dictionaries:
extra_dictionaries:
cspell:ar/src/ayaspell/ar.dic
cspell:bg_BG/bg_BG.dic
cspell:ca/ca.dic
cspell:cs_CZ/Czech.dic
cspell:da_DK/da_DK.dic
cspell:de_CH/src/hunspell/index.dic
cspell:de_DE/src/German_de_DE.dic
cspell:de_DE/src/hunspell/index.dic
cspell:el/src/hunspell/el-GR.dic
cspell:en_GB/src/aoo-mozilla-en-dict/en-GB.dic
cspell:en_GB/src/hunspell/en_GB.dic
cspell:en_US/src/aoo-mozilla-en-dict/en_US.dic
cspell:en_US/src/hunspell/en_US.dic
cspell:eo/eo.dic
cspell:es_ES/src/hunspell/index.dic
cspell:et-EE/src/index.dic
cspell:fa_IR/hunspell/fa-IR.dic
cspell:fr_FR/src/hunspell-french-dictionaries-v7.0/fr-classique.dic
cspell:fr_FR/src/hunspell-french-dictionaries-v7.0/fr-reforme1990.dic
cspell:fr_FR/src/hunspell-french-dictionaries-v7.0/fr-toutesvariantes.dic
cspell:fr_FR_90/src/hunspell-french-dictionaries-v7.0/fr-classique.dic
cspell:fr_FR_90/src/hunspell-french-dictionaries-v7.0/fr-reforme1990.dic
cspell:fr_FR_90/src/hunspell-french-dictionaries-v7.0/fr-toutesvariantes.dic
cspell:he/hunspell/he.dic
cspell:hr_HR/src/hr_HR.dic
cspell:it_IT/it_IT.dic
cspell:lt_LT/lt_LT.dic
cspell:nb_NO/src/nb.dic
cspell:nl_NL/src/hunspell/index.dic
cspell:pl_PL/pl_pl.dic
cspell:pt_BR/src/hunspell/index.dic
cspell:pt_PT/Portuguese-European.dic
cspell:ru_RU/src/Russian.dic
cspell:ru_RU/src/hunspell/index.dic
cspell:ru_RU/src/ru_ru.dic
cspell:ru_RU/src/russian-aot.dic
cspell:sl_SI/src/sl_SI.dic
cspell:sv/src/hunspell/index.dic
cspell:sv/src/ooo-swedish-dict-2-42/dictionaries/sv_FI.dic
cspell:sv/src/ooo-swedish-dict-2-42/dictionaries/sv_SE.dic
cspell:sv/src/open-office-2008/Swedish.dic
cspell:tr_TR/Turkish.dic
cspell:uk_UA/uk_ua.dic
cspell:vi_VN/vi.dic
In order for this to work reasonably well, support for hunspell .dic
and .aff
files has been added (in v0.0.22).
Right now, characters that fall outside the recognized set are effectively blanked (replaced with a non-word character, currently =
). I might switch to only parsing characters that match the regex. That'd save me a pass.