The Multilingual Terminology in Humanitarian Language Exchange
(abbreviation: “HXLTM”) is an HXLated valid tabular format (stricly, a
documented subset of HXL) by
HXL-CPLP with strong focus to store community
contributed translations and glossaries while maximizing portability for
implementers (like via use of hxltmcli
to export to
Excel, XLIFF,
TMX,
TBX,
UTX, JSON, CSV,
and more!
Why HXLTM was created?
HXLTM is…
I’m in the middle of something urgent! What to do?!
Okay, If it is your first time here, and you’re like a software developer volunteer in the middle of a disaster response (or like someone in a Hackathon) you either will need to:
As localization implementer:
use existing HXLTM public datasets by exporting to a format your software undestand. The ones done by HXL-CPLP and Etica.AI are on public domain dedication.
To convert from HXLTM to several other data formats, like TMX, TBX, UTX, XLIFF, CSV, Excel, etc, etc, etc check the dedicated page https://hdp.etica.ai/hxltm/archivum/
As translator collaborator: be yourself a translator manager (even if a single person team) and either
Simple case: "fork" someone else work (e.g. copy Hapi spreadsheets <https://hapi.etica.ai/>) and do everything from your version; you’re welcome to donate back your work!
Complex, lot’s of texts: export a bilingual XLIFF file from HXLTM to a target language, then use an collaborative tool like https://www.matecat.com/ (check this MateCat tutorial).
Protip A: 2 (two) native languaguage speakers (one as translator, plus other to review) is the perfect case to be accepted by others immediately.
Protip B: The MateCat site, in special for translators, do not need an account; you can just share the private link to one or more collaborators.
"If it’s a good idea, go ahead and do it. It is much easier to apologize than it is to get permission." — Grace Hopper
See Preface: Why HXLTM was created?(Again) Why HXLTM was created?
Quickstart
Installation
hxltmcli
uses Python 3. While is possible to just copy the
hxltmcli
file and install manually dependencies, like the
HXLStandard/libhxl-python,
you can install with the
hxltm-eticaai.
# hxltmcli is installed with the hxltm-eticaai, no extras required.
# @see https://pypi.org/project/hxltm-eticaai/
pip install hxltm-eticaai
hxltmcli --help
HXLTM: supported file types
Check the dedicated page: Quickstart documentation of HXLTM - Multilingual Terminology in Humanitarian Language Exchange |
Preface
Why HXLTM was created?
HXLTM is, by design, focused on storage and exchange of multilingual content while keeping minimum compatibility with average localization file formats used by software (TL;DR: they are mono or bilingual).
Unless we talk with a format like TBX (TermBase eXchange), and ignoring the unsung hero UTX (Universal Terminology eXchange), the current standards to exchange multilingual content are almost 20 years stuck on the past without any update (check TMX and the sad fact that LISA was declared insolvent). It’s a long history. And the problem is not if you don’t like XML-like formats instead of JSON, because even the XML-like standards, as of 2021, still have a poor feature set. The thing are so bad, SO BAD, that people use gettext .po or bilingual JSON to handle multilingual translations/glossaries.
It’s a long history, but not just because HXL-CPLP, as a community user group, would want to promote HXL (and, by the way, HXLTM is the format used on the HXL-CPLP/Auxilium-Humanitarium-API if you want to collaborate 😀), but even if you someone else want to coordinate volunteers to handle translations without always have to enforce a single source translation language (like English, as is mostly used for software localization), there is simply no standard (and we mean eve paid services) to let you do it.
Do not attribute to malice that which is adequately explained by lack of how other languages work and lack of tooling done by previous developers.
On this topic, one easily observable fact is that actually most maintainers of software primarily written in English are not even native English speakers. But the lack of knowledge of how other natural languages work can play a role in initial decisions: it is also common, when asked for feedback, native speakers of other languages (that care about the issue) don’t know what do propose and the ones who speak up (often new to the software development) may tell that this is not necessary.
@TODO this section still a draft. Needs to be improved later. |
The need for optional different source language for translations
While hxltmcli
, even early versions, already can be used as
non-Humanitarian general propose to exchange translations stored
(or entirely managed) "with Excel" or "Google Sheets" one of what would
become initial goals of HXL-CPLP is a data format and minimal software
implementation that could allow receiving collaborative suggestions
from people who don’t know whatever would be the initial working language.
If you look at the context of the HXL-CPLP/Auxilium-Humanitarium-API by initial working language we don’t mean English. Good source material could in fact be created by experts (let’s say, medical field, on a country like Brazil, on a subject like the Covid pandemic) and in special outside of software development, these experts, when do know more than their native language (as is Portuguese on Brazil) they could do know Spanish, or French, or Italian, but not English.
Presentations
Bootstrapping-HXLTM
eng-Latn: Bootstrapping technical translations and multilingual controlled vocabularies with HXLTM
por-Latn: Como criar do zero traduções técnicas e vocabulários controlados multilíngues com HXLTM
Ontologia
Most features feasible to not hardcode (id est, require you to hack the python code) on initial reference implementations use an ontology file as single source of truth. |
While the exporting features (like normam.TMX.formatum.initiale ,
normam.TMX.formatum.corporeum and normam.TMX.formatum.finale )
allow great customization only by fine tunning your ontology file,
(TIP: the syntax is based on Liquid,
https://shopify.github.io/liquid/) importing data from other formats
like convert TMX, TBX or XLIFF) may require change not only the
ontology file but also the Python code.
|
- Ontology, YAML
- Ontology, JSON
# @ARCHIVUM cor.hxltm.215.yml
# @LICENTIAM Dominium publicum
# @DESCRIPTIONEM HXL Trānslātiōnem Memoriam (HXLTM) cor cōnfigūrātiōnem.
# Auxilium de HXLTM: https://hxltm.etica.ai/
__ontologia_cor_versionem__: v0.9.2rc3 EticaAI voluntārium-commūne
### Trivia:
# - "HXL"
# - https://hxlstandard.org/
# - "Trānslātiōnem Memoriam"
# - https://www.wikidata.org/wiki/Q333761
# - "Terminologia Multilinguae"
# - https://github.com/EticaAI/HXL-Data-Science-file-formats#HXLTM
# - "HXL Trānslātiōnem Memoriam" (priore)
# - https://github.com/EticaAI/HXL-Data-Science-file-formats#HXLTM
# - "cor"
# - https://en.wiktionary.org/wiki/cor#Latin
# - "cōnfigūrātiōnem"
# - https://en.wiktionary.org/wiki/configuratio#Latin
# - "ontologia"
# - https://la.wikipedia.org/wiki/Ontologia
# - "glōssārium"
# - https://en.wiktionary.org/wiki/glossarium#Latin
# - "archīvum"
# - https://en.wiktionary.org/wiki/archivum
# - "typum"
# - https://en.wiktionary.org/wiki/typus#Latin
# - "fōrmātum"
# - https://en.wiktionary.org/wiki/formatus#Latin
# - "fontem"
# - https://en.wiktionary.org/wiki/fons#Latin
# - "extēnsiōnem"
# - https://en.wiktionary.org/wiki/extensio#Latin
# - "commūne"
# - https://en.wiktionary.org/wiki/communis#Latin
# - "alternātīvum"
# - https://en.wiktionary.org/wiki/alternativus#Latin
# - "versiōnem"
# - https://en.wiktionary.org/wiki/versio#Latin
# - referēns archīvum cōnfigūrātiōnem
## Referēns archīvum cōnfigūrātiōnem
# ego.hxltm.yml >> venditorem.hxltm.yml > cor.hxltm.yml
#
# - I: ego.hxltm.yml
# - Exemplum: https://hdp.etica.ai/hxlm/data/exemplum/ego.hxltm.yml
# - II: venditorem.hxltm.yml
# - Exemplum: https://hdp.etica.ai/hxlm/data/exemplum/venditorem.hxltm.yml
# - III: cor.hxltm.yml
# - commūne referēns: https://hdp.etica.ai/ontologia/cor.hxltm.yml
# tag::archivum[]
fontem_archivum_extensionem:
.tm.hxl.csv: HXLTM
.xliff.hxl.csv: CSV-HXL-XLIFF
.asa.hxltm.json: HXLTM-ASA-JSON
.asa.hxltm.yml: HXLTM-ASA-YAML
.hxl.csv: HXLTM
# .csv: CSV-3
.csv: HXLTM
.json: JSON-kv
.tmx: TMX
.utx: UTX
.tab: TSV-3
.tsv: TSV-3
.tbx: TBX-Basic
.xlf: XLIFF
.xlf2: XLIFF # XLIFF2
.xliff: XLIFF # XLIFF2
fontem_archivum_extensionem_regex:
"[a-z]{2}.csv": CSV-3 # Not implemented yet
# end::archivum[]
# tag::normam[]
# Trivia: normam, https://en.wiktionary.org/wiki/norma#Latin
normam:
#### Ad Hoc template ________________________________________________________
# tag::normam_Ad-Hoc[]
Ad-Hoc:
__meta:
# archivum_extensionem:
archivum:
extensionem:
descriptionem: |
_[eng-Latn]
Ad Hoc template
[eng-Latn]_
normam:
- <https://github.com/HXL-CPLP/Auxilium-Humanitarium-API/wiki/Index>
nomen:
eng-Latn: 'Ad Hoc template'
situs_interretialis:
referens_officinale:
- <https://github.com/HXL-CPLP/Auxilium-Humanitarium-API/wiki/Index>
asa:
modus_operandi: []
formatum:
initiale: false
corporeum: false
finale: False
# end::normam_Ad-Hoc[]
#### CSV-3: Source Target Comment (draft) _______________________________
# tag::normam_CSV-3[]
CSV-3:
__meta:
# archivum_extensionem: .csv
archivum:
extensionem: .csv
descriptionem: |
_[eng-Latn]
The hxltm "CSV-3" export format is a somewhat basic (but at worst case
accpted by several tools) with the following column order:
> "source-language","target-language",comment
> "verbum", "كلمة","Arabic translationem de latin verbum"
Some references of tools that allow conversions using this format:
- Okapi Framework
- <https://okapiframework.org/wiki/index.php/Table_Filter>
- MateCat
- <https://site.matecat.com/support/managing-language-resources/add-glossary/>
[eng-Latn]_
normam:
- <https://datatracker.ietf.org/doc/html/rfc4180>
nomen:
eng-Latn: 'CSV 3 bilingual Source Objective Comment'
situs_interretialis:
referens_officinale:
- <https://datatracker.ietf.org/doc/html/rfc4180>
vicipaedia:
- <https://en.wikipedia.org/wiki/Comma-separated_values>
# Trivia:
# - ASA
# - (HXLTM) Abstractum Syntaxim Arborem
asa:
# Trivia: modus operandī, https://en.wiktionary.org/wiki/modus_operandi#Latin
modus_operandi:
# - multiplum_linguam
- bilingue
formatum:
initiale: |-
{{ globum.fontem_linguam.bcp47 | default: 'la' | quotum_rem }},{{ globum.objectivum_linguam.bcp47 | default: 'ar' | quotum_rem }},commentarium
corporeum: |-
{{ rem.de_fontem_linguam.rem | quotum_rem }},{{ rem.de_objectivum_linguam.rem | quotum_rem }},""
finale: False
# end::normam_CSV-3[]
#### HXLated bilingual CSV ( up to 5 source alt) for XLIFF___________________
# tag::normam_CSV-HXL-XLIFF[]
# @DEPRECATED: maybe eventually remove this file format. It may still work,
# but already is not fully documented.
CSV-HXL-XLIFF:
__meta:
archivum_extensionem: .xliff.hxl.csv
# archivum_extensionem: .{fontem-linguam}--{objectivum-linguam}.xliff.hxl.csv
normam:
- <https://hdp.etica.ai/hxltm/archivum/>
nomen:
eng-Latn: 'HXLated bilingual CSV ( up to 5 source alt) for XLIFF'
situs_interretialis:
referens_officinale: []
exportandum_hxl_sortem:
- '#x_xliff unit id'
- '#x_xliff source' # ... __linguam__ aut __linguam_de_imperium__
- '#x_xliff target' # ... __linguam__ aut __linguam_de_imperium__
- '#x_xliff segment state' # ... __linguam__ aut __linguam_de_imperium__
- '#x_xliff unit note note_category__sourcestatus'
# ... __linguam__ aut __linguam_de_imperium__
- '#x_xliff unit note note_category__termtype'
- '#x_xliff group group_0'
- '#x_xliff unit note note_category__altsource1'
# ... __linguam__ aut __linguam_de_imperium__
- '#x_xliff unit note note_category__definitionalt1'
# ... __linguam__ aut __linguam_de_imperium__
- '#x_xliff unit note note_category__contextalt1'
# ... __linguam__ aut __linguam_de_imperium__
- '#x_xliff unit note note_category__altsource2'
# ... __linguam__ aut __linguam_de_imperium__
- '#x_xliff unit note note_category__definitionalt2'
# ... __linguam__ aut __linguam_de_imperium__
- '#x_xliff unit note note_category__contextalt2'
# ... __linguam__ aut __linguam_de_imperium__
- '#x_xliff unit note note_category__altsource3'
# ... __linguam__ aut __linguam_de_imperium__
- '#x_xliff unit note note_category__definitionalt3'
# ... __linguam__ aut __linguam_de_imperium__
- '#x_xliff unit note note_category__contextalt3'
# ... __linguam__ aut __linguam_de_imperium__
- '#x_xliff unit note note_category__altsource4'
# ... __linguam__ aut __linguam_de_imperium__
- '#x_xliff unit note note_category__definitionalt4'
# ... __linguam__ aut __linguam_de_imperium__
- '#x_xliff unit note note_category__contextalt4'
# ... __linguam__ aut __linguam_de_imperium__
- '#x_xliff unit note note_category__altsource5'
# ... __linguam__ aut __linguam_de_imperium__
- '#x_xliff unit note note_category__definitionalt5'
# ... __linguam__ aut __linguam_de_imperium__
- '#x_xliff unit note note_category__contextalt5'
# ... __linguam__ aut __linguam_de_imperium__
- '#x_xliff unit note note_category__wikidata'
- '#x_xliff unit note note_category__iate'
- '#x_xliff unit note note_category__unterm'
asa:
modus_operandi:
# - multiplum_linguam
- bilingue
# end::normam_CSV-HXL-XLIFF[]
# tag::normam_GSheets[]
#### XLSX, Google Sheets ____________________________________________________
# @see https://support.microsoft.com/en-us/office/excel-specifications-and-limits-1672b34d-7043-467e-8e27-269d656771c3
# @see https://support.google.com/drive/answer/37603
GSheets:
__meta:
archivum:
extensionem: # https://docs.google.com/spreadsheets/ (...)
descriptionem: |
_[eng-Latn]
Both URL GSheets and local/remote file of Microsoft Excel have built
read-only access in support for reference cli implementation
as container for data source without intermediate file transformation
to CSV container of HXLTM. This means humans don't need to edit CSV
files directly.
The support on `hxltmcli` to write directly to GSheets and
Microsoft Excel is unlikely to be implemented.
[eng-Latn]_
normam:
- <https://developers.google.com/sheets/api>
nomen:
# eng-Latn: 'Google Sheets (via CSV import)'
# eng-Latn: 'Google Sheet (native support to read, but not write, data directly from GSheets)'
eng-Latn: 'Google Sheets, HXLTM container (read-only; native support as data source)'
situs_interretialis:
referens_officinale:
- <https://www.google.com/sheets/about/>
asa:
modus_operandi:
- multiplum_linguam
# - bilingue
# end::normam_GSheets[]
# tag::normam_HXL-Proxy[]
#### HXL-Proxy _______________________________________________________________
HXL-Proxy:
__meta:
archivum:
extensionem:
descriptionem: |
_[eng-Latn]
HXL Proxy, a tool for cleaning, transforming, merging, and
validating data tagged using the Humanitarian Exchange Language (HXL)
standard.
In the context of HXLTM, HXL-Proxy is recommended to use for:
- HXLate (e.g. add HXL hashtags, like the ones required by HXLTM) to
any supported data input by HXL-Proxy (which is a lot)
- Do more advanced filter (like removing columns) or merging
different datasets by HXLTM concept value with friendly user
interface
When use HXL cli tools (including hxltmcli) and when use HXL-Proxy?
While hxl cli tools (see
<https://github.com/HXLStandard/libhxl-python/wiki/Command-line-tools>
and <https://pypi.org/project/libhxl/>) have almost all features
of HXL-proxy (in special with use of JSON spec) in real world, under
urgency, still faster to set up a private Docker instance than use
teach everyone to use the cli tools.
The HXLTM reference tooling will intentionally NOT implement features
that would be possible do with HXL-Proxy. Some viable using HXL
Standard cli tools (that would be too complex to explain) may
be added either as hxltmcli / hxltmdexml CLI options or via
`ontologia:normam.HXLTM-TMETA`.
[eng-Latn]_
normam:
- <https://github.com/HXLStandard/hxl-proxy/wiki>
nomen:
eng-Latn: 'HXL-Proxy (read-only; native support as data source)'
situs_interretialis:
referens_officinale:
- <https://github.com/HXLStandard/hxl-proxy>
# For humanitarian-use only (or to lean HXL), the UN OCHA proxy:
- <https://proxy.hxlstandard.org/>
# For intranets, or for large (500.000 rows) or non-humanitarian use,
# please set up your own HXL-proxy using Docker.
- <https://hub.docker.com/r/unocha/hxl-proxy>
asa:
modus_operandi:
- multiplum_linguam
# - bilingue
# end::normam_HXL-Proxy[]
#### HXLTM: Terminologia Multilinguae (Datum ideam) ________________________
# tag::normam_HXLTM[]
HXLTM:
__meta:
# archivum_extensionem: .tm.hxl.csv # .tm.hxl.xlsx, xlsx, ...
archivum:
extensionem:
- .tm.hxl.csv
- .tm.hxl.xlsx
- .hxltm.xml
# - .hxltm.tmx
# - .hxltm.tbx
# - HXL-proxy
# - (...)
descriptionem: |
_[eng-Latn]
`ontologia:normam.HXLTM` is an abstraction to several data containers
of HXLTM implementation able to store multilingual data without loss.
Some general notes:
- The most feature-complete are the HXLTM implementation using
tabular storage (plain HXLTM in CSV, or Google Sheets, or Excel,
or HXL-proxy, or...), which is able to preserve valid HXL HXLated
columns, but unknown to documented HXLTM implementation.
- The `ontologia:normam.XML`, while not tabular implementation,
contains information that allow data be exportable with
`hxltmcli --objectivum-XML` and importable with
`hxltmdexml` with more features than would be possible with other
data standards that could be close to what HXLTM is, TBX, TMX and
the tabular UTX.
- Valid HXLated HXL columns (but unknown HXLTM), even if the
templating engine know the undocumented HXL tags, are not
intended to be exported. The idea is the generic XML format still
designed to only export what could be imported back using the
same ontologia.
- If you plan to do VERY long-term data storage consider save
together with the data the ontologia that generated it.
- A cor.hxltm.yml with 3000 lines exported to PDF (which could be
printed if you data already is printed) takes around 48 pages
(A4 format).
- Is it possible to also change the tags from latin to your natural
language. While still have better ways to save more compact
export, if you plan to save a backup on some library on a
physical book, then at least customize it.
[eng-Latn]_
normam:
- <https://github.com/HXL-CPLP/forum/issues/58>
nomen:
eng-Latn: 'HXLTM: Terminologia Multilinguae (Datum ideam)'
situs_interretialis:
referens_officinale:
- <https://hdp.etica.ai/hxltm>
# end::normam_HXLTM[]
#### HXLTM: Terminologia Multilinguae Meta __________________________________
# tag::normam_HXLTM-TMETA[]
HXLTM-TMETA:
__meta:
archivum:
extensionem:
- .tmeta.json
- .tmeta.yml
descriptionem: |
_[eng-Latn]
To be documented.
[eng-Latn]_
normam:
- <https://hdp.etica.ai/hxltm/archivum/#HXLTM-TMETA>
nomen:
eng-Latn: 'HXLTM Terminologia Multilinguae Meta'
situs_interretialis:
referens_officinale:
- <https://hdp.etica.ai/hxltm>
- <https://github.com/EticaAI/HXL-Data-Science-file-formats/labels/HXLTM>
- <https://github.com/EticaAI/HXL-Data-Science-file-formats/issues/24>
# end::normam_HXLTM-TMETA[]
#### HXLTM: Terminologia Multilinguae Abstractum Syntaxim Arborem ___________
# tag::normam_HXLTM-ASA[]
HXLTM-ASA:
__meta:
archivum:
extensionem:
- .asa.hxltm.json
- .asa.hxltm.yml
normam:
- <https://hdp.etica.ai/hxltm/archivum/#HXLTM-ASA>
descriptionem: |
_[eng-Latn]
The HXLTM-ASA is an not strictly documented Abstract Syntax Tree
of an data conversion operation.
This format, different from the HXLTM permanent storage, is not
meant to be used by end users. And, in fact, either JSON (or other
formats, like YAML) are more a tool for users debugging the initial
reference implementation hxltmcli OR developers using JSON
as more advanced input than the end user permanent storage.
Warning: The HXLTM-ASA is not meant to be an stricly documented format
even if HXLTM eventually get used by large public. If necessary,
some special format could be created, but this would require feedback
from community or some work already done by implementers.
[eng-Latn]_
Trivia:
- abstractum, <https://en.wiktionary.org/wiki/abstractus#Latin>
- syntaxim, <https://en.wiktionary.org/wiki/syntaxis#Latin>
- arborem, <https://en.wiktionary.org/wiki/arbor#Latin>
- conceptum de Abstractum Syntaxim Arborem
- <https://www.wikidata.org/wiki/Q127380>
nomen:
eng-Latn: 'HXLTM Abstractum Syntaxim Arborem'
situs_interretialis:
referens_officinale:
- <https://hdp.etica.ai/hxltm>
- <https://github.com/EticaAI/HXL-Data-Science-file-formats/labels/HXLTM>
- <https://github.com/EticaAI/HXL-Data-Science-file-formats/issues/22>
# end::normam_HXLTM-ASA[]
#### JSON-kv: JSON key: val; id/source -> target _____________________________
# tag::normam_JSON-kv[]
# TODO: create at least one different exporter, JSON-2, since JSON-kv
# would be harder to explain how to document on HXLTM sheets than
# create the exporter
JSON-kv:
__meta:
archivum:
extensionem: .json
descriptionem: |
_[eng-Latn]
This export/importer needs to be created. One level is trivial, but 2
or more nested levels would be simpler for end user just use
**HXLTM Ad Hoc Fōrmulam (HXLTM templated export)** to have full
control.
[eng-Latn]_
normam:
# Not sure where to find some place to 'explain' this format
- <https://angular.io/guide/i18n#change-the-source-language-file-location>
- <https://www.i18next.com/misc/json-format>
- <https://lokalise.com/blog/how-to-internationalize-react-application-using-i18next/>
nomen:
eng-Latn: 'JSON key: val; id/source -> target (draft)'
situs_interretialis:
referens_officinale: []
exemplum:
- <https://github.com/i18next/react-i18next/blob/master/example/react/public/locales/de/translation.json>
asa:
modus_operandi:
# - multiplum_linguam
- bilingue
# end::normam_JSON-kv[]
#### TSV-3: Source Target Comment ________________________________________
# tag::normam_TSV-3[]
TSV-3:
__meta:
archivum:
extensionem: .tab
descriptionem: |
_[eng-Latn]
The hxltm "TSV-3" is que version of the "CSV-3" with tabs.
It will exportl tools) with the following column order:
> source-language target-language comment
> verbum كلمة Arabic translationem de latin verbum
This format is less common than CSV-3, but may be useful when tab
is never used inside the fields.
[eng-Latn]_
normam:
- <https://datatracker.ietf.org/doc/html/rfc4180>
- <https://www.iana.org/assignments/media-types/text/tab-separated-values>
# - <http://dataprotocols.org/linear-tsv/>
nomen:
eng-Latn: 'TSV-3 bilingual Source Objective Comment'
situs_interretialis:
vicipaedia:
- <https://en.wikipedia.org/wiki/Tab-separated_values>
asa:
modus_operandi:
# - multiplum_linguam
- bilingue
formatum:
initiale: |-
{{ globum.fontem_linguam.bcp47 | default: 'la' }} {{ globum.objectivum_linguam.bcp47 | default: 'ar' }} commentarium
corporeum: |-
{{ rem.de_fontem_linguam.rem | quotum_rem: ' ' }} {{ rem.de_objectivum_linguam.rem | quotum_rem: ' ' }}
finale: False
# end::normam_TSV-3[]
# tag::normam_TBX-Basim[]
#### TBX-Basic: TermBase eXchange (TBX) Basic _______________________________
TBX-Basim:
__meta:
archivum:
extensionem: .tbx
descriptionem: |
_[eng-Latn]
See the links
[eng-Latn]_
- <http://www.terminorgs.net/Terminology-Starter-Guide.html>
- <https://www.gala-global.org/sites/default/files/migrated-pages/docs/tbx_oscar_0.pdf>
- <http://www.ttt.org/oscarStandards/tbx/tbx-basic.html>
exemplum:
- <http://www.ttt.org/oscarStandards/tbx/TBXBasic.zip>
normam:
- <http://www.terminorgs.net/downloads/TBX_Basic_Version_3.1.pdf>
nomen:
eng-Latn: 'TermBase eXchange (TBX) Basic 2.1'
situs_interretialis:
referens_officinale:
- <http://www.terminorgs.net/TBX-Basic.html>
- <http://www.ttt.org/oscarStandards/tbx/TBXBasic.zip>
- <http://www.ttt.org/oscarStandards/tbx/tbx-basic.html>
asa:
modus_operandi:
- multiplum_linguam
# - bilingue
de_xml:
# This is a working draft
# @see https://terminator.readthedocs.io/en/latest/tbx_conformance.html
# ontologia libellam: I glossarium > II conceptum > III linguam > IV terminum
glossarium_radicem:
signum: martif
glossarium_titulum:
signum: title
# de_attributum: False
trivium:
# de <martif> ad <title>
- martifHeader
- fileDesc
# II conceptum
conceptum_codicem:
signum: termEntry
de_attributum: id
trivium:
# de <martif> ad <termEntry>
- text
- body
# III linguam
linguam_codicem:
signum: langSet # 'la' ad <langSet xml:lang="la">
de_attributum: lang
trivium: []
# IV terminum
terminum_habendum_accuratum: True # TBX terminum habendum accuratum? Verum
terminum_habendum_multum: True
terminum_habendum_fontem: False # TBX terminum habendum fontem? Falsum
terminum_habendum_objectivum: False # TBX terminum habendum objectivum? Falsum
# @see https://www.gala-global.org/sites/default/files/migrated-pages/docs/tbx_oscar_0.pdf
terminum_accuratum:
# Exemplum: <descrip type="reliabilityCode">1</descrip>
ad: XML-nodum-textum
de_signum: descrip
de_attributum:
type: reliabilityCode
# de_attributum: False
viam_trivium: []
# - termSec # de <langSec> ad <term>
terminum_valorem:
signum: term # 'lat-Latn' ad <langSet xml:lang="la"><tig><term>lat-Latn</term></tig></langSet>
# de_attributum: False
trivium:
- tig # de <langSet> ad <term>
formatum:
initiale: |
<?xml version='1.0'?>
<!DOCTYPE martif SYSTEM "TBXBasiccoreStructV02.dtd">
<martif type="TBX-Basic" xml:lang="{{ globum.fontem_linguam.iso6391a2 | default: globum.fontem_linguam.iso6391a2 | default: 'la' }}">
<martifHeader>
<fileDesc>
<titleStmt>
<title>{{- ____.glossarium_titulum -}}</title>
<note>{{- ____.glossarium_annotationem -}}</note>
</titleStmt>
<sourceDesc>
<p>{{ ____.glossarium_fontem }}</p>
</sourceDesc>
</fileDesc>
</martifHeader>
<text>
<body>
# _[eng-Latn]
# NOTE: some IDs, like
# <termEntry id="I18N_०१२३४५६७८९_〇一二三四五六七八九十百千万亿_-1 2/3*4_٩٨٧٦٥٤٣٢١٠_零壹贰叁肆伍陆柒捌玖拾佰仟萬億_I18N">
# will generate errors to validate TBX with TBXBasiccoreStructV02.dtd
# from http://www.ttt.org/oscarStandards/tbx/TBXBasic.zip
# The problematic part is ' 2/3*': '*', ' ', '/'
# One way to replace:
# {{ conceptum.codicem | default: rem.de_nomen_breve.conceptum_codicem | default: 'errorem' | replace: "*", "*" | replace: " ", "+" | replace: "/", "/" }}
# [eng-Latn]_
corporeum: |2
<termEntry id="{{ conceptum.codicem | default: rem.de_nomen_breve.conceptum_codicem | default: 'errorem' | replace: '*', '' | replace: ' ', '' | replace: '/', '' }}">
{% for item in rem.de_linguam %}
{% if item[1].rem != '' -%}
<langSet xml:lang="{{ item[1].bcp47 }}">
<tig>
<term>{{- item[1].rem -}}</term>
</tig>
</langSet>
{%- endif -%}
{%- endfor %}
</termEntry>
finale: |2
</body>
</text>
</martif>
#### Term Base eXchange (TBX) 2008 CC-BY License _____________________________
# Term Base eXchange (TBX) (identical to ISO 30042:2008)
TBX-2008:
archivum:
extensionem: .tbx
normam:
- <https://www.gala-global.org/sites/default/files/migrated-pages/docs/tbx_oscar_0.pdf>
situs_interretialis:
referens_officinale:
- <https://www.gala-global.org/knowledge-center/industry-development/standards/lisa-oscar-standards>
_temporarium:
- <https://www.termbases.eu/>
- <http://www.tbxinfo.net/>
# - <https://github.com/byutrg/TBX-Spec>
- <https://byutrg.github.io/TBX-Implementor/>
- <https://github.com/byutrg/baseterm>
- <https://github.com/LTAC-Global/TBX-Basic_ImplementationGuide>
# - https://www.tbxinfo.net/tbx-dialects/
# - Convert existing spreadsheet based glossaries into TBX format
# (see our tutorial and sample spreadsheets).
# - https://www.tbxinfo.net/wp-content/uploads/2016/06/Spreadsheet-to-TBX-Min-Tutorial.pdf
# - https://www.tbxinfo.net/wp-content/uploads/2016/05/sampleSpreadsheets.zip
# - https://multilingual.com/issues/july-aug-2019/tbx-version-3-published-at-iso/
asa:
modus_operandi:
- multiplum_linguam
# - bilingue
#### TermBase eXchange (TBX) ISO 30042:2019 proprietary forma ________________
TBX-2019:
__meta:
archivum:
extensionem: .tbx
descriptionem: |
- <https://www.tbxinfo.net/about/>
- <https://www.iso.org/standard/62510.html>
## TBX-IATE
Trivia: <https://iate.europa.eu/fields-explained>
### `hxltmdexml` --agendum-linguam
- bul-Latn@bg
- Bulgarian: bg; <https://iso639-3.sil.org/code/bul>
- ces-Latn@cs
- Czech: cs; <https://iso639-3.sil.org/code/ces>
- dan-Latn@da
- Danish: da; <https://iso639-3.sil.org/code/dan>
- dut-Latn@nl
- Dutch: nl; <https://iso639-3.sil.org/code/dut>
- ell-Latn@el
- Greek: el; <https://iso639-3.sil.org/code/ell>
- eng-Latn@en
- English: en; <https://iso639-3.sil.org/code/eng>
- est-Latn@et
- Estonian: et; <https://iso639-3.sil.org/code/est>
- fin-Latn@fi
- Finnish: fi; <https://iso639-3.sil.org/code/fin>
- fra-Latn@fr
- French: fr; <https://iso639-3.sil.org/code/fra>
- ger-Latn@de
- German: de; <https://iso639-3.sil.org/code/ger>
- gle-Latn@ga
- Irish: ga; <https://iso639-3.sil.org/code/gle>
- hun-Latn@hu
- Hungarian: hu; <https://iso639-3.sil.org/code/hun>
- ita-Latn@it
- Italian: it; <https://iso639-3.sil.org/code/ita>
- lav-Latn@lv
- Latvian: lv; <https://iso639-3.sil.org/code/lav>
- lit-Latn@lt
- Lithuanian: lt; <https://iso639-3.sil.org/code/lit>
- mlt-Latn@mt
- Maltese: mt; <https://iso639-3.sil.org/code/mlt>
- pol-Latn@pl
- Polish: pl; <https://iso639-3.sil.org/code/pol>
- por-Latn@pt
- Portuguese: pt; <https://iso639-3.sil.org/code/por>
- ron-Latn@ro
- Romanian: ro; <https://iso639-3.sil.org/code/ron>
- scr-Latn@hr
- Croatian: hr; <https://iso639-3.sil.org/code/scr>
- slk-Latn@sk
- Slovak: sk; <https://iso639-3.sil.org/code/slk>
- slv-Latn@sl
- Slovene: sl; <https://iso639-3.sil.org/code/slv>
- spa-Latn@es
- Spanish: es; <https://iso639-3.sil.org/code/spa>
- swe-Latn@sv
- Swedish: sv; <https://iso639-3.sil.org/code/swe>
normam:
- proprietary format
- ¯\_(ツ)_/¯
nomen:
eng-Latn: 'TermBase eXchange (TBX) ISO 30042:2019 proprietary format'
situs_interretialis:
referens_officinale:
- proprietary format
- ¯\_(ツ)_/¯
asa:
modus_operandi:
- multiplum_linguam
# - bilingue
de_xml:
# ontologia libellam: I glossarium > II conceptum > III linguam > IV terminum
glossarium_radicem:
signum: tbx # TBX-Basic: termEntry
# <tbx type="TBX-IATE" style="dca" xml:lang="en" xmlns="urn:iso:std:iso:30042:ed-2">
glossarium_titulum:
signum: title
# de_attributum: False
trivium:
# de <tbx> ad <title>
- tbxHeader
- fileDesc
# II conceptum
conceptum_codicem:
signum: conceptEntry # TBX-Basic: termEntry
de_attributum: id
trivium:
# de <martif> ad <termEntry>
- text
- body
# III linguam
linguam_codicem:
signum: langSec # TBX-Basic: langSet
de_attributum: lang
trivium:
- termSec
# IV terminum
terminum_habendum_accuratum: True
terminum_habendum_multum: True
terminum_habendum_fontem: False
terminum_habendum_objectivum: False
terminum_habendum_typum: True
terminum_accuratum:
# Exemplum: <descrip type="reliabilityCode">1</descrip>
ad: XML-nodum-textum
signum: descrip
de_attributum:
type: reliabilityCode
trivium:
- termSec # de <langSec> ad <term>
# terminum_accuratum:
# # Exemplum: <descrip type="reliabilityCode">1</descrip>
# ad: XML-nodum-textum
# de_signum: descrip
# de_attributum:
# type: reliabilityCode
# # de_attributum: False
# viam_trivium: []
# # - termSec # de <langSec> ad <term>
terminum_fontem: False # TBX terminum habendum fontem? Falsum
terminum_objectivum: False # TBX terminum habendum objectivum? Falsum
terminum_valorem:
signum: term # 'lat-Latn' ad <langSet xml:lang="la"><tig><term>lat-Latn</term></tig></langSet>
# de_attributum: False
trivium:
- tig # de <langSet> ad <term>
terminum_typum:
# Exemplum: <descrip type="reliabilityCode">1</descrip>
ad: XML-nodum-textum
signum: termNote
de_attributum:
type: termType
in_praefixum: 'TBX_'
in_suffixum: ''
trivium: []
formatum:
initiale: False
corporeum: False
finale: False
# end::normam_TBX-Basim[]
#### TMX: Translation Memory eXchange format (TMX) ___________________________
# tag::normam_TMX[]
TMX:
__meta:
archivum:
extensionem: .tmx
normam:
- https://www.gala-global.org/tmx-14b
- https://www.gala-global.org/sites/default/files/migrated-pages/docs/tmx14 (1).dtd
nomen:
eng-Latn: 'Translation Memory eXchange format (TMX)'
situs_interretialis:
referens_officinale:
- https://www.gala-global.org/knowledge-center/industry-development/standards/lisa-oscar-standards
asa:
modus_operandi:
- multiplum_linguam
# - bilingue
de_xml:
# ontologia libellam: I glossarium > II conceptum > III linguam > IV terminum
glossarium_radicem:
signum: tmx
# <!DOCTYPE tmx SYSTEM "tmx14.dtd"><tmx version="1.4">...
glossarium_titulum: False
# II conceptum
conceptum_codicem:
signum: tu # <tu tuid="L10N_ego_codicem">
de_attributum: tuid
trivium:
# de <tmx> ad <tu>
- body
# III linguam
linguam_codicem:
signum: tuv
de_attributum: lang
trivium: []
# IV terminum
# terminum_habendum_accuratum: True
terminum_habendum_multum: True
terminum_habendum_fontem: False # TMX terminum habendum fontem? Falsum
terminum_habendum_objectivum: False # TMX terminum habendum objectivum? Falsum
# terminum_accuratum: False # TMX terminum habendum accuratum? Falsum
# terminum_fontem: False # TMX terminum habendum fontem? Falsum
# terminum_objectivum: False # TMX terminum habendum objectivum? Falsum
terminum_valorem:
signum: seg # 'lat-Latn' ad <tu tuid="L10N_ego_codicem"><tuv xml:lang="la"><seg>lat-Latn</seg>
# de_attributum: False
trivium: []
formatum:
initiale: |2
<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE tmx SYSTEM "tmx14.dtd">
<tmx version="1.4">
<header
creationtool="hxltmcli.py"
creationtoolversion="{{ globum.instrumentum_versionem }}"
segtype="sentence"
o-tmf="UTF-8"
adminlang="{{ globum.instrumentum_versionem }}"
srclang="{{ globum.fontem_linguam.iso6391a2 | default: globum.fontem_linguam.iso6391a2 | default: 'la' }}"
datatype="PlainText"
/>
<body>
corporeum: |2
<tu tuid="{{ conceptum.codicem | default: rem.de_nomen_breve.conceptum_codicem | default: 'errorem' }}">
{%- if conceptum.conceptum_codicem_wikidata %}
<prop type="wikidata">{{ conceptum.conceptum_codicem_wikidata }}</prop>
{% endif -%}
{% for item in rem.de_linguam %}
{% if item[1].rem != '' -%}
<tuv xml:lang="{{ item[1].bcp47 }}">
<seg>{{- item[1].rem -}}</seg>
</tuv>
{%- endif -%}
{%- endfor %}
</tu>
finale: |2
</body>
</tmx>
# end::normam_TMX[]
#### UTX: Universal Terminology eXchange _____________________________________
# tag::normam_UTX[]
UTX:
__meta:
archivum:
extensionem: .utx
situs_interretialis:
referens_officinale:
- <http://www.aamt.info/english/utx/>
vicipaedia:
- <https://en.wikipedia.org/wiki/Universal_Terminology_eXchange>
normam:
- <https://aamt.info/wp-content/uploads/2019/06/utx1.20-specification-e.pdf>
- <https://aamt.info/wp-content/uploads/2019/06/utx1.20-specification-e.docx>
nomen:
eng-Latn: 'Universal Terminology eXchange (UTX) (working draft)'
exemplum:
- <https://aamt.info/english/download/#UTX_Glossaries>
- <https://docs.google.com/spreadsheets/d/13gBCYAd3tbty10W2W6GX9rA-3oT2LnI2u0UG1dpIEkI/pubhtml?widget=true&headers=false>
- <https://docs.google.com/spreadsheets/d/1YONjd_pb7iXJvGlYBeFtZCTCUrDxRfbvKf-OLbCw3LE/pubhtml?widget=true&headers=false>
- <https://aamt.info/wp-content/uploads/2019/06/yakushite-soccer-ej-utx1.20.utx>
asa:
modus_operandi:
- multiplum_linguam
- bilingue
formatum:
# TODO: formatum.initiale should have access to at least the first 10
# data lines. This could allos make some inferences without require
# everyone make it in raw python code (id est, Liquid filters).
initiale: |-
#UTX 1.20; directionality: {{ glossarium.translationem_directionem | default: 'multi' }};
# @TODO: implement the quotum_lineam filter. At the moment we have
# blank line at the end
# [{{ item[1].statum }}]
corporeum: |-
{%- if tabulam.lineam_indicem contains 1 -%}
#{%- for item in rem.de_linguam -%}
term:{{ item[1].bcp47 | quotum_rem }},
{%- endfor %}
{% endif -%}
{%- for item in rem.de_linguam -%}
{% if item[1].rem != '' -%}
{{- item[1].rem | quotum_rem -}},
{%- endif -%}
{%- endfor -%}
finale: false
# end::normam_UTX[]
#### XML: XML ___________________________
# tag::normam_XML[]
XML:
__meta:
archivum:
extensionem: .hxltm.xml
descriptionem: |
_[eng-Latn]
The .hxltm.xml named 'XML Glōssārium' is an example of multilingual
glossary exported to XML format that can be imported back.
With help of the reference cli tool, hxltmdexml, the
ontologia:normam.XML.de_xml explains how to convert back from the
XML file to an HXLTM CSV working file to be able to work with reference
cli tool hxltmcli.
[eng-Latn]_
normam:
- <https://hdp.etica.ai/hxltm/archivum/#XML>
- <https://terminator.readthedocs.io/en/latest/_images/TBX_termEntry_structure.png>
nomen:
eng-Latn: 'XML Glōssārium (generic multilingual XML)'
situs_interretialis:
referens_officinale:
- <https://hdp.etica.ai/hxltm/archivum/#XML>
asa:
modus_operandi:
- multiplum_linguam
# - bilingue
de_xml:
# ontologia libellam: I glossarium > II conceptum > III linguam > IV terminum
glossarium_radicem:
signum: glossarium
glossarium_titulum: False
# II conceptum
conceptum_codicem:
signum: conceptum
de_attributum: _
trivium:
# de <glossarium> ad <conceptum>
- datum
# III linguam
linguam_codicem:
signum: linguam
de_attributum: _
trivium: []
linguam_linguam:
ad: 'XML-nodum-attributum:_'
de_signum: linguam
# signum: linguam
# de_attributum:
# type: reliabilityCode
trivium: []
# IV terminum
terminum_habendum_accuratum: True
terminum_habendum_type: True
terminum_habendum_multum: True
terminum_habendum_fontem: False
terminum_habendum_objectivum: False
terminum_accuratum:
ad: XML-nodum-textum
de_signum: accuratum
# de_attributum: False
trivium: []
terminum_valorem:
signum: rem # 'lat-Latn' ad <conceptum _="L10N_ego_codicem"><linguam _="lat-Latn"><terminum><rem>lat-Latn</rem>
# de_attributum: False
trivium:
- terminum
# @TODO: make the latin terms, like <glossarium>, <caput> and <datum>
# be configurable, users could use to bootstrap versions even in
# non-Latin scripts.
# Why allow this? The anwser would be: why not?
formatum:
# I glossarium > II conceptum > III linguam > IV terminum
__:
___: Latium
# glōssārium, https://en.wiktionary.org/wiki/glossarium#Latin
glossarium: glossarium
# datum, https://en.wiktionary.org/wiki/datum#Latin
datum: datum
# caput, https://en.wiktionary.org/wiki/caput#Latin
caput: caput
# conceptum, https://en.wiktionary.org/wiki/conceptus#Latin
conceptum: conceptum
# dēfīnītiōnem, https://en.wiktionary.org/wiki/definitio#Latin
definitionem: definitionem
# contextum, https://en.wiktionary.org/wiki/contextus#Latin
contextum: contextum
# titulum, https://en.wiktionary.org/wiki/titulus#Latin
titulum: titulum
# linguam, https://en.wiktionary.org/wiki/lingua#Latin
linguam: linguam
# librārium, https://en.wiktionary.org/wiki/librarium#Latin
librarium: librarium
# partem ōrātiōnis, https://en.wiktionary.org/wiki/pars_orationis#Latin
partem_orationis: partem_orationis
# terminum, https://en.wiktionary.org/wiki/terminus#Latin
terminum: terminum
# fontem, https://en.wiktionary.org/wiki/fons#Latin
fontem: fontem
# objectīvum, https://en.wiktionary.org/wiki/objectivus#Latin
objectivum: objectivum
rem: rem
# de-textum: de-textum
initiale: |2-
<glossarium _="Latium" __="hxltmcli.py" ___="{{ globum.instrumentum_versionem }}">
<caput>
<titulum>{{- globum.glossarium_titulum -}}</titulum>
</caput>
<datum>
<!-- _[eng-Latn]@TODO Use librarium as XLIFF file / Excel worksheets. The "_" is (an non-Latin) script neutral default ID [eng-Latn]_ -->
<librarium _="_">
corporeum: |2
<conceptum _="{{ conceptum.codicem | default: rem.de_nomen_breve.conceptum_codicem }}">
{%- for item in rem.de_linguam %}
{% if item[1].rem != '' -%}
<linguam _="{{ item[1].linguam }}">
<definitionem></definitionem>
<terminum>
{% if item[1].accuratum -%}
<accuratum>{{- item[1].accuratum -}}</accuratum>
{%- endif %}
<rem>{{- item[1].rem -}}</rem>
</terminum>
<!-- <terminum-fontem></terminum-fontem> -->
<!-- <terminum-objectivum></terminum-objectivum> -->
</linguam>
{%- endif -%}
{%- endfor %}
</conceptum>
finale: |2
</librarium>
</datum>
</glossarium>
# end::normam_XML[]
#### XLIFF-obsoletum: XML Localization Interchange File Format (XLIFF) v2.1 __
# tag::normam_XLIFF[]
# @TODO: JLIFF (XLIFF on JSON) <https://github.com/oasis-tcs/xliff-omos-jliff>
XLIFF:
__meta:
archivum:
extensionem: .xlf
situs_interretialis:
referens_officinale:
- <https://www.oasis-open.org/committees/xliff/>
vicipaedia:
- <https://en.wikipedia.org/wiki/XLIFF>
exemplum:
- <https://github.com/oasis-tcs/xliff-xliff-22>
- <https://github.com/oasis-tcs/xliff-xliff-22/blob/master/xliff-21/test-suite/core/valid/allExtensions.xlf>
- <https://github.com/oasis-tcs/xliff-xliff-22/blob/master/xliff-21/test-suite/core/valid/everything-core.xlf>
normam:
- <https://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html>
# - <https://docs.oasis-open.org/xliff/xliff-core/v2.1/os/schemas/>
# @see <https://github.com/redhat-developer/vscode-xml/wiki/XMLValidation#XML-catalog-with-XSD>
# @see <https://github.com/redhat-developer/vscode-xml/issues/315>
- <https://docs.oasis-open.org/xliff/xliff-core/v2.1/os/schemas/catalog.xml>
nomen:
eng-Latn: 'XML Localization Interchange File Format (XLIFF) v2.1'
asa:
modus_operandi:
# - multiplum_linguam
- bilingue
de_xml:
# This is a working draft
# @see https://terminator.readthedocs.io/en/latest/tbx_conformance.html
# ontologia libellam: I glossarium > II conceptum > III linguam > IV terminum
glossarium_radicem:
signum: xliff
# Exemplum I: <xliff version="1.2">
# Exemplum II: <xliff version="1.2" xmlns="urn:oasis:names:tc:xliff:document:1.2">
glossarium_titulum: False
# II conceptum
conceptum_codicem:
signum: unit
de_attributum: id
trivium:
# de <xliff> ad <trans-unit>
- file
# III linguam
linguam_codicem: False # XLIFF-obsoletum est bilingue
linguam_fontem_codicem:
# Exemplum: 'pt' ad '<source xml:lang="pt">por-Latn</source>''
signum: source
de_attributum: lang
trivium: []
linguam_objectivum_codicem:
# Exemplum: 'es' ad '<target xml:lang="es">spa-Latn</target>''
signum: target
de_attributum: lang
trivium: []
# IV terminum
terminum_accuratum: False # XLIFF terminum habendum accuratum? Falsum
terminum_multum: False # XLIFF-obsoletum est bilingue
terminum_habendum_fontem: True
terminum_habendum_objectivum: True
terminum_fontem_valorem:
# Exemplum: 'por-Latn ad <source xml:lang="pt">por-Latn</source>
signum: source
# de_attributum: False
trivium: []
terminum_objectivum_valorem:
# Exemplum: 'spa-Latn' ad <target xml:lang="es">spa-Latn</target>
signum: target
# de_attributum: False
trivium: []
formatum:
# @see https://docs.oasis-open.org/xliff/xliff-core/v2.1/os/schemas/catalog.xml
# @see https://docs.oasis-open.org/xliff/xliff-core/v2.1/os/schemas/xliff_core_2.0.xsd
initiale: |2
<?xml version="1.0"?>
<xliff version="2.0"
xmlns="urn:oasis:names:tc:xliff:document:2.0"
xmlns:fs="urn:oasis:names:tc:xliff:fs:2.0"
xmlns:val="urn:oasis:names:tc:xliff:validation:2.0"
srcLang="{{ globum.fontem_linguam.bcp47 | default: 'la' }}"
trgLang="{{ globum.objectivum_linguam.bcp47 | default: 'ar' }}">
<file id="f1">
corporeum: |2
{% if rem.de_fontem_linguam -%}
<unit id="{{ conceptum.codicem | default: rem.de_nomen_breve.conceptum_codicem | default: 'errorem' | replace: '*', '' | replace: ' ', '' | replace: '/', '' }}">
{% if rem.de_auxilium_linguam or rem.de_nomen_breve.referens_situs_interretialis.size > 0 -%}
<notes>
{%- for item in rem.de_auxilium_linguam -%}
<note appliesTo="source" priority="3"
category="de_auxilium_linguam">
_[{{- item.linguam -}}]
{{- item.rem -}}
[{{- item.linguam -}}]_
</note>
{%- endfor %}
{% for item in rem.de_nomen_breve.referens_situs_interretialis -%}
<note appliesTo="source" priority="1"
category="referens_situs_interretialis">
{{- item -}}
</note>
{% endfor -%}
</notes>
{% else -%}
<!--
non rem.de_auxilium_linguam aut rem.de_nomen_breve.referens_situs_interretialis
-->
{% endif -%}
<segment state="{{ rem.de_objectivum_linguam.codicem_XLIFF | default: 'initial' }}">
<source>{{- rem.de_fontem_linguam.rem -}}</source>
{%- if rem.de_objectivum_linguam and rem.de_objectivum_linguam.rem != '' %}
<target>{{- rem.de_objectivum_linguam.rem -}}</target>
{%- else %}
<!-- non rem.de_objectivum_linguam -->
{%- endif %}
</segment>
</unit>
{%- else -%}
<!-- non rem.de_fontem_linguam -->
{%- endif %}
# <!-- {{ rem }} -->
finale: |2
</file>
</xliff>
# end::normam_XLIFF[]
#### normam_XLIFF-obsoletum: Universal Terminology eXchange v1.2 _____________
# tag::normam_XLIFF-obsoletum[]
XLIFF-obsoletum:
__meta:
archivum:
extensionem: .xlf
situs_interretialis:
referens_officinale:
- <https://www.oasis-open.org/committees/xliff/>
vicipaedia:
- <https://en.wikipedia.org/wiki/XLIFF>
normam:
- <https://docs.oasis-open.org/xliff/xliff-core/xliff-core.html>
- <https://docs.oasis-open.org/xliff/v1.2/os/xliff-core-1.2-strict.xsd>
- <http://docs.oasis-open.org/xliff/v1.2/cs02/xliff-core-1.2-transitional.xsd>
nomen:
eng-Latn: 'XML Localization Interchange File Format (XLIFF) v1.2'
asa:
modus_operandi:
# - multiplum_linguam
- bilingue
de_xml:
# This is a working draft
# @see https://terminator.readthedocs.io/en/latest/tbx_conformance.html
# ontologia libellam: I glossarium > II conceptum > III linguam > IV terminum
glossarium_radicem:
signum: xliff
# Exemplum I: <xliff version="1.2">
# Exemplum II: <xliff version="1.2" xmlns="urn:oasis:names:tc:xliff:document:1.2">
glossarium_titulum: False
# II conceptum
conceptum_codicem:
signum: trans-unit
de_attributum: id
trivium:
# de <xliff> ad <trans-unit>
- file
# III linguam
linguam_codicem: False # XLIFF-obsoletum est bilingue
linguam_fontem_codicem:
# Exemplum: 'pt' ad '<source xml:lang="pt">por-Latn</source>''
signum: source
de_attributum: lang
trivium: []
linguam_objectivum_codicem:
# Exemplum: 'es' ad '<target xml:lang="es">spa-Latn</target>''
signum: target
de_attributum: lang
trivium: []
# IV terminum
terminum_habendum_accuratum: False # XLIFF terminum habendum accuratum? Falsum
terminum_habendum_multum: False # XLIFF-obsoletum est bilingue
terminum_habendum_fontem: True
terminum_habendum_objectivum: True
terminum_fontem_valorem:
# Exemplum: 'por-Latn ad <source xml:lang="pt">por-Latn</source>
signum: source
# de_attributum: False
trivium: []
terminum_objectivum_valorem:
# Exemplum: 'spa-Latn' ad <target xml:lang="es">spa-Latn</target>
signum: target
# de_attributum: False
trivium: []
formatum:
initiale: |2
<?xml version="1.0"?>
<xliff version="1.2"
xmlns="urn:oasis:names:tc:xliff:document:1.2">
<file
source-language="{{ globum.fontem_linguam.bcp47 | default: 'la' }}"
target-language="{{ globum.objectivum_linguam.bcp47 | default: 'ar' }}"
datatype="plaintext"
original="exemplum.ext">
<body>
corporeum: |2
{% if rem.de_fontem_linguam and rem.de_fontem_linguam.rem != '' -%}
{%- comment -%}
_[eng-Latn]
Since we're targeting XLIFF 2.1 and this is XLIFF 1.2, we will hardcode
Some variables directly on this Liquid template instead of waste time
doing on the Python code.
If this template seems ugly, is uglier who, as 2021, still have shitty
support for XLIFF 2.X.
state: http://docs.oasis-open.org/xliff/v1.2/os/xliff-core.html#state
state-qualifier: http://docs.oasis-open.org/xliff/v1.2/os/xliff-core.html#state-qualifier
[eng-Latn]_
{%- endcomment -%}
{% capture rem_fontem_linguam_attr -%}
xml:lang="{{ rem.de_fontem_linguam.bcp47 }}"
{%- endcapture -%}
{% if rem.de_objectivum_linguam.rem and if rem.de_objectivum_linguam.rem != '' %}
{% capture XLIFF_obsoletum_state -%}
{{ rem.de_objectivum_linguam.codicem_XLIFF | default: 'new' }}
{%- endcapture -%}
{% capture rem_objectivum_linguam_attr -%}
xml:lang="{{ rem.de_objectivum_linguam.bcp47 }}"
{%- endcapture -%}
{%- if XLIFF_obsoletum_state == 'initial' -%}
{%- assign XLIFF_obsoletum_state = 'new' -%}
{%- assign XLIFF_obsoletum_approved = 'no' -%}
{%- assign XLIFF_obsoletum_statequalifier_attr = '' -%}
{%- assign XLIFF_obsoletum_approved_attr = '' -%}
{%- endif -%}
{%- if XLIFF_obsoletum_state == 'final' -%}
{%- assign XLIFF_obsoletum_state = 'final' -%}
{%- assign XLIFF_obsoletum_approved = 'yes' -%}
{%- assign XLIFF_obsoletum_translatesource_attr = ' translate="no"' -%}
{%- assign XLIFF_obsoletum_statequalifier_attr = ' state-qualifier="id-match"' -%}
{%- assign XLIFF_obsoletum_approved_attr = ' approved="yes"' -%}
{%- endif -%}
{%- endif -%}
<trans-unit id="{{- conceptum.codicem | default: rem.de_nomen_breve.conceptum_codicem | default: 'errorem' | replace: '*', '' | replace: ' ', '' | replace: '/', '' -}}"
{{- XLIFF_obsoletum_translatesource_attr -}}
{{- XLIFF_obsoletum_approved_attr -}}>
<source {{ rem_fontem_linguam_attr }}>
{{- rem.de_fontem_linguam.rem -}}
</source>
{%- if rem.de_objectivum_linguam.rem != '' %}
<target {{ rem_objectivum_linguam_attr }} state="{{ XLIFF_obsoletum_state }}"{{ XLIFF_obsoletum_statequalifier_attr }}>
{{- rem.de_objectivum_linguam.rem -}}
</target>
{%- else -%}
<!-- non rem.de_objectivum_linguam.rem -->
{%- endif %}
{% if rem.de_auxilium_linguam %}
{%- for item in rem.de_auxilium_linguam -%}
{%- if item.rem -%}
<note annotates="source" priority="2">
_[{{- item.linguam -}}]{{- item.rem -}}[{{- item.linguam -}}]_
</note>
{%- endif -%}
{%- endfor %}
{%- else -%}
<!--
non rem.de_auxilium_linguam
-->
{%- endif %}
{% if rem.de_nomen_breve.referens_situs_interretialis.size > 0 -%}
{% for item in rem.de_nomen_breve.referens_situs_interretialis -%}
<note annotates="source" priority="1"
from="referens_situs_interretialis">
{{- item -}}
</note>
{%- endfor %}
{%- else -%}
<!--
non rem.de_nomen_breve.referens_situs_interretialis
-->
{%- endif %}
</trans-unit>
{%- else -%}
<!-- non rem.de_fontem_linguam -->
{%- endif %}
finale: |2
</body>
</file>
</xliff>
# end::normam_XLIFF-obsoletum[]
# tag::normam_XLSX[]
#### XLSX, Google Sheets ____________________________________________________
# @see https://support.microsoft.com/en-us/office/excel-specifications-and-limits-1672b34d-7043-467e-8e27-269d656771c3
# @see https://support.google.com/drive/answer/37603
XLSX:
__meta:
archivum:
extensionem: .xlsx
descriptionem: |
_[eng-Latn]
Both URL GSheets and local/remote file of Microsoft Excel have built
read-only access in support for reference cli implementation
as container for data source without intermediate file transformation
to CSV container of HXLTM. This means humans don't need to edit CSV
files directly.
The support on `hxltmcli` to write directly to GSheets and
Microsoft Excel is unlikely to be implemented.
[eng-Latn]_
nomen:
eng-Latn: 'Microsoft Excel, HXLTM container (read-only; native support as data source)'
# eng-Latn: 'Microsoft Excel (native support to read, but not write, data directly with .XSLX)'
asa:
modus_operandi:
- multiplum_linguam
# - bilingue
# end::normam_XLSX[]
#### YAML: (draft) ___________________________________________________________
# tag::normam_YAML[]
YAML:
__meta:
archivum:
extensionem: .yml
normam:
# Not sure where to find some place to 'explain' this format
- <https://guides.rubyonrails.org/i18n.html>
nomen:
eng-Latn: 'YAML (planned, but no draft)'
situs_interretialis:
referens_officinale:
- <https://guides.rubyonrails.org/i18n.html>
exemplum:
- <https://github.com/i18next/react-i18next/blob/master/example/react/public/locales/de/translation.json>
asa:
modus_operandi:
# - multiplum_linguam
- bilingue
# end::normam_YAML[]
# end::normam[]
# tag::normam_excerptum[]
formatum_excerptum:
# Trivia:
# - fōrmātum, https://en.wiktionary.org/wiki/formatus#Latin
# - excerptum, https://en.wiktionary.org/wiki/excerptus#Latin
# - "liquid template"
# - https://shopify.github.io/liquid/
# - https://github.com/jg-rp/liquid#quick-start
exemplum: |
{{ #_1 }},{{ #_2 }}
# end::normam_excerptum[]
# tag::ontologia[]
libellam:
glossarium:
conceptum:
_TBX: entry-level
linguam:
_TBX: language-level
terminum:
_TBX: term-level
# rem:
# _TBX: term-level
# Trivia: ontologia, https://la.wikipedia.org/wiki/Ontologia
ontologia:
# Trivia: commūne, https://en.wiktionary.org/wiki/conceptus#Latin
commune:
# Trivia: conceptum, https://en.wiktionary.org/wiki/conceptus#Latin
conceptum:
# Trivia:
# - accūrātum, https://en.wiktionary.org/wiki/accuratus
# - reliabilityCode, https://iate.europa.eu/fields-explained
# - reliabilityCode, https://www.gala-global.org/sites/default/files/migrated-pages/docs/tbx_oscar_0.pdf
accuratum:
__HXL: '#status conceptum accuratum'
__nomen_breve: 'accuratum' # __nomen_breve __libellam : 'conceptum.accuratum'
__id: ontologia.commune.conceptum.accuratum
__libellam: conceptum
__valorem_optionem: "ontologia_aliud.accuratum"
__valorem_maximum: 10
__valorem_minimum: 0
__valorem_typum: numerum
_TBX: &referens_ontologia-commune-conceptum-accuratum-_TBX
# @see https://www.gala-global.org/sites/default/files/migrated-pages/docs/tbx_oscar_0.pdf
_nomen: reliabilityCode
_descriptionem: |
A code assigned to a data-category or record indicating accuracy
and/or completeness. The content of the <descrip> element when it
has a type attribute value of 'reliabilityCode' shall be a value
from 1 (least reliable) to 10 (most reliable).
_xml: <descrip type='reliabilityCode'>__valorem__</descrip>
# Trivia: cōdicem, https://en.wiktionary.org/wiki/codex#Latin
codicem:
__HXL: '#item conceptum codicem'
__nomen_breve: 'codicem' # conceptum.codicem
__id: ontologia.commune.conceptum.codicem
__libellam: conceptum
_XLIFF:
__HXL_bilingue: '#x_xliff unit id'
# Trivia: dēprecātum, https://en.wiktionary.org/wiki/deprecatus#Latin
deprecatum:
__HXL: '#meta conceptum codicem deprecatum'
__nomen_breve: 'codicem_deprecatum' # conceptum.codicem_deprecatum
__id: ontologia.commune.conceptum.codicem.deprecatum
__libellam: conceptum
__valorem_typum: compactum_textum
- exemplum_codicem_a|exemplum_codicem_b|123456
# Trivia: alternātīvum, https://en.wiktionary.org/wiki/alternativus#Latin
alternativum:
__HXL: '#meta conceptum codicem alternativum'
__nomen_breve: 'codicem_alternativum' # conceptum.codicem_deprecatum
__id: ontologia.commune.conceptum.codicem.alternativum
__libellam: conceptum
__valorem_typum: compactum_textum
_exemplum:
- Q1065|UNTERM5f40d95f1d17bf8c85256a01000080af|IATE787725
# ontologia.extensionem.conceptum.codicem.iate
# ontologia.extensionem.conceptum.codicem.unterm
# ontologia.extensionem.conceptum.codicem.wikidata
# (...)
dominium:
__HXL: '#item conceptum dominium' # Always a list
__nomen_breve: 'dominium' # conceptum.dominium
__id: ontologia.commune.conceptum.dominium
__libellam: conceptum
# See also
# - https://iate.europa.eu/developers
# - https://iate.europa.eu/em-api/domains/_tree?pretty=true
_TBX:
_id: DC-489
_descriptionem: |
Refers to a location in the corpus—such as a software application
user interface, product packaging, oran industrial process—where
the term frequently occurs'A sample sentence that contains
the term.
_nomen: Subject field
_level: Concept
_xml: <descrip type='subjectField'>
_TMX:
_descriptionem: |
Property - The <prop> element is used to define the various
properties of the parent element (or of the document when
<prop> is used in the <header> element).
These properties are not defined by the standard.
It is the responsibility of each tool provider to publish the
types and values of the properties it uses.
If the tool exports unpublished properties types, their
values should begin with the prefix "x-".
Example:
<prop type='user-defined'>name:domain value:Computer science</prop>
<prop type='x-domain'>Computer science</prop>
_UTX:
_nomen: domain property
_descriptionem: |
The domain property is a text string that indicates the domain of
the glossary. This property is used when you need to group multiple
glossaries into a domain. If you use glossary IDs as domain names,
the domain property is not necessary.
Example
domain: Aerospace
_XLIFF:
# (We will not implement deeper levels than 0 now)
__HXL_bilingue: '#x_xliff group group_0'
compactum_json:
__HXL: '#meta conceptum json'
__id: ontologia.commune.conceptum.compactum_json
# # Better not provide a generic compactum_textum for conceptum
# compactum_textum:
# __HXL: '#meta conceptum textum'
# __id: ontologia.commune.conceptum.compactum_textum
# meta:
# __HXL: '#meta conceptum'
# __id: ontologia.commune.conceptum.meta
# # Note: this field is unlikely to show as it is on spreadsheets
# # edited by humans, but is a catch-all for metatada related
# # to concept that does not have specialized field.
# # Recommended data type: JSON
# Trivia: referēns, https://en.wiktionary.org/wiki/referens
referens:
# Situs interretialis
situs_interretialis:
__HXL: '#meta item url list'
__nomen_breve: 'referens_situs_interretialis' # conceptum.referens_situs_interretialis
__id: ontologia.commune.conceptum.referens.situs_interretialis
__libellam: conceptum
# TODO: TBXBasic 6.17 Subject field, <descrip type="subjectField">
# Trivia: "typum", https://en.wiktionary.org/wiki/typus#Latin
typum:
__HXL: '#item conceptum typum'
# @TODO: This may need move to other part, since is term level
# not concept level.
__nomen_breve: 'typum' # rem.typum
__id: ontologia.commune.conceptum.typum
__libellam: rem
_TBX: &referens_ontologia-commune-rem-typum-__linguam__-_TBX
_id: DC-2677
_nomen: Term type
_xml: <termNote type='termType'>
_descriptionem: |
Permissible values and their ISOcat PIDs are as follows:
Value ISOcat PID
fullForm www.isocat.org/datcat/DC-321
acronym www.isocat.org/datcat/DC-334
abbreviation www.isocat.org/datcat/DC-331
shortForm www.isocat.org/datcat/DC-332
variant www.isocat.org/datcat/DC-330
phrase www.isocat.org/datcat/DC-339
_XLIFF:
# temporary hashtag. Needs better naming
__HXL_bilingue: '#x_xliff unit note note_category__termtype'
# Trivia: rem, https://en.wiktionary.org/wiki/res#Latin
rem:
# Trivia: linguam, https://en.wiktionary.org/wiki/lingua#Latin
__linguam__:
# Exemplum: '#item rem i_en i_eng is_Latn'
# TODO: invert __HXL with __HXL_deprecatum
__HXL: '#item rem __linguam__'
__HXL_deprecatum: ['#item terminum __linguam__ rem']
__nomen_breve: 'rem__L__' # rem.rem__L__
__id: ontologia.commune.rem.__linguam__
__libellam: rem
_TBX: &referens_ontologia-commune-rem-__linguam__-_TBX
_id: DC-1823
_nomen: Term
_descriptionem: |
Refers to a location in the corpus—such as a software application
user interface, product packaging, oran industrial process—where
the term frequently occurs'A sample sentence that contains
the term.
_xml: '<term>'
_UTX: &referens_ontologia-commune-rem-__linguam__-_UTX
_nomen: Term
_descriptionem: |
A term is a headword of either the source or target language(s).
A term in a UTX glossary should be in the basic form of the word
such as a headword in a dictionary.
See also "7. Appendix A: UTX content guidelines."
Note: Term definitions are optional in a UTX glossary.
_XLIFF:
# __HXL_bilingue: ''
__HXL_fontem: '#x_xliff source __linguam__'
__HXL_fontem_alternativum_I: '#x_xliff unit note note_category__altsource1 __linguam__'
__HXL_fontem_alternativum_II: '#x_xliff unit note note_category__altsource2 __linguam__'
__HXL_fontem_alternativum_III: '#x_xliff unit note note_category__altsource3 __linguam__'
__HXL_fontem_alternativum_IV: '#x_xliff unit note note_category__altsource4 __linguam__'
__HXL_fontem_alternativum_V: '#x_xliff unit note note_category__altsource5 __linguam__'
__HXL_objectivum: '#x_xliff target __linguam__'
# Trivia:
# - linguam, https://en.wiktionary.org/wiki/lingua#Latin
# - dē, https://en.wiktionary.org/wiki/de#Latin
# - imperium, https://en.wiktionary.org/wiki/imperium#Latin
__linguam_de_imperium__:
# Exemplum: '#item rem i_en i_eng is_Latn TODO_THINK_ABOUT_HXLTAG'
__HXL: '#item rem __linguam_de_imperium__'
__nomen_breve: 'rem__L_I__' # rem.rem__L_I__
__id: ontologia.commune.rem.__linguam_de_imperium__
__libellam: rem
_TBX: *referens_ontologia-commune-rem-__linguam__-_TBX
_UTX: *referens_ontologia-commune-rem-__linguam__-_UTX
_XLIFF:
# __HXL_bilingue: ''
__HXL_fontem: '#x_xliff source __linguam_de_imperium__'
__HXL_fontem_alternativum_I: '#x_xliff unit note note_category__altsource1 __linguam_de_imperium__'
__HXL_fontem_alternativum_II: '#x_xliff unit note note_category__altsource2 __linguam_de_imperium__'
__HXL_fontem_alternativum_III: '#x_xliff unit note note_category__altsource3 __linguam_de_imperium__'
__HXL_fontem_alternativum_IV: '#x_xliff unit note note_category__altsource4 __linguam_de_imperium__'
__HXL_fontem_alternativum_V: '#x_xliff unit note note_category__altsource5 __linguam_de_imperium__'
__HXL_objectivum: '#x_xliff target __linguam_de_imperium__'
accuratum:
__linguam__:
__HXL: '#status rem accuratum __linguam__'
__nomen_breve: 'accuratum__L__' # rem.accuratum__L__
__id: ontologia.commune.rem.__linguam__.accuratum
__libellam: rem
__valorem_optionem: "ontologia_aliud.accuratum"
__valorem_maximum: 10
__valorem_minimum: 0
__valorem_typum: numerum
_TBX: *referens_ontologia-commune-conceptum-accuratum-_TBX
__linguam_de_imperium__:
__HXL: '#status rem accuratum __linguam_de_imperium__'
__nomen_breve: 'accuratum__L_I__' # rem.accuratum__L_I__
__id: ontologia.commune.rem.accuratum.__linguam_de_imperium__
__libellam: rem
__valorem_optionem: "ontologia_aliud.accuratum"
__valorem_maximum: 10
__valorem_minimum: 0
__valorem_typum: numerum
_TBX: *referens_ontologia-commune-conceptum-accuratum-_TBX
compactum_json:
__linguam__:
__HXL: '#meta rem json __linguam__'
__nomen_breve: 'json__L__' # rem.json__L__
__id: ontologia.commune.rem.compactum_json.__linguam__
__libellam: rem
__linguam_de_imperium__:
__HXL: '#meta rem json __linguam_de_imperium__'
__nomen_breve: 'json__L_I__' # rem.json__L_I__
__id: ontologia.commune.rem.compactum_json.__linguam_de_imperium__
__libellam: rem
# Note: we will NOT provide a generic #meta rem, but will compact_json
# compactum_textum:
# __linguam__:
# __HXL: '#meta rem textum __linguam__'
# __id: ontologia.commune.rem.compactum_textum.__linguam__
# - status
# - HXL hashtag #status, https://hxlstandard.org/standard/1-1final/dictionary/#tag_status
# - statum, https://en.wiktionary.org/wiki/status#Latin
statum:
compactum_json:
__linguam__:
__HXL: '#status rem json __linguam__'
__id: ontologia.commune.rem.status.compactum_json.__linguam__
__libellam: rem
__nomen_breve: 'statum_rem_json__L__'
__linguam_de_imperium__:
__HXL: '#status rem json __linguam_de_imperium__'
__id: ontologia.commune.rem.compactum_json.__linguam_de_imperium__
__libellam: rem
compactum_textum:
__linguam__:
__HXL: '#status rem textum __linguam__'
__id: ontologia.commune.rem.__linguam__.status.compactum_textum
__libellam: rem
__nomen_breve: 'statum_rem_textum__L__'
_XLIFF:
# _[eng-Latn]
# XLIFF assume that the source translation already 'is perfect'
# Something that would not be true on HXLTM use cases that allow
# force any language as entry. So the alternative is only mark
# as metadata the acuracy of the source, and the target language
# use as status (since previous content may already be
# translated)
# [eng-Latn]_
#
# __HXL_bilingue: ''
# TODO: better naming for __HXL_fontem
__HXL_fontem: '#x_xliff unit note note_category__sourcestatus __linguam__'
__HXL_objectivum: '#x_xliff segment state __linguam__'
# TODO: document in YAML how a program could 'transform' the text
# to what XLIFF expect. We already have aliases at the
# bottom of this file
__linguam_de_imperium__:
__HXL: '#status rem textum __linguam_de_imperium__'
__id: ontologia.commune.rem.compactum_textum.__linguam_de_imperium__
_XLIFF:
# __HXL_bilingue: ''
# TODO: better naming for __HXL_fontem
__HXL_fontem: '#x_xliff unit note note_category__sourcestatus __linguam_de_imperium__'
__HXL_objectivum: '#x_xliff segment state __linguam_de_imperium__'
# MOVED TO ontologia.glossarium
# # Trivia: contextum, https://en.wiktionary.org/wiki/contextus#Latin
# contextum:
# __linguam__:
# __HXL: '#item rem contextum __linguam__'
# __id: ontologia.commune.rem.contextum.__linguam__
# _TBX: &referens_ontologia-commune-rem-contextum-__linguam__-_TBX
# _id: DC-149
# _descriptionem: A sample sentence that contains the term.
# _xml: <descrip type='context'>
# __linguam_de_imperium__:
# __HXL: '#item rem contextum __linguam_de_imperium__'
# __id: ontologia.commune.rem.contextum.__linguam_de_imperium__
# _TBX: *referens_ontologia-commune-rem-contextum-__linguam__-_TBX
# MOVED TO ontologia.glossarium
# # Trivia: dēfīnītiōnem, https://en.wiktionary.org/wiki/definitio#Latin
# definitionem:
# __linguam__:
# __HXL: '#item rem definitionem __linguam__'
# __id: ontologia.commune.rem.definitionem.__linguam__
# _TBX: &referens_ontologia-commune-rem-definitionem-__linguam__-_TBX
# _id: DC-168
# _descriptionem:
# _nomen: Definition
# _level: Concept, Language
# _xml: <descrip type='definition'>
# __linguam_de_imperium__:
# __HXL: '#item rem definitionem __linguam_de_imperium__'
# __id: ontologia.commune.rem.definitionem.__linguam_de_imperium__
# _TBX: *referens_ontologia-commune-rem-definitionem-__linguam__-_TBX
# Trivia:
# - genus_grammaticum,
# - https://la.wikipedia.org/wiki/Genus_grammaticum
# - https://en.wikipedia.org/wiki/List_of_languages_by_type_of_grammatical_genders
genus_grammaticum:
__linguam__:
__HXL: '#item rem genus_grammaticum __linguam__'
__id: ontologia.commune.rem.genus_grammaticum.__linguam__
_TBX: &referens_ontologia-commune-rem-genus_grammaticum-__linguam__-_TBX
_id: DC-245
_descriptionem: |
Picklist, with permissible values as follows:
• masculine
• feminine
• neuter
• other
_nomen: Gender
_level: Term
_xml: <termNote type='grammaticalGender'>
__linguam_de_imperium__:
__HXL: '#item rem genus_grammaticum __linguam_de_imperium__'
__id: ontologia.commune.rem.genus_grammaticum.__linguam_de_imperium__
_TBX: *referens_ontologia-commune-rem-genus_grammaticum-__linguam__-_TBX
# Trivia: annotātiōnem, https://en.wiktionary.org/wiki/annotatio#Latin
annotationem:
__linguam__:
__HXL: '#meta rem annotationem __linguam__'
__id: ontologia.commune.rem.annotationem.__linguam__
__valorem_typum: textum # annotationem Always free text
_TBX: &referens_ontologia-commune-rem-annotationem-__linguam__-_TBX
_id: DC-382
_descriptionem: |
Any kind of note, such as a usage note, explanation,
or instruction
_nomen: Note
_level: Concept, Language, Term
_xml: <note>
__linguam_de_imperium__:
__HXL: '#meta rem annotationem __linguam_de_imperium__'
__id: ontologia.commune.rem.annotationem.__linguam_de_imperium__
__valorem_typum: textum # annotationem Always free text
_TBX: *referens_ontologia-commune-rem-annotationem-__linguam__-_TBX
# Trivia: partem ōrātiōnis, https://en.wiktionary.org/wiki/pars_orationis#Latin
partem_orationis:
__linguam__:
__HXL: '#item rem partem_orationis __linguam__'
__id: ontologia.commune.rem.partem_orationis.__linguam__
_TBX: &referens_ontologia-commune-rem-partem_orationis-__linguam__-_TBX
_id: DC-396
_nomen: Part of speech
_descriptionem: |
Content type picklist
Permissible values and their ISOcat PIDs are as follows:
Value ISOcat PID
noun www.isocat.org/datcat/DC-1333
verb www.isocat.org/datcat/DC-1424
adjective www.isocat.org/datcat/DC-1230
adverb www.isocat.org/datcat/DC-1232
properNoun www.isocat.org/datcat/DC-384
other www.isocat.org/datcat/DC-4336
In TBX-Default, the data type for part of speech is plainText.
TBX-Basic's use of picklist is in compliance with TBX-Default
because picklist is more constrained than plainText.
The other value can be used for terms of the phrase type.
_xml: <termNote type='partOfSpeech'>
__linguam_de_imperium__:
__HXL: '#item rem partem_orationis __linguam_de_imperium__'
__id: ontologia.commune.rem.partem_orationis.__linguam_de_imperium__
_TBX: *referens_ontologia-commune-rem-partem_orationis-__linguam__-_TBX
# # TODO: TBXBasic 6.17 Subject field, <descrip type="subjectField">
# # Trivia: "typum", https://en.wiktionary.org/wiki/typus#Latin
# typum:
# __linguam__:
# __HXL: '#item rem typum __linguam__'
# __id: ontologia.commune.rem.typum.__linguam__
# _TBX: &referens_ontologia-commune-rem-typum-__linguam__-_TBX
# _id: DC-2677
# _nomen: Term type
# _xml: <termNote type='termType'>
# _descriptionem: |
# Permissible values and their ISOcat PIDs are as follows:
# Value ISOcat PID
# fullForm www.isocat.org/datcat/DC-321
# acronym www.isocat.org/datcat/DC-334
# abbreviation www.isocat.org/datcat/DC-331
# shortForm www.isocat.org/datcat/DC-332
# variant www.isocat.org/datcat/DC-330
# phrase www.isocat.org/datcat/DC-339
# __linguam_de_imperium__:
# __HXL: '#item rem typum __linguam_de_imperium__'
# __id: ontologia.commune.rem.typum.__linguam_de_imperium__
# _TBX: *referens_ontologia-commune-rem-typum-__linguam__-_TBX
# Trivia: glōssārium, https://en.wiktionary.org/wiki/glossarium
glossarium:
# Trivia: rem, https://en.wiktionary.org/wiki/res#Latin
rem:
# Trivia: contextum, https://en.wiktionary.org/wiki/contextus#Latin
contextum:
__linguam__:
__HXL: '#item rem contextum __linguam__'
# __nomen_breve: 'G_rem_contextum__L__'
__nomen_breve: 'contextum__L__' # linguam.contextum__L__
__id: ontologia.glossarium.rem.contextum.__linguam__
__libellam: linguam
_TBX: &referens_ontologia-commune-rem-contextum-__linguam__-_TBX
_id: DC-149
_descriptionem: A sample sentence that contains the term.
_xml: <descrip type='context'>
_XLIFF:
# __HXL_bilingue: ''
__HXL_fontem: '#x_xliff unit note note_category__context __linguam__'
__HXL_fontem_alternativum_I: '#x_xliff unit note note_category__contextalt1 __linguam__'
__HXL_fontem_alternativum_II: '#x_xliff unit note note_category__contextalt2 __linguam__'
__HXL_fontem_alternativum_III: '#x_xliff unit note note_category__contextalt3 __linguam__'
__HXL_fontem_alternativum_IV: '#x_xliff unit note note_category__contextalt4 __linguam__'
__HXL_fontem_alternativum_V: '#x_xliff unit note note_category__contextalt5 __linguam__'
# __HXL_objectivum: ''
__linguam_de_imperium__:
__HXL: '#item rem contextum __linguam_de_imperium__'
# __nomen_breve: 'G_rem_contextum__L_I__'
__nomen_breve: 'contextum__L_I__' # linguam.contextum__L_I__
__id: ontologia.commune.rem.contextum.__linguam_de_imperium__
__libellam: linguam
_TBX: *referens_ontologia-commune-rem-contextum-__linguam__-_TBX
_XLIFF:
# __HXL_bilingue: ''
__HXL_fontem: '#x_xliff unit note note_category__context __linguam_de_imperium__'
__HXL_fontem_alternativum_I: '#x_xliff unit note note_category__contextalt1 __linguam_de_imperium__'
__HXL_fontem_alternativum_II: '#x_xliff unit note note_category__contextalt2 __linguam_de_imperium__'
__HXL_fontem_alternativum_III: '#x_xliff unit note note_category__contextalt3 __linguam_de_imperium__'
__HXL_fontem_alternativum_IV: '#x_xliff unit note note_category__contextalt4 __linguam_de_imperium__'
__HXL_fontem_alternativum_V: '#x_xliff unit note note_category__contextalt5 __linguam_de_imperium__'
# __HXL_objectivum: ''
# Trivia: dēfīnītiōnem, https://en.wiktionary.org/wiki/definitio#Latin
definitionem:
__linguam__:
__HXL: '#item rem definitionem __linguam__'
# __nomen_breve: 'G_rem_definitionem__L__'
__nomen_breve: 'definitionem__L__' # linguam.definitionem__L__
__id: ontologia.glossarium.rem.definitionem.__linguam__
__libellam: linguam
_TBX: &referens_ontologia-commune-rem-definitionem-__linguam__-_TBX
_id: DC-168
_descriptionem:
_nomen: Definition
_level: Concept, Language
_xml: <descrip type='definition'>
_XLIFF:
# __HXL_bilingue: ''
__HXL_fontem: '#x_xliff unit note note_category__definition __linguam__'
__HXL_fontem_alternativum_I: '#x_xliff unit note note_category__definitionalt1 __linguam__'
__HXL_fontem_alternativum_II: '#x_xliff unit note note_category__definitionalt2 __linguam__'
__HXL_fontem_alternativum_III: '#x_xliff unit note note_category__definitionalt3 __linguam__'
__HXL_fontem_alternativum_IV: '#x_xliff unit note note_category__definitionalt4 __linguam__'
__HXL_fontem_alternativum_V: '#x_xliff unit note note_category__definitionsalt5 __linguam__'
# __HXL_objectivum: ''
__linguam_de_imperium__:
__HXL: '#item rem definitionem __linguam_de_imperium__'
# __nomen_breve: 'G_rem_definitionem__L_I__'
__nomen_breve: 'definitionem__L_I__' # linguam.definitionem__L_I__
__id: ontologia.commune.rem.definitionem.__linguam_de_imperium__
__libellam: linguam
_TBX: *referens_ontologia-commune-rem-definitionem-__linguam__-_TBX
_XLIFF:
# __HXL_bilingue: ''
__HXL_fontem: '#x_xliff unit note note_category__definition __linguam_de_imperium__'
__HXL_fontem_alternativum_I: '#x_xliff unit note note_category__definitionalt1 __linguam_de_imperium__'
__HXL_fontem_alternativum_II: '#x_xliff unit note note_category__definitionalt2 __linguam_de_imperium__'
__HXL_fontem_alternativum_III: '#x_xliff unit note note_category__definitionalt3 __linguam_de_imperium__'
__HXL_fontem_alternativum_IV: '#x_xliff unit note note_category__definitionalt4 __linguam_de_imperium__'
__HXL_fontem_alternativum_V: '#x_xliff unit note note_category__definitionsalt5 __linguam_de_imperium__'
# __HXL_objectivum: ''
# Trivia: extēnsiōnem, https://en.wiktionary.org/wiki/extensio#Latin
extensionem:
conceptum:
codicem:
iate:
__HXL: '#meta conceptum codicem iate'
__nomen_breve: 'codicem_iate' # conceptum.codicem_iate
__id: ontologia.extensionem.conceptum.codicem.iate
__libellam: conceptum
_url: https://iate.europa.eu/fields-explained
# _url_rem: ' ???? '
_compactum_json:
- #meta json conceptum
_id: ontologia.commune.conceptum.compactum_json
_JSONPath: codicem.iate
_exemplum:
{"codicem": {"iate": 787725}}
_compactum_textum:
- #meta conceptum codicem alternativum
_id: ontologia.commune.conceptum.codicem.alternativum
_praefixum: "IATE"
_suffixum: ""
_exemplum:
- IATE787725
_XLIFF:
__HXL_bilingue: '#x_xliff unit note note_category__iate'
unterm:
__HXL: '#meta conceptum codicem unterm'
__nomen_breve: 'codicem_unterm' # conceptum.codicem_unterm
__id: ontologia.extensionem.conceptum.codicem.unterm
__libellam: conceptum
_url: https://iate.europa.eu/fields-explained
_url_rem: 'https://unterm.un.org/unterm/display/record/unhq/na?OriginalId={{valorem}}'
_compactum_json:
- _id: ontologia.commune.conceptum.compactum_json
_JSONPath: codicem.unterm
#meta conceptum json
_exemplum:
{"codicem": {"unterm": "5f40d95f1d17bf8c85256a01000080af"}}
_compactum_textum:
#meta conceptum codicem alternativum
- _id: ontologia.commune.conceptum.codicem.alternativum
_praefixum: "UNTERM"
_suffixum: ""
_exemplum: UNTERM5f40d95f1d17bf8c85256a01000080af
_XLIFF:
__HXL_bilingue: '#x_xliff unit note note_category__unterm'
wikidata:
__HXL: '#meta conceptum codicem wikidata'
__nomen_breve: 'codicem_wikidata' # conceptum.codicem_wikidata
__id: ontologia.extensionem.conceptum.codicem.wikidata
__libellam: conceptum
_url: https://www.wikidata.org/
_url_rem: 'https://www.wikidata.org/wiki/{{valorem}}'
_compactum_json:
- #meta json conceptum
_id: ontologia.commune.conceptum.compactum_json
_JSONPath: codicem.wikidata
_exemplum:
{"codicem": {"wikidata": "Q1065"}}
_compactum_textum:
- #meta conceptum codicem alternativum
_id: ontologia.commune.conceptum.codicem.alternativum
_praefixum: "Q"
_suffixum: ""
_exemplum: Q1065
_XLIFF:
__HXL_bilingue: '#x_xliff unit note note_category__wikidata'
# @see https://docs.google.com/spreadsheets/d/1couRYFuVLnr6CfIMEiXKBamJtmcHinSAy1K1e69rNqw/edit#gid=141644151
normam:
org_hxlstandard:
__HXL: '#meta conceptum normam normam_org_hxlstandard'
__nomen_breve: 'normam_org_hxlstandard'
__id: ontologia.extensionem.conceptum.normam.org_hxlstandard
__libellam: conceptum
_url: https://hxlstandard.org/
_compactum_json:
- #meta json conceptum
_id: ontologia.commune.conceptum.compactum_json
_JSONPath: normam.hxlstandard
_exemplum:
{"normam": {"org_hxlstandard": '#meta id'}}
# _compactum_textum:
# - #meta conceptum normam alternativum
# _id: ontologia.commune.conceptum.normam.alternativum
# _praefixum: ""
# _suffixum: ""
# _exemplum:
# -
# _XLIFF:
# __HXL_bilingue: '#x_xliff unit note note_category__iate'
# Trivia: https://www.wikidata.org/wiki/Q333761
tm:
# @see https://multifarious.filkin.com/2018/08/23/xliff-2-x-the-translators-panacea/
# @see https://blog.zingword.com/xliff-2-0-and-how-big-companies-are-preventing-translators-from-improving-their-lives-209e384ed8ea
# @see https://github.com/tingley/interoperability-now/wiki
# Trivia: conceptum, https://en.wiktionary.org/wiki/conceptus#Latin
conceptum:
# Trivia: trānslātiōnem, https://en.wiktionary.org/wiki/translatio#Latin
translationem:
# Trivia: dīrēctiōnem, https://en.wiktionary.org/wiki/directio#Latin
directionem:
__HXL: '#meta conceptum translationem directionem'
# __nomen_breve: 'TM_conceptum_translationem_directionem'
__nomen_breve: 'translationem_directionem' # conceptum.translationem_directionem
__id: ontologia.tm.conceptum.translationem.directionem
__libellam: conceptum
_compactum_json:
#meta json conceptum
_id: ontologia.commune.conceptum.compactum_json
_JSONPath: translationem.directionem
_exemplum:
{"translationem": {"directionem": []}}
_UTX: &referens_ontologia-commune-conceptum-meta-_UTX
_nomen: Translation direction
_descriptionem: |
The direction of translation from one language to another
(translation direction) can be unidirectional, bidirectional, or
multidirectional. This information can be specified with
"3.2.14 directionality property."
A unidirectional glossary is a glossary whose translation
direction is primarily one-way, i.e. from the source language to
the target language.
> Example: unidirectional bilingual Japanese-English UTX glossary
- Source language: Japanese, target language: English
- Primary translation direction: Japanese to English
- Note: Some terms in a unidirectional glossary may be exported
and used in the reverse direction in an ad-hoc manner.
This operation is called reverse-exporting. In this case, the
source language becomes the target language, and vice versa.
A reverse-exported unidirectional glossary may contain problems
because the consequence of the reversal may not be thoroughly
examined when compared with a full bidirectional glossary.
A bidirectional glossary is a glossary that is designed to be
used in two-way translation. Terms in one language can be
translated into another, and vice versa.
> Example: bidirectional bilingual Japanese-English UTX glossary
>
> - Language 1: Japanese, language 2: English
> - Translation direction: Japanese ⇔ English
> Example: bidirectional multilingual English-French-German
> glossary
>
> - Language 1: English, language 2: French, language 3: German
> - Translation direction: English ⇔ French, English ⇔ German
(but not French ⇔ German)
A multidirectional glossary is a type of multilingual glossary
that is designed to be used in any combination of languages in the
glossary.
> Example: multidirectional multilingual English-Japanese-Chinese
> glossary
>
> - Language 1: English, language 2: Japanese, language 3: Chinese
> - Translation direction: any combination of the above
# Trivia: rem, https://en.wiktionary.org/wiki/res#Latin
rem:
# Trivia: trānslātiōnem, https://en.wiktionary.org/wiki/translatio#Latin
translationem:
# Trivia: dīrēctiōnem, https://en.wiktionary.org/wiki/directio#Latin
directionem:
__linguam__:
# Exemplum: '#meta rem translationem directionem i_en i_eng is_Latn'
__HXL: '#meta rem translationem directionem __linguam__'
# __nomen_breve: 'TM_rem_translationem_directionem__L__'
__nomen_breve: 'translationem_directionem__L__' # conceptum.translationem_directionem__L__
__id: ontologia.tm.rem.translationem.directionem.__linguam__
__libellam: conceptum
_compactum_json:
#meta json rem __linguam__
_id: ontologia.commune.rem.__linguam__.compactum_json
_JSONPath: translationem.directionem
_exemplum:
{"translationem": {"directionem": []}}
_UTX: *referens_ontologia-commune-conceptum-meta-_UTX
__linguam_de_imperium__:
__HXL: '#item rem translationem directionem __linguam_de_imperium__'
# __nomen_breve: 'TM_rem_translationem_directionem__L_I__'
__nomen_breve: 'translationem_directionem__L_I__' # conceptum.translationem_directionem__L_I__
__id: ontologia.tm.rem.translationem.directionem.__linguam_de_imperium__
__libellam: conceptum
_compactum_json:
#meta json rem __linguam_de_imperium__
_id: ontologia.commune.rem.__linguam_de_imperium__.compactum_json
_JSONPath: translationem.directionem
_exemplum:
{"translationem": {"directionem": ""}}
_UTX: *referens_ontologia-commune-conceptum-meta-_UTX
# end::ontologia[]
# tag::ontologia_datum_typum[]
# Trivia:
# - ontologia, https://la.wikipedia.org/wiki/Ontologia
# - datum, https://en.wiktionary.org/wiki/datum#Latin
# - typum, https://en.wiktionary.org/wiki/typus#Latin
# - fōrmātum, https://en.wiktionary.org/wiki/formatus#Latin
# - normam, https://en.wiktionary.org/wiki/norma#Latin
# - digitum, https://en.wiktionary.org/wiki/digitus#Latin
# - digit, https://en.wiktionary.org/wiki/digit#English
# - 'Unicode Digit' (unicode have several classes of numbers, digit is
# how it calls the 0 1 2 3 4 5 6 7 8 9)
# - textum, https://en.wiktionary.org/wiki/textus#Latin
ontologia_datum_typum:
formatum:
compactum_json:
_normam: https://www.json.org/
_typum: textum
compactum_textum:
_descriptionem: |
_[eng-Latn] Note: compactum_textum is different from textum. While textum
is a free form text, compactum_textum is meant to be used as human
editable compact form of values that could be stored on the equivalent
compactum_json.
Most of the time this means create controlled constants documented on
cor.hxltm.yml->ontologia_aliud to to represent other values. So this
means that compactum_textum MUST be both editable by humans and
parseable by computers.
Also, when more than one constant is need on compactum_textum the
separator is the character "|".
Whitespaces betwen start and end of a term should be ignored.
Unknow values should be ignored. Errors should only stop processing
if user ask for stricter or debugging mode.
[eng-Latn]_
_typum: textum
numerum:
_exemplum:
- "123456789"
- "0123456789ABCDEF"
- "0123456789ABCDEF"
- "Ⅰ Ⅱ Ⅲ Ⅳ Ⅴ Ⅵ Ⅶ Ⅷ Ⅸ" # https://en.wikipedia.org/wiki/Numerals_in_Unicode#Roman_numerals
- "零 壹 貳 參 肆 伍 陸 柒 捌 玖" # https://en.wikipedia.org/wiki/Chinese_numerals
- "〇 一 二 三 四 五 六 七 八 九" # https://en.wikipedia.org/wiki/Chinese_numerals
# Several more examples at https://en.wikipedia.org/wiki/Numeral_system
_typum: textum
numerum_digitum:
_descriptionem: |
- https://en.wikipedia.org/wiki/Numerals_in_Unicode
_exemplum:
- "123456789"
_typum: numerum_digitum
textum:
_descriptionem: |
_[eng-Latn] Free text. This means allow even line breaks.
The only hard requeriment is a a format that can be represented on CSV
format (so means escaping characters).
DO NOT escape non US-ASCII characters. This is annoying. Fix your
system to accept Unicode and if necessary, only threat differently
control characters that are writing system neutral.
[eng-Latn]_
typum:
numerum_digitum: {}
textum: {}
# end::ontologia_datum_typum[]
# tag::ontologia_aliud[]
# Trivia:
# - ontologia, https://la.wikipedia.org/wiki/Ontologia
# - aliud, https://en.wiktionary.org/wiki/alius#Latin, https://en.wiktionary.org/wiki/alias#English
ontologia_aliud:
# Trivia:
# - accūrātum, https://en.wiktionary.org/wiki/accuratus
# - reliabilityCode, https://iate.europa.eu/fields-explained
# - reliabilityCode, https://www.gala-global.org/sites/default/files/migrated-pages/docs/tbx_oscar_0.pdf
#
# Term Base eXchange (TBX) 2008 CC-BY License ................................
# @see https://www.gala-global.org/sites/default/files/migrated-pages/docs/tbx_oscar_0.pdf
#
# reliabilityCode:
# A code assigned to a data-category or record indicating accuracy and/or
# completeness. The content of the <descrip> element when it has a type
# attribute value of 'reliabilityCode' shall be a value from 1
# (least reliable) to 10 (most reliable).
#
# Interactive Terminology for Europe (IATE) ..................................
# @see https://iate.europa.eu/fields-explained
#
# Reliability code
# IATE uses four codes to indicate the reliability of terms:
#
# Nº Code Description Explanation
#
# 1 ★ Reliability Automatically assigned to terms entered by
# not verified non-native speakers. Also, all lookup forms have
# a reliability of one.
#
# 6 ★★ Minimum Automatically assigned to terms entered or updated
# reliability by native speakers.
#
# 9 ★★★ Reliable Manually assigned by a terminologist following a
# reliability assessment. Reliable termsshould
# satisfy at least one of the following criteria:
# - having been obtained from a trusted source;
# - having been agreed on by a representative
# body of same-language terminologists;
# - being the common designation of the concept
# in its field.
#
# 10 ★★★★ Very reliable Manually assigned following a reliability
# assessment. Very reliable terms are:
# - well-established and widely accepted by
# experts as the correct designation, or
# - confirmed by a trusted and authoritative
# source, in particular a reliable written
# source.
accuratum:
"?":
# The '?' express what to do when the entire column does not exist, so
# is not a particular value that is missing
_IATE_valorem_codicem: "★★"
_IATE_valorem_descriptionem: |
Automatically assigned to terms entered or updated by native speakers.
_IATE_valorem_nomen: "Minimum reliability"
_IATE_valorem_numerum: 6
"_":
# The '_' express what to do when the column do exist, but one particular
# value does not have any value
_IATE_valorem_codicem: "★"
_IATE_valorem_descriptionem: |
Automatically assigned to terms entered by non-native speakers.
Also, all lookup forms have a reliability of one.
_IATE_valorem_nomen: "Reliability not verified"
_IATE_valorem_numerum: 1
"0":
# The '0' express when the column do exist, but one particular item
# evaluate to 0
_IATE_valorem_codicem: "★"
_IATE_valorem_descriptionem: |
Automatically assigned to terms entered by non-native speakers.
Also, all lookup forms have a reliability of one.
_IATE_valorem_nomen: "Reliability not verified"
_IATE_valorem_numerum: 1
"1":
_IATE_valorem_codicem: "★"
_IATE_valorem_descriptionem: |
Automatically assigned to terms entered by non-native speakers.
Also, all lookup forms have a reliability of one.
_IATE_valorem_nomen: "Reliability not verified"
_IATE_valorem_numerum: 1
"2": {}
"3": {}
"4": {}
"5": {}
"6":
_IATE_valorem_codicem: "★★"
_IATE_valorem_descriptionem: |
Automatically assigned to terms entered or updated by native speakers.
_IATE_valorem_nomen: "Minimum reliability"
_IATE_valorem_numerum: 6
"7": {}
"8": {}
"9":
_IATE_valorem_codicem: "★★★"
_IATE_valorem_descriptionem: |
Manually assigned by a terminologist following a reliability
assessment. Reliable terms should satisfy at least one of the
following criteria:
- having been obtained from a trusted source;
- having been agreed on by a representative body of same-language
terminologists;
- being the common designation of the concept in its field.
N.B. This code was automatically assigned to many entries, regardless
of their previous validation status, following the merger of existing
databases to create IATE. Therefore some entries marked as ‘reliable’
are not necessarily so.
_IATE_valorem_nomen: "Reliable"
_IATE_valorem_numerum: 9
"10":
_IATE_valorem_codicem: "★★★★"
_IATE_valorem_descriptionem: |
Manually assigned following a reliability assessment.
Very reliable terms are:
- well-established and widely accepted by experts as the correct
designation, or
- confirmed by a trusted and authoritative source, in particular a
reliable written source.
_IATE_valorem_nomen: "Very reliable"
_IATE_valorem_numerum: 10
genus_grammaticum:
# TODO: Several languages have more than 3 genders, but 6 are added to
# latin aliases. This need improvement.
# @see https://en.wikipedia.org/wiki/List_of_languages_by_type_of_grammatical_genders
### Lingua Latina ------------------------------------------------------------
# > https://la.wikipedia.org/wiki/Genus_grammaticum
# Genus grammaticum
# Genus grammaticum est aliqua proprietas sive nominis sive interdum etiam
# verbi. Genera circiter in quarta parte linguarum mundi distinguuntur.
# Divisiones principales sunt haec:
#
# - masculinum et femininum
# - (e.g. Francogallice, Hispanice, Hindi-Urdu, Arabice, Hebraice)
# - masculinum, femininum, neutrum
# - (e.g. Latine, Graece, Theodisce, Islandice, Sanscritice)
# - animatum, inanimatum
# - (e.g. Ojibwayense et probabiliter lingua Protoindoeuropaea)
# - commune, neutrum (e.g. Danice, Suecice).
lat_commune:
_aliud: 'TBX_other'
# _codicem: lat_commune
_codicem_TBX: TBX_other
_descriptionem: |
- https://la.wikipedia.org/wiki/Genus_grammaticum
- https://en.wikipedia.org/wiki/List_of_languages_by_type_of_grammatical_genders#Common_and_neuter
codicem_lat: commune
lat_animatum:
_aliud: 'TBX_other'
# _codicem: lat_animatum
_codicem_TBX: TBX_other
_descriptionem: |
- https://la.wikipedia.org/wiki/Genus_grammaticum
- https://en.wikipedia.org/wiki/List_of_languages_by_type_of_grammatical_genders#Animate_and_inanimate
codicem_lat: animatum
lat_femininum:
_descriptionem: |
- https://la.wikipedia.org/wiki/Genus_grammaticum
- https://en.wikipedia.org/wiki/List_of_languages_by_type_of_grammatical_genders
codicem_lat: masculinum
lat_inanimatum:
_aliud: 'TBX_other'
# _codicem: lat_inanimatum
_codicem_TBX: TBX_other
_descriptionem: |
- https://la.wikipedia.org/wiki/Genus_grammaticum
- https://en.wikipedia.org/wiki/List_of_languages_by_type_of_grammatical_genders#Animate_and_inanimate
codicem_lat: inanimatum
lat_neutrum:
_aliud: 'TBX_other'
# _codicem: lat_neutrum
_codicem_TBX: TBX_other
_descriptionem: |
- https://la.wikipedia.org/wiki/Genus_grammaticum
- https://en.wikipedia.org/wiki/List_of_languages_by_type_of_grammatical_genders#Common_and_neuter
codicem_lat: neutrum
lat_masculinum:
_aliud: 'TBX_masculine'
# _codicem: lat_masculinum
_codicem_TBX: TBX_masculine
_descriptionem: |
- https://la.wikipedia.org/wiki/Genus_grammaticum
- https://en.wikipedia.org/wiki/List_of_languages_by_type_of_grammatical_genders
codicem_lat: masculinum
# TODO: Lingua latina, 'More than three grammatical genders'
# @see https://en.wikipedia.org/wiki/List_of_languages_by_type_of_grammatical_genders#More_than_three_grammatical_genders
# Term Base eXchange (TBX) 2008 CC-BY License ................................
# @see http://www.terminorgs.net/downloads/TBX_Basic_Version_3.1.pdf
# @see https://www.gala-global.org/sites/default/files/migrated-pages/docs/tbx_oscar_0.pdf
#
# 6.8 Gender
# Identifier www.isocat.org/datcat/DC-245
# XML representation <termNote type="grammaticalGender">
# Level Term
# Content type Picklist, with permissible values as follows:
# • masculine
# • feminine
# • neuter
# • other
#
# <termNoteSpec name="grammaticalGender" datcatId="ISO12620A-020202">
# <contents datatype="picklist" forTermComp="yes">
# masculine feminine neuter otherGender
# </contents>
# </termNoteSpec>
TBX_masculine:
codicem_TBX: masculine
TBX_feminine:
codicem_TBX: feminine
TBX_neuter:
codicem_TBX: neuter
TBX_other:
codicem_TBX: other
# Trivia: partem ōrātiōnis, https://en.wiktionary.org/wiki/pars_orationis#Latin
partem_orationis:
### Lingua Latina ---------------------------------------------------------
# @see https://en.wikipedia.org/wiki/Latin_grammar
# @see https://la.wikipedia.org/wiki/Grammatica_Latina
# @see http://www.butte.edu/departments/cas/tipsheets/grammar/parts_of_speech.html
# - Charlton T. Lewis and Charles Short, A new Latin Dictionary,
# New York/Oxford 1891 (1879)
# - https://archive.org/details/LewisAndShortANewLatinDictionary
lat_adverbium:
_aliud: 'TBX_adverb|UTX_adverb'
_codicem: lat_adverbium
_codicem_TBX: TBX_adverb
_codicem_UTX: UTX_adverb
_codicem_wikidata: Q380057 # https://www.wikidata.org/wiki/Q380057
_normam: https://la.wikipedia.org/wiki/Adverbium
codicem_lat: adverbium
lat_nomen_adiectivum:
_aliud: 'TBX_adjective|UTX_adjective'
_codicem: lat_nomen_adiectivum
_codicem_TBX: TBX_adjective
_codicem_UTX: UTX_adjective
_codicem_wikidata: Q34698 # https://www.wikidata.org/wiki/Q34698
_normam: https://la.wikipedia.org/wiki/Adiectivum
codicem_lat: nomen_adiectivum
# substantīvum
lat_nomen_substantivum:
_aliud: 'TBX_noun|UTX_noun'
_codicem: lat_nomen_substantivum
_codicem_TBX: TBX_noun
_codicem_UTX: UTX_noun
_codicem_wikidata: Q1084 # https://www.wikidata.org/wiki/Q1084
_normam: https://la.wikipedia.org/wiki/Nomen_substantivum
codicem_lat: nomen_substantivum
lat_nomen_proprium:
_aliud: 'TBX_properNoun|UTX_properNoun'
_codicem: lat_nomen_proprium
_codicem_TBX: TBX_properNoun
_codicem_UTX: UTX_properNoun
_codicem_wikidata: Q147276 # https://www.wikidata.org/wiki/Q147276
_normam: https://la.wikipedia.org/wiki/Nomen_proprium
codicem_lat: nomen_proprium
# TODO: UTX_prenominal, https://pt.wikipedia.org/wiki/Colocação_pronominal
lat_verbum:
_aliud: 'TBX_verb|UTX_verb'
_codicem: lat_verbum
_codicem_TBX: TBX_verb
_codicem_UTX: UTX_verb
_codicem_wikidata: Q24905 # https://www.wikidata.org/wiki/Q24905
_normam: https://la.wikipedia.org/wiki/Verbum_(temporale)
codicem_lat: verbum
# TODO: UTX_vt, Transitive verb, https://en.wikipedia.org/wiki/Transitive_verb
# TODO: UTX_vi, Transitive verb, https://en.wikipedia.org/wiki/Intransitive_verb
# sentence, maybe https://en.wiktionary.org/wiki/sententia
# phrasem1, https://en.wiktionary.org/wiki/phrasis#Latin
# TODO: TBX_other
### ------------------------------------------------------------------------
# http://www.terminorgs.net/downloads/TBX_Basic_Version_3.1.pdf
TBX_noun:
_codicem: DC-1333
codicem_TBX: noun
TBX_verb:
_codicem: DC-1424
codicem_TBX: verb
TBX_adjective:
_codicem: DC-1230
codicem_TBX: adjective
TBX_adverb:
_codicem: DC-1232
codicem_TBX: adverb
TBX_properNoun:
_codicem: DC-384
codicem_TBX: properNoun
TBX_other:
_codicem: DC-4336
codicem_TBX: other
### ------------------------------------------------------------------------
# https://aamt.info/wp-content/uploads/2019/06/utx1.20-specification-e.pdf
UTX_noun:
_nomen: Noun
codicem_UTX: noun
UTX_properNoun:
_nomen: Proper noun
codicem_UTX: properNoun
UTX_verb:
codicem_UTX: verb
UTX_vt:
_nomen: Transitive verb
codicem_UTX: vt
UTX_vi:
_nomen: Intransitive verb
codicem_UTX: vi
UTX_adjective:
codicem_UTX: adjective
UTX_prenominal:
codicem_UTX: prenominal
UTX_adverb:
codicem_UTX: adverb
UTX_sentence:
codicem_UTX: sentence
# Trivia:
# - rem, https://en.wiktionary.org/wiki/res#Latin
# - trānslātiōnem, https://en.wiktionary.org/wiki/translatio#Latin
# - status
# - HXL hashtag #status, https://hxlstandard.org/standard/1-1final/dictionary/#tag_status
# - statum, https://en.wiktionary.org/wiki/status#Latin
# translationem_status:
rem_statum:
### Lingua Latina ---------------------------------------------------------
# TODO: lingua latina
# Trivia:
# - rem, https://en.wiktionary.org/wiki/res#Latin
# - fīnāle, https://en.wiktionary.org/wiki/finalis#Latin
lat_rem_finale:
_aliud: 'TBX_preferred|UTX_approved|XLIFF_final'
codicem_lat: rem_finale
# Trivia: initiāle, https://en.wiktionary.org/wiki/initialis#Latin
lat_rem_initiale:
# TBX do not have valor for 'initial' but we can use accuratus for this
_aliud: 'UTX_provisional|XLIFF_initial'
codicem_lat: rem_initiale
# Trivia:
# - temporārium, https://en.wiktionary.org/wiki/temporarius#Latin
# - non, https://en.wiktionary.org/wiki/non#Latin
# - nātīvum, https://en.wiktionary.org/wiki/nativus#Latin
lat_rem_temporarium_de_non_nativum:
# TBX do not have valor for 'initial' but we can use accuratus for this
_aliud: 'UTX_provisional|XLIFF_initial'
codicem_lat: rem_temporarium_de_non_nativum
# Trivia:
# - temporārium, https://en.wiktionary.org/wiki/temporarius#Latin
lat_rem_temporarium:
# TBX do not have valor for 'initial' but we can use accuratus for this
_aliud: 'UTX_provisional|XLIFF_initial'
codicem_lat: rem_temporarium
# Trivia: vacuum, https://en.wiktionary.org/wiki/vacuus#Latin
lat_rem_vacuum:
_aliud: ''
codicem_lat: rem_vacuum
### TermBase eXchange (TBX) "TBX-Basic" 3.1 --------------------------------
# 3.1 7 Additional information about data categories
# The term type data category is optional. When a term has no term type
# value, it is assumed to be an ordinary entry term that is not an
# abbreviation or a variant of another term or an abbreviation of
# another full form term.
TBX_preferred:
_codicem: DC-72
_codicem_isocat: preferredTerm‐admn‐sts
_descriptionem: |
The term that, among a set of synonymous terms, is most recommended
for use.
codicem_TBX: preferred
TBX_admitted:
_codicem: DC-73
_codicem_isocat: admittedTerm‐admn‐sts
_descriptionem: The term is acceptable for use
codicem_TBX: admitted
TBX_notRecommended:
_codicem: DC-74
_codicem_isocat: deprecatedTerm‐admn‐sts
_descriptionem: The term should not be used.
codicem_TBX: notRecommended
TBX_obsolete:
_codicem: DC-75
_codicem_isocat: supersededTerm‐admn‐sts
_descriptionem: |
The term is no longer used, usually because a more modern term has
replaced it.
codicem_TBX: obsolete
### Universal Terminology eXchange UTX 1.20 --------------------------------
# The term status field indicates the status of a term.
# There are 7 statuses: blank, provisional, approved, non-standard,
# forbidden, rejected, or obsolete. Only a glossary administrator and a
# delegate can change the value of a term status.
# Note: If a glossary does not have a term status field, all entries
# are considered to be approved.
# (...)
# 5. Advanced concepts
# 5.1 Single term status and per-language term status
# There are two methods of applying term status: single term status and
# per-language term status. A glossary can use either of these to
# indicate the term status.
# 5.1.1 Single term status
#
# (...)
#
# 5.1.2 Per-language term status
# Per-language term status specifies the term status of a term for a
# particular language rather than a pair of two languages. For example,
# if it is used in a bilingual unidirectional glossary, it requires two
# term status columns, one for the source language, and one for the
# target language.
#
# If a language tag is not specified, the term status is treated as
# single term status (UTX 1.11 style).
# Note: Per-language term status is introduced in UTX 1.20 to handle
# bilingual bidirectional glossaries and multilingual glossaries.
#
# 5.1.3 Term status behaviors for an MT dictionary
#
# (...)
#
UTX_provisional:
_descriptionem: |
The term status "provisional" indicates that a target term is
proposed by a contributor but not yet authorized by the glossary
administrator. As provisional status is temporary, the glossary
administrator should promptly decide the term status such as
"approved."
Note: The glossary administrator may also choose to exclude (delete)
the term from the glossary, or move it to another glossary
codicem_UTX: provisional
UTX_approved:
_descriptionem: |
The term status "approved" indicates that an entry has been approved
for the particular glossary (domain) by the glossary administrator.
An approved status indicates that the term must be used with the
highest priority, whenever applicable. If a term has synonyms or
alternative spellings, such as "plugin" and "plugin," only one of
these should have approved status.
An approved term in one language is paired with another approved term
in another language. If the parts of speech of these multiple entries
are different, then they are different terms. For example, "plot"
can be a noun and a verb, and each can have approved status.
codicem_UTX: approved
# 4.5.3 Blank term status
# If the term status is left blank, it is considered as approved
# (a change from UTX 1.11). The term status of a term paired with a
# non-standard, forbidden, rejected, and obsolete term (explained later)
# can also be blank, which implicitly indicates approved status.
UTX_non-standard:
_descriptionem: |
The term status "non-standard" indicates one or more terms that are
less-preferred within a group of synonyms or alternative spellings.
Note: The glossary administrator decides whether the term is
less-preferred or not for a particular glossary. Therefore, this
status could vary in different glossaries, or with a different
glossary administrator.
codicem_UTX: non-standard
UTX_forbidden:
_descriptionem: |
The term status "forbidden" indicates that a term must not be used.
A term is marked as forbidden not only for being inappropriate as a
translation, but also if it is inappropriate within the context of the
end-result document.
A forbidden term, unlike a non-standard term, should not be provided
as a translation candidate.
Note: A term is “forbidden” because it is inappropriate from
linguistic, social, terminological, branding, or other viewpoints.
Up to UTX 1.11, only a target term could be indicated as forbidden.
UTX 1.20 allows any term (including a source term) to be indicated a
forbidden.
Forbidden terms can be exported from a UTX glossary for terminological
checking. Based on this information, a function of a translation tool
or a dedicated terminological checker can ascertain whether
translation files contain any undesirable terms.
codicem_UTX: forbidden
UTX_rejected:
_descriptionem: |
The term status "rejected" indicates that a term is not appropriate
for inclusion in a glossary. Rejected terms can be kept in the glossary
for record keeping, moved into a separate list, or deleted at a
later time.
codicem_UTX: rejected
UTX_obsolete:
_descriptionem: |
The term status "obsolete" indicates that a term was previously used,
but should no longer be used. Obsolete terms can be kept in the
glossary for record keeping, moved into a separate list, or deleted
at a later time.
codicem_UTX: obsolete
### XML Localization Interchange File Format XLIFF 2.1 ---------------------
# @see https://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#state
# @see http://docs.oasis-open.org/xliff/xliff-core/v2.1/os/xliff-core-v2.1-os.html#substate
#
# State - indicates the state of the translation of a segment.
#
# Value description: The value MUST be set to one of the following values:
#
# initial - indicates the segment is in its initial state.
# translated - indicates the segment has been translated.
# reviewed - indicates the segment has been reviewed.
# final - indicates the segment is finalized and ready to be used.
#
# The 4 defined states constitute a simple linear state machine that
# advances in the above given order. No particular workflow or process is
# prescribed, except that the three states more advanced than the default
# initial assume the existence of a Translation within the segment.
# One can further specify the state of the Translation using the subState
# attribute.
#
# Default value: initial
XLIFF_initial:
_descriptionem: indicates the segment is in its initial state
codicem_XLIFF: initial
XLIFF_translated:
_descriptionem: indicates the segment has been translated
codicem_XLIFF: translated
XLIFF_reviewed:
_descriptionem: indicates the segment has been reviewed
codicem_XLIFF: reviewed
XLIFF_final:
_descriptionem: indicates the segment is finalized and ready to be used
codicem_XLIFF: final
# Trivia:
# - terminum, https://en.wiktionary.org/wiki/terminus#Latin
# - typum, https://en.wiktionary.org/wiki/typus#Latin
terminum_typum: &ontologia_aliud_terminum_typum
### TermBase eXchange (TBX) "TBX-Basic" 3.1 IATE ------------------------
# @see http://www.terminorgs.net/downloads/TBX_Basic_Version_3.1.pdf
# @see https://iate.europa.eu/fields-explained
# @see https://www.gala-global.org/sites/default/files/migrated-pages/docs/tbx_oscar_0.pdf
# @TODO: maybe add the ones from tbx_oscar_0
TBX_fullForm:
_codicem: DC-321
codicem_TBX: fullForm
TBX_acronym:
_codicem: DC-334
codicem_TBX: acronym
TBX_abbreviation:
_codicem: DC-331
codicem_TBX: abbreviation
TBX_formula: # Used on IATE
codicem_TBX: formula
TBX_shortForm:
_codicem: DC-332
codicem_TBX: shortForm
TBX_variant:
_codicem: DC-330
codicem_TBX: variant
TBX_phrase:
_codicem: DC-339
codicem_TBX: phrase
### Universal Terminology eXchange UTX 1.20 --------------------------------
# 4.4.2 sentence and special characters
# sentence is a special pos field item that indicates that the "term"
# is a sentence.
# Note: sentence should only be used when necessary. sentence would be
# used for a user interface message in the form of a sentence, for example.
# Entries of pairs of translated sentences should be stored in a translation
# memory format (such as TMX) rather than a glossary. When a UTX glossary is
# exported for an MT system that does not treat sentence as a type of
# part of speech, sentence entries can be treated as nouns.
# @deprecated rem_typum. Use terminum_typum
rem_typum: *ontologia_aliud_terminum_typum
ontologia_aliud_familiam:
lat:
TBX:
XLIFF:
UTX:
# end::ontologia_aliud[]
# Trivia: rēgulam, https://en.wiktionary.org/wiki/regula#Latin
ontologia_regulam:
# @see https://regex101.com/ (online regex tester, multiple engines)
# @see https://en.wikipedia.org/wiki/Comparison_of_regular_expression_engines
# @see https://docs.python.org/3/howto/regex.html
# @see https://learnbyexample.github.io/py_regular_expressions/groupings-and-backreferences.html
# @see https://developer.mozilla.org/en-US/docs/Web/JavaScript
# /Reference/Global_Objects/RegExp
# @see https://pkg.go.dev/regexp/syntax
# @see https://www.php.net/manual/en/reference.pcre.pattern.syntax.php
# /workspace/git/EticaAI/tico-19-hxltm/scripts/fn/linguacodex.py --de_bcp47_simplex --de_codex pt-Latn-g-port1283 | jq
exemplum:
hxl_caput:
- hxl: '#item conceptum codicem'
divisionem: '#item'
classem: ' conceptum'
# speciem: ' codicem'
- hxl: '#meta linguam i_pt i_por ig_port1283 is_latn'
# BCP47 extended
# bcp47e: pt-Latn-g-port1283
divisionem: '#meta'
classem: ' linguam'
# speciem: i_pt i_por ig_port1283 is_latn
- hxl: '#item linguam i_pt i_por ig_port1283 is_latn ib_t_en_latn rem'
# BCP47 extended
# bcp47e: pt-Latn-g-port1283-t-en-latn
divisionem: '#item'
classem: ' linguam'
# speciem: i_pt i_por ig_port1283 is_latn
- hxl: '#meta linguam i_en i_eng ig_stan1293 ir_076 is_latn it_1_pt_por_latn iu_1_emerrocha iw_1_bing ix_ambiguum ix_periculo v_linguam_maximum'
# BCP47 extended
# bcp47e: pt-Latn-g-port1283-t-en-latn
divisionem: '#meta'
classem: ' linguam'
# speciem: i_pt i_por ig_port1283 is_latn
# /workspace/git/EticaAI/tico-19-hxltm/scripts/fn/linguacodex.py --de_bcp47_simplex --de_codex g-port1283-aaa-bbb | jq
# Trivia: strūctūram, https://en.wiktionary.org/wiki/structura#Latin
structuram:
# basim -> divisionem, classem, speciem
basim:
# https://regex101.com/r/XUOncM/5
# https://regex101.com/r/Ff27ID/4
javascript: >-
(?<divisionem>(\#item|\#meta))(?<classem>(\ conceptum|\ linguam|\ terminum))((?<!conceptum)((?<linguam_implicitum_de>(\ ii_de_linguam[a-z_]*))|(?<linguam_implicitum_est>(\ ii_est_linguam[a-z_]*)))|((?<linguam_iso639_1_a>(\ i_\w\w))?(?<linguam_iso639_3_a>(\ i_\w\w\w))(?<linguam_glotto>(\ ig_[a-z]{4}\d{4}))?((?<linguam_iso3166_2_a>(\ ir_[a-z]{2}))|(?<linguam_iso3166_3_a>(\ ir_[a-z]{3}))|(?<linguam_unm49>(\ ir_[0-9]{3})))?(((?<linguam_iso15924_a>(\ is_[a-z]{4})))|((?<linguam_iso15924_n>(\ is_[0-9]{3}))))((?<linguam_translationem_de_linguam>((\ it_[1,9]{1}_[a-z0-9_]*){1,})))?(((?<linguam_translationem_humanum>((\ iu_[1,9]{1}_[a-z0-9_]*){1,}){1,})))?(((?<linguam_translationem_machinam>((\ iw_[1,9]{1}_[a-z0-9_]*){1,}){1,})))?(((?<linguam_privatum>((\ ix_[a-z0-9]{2,8})){1,})))?))?.*(?<datum_vocabularium>(\ v_[a-z_]*))?
# \#(?<divisionem>(item|meta)). ?(?<classem>(conceptum|linguam|terminum))(?<speciem>.*)
# https://learnbyexample.github.io/py_regular_expressions/groupings-and-backreferences.html
python: |
# I: Abstractum (#meta) aut concretum (#item)
(?P<divisionem>
(\#item|\#meta)
)
# II: Classem: conceptum, linguam aut terminum
(?P<classem>
(\ conceptum|\ linguam|\ terminum)
)
# III: linguam et terminum; quod linguam?
# _[eng-Lant]Know non-enforcement: this will tolerate if classem = conceptum[eng-Latn]_
((?<!conceptum)
# III.I: implicitum aut explicitum
## implicitum est
(
(?P<linguam_implicitum_de>(\ ii_de_linguam[a-z_]*))
|
(?P<linguam_implicitum_est>(\ ii_est_linguam[a-z_]*))
)
| # ...aut...
## explicitum est
(
(?P<linguam_iso639_1_a>(\ i_\w\w))?
(?P<linguam_iso639_3_a>(\ i_\w\w\w)) # ISO 639-3 requisitum!
(?P<linguam_glotto>(\ ig_[a-z]{4}\d{4}))?
( # Locum
(?P<linguam_iso3166_2_a>(\ ir_[a-z]{2}))
|
(?P<linguam_iso3166_3_a>(\ ir_[a-z]{3}))
|
(?P<linguam_unm49>(\ ir_[0-9]{3}))
)?
( # Scriptum codicem: requisitum!
((?P<linguam_iso15924_a>(\ is_[a-z]{4})))
|
((?P<linguam_iso15924_n>(\ is_[0-9]{3})))
)
( # BCP 47 Extension T style, de linguam
(?P<linguam_translationem_de_linguam>((\ it_[1,9]{1}_[a-z0-9_]*){1,}))
)?
( # Humam translator, list
((?P<linguam_translationem_humanum>((\ iu_[1,9]{1}_[a-z0-9_]*){1,}){1,}))
)?
# iv_ not used
( # _machinam translator, list
((?P<linguam_translationem_machinam>((\ iw_[1,9]{1}_[a-z0-9_]*){1,}){1,}))
)?
( # BCP 47 Private attribute
((?P<linguam_privatum>((\ ix_[a-z0-9]{2,8})){1,}))
)?
)
)?
#(?P<etcetera>
# (\ [0-9a-z_]*)
#)?
.* # TODO: remove this
(?P<datum_vocabularium>
(\ v_[a-z_]*)
)?
# subspeciem:
# javascript: >-
# \(?<divisionem>(#item|#meta)). ?(?<classem>(conceptum|linguam|terminum))(?<speciem>.*)
# python: >-
# \(?P<divisionem>(#item|#meta)). ?(?P<classem>(conceptum|linguam|terminum))(?P<speciem>.*)
# https://regex101.com/r/ijNoTe/1
# https://regex101.com/delete/nERE0vlhhSmLY2ircayaduP8
# https://regex101.com/r/Ff27ID/2
# named group:
# (?P<hxltag>\#[a-zA-Z_]*)(?P<hxlattrs>\ \w*){0,20}
# abstractum, https://en.wiktionary.org/wiki/abstractus#Latin
abstractum:
python: "^#meta"
# classem, https://en.wiktionary.org/wiki/classis#Latin
conceptum_classem:
python: '(^#item|^#meta)\ conceptum'
# concrētum, https://en.wiktionary.org/wiki/concretus
concretum:
python: "^#item"
# Glottocode, https://glottolog.org/
glotto:
# 'port1283' ad ' ig_port1283 '
python: '(?<=ig_)([\w]{4}[\d]{4})'
# basim, https://en.wiktionary.org/wiki/basis#Latin
hxltm_basim:
python: '(^#item|^#meta)(\ conceptum|\ linguam|\ terminum)'
# rem: '(^#item|^#meta)(\ conceptum|\ linguam|\ terminum)'
# HXL vacabularies, v_(...)
hxl_v:
# existere: '(\ v_[\w|\d] )'
python: '(?<=\ )(v_[\w|\d] )'
iso639_1:
# 'en' in ' i_en'
python: '(?<=i_)([\w]{2})\ '
iso639_3:
# 'eng' in ' i_eng'
python: '(?<=i_)([\w]{3})\ '
iso15924:
# ISO 15924: 'latn' in ' is_latn '
python: 'TODO'
iso15924_a:
# ISO 15924: 'latn' in ' is_latn '
python: '(?<=is_)([\w]{4})'
iso15924_n:
# ISO 15924: '215' in is_215
python: '(?<=is_)([\d]{3})'
linguam_classem:
python: '(^#item|^#meta)\ linguam'
# explicitum, https://en.wiktionary.org/wiki/implicitus#Latin
# implicitum, https://en.wiktionary.org/wiki/explicitus#Latin
# The bare minimum to be considered linguan is i_www i_Wwww or i_www i_nnn
linguam_basim_explicitum:
python: '(\ i_([\w]{2}))([\w]{4}[\d]{4})?.*(\ is_[\w]{4}|\ is_[\d]{3}){1}' # needs more test
# existere: '(\ is_([\w]{4})|([\d]{3}))'
# Implicitly language, value present as value on another column
linguam_basim_implicitum_de:
python: '(\ de_linguam_fontem|\ de_linguam_objectivum|\ de_linguam)'
# Implicitly language, this column have values from equivalent de_linguam*
linguam_basim_implicitum_est:
python: '(\ est_linguam_fontem|\ est_linguam_objectivum|\ est_linguam)'
terminum_classem:
python: '(^#item|^#meta)\ terminum'
### Example test cases
#meta linguam i_en i_eng ig_stan1293 ir_076 is_latn it_1_pt_por_latn iu_1_emerrocha iw_1_bing ix_ambiguum ix_periculo v_linguam_maximum
#meta linguam i_en i_eng ig_stan1293 ir_076 is_latn it_1_pt_por_latn it_2_es_spa_latn iu_1_emerrocha ix_ambiguum ix_periculo
#item terminum ii_de_linguam_fontem
#meta conceptum i_en i_eng is_latn
#item conceptum codicem
#meta linguam i_en i_eng is_latn
#meta linguam i_en i_eng ig_stan1293 is_latn it_1_en_por_latn ix_ambigua
#meta linguam i_en i_eng is_215
#item terminum ii_est_linguam v_linguam_maximum
#item terminum ii_est_linguam v_linguam_a
#item terminum ii_est_linguam v_linguam
#item terminum ii_de_linguam
#item terminum ii_est_linguam_fonte v_lngam
#item terminum ii_est_linguam_objectivum
#item terminum ii_de_linguam_fontem
#item terminum i_en i_eng is_latn rem
## Dummy example I
#item terminum
# Regexes test
# - https://regex101.com/r/2VpoTS/1
# - Delete https://regex101.com/delete/aCOj7ECKu1kLyJOCMRZS2SeL
## TBX Basic 3.1 (http://www.terminorgs.net/downloads/TBX_Basic_Version_3.1.pdf)
# 5 Mandatory data categories
# There are only two mandatory data categories in TBX-Basic: term, and language.
# Several of the remaining data categories, including definition, context, part of speech, and subject
# field are very important and should be included in a terminology whenever possible. The most
# important non-mandatory data category is part of speech
#### TO DOs ____________________________________________________________________
# - https://site.matecat.com/support/managing-language-resources/add-glossary/
# - http://docs.translatehouse.org/projects/translate-toolkit/en/latest/commands/csv2tbx.html
# - https://github.com/lokalise/i18n-ally
# - https://marketplace.visualstudio.com/items?itemName=lokalise.i18n-ally
# - Which characters need to be escaped when using Bash?
# - https://stackoverflow.com/questions/15783701/which-characters-need-to-be-escaped-when-using-bash
License
The EticaAI has dedicated the work to the public domain by waiving all of their rights to the work worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law. You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission.