Researcher, scholar, teacher
Profiles
The following are projects I am actively maintaining or contributing to. More might have been added since then.
Name | Description | |
---|---|---|
MTEB | The Massive Text Embedding Benchmark for evaluating document embeddings e.g. for RAG systems. | |
Scandinavian Embedding Benchmark | A Scandinavian Benchmark for evaluating document embeddings | |
DaCy | The State of the Art Danish NLP pipeline for SpaCy | |
tomsup | Theory of Mind Simulation using Python. A package that allows for easy agent-based modeling of recursive Theory of Mind agents | |
Augmenty | An structured augmentation library for augmenting both the texts and the annotations | |
TextDescriptives | A Python library for calculating a large variety of metrics from text | |
timeseriesflattener | for converting irregularly spaced time series, such as electronic health records, into statically shaped data frames. | |
Asent | An educational library for performing transparent sentiment analysis | |
ScandEval | An evaluation benchmark for the Scandinavian and Germanic language models evaluating natural language understanding and generation. | |
swift-python-cookiecutter | The cookie-cutter template I actively use for my packages | |
UD_Danish-DDT | The Danish Universal Dependencies Treebank, a high quality linguistic resource |
A selection of contributions to open-source libraries, besides the ones to which I am actively contributing.
Library | Contribution |
---|---|
Huggingface Libraries: | |
datasets | Fixes for minor compatibility issue with numpy >=2.0.0 |
transformers | Bugfixes for training masked language models using flax |
SpaCy core libraries: | |
spacy-transformers | Allow passing arguments to the transformer backend to obtain attention weights |
confection | Fixed issue where config where could not be filled |
spacy-curated-transformers | Added support for ELECTRA tokenizers |
curated-transformers | Added ELECTRA |