Skip to content
View mbanon's full-sized avatar
shining
shining

Organizations

@paracrawl @bitextor @macocu

Block or report mbanon

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Hammerfest web game sources

Mathematica 43 10 Updated Sep 26, 2023

What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets

Python 181 19 Updated Sep 9, 2024
HTML 3 Updated Aug 20, 2024

Targetted language identifier, based on FastText and Hunspell.

Python 27 4 Updated Sep 4, 2024

Data Analytics Tool

JavaScript 8 1 Updated Oct 5, 2024

OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.

Python 46 13 Updated Sep 7, 2024

The Database Toolkit for Python

Python 9,509 1,418 Updated Oct 5, 2024

Corset is a web-based data selection portal that helps you getting relevant data from massive amounts of parallel data.

SCSS 17 3 Updated Nov 6, 2023
Python 6 1 Updated May 31, 2023

Pre-filtering step for bicleaner

Python 4 2 Updated Jul 26, 2024

Monocleaner models repository

1 Updated Nov 18, 2021
Python 6 1 Updated Sep 6, 2023

Hunspell dictionaries in UTF-8

JavaScript 1,193 398 Updated Sep 9, 2024

Repository for data models, dictionaries and more resources for Bicleaner

6 Updated Dec 15, 2022

Repository of Bicleaner AI models

5 Updated Mar 28, 2023

Transform TMX to text

Python 1 Updated Aug 3, 2023

Bicleaner fork that uses neural networks

Python 38 4 Updated Jul 26, 2024

Utility that will help you to ROAM (Random Omit Anonymize and Mix) your parallel corpus.

Python 9 2 Updated Feb 26, 2024

A Corpus of Quotes

68 16 Updated May 4, 2019

Code for Neural Inverse Knitting: From Images to Manufacturing Instructions

Python 45 14 Updated Nov 14, 2023

A React site simulates knitting different stitch widths using a skein of variegated yarn.

JavaScript 2 Updated Nov 27, 2018

Tool for manual evaluation of parallel sentences.

PHP 14 4 Updated Oct 19, 2023
Python 67 18 Updated Aug 8, 2024

Program used to split text into segments

Java 2 1 Updated Sep 19, 2022

Program used to split text into segments

Java 25 9 Updated Sep 15, 2023

Print resource usage of processes to stderr with LD_PRELOAD

C 3 Updated Jul 25, 2016

Tool to fix bitexts and tag near-duplicates for removal

Python 29 3 Updated Aug 19, 2024

Transform TMX to text

Python 29 10 Updated Nov 23, 2022

Results of the human evaluation

Rich Text Format 5 3 Updated Dec 9, 2020

Iterative JSON parser with Pythonic interface

Python 615 130 Updated Jan 15, 2020
Next