Unsupervised Word Segmentation for Neural Machine Translation and Text Generation
-
Updated
Aug 7, 2024 - Python
Unsupervised Word Segmentation for Neural Machine Translation and Text Generation
Unsupervised text tokenizer focused on computational efficiency
Fast and customizable text tokenization library with BPE and SentencePiece support
Explains nlp building blocks in a simple manner.
Fast bare-bones BPE for modern tokenizer training
Subword Encoding in Lattice LSTM for Chinese Word Segmentation
Machine Learning for Phishing Website Detection
Simple-to-use scoring function for arbitrarily tokenized texts.
GPT3 encoder & decoder tool written in Swift
High performance unsupervised text tokenization for Ruby
Learning BPE embeddings by first learning a segmentation model and then training word2vec
Sentiment-based classification for stock article title using PhoBert
Add a description, image, and links to the bpe topic page so that developers can more easily learn about it.
To associate your repository with the bpe topic, visit your repo's landing page and select "manage topics."