Starred repositories
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
Rasa UI is a frontend for the Rasa Framework
BERT-based intent and slots detector for chatbots.
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
中英文敏感词、语言检测、中外手机/电话归属地/运营商查询、名字推断性别、手机号抽取、身份证抽取、邮箱抽取、中日文人名库、中文缩写库、拆字词典、词汇情感值、停用词、反动词表、暴恐词表、繁简体转换、英文模拟中文发音、汪峰歌词生成器、职业名称词库、同义词库、反义词库、否定词库、汽车品牌词库、汽车零件词库、连续英文切割、各种中文词向量、公司名字大全、古诗词库、IT词库、财经词库、成语词库、地名词库、…
A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
The simplest, fastest repository for training/finetuning medium-sized GPTs.
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
tqchen / xgboost
Forked from dmlc/xgboosthttps://github.com/dmlc/xgboost
坚持分享 GitHub 上高质量、有趣实用的开源技术教程、开发者工具、编程网站、技术资讯。A list cool, interesting projects of GitHub.
LeetCode Solutions: A Record of My Problem Solving Journey.( leetcode题解,记录自己的leetcode解题之路。)
Jupyter notebooks for the code samples of the book "Deep Learning with Python"
Implementation of BERT that could load official pre-trained models for feature extraction and prediction
《Hello 算法》:动画图解、一键运行的数据结构与算法教程。支持 Python, Java, C , C, C#, JS, Go, Swift, Rust, Ruby, Kotlin, TS, Dart 代码。简体版和繁体版同步更新,English version ongoing
自然语言处理(NLP)教程,包括:词向量,词法分析,预训练语言模型,文本分类,文本语义匹配,信息抽取,翻译,对话。
TextGen: Implementation of Text Generation models, include LLaMA, BLOOM, GPT2, BART, T5, SongNet and so on. 文本生成模型,实现了包括LLaMA,ChatGLM,BLOOM,GPT2,Seq2Seq,BART,T5,UDA等模型的训练和预测,开箱即用。
pytextclassifier is a toolkit for text classification. 文本分类,LR,Xgboost,TextCNN,FastText,TextRNN,BERT等分类模型实现,开箱即用。
text2vec, text to vector. 文本向量表征工具,把文本转化为向量矩阵,实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型,开箱即用。
pycorrector is a toolkit for text error correction. 文本纠错,实现了Kenlm,T5,MacBERT,ChatGLM3,Qwen2.5等模型应用在纠错场景,开箱即用。
Easy-to-use,Modular and Extendible package of deep-learning based CTR models .
A deep matching model library for recommendations & advertising. It's easy to train models and to export representation vectors which can be used for ANN search.
A collection of algorithms and data structures
基于tensorflow 实现的用textcnn方法做情感分析的项目,有数据,可以直接跑。
The code of CIKM'19 paper《Hierarchical Multi-label Text Classification: An Attention-based Recurrent Network Approach》
Word2vec, Fasttext, Glove, Elmo, Bert, Flair pre-train Word Embedding
torch-optimizer -- collection of optimizers for Pytorch