Stars
A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。
Docker Image for Ubuntu Desktop which support HW GPU accelerated GUI apps. you can access the Container with ssh or remote desktop, just like Cloud VM.
农业知识图谱(AgriKG):农业领域的信息检索,命名实体识别,关系抽取,智能问答,辅助决策
Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting yo…
一个用于提取简体中文字符串中省,市和区并能够进行映射,检验和简单绘图的python模块
中华人民共和国行政区划:省级(省份)、 地级(城市)、 县级(区县)、 乡级(乡镇街道)、 村级(村委会居委会) ,中国省市区镇村二级三级四级五级联动地址数据。
Unify Efficient Fine-tuning of RAG Retrieval, including Embedding, ColBERT,Cross Encoder
基于规则匹配的问答系统中的解析器,the parser of based rule QA system
整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。
Firefly: 大模型训练工具,支持训练Qwen2.5、Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、Llama、Qwen、Baichuan、ChatGLM2、InternLM、Ziya2、Vicuna、Bloom等大模型
Implementing a ChatGPT-like LLM in PyTorch from scratch, step by step
This is an official implementation for the WTW Dataset in "Parsing Table Structures in the Wild " on table detection and table structure recognition.
Example models using DeepSpeed
Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"
LLM (Large Language Model) FineTuning
Minimal example scripts of the Hugging Face Trainer, focused on staying under 150 lines
A simple toy demo of a local voice assistant with whisper and large language model.
Open Language Pre-trained Model Zoo
This repository contains tutorials and examples for Triton Inference Server
Netease Youdao's open-source embedding and reranker models for RAG products.