Stars
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
[ICML 2024] CLLMs: Consistency Large Language Models
RAGOnMedicalKG,将大模型RAG与KG结合,完成demo级问答,旨在给出基础的思路。
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM.
Build deep learning applications in a new and easy way.
The simplest, fastest repository for training/finetuning medium-sized GPTs.
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
PromptCBLUE: a large-scale instruction-tuning dataset for multi-task and few-shot learning in the medical domain in Chinese
We unified the interfaces of instruction-tuning data (e.g., CoT data), multiple LLMs and parameter-efficient methods (e.g., lora, p-tuning) together for easy use. We welcome open-source enthusiasts…
本项目旨在收集开源的表格智能任务数据集(比如表格问答、表格-文本生成等),将原始数据整理为指令微调格式的数据并微调LLM,进而增强LLM对于表格数据的理解,最终构建出专门面向表格智能任务的大型语言模型。
雅意信息抽取大模型:在百万级人工构造的高质量信息抽取数据上进行指令微调,由中科闻歌算法团队研发。 (Repo for YAYI Unified Information Extraction Model)
Robust Speech Recognition via Large-Scale Weak Supervision
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
A curated list of practical guide resources of Medical LLMs (Medical LLMs Tree, Tables, and Papers)
搜索、推荐、广告、用增等工业界实践文章收集(来源:知乎、Datafuntalk、技术公众号)
本项目是一个面向小白开发者的大模型应用开发教程,在线阅读地址:https://datawhalechina.github.io/llm-universe/
整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。
Locating and editing factual associations in GPT (NeurIPS 2022)
闻达:一个LLM调用平台。目标为针对特定环境的高效内容生成,同时考虑个人和中小企业的计算资源局限性,以及知识安全和私密性问题
GPT4 & LangChain Chatbot for large PDF docs
Code for fintune ChatGLM-6b using low-rank adaptation (LoRA)
中文LLaMA&Alpaca大语言模型 本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
chinese document classification of layoutlmv3 and layoutxlm