LLM
What would you do with 1000 H100s...
[ICLR 2024] SWE-Bench: Can Language Models Resolve Real-world Github Issues?
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
A framework for few-shot evaluation of language models.
MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.
Get up and running with Llama 3.2, Mistral, Gemma 2, and other large language models.
CoreNet: A library for training deep neural networks
Robust recipes to align language models with human and AI preferences
Modeling, training, eval, and inference code for OLMo
[ICML 2024] Selecting High-Quality Data for Training Language Models
Code accompanying the paper "Massive Activations in Large Language Models"
Minimalistic large language model 3D-parallelism training
Easily embed, cluster and semantically label text datasets
The official implementation of Self-Play Fine-Tuning (SPIN)
Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑🔬
aider is AI pair programming in your terminal
LongWriter: Unleashing 10,000 Word Generation from Long Context LLMs
Scalable toolkit for efficient model alignment
Ongoing research training transformer models at scale