-
KAIST AI (OSI LAB)
- Seoul, Korea
-
06:57
(UTC 09:00) - namgyu.com
- https://orcid.org/0000-0002-2445-3026
- @itsnamgyu
Highlights
- Pro
Stars
📰 Must-read papers on KV Cache Compression (constantly updating 🤗).
Evaluation of speculative inference over multilingual tasks
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]
Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting…
Doing simple retrieval from LLM models at various context lengths to measure accuracy
Deep learning for dummies. All the practical details and useful utilities that go into working with real models.
Official repository for EXAONE built by LG AI Research
Doing simple retrieval from LLM models at various context lengths to measure accuracy
OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset
Official PyTorch implementation of DistiLLM: Towards Streamlined Distillation for Large Language Models (ICML 2024)
Official implementation of "Perturbed-Attention Guidance"
Official repository of "Distort, Distract, Decode: Instruction-Tuned Model Can Refine its Response from Noisy Instructions", ICLR 2024 Spotlight
Official repository of "HARE: Explainable Hate Speech Detection with Step-by-Step Reasoning", Findings of EMNLP 2023
The official PyTorch implementation of Google's Gemma models
Python Fire is a library for automatically generating command line interfaces (CLIs) from absolutely any Python object.
The hub for EleutherAI's work on interpretability and learning dynamics
Modeling, training, eval, and inference code for OLMo
ACL 2024 | LooGLE: Long Context Evaluation for Long-Context Language Models
A framework for few-shot evaluation of language models.
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
[EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
Fast and memory-efficient exact attention
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPT