Neural Magic reposted this
📢 Llama 3.1 is Here, and We're Actively Compressing Them! 📢 Meta unveiled their latest Llama series, featuring an impressive 405 billion parameter model surpassing OpenAI's GPT4o. This milestone significantly boosts open source and the AI community, although the largest model now requires multiple servers (810 GB!). Model compression is crucial! Our (Neural Magic) Llama 3.1 compression project is underway, aiming for cost-effective and sustainable deployments without compromising accuracy. The FP8 quantized Llama 3.1 8B model has already achieved over 99% recovery, with detailed accuracy metrics and deployment guidelines available. Also, we've introduced FP8 model support for all Llama versions in vLLM for immediate use. Explore the latest models here: - Meta-Llama-3.1-8B-Instruct-FP8: https://lnkd.in/dDAcXAAY - Meta-Llama-3.1-8B-Instruct-FP8-dynamic: https://lnkd.in/djtw4GMr For more insights, visit the vLLM Llama 3.1 Blog: - https://lnkd.in/dnZsvKjy Stay tuned for further updates—I'll be sharing more posts in the days ahead! #LLMs #vLLM #AI #MachineLearning #Quantization #NeuralMagic