Skip to content
/ Lumix Public

Pre-Processing data before pre-train and sft

Notifications You must be signed in to change notification settings

xcxhy/Lumix

Repository files navigation

Lumix

This is an open source project for preparing large language model data. Due to the fact that everyone is pre training and fine-tuning the volume model, most public projects also rarely mention the details of handling cleaning data.

I hope this project can help everyone to complete the data cleaning work as much as possible, so that everyone can focus more on model training and fine-tuning.

Project structure

About

Pre-Processing data before pre-train and sft

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published