-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Issues: hpcaitech/ColossalAI
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[FEATURE]: How to skip a custom node from generating strategies in colossal-auto?
enhancement
New feature or request
#5983
opened Aug 8, 2024 by
robotsp
[BUG]: Pytest with a specific config failed after PR #5868
bug
Something isn't working
shardformer
#8639
opened Jul 29, 2024 by
GuangyaoZhang
1 task done
[FEATURE]: Request updates for pretraining roberta
enhancement
New feature or request
#8638
opened Jul 29, 2024 by
jiahuanluo
[BUG]: Pipeline Parallelism fails when input shape varies
bug
Something isn't working
shardformer
#8630
opened Jul 25, 2024 by
GuangyaoZhang
1 task done
[BUG]: Something isn't working
pip install .
error: identifier "__hsub" is undefined
bug
#5929
opened Jul 19, 2024 by
jtmer
1 task done
[BUG]: Shardformer FP8 communication training accuracy degradation
bug
Something isn't working
#5920
opened Jul 18, 2024 by
GuangyaoZhang
1 task done
[BUG]: Low_Level_Zero plugin crashes with LoRA
bug
Something isn't working
#5909
opened Jul 15, 2024 by
Fallqs
1 task done
[PROPOSAL]: Does the LowLevelZero Plugin Support Lora, This Code Is Confusing
enhancement
New feature or request
#5908
opened Jul 15, 2024 by
YeAnbang
1 task
[BUG]: run opt inference but failed with No module named 'energonai'
bug
Something isn't working
#5906
opened Jul 13, 2024 by
munger1985
1 task done
Whether to support the training acceleration of the StableDiffusion3 algorithm model?
enhancement
New feature or request
#5900
opened Jul 10, 2024 by
tensorflowt
[BUG]: ColossalChat train sft is skipped with opt-1.3b model
bug
Something isn't working
#5865
opened Jun 27, 2024 by
smash1999
1 task done
[BUG]: Colossal AI failed to load ChatGLM2
bug
Something isn't working
#5861
opened Jun 26, 2024 by
hiprince
1 task done
Use gemini plugin and LowLevelZero to run llama2_7b. In the pulgin in gemini, set the policy to static, shard_param_frac, offload_optim_frac, and offload_param_frac to 0.0, making gemini equal to zero2, and set stage to 2 in LowLevelZero. Using bf16 for training, and comparing the two plugins, we found that the GPU memory usage of gemini is higher than that of LowLevelZero. Why is this? In principle, gemini should save more GPU memory
#5830
opened Jun 18, 2024 by
JJGSBGQ
[FEATURE]: LoRA with sharded model
enhancement
New feature or request
#5826
opened Jun 17, 2024 by
KaiLv69
[BUG]: Shardformer failure with torch 2.3
bug
Something isn't working
#5757
opened May 27, 2024 by
Edenzzzz
1 task done
Previous Next
ProTip!
Follow long discussions with comments:>50.