hpcaitech / ColossalAI Public

Notifications You must be signed in to change notification settings
Fork 4.3k
Star 38.5k

Code
Issues 395
Pull requests 39
Discussions
Actions
Projects 9
Wiki
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Wiki
Security
Insights

Issues: hpcaitech/ColossalAI

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

395 Open 1,239 Closed

Author

Filter by author

Label

Filter by label

Use alt click/return to exclude labels

or ⇧ click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[FEATURE]: How to skip a custom node from generating strategies in colossal-auto? enhancement

New feature or request

#5983 opened Aug 8, 2024 by robotsp

[fp8] support amp

#5974 opened Aug 7, 2024 by ver217

[fp8] support hybrid parallel plugin

#5972 opened Aug 7, 2024 by wangbluo

qwen2 fp8 forward/backward

#5971 opened Aug 7, 2024 by wangbluo

[BUG]: Hang on startup bug

Something isn't working

#5969 opened Aug 6, 2024 by rob-hen

1 task done

support moe

#5954 opened Jul 30, 2024 by flybird11111

[BUG]: Pytest with a specific config failed after PR #5868 bug

Something isn't working

shardformer

#8639 opened Jul 29, 2024 by GuangyaoZhang

1 task done

[FEATURE]: Request updates for pretraining roberta enhancement

New feature or request

#8638 opened Jul 29, 2024 by jiahuanluo

[Feature]: support FP8 communication in Gemini

#8633 opened Jul 26, 2024 by BurkeHulk

[BUG]: Pipeline Parallelism fails when input shape varies bug

Something isn't working

shardformer

#8630 opened Jul 25, 2024 by GuangyaoZhang

1 task done

[BUG]: pip install . error: identifier "__hsub" is undefined bug

Something isn't working

#5929 opened Jul 19, 2024 by jtmer

1 task done

[Feature]: support FP8 communication in FSDP

#5927 opened Jul 19, 2024 by BurkeHulk

[Feature]: support FP8 communication in DDP

#5926 opened Jul 19, 2024 by BurkeHulk

[BUG]: Shardformer FP8 communication training accuracy degradation bug

Something isn't working

#5920 opened Jul 18, 2024 by GuangyaoZhang

1 task done

[Feature] support shardformer

#5916 opened Jul 17, 2024 by GuangyaoZhang

[BUG]: Low_Level_Zero plugin crashes with LoRA bug

Something isn't working

#5909 opened Jul 15, 2024 by Fallqs

1 task done

[PROPOSAL]: Does the LowLevelZero Plugin Support Lora, This Code Is Confusing enhancement

New feature or request

#5908 opened Jul 15, 2024 by YeAnbang

1 task

[BUG]: run opt inference but failed with No module named 'energonai' bug

Something isn't working

#5906 opened Jul 13, 2024 by munger1985

1 task done

Whether to support the training acceleration of the StableDiffusion3 algorithm model？ enhancement

New feature or request

#5900 opened Jul 10, 2024 by tensorflowt

training issue

#5890 opened Jul 5, 2024 by MaleekaA

[BUG]: ColossalChat train sft is skipped with opt-1.3b model bug

Something isn't working

#5865 opened Jun 27, 2024 by smash1999

1 task done

[BUG]: Colossal AI failed to load ChatGLM2 bug

Something isn't working

#5861 opened Jun 26, 2024 by hiprince

1 task done

Use gemini plugin and LowLevelZero to run llama2_7b. In the pulgin in gemini, set the policy to static, shard_param_frac, offload_optim_frac, and offload_param_frac to 0.0, making gemini equal to zero2, and set stage to 2 in LowLevelZero. Using bf16 for training, and comparing the two plugins, we found that the GPU memory usage of gemini is higher than that of LowLevelZero. Why is this? In principle, gemini should save more GPU memory

#5830 opened Jun 18, 2024 by JJGSBGQ

[FEATURE]: LoRA with sharded model enhancement

New feature or request

#5826 opened Jun 17, 2024 by KaiLv69

[BUG]: Shardformer failure with torch 2.3 bug

Something isn't working

#5757 opened May 27, 2024 by Edenzzzz

1 task done

Previous 1 2 3 4 5 … 15 16 Next

Previous Next

ProTip! Follow long discussions with comments:>50.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly