[Fix] zero optimizer w/ tensor parallel test #167

yhna940 · 2023-03-27T14:31:28Z

Title

[Fix] zero optimizer w/ tensor parallel test

Description

ZeRO was not running in tensor parallel mode, so I fixed this by switching to a model from transformers.

Linked Issues

N/A

KKIEEK · 2023-03-27T15:02:16Z

I think you need to invoke parallelize function or oslo.ready for using TensorParallel.

hyunwoongko · 2023-03-27T19:55:51Z

You don't need to move full model to GPU.
Just use oslo.ready(model, pc).

KKIEEK · 2023-03-28T13:59:42Z

Unfortunately, it still doesn't work because of these line.

E       torch.multiprocessing.spawn.ProcessRaisedException: 
E       
E       -- Process 0 terminated with the following error:
E       Traceback (most recent call last):
E         File "/admin/home/.local/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
E           fn(i, *args)
E         File "/admin/home/vit/tests_deprecated/torch/nn/parallel/data_parallel/zero/test_hybrid.py", line 103, in run_dist
E           run(parallel_context)
E         File "/admin/home/vit/tests_deprecated/torch/nn/parallel/data_parallel/zero/test_hybrid.py", line 89, in run
E           hybrid_optimizer.step()
E         File "/admin/home/vit/oslo/torch/nn/parallel/data_parallel/zero/sharded_optim/sharded_optim.py", line 581, in step
E           norm_group = compute_norm(
E         File "/admin/home/vit/oslo/torch/nn/parallel/data_parallel/zero/sharded_optim/_utils.py", line 230, in compute_norm
E           if is_model_parallel_parameter(p) or mp_rank == 0:
E         File "/admin/home/vit/oslo/torch/nn/parallel/data_parallel/zero/sharded_optim/_utils.py", line 39, in is_model_parallel_parameter
E           return ParallelMode.PIPELINE in parallel_mode or any(
E         File "/admin/home/vit/oslo/torch/nn/parallel/data_parallel/zero/sharded_optim/_utils.py", line 40, in <genexpr>
E           key.startswith("tensor") for key in parallel_mode
E       AttributeError: 'ParallelMode' object has no attribute 'startswith'

>>> from oslo.torch.distributed import ParallelMode
>>> ParallelMode.TENSOR_1D.startswith('tensor')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'ParallelMode' object has no attribute 'startswith'

yhna940 · 2023-03-28T14:41:13Z

Unfortunately, it still doesn't work because of these line.

E       torch.multiprocessing.spawn.ProcessRaisedException: 
E       
E       -- Process 0 terminated with the following error:
E       Traceback (most recent call last):
E         File "/admin/home/.local/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
E           fn(i, *args)
E         File "/admin/home/vit/tests_deprecated/torch/nn/parallel/data_parallel/zero/test_hybrid.py", line 103, in run_dist
E           run(parallel_context)
E         File "/admin/home/vit/tests_deprecated/torch/nn/parallel/data_parallel/zero/test_hybrid.py", line 89, in run
E           hybrid_optimizer.step()
E         File "/admin/home/vit/oslo/torch/nn/parallel/data_parallel/zero/sharded_optim/sharded_optim.py", line 581, in step
E           norm_group = compute_norm(
E         File "/admin/home/vit/oslo/torch/nn/parallel/data_parallel/zero/sharded_optim/_utils.py", line 230, in compute_norm
E           if is_model_parallel_parameter(p) or mp_rank == 0:
E         File "/admin/home/vit/oslo/torch/nn/parallel/data_parallel/zero/sharded_optim/_utils.py", line 39, in is_model_parallel_parameter
E           return ParallelMode.PIPELINE in parallel_mode or any(
E         File "/admin/home/vit/oslo/torch/nn/parallel/data_parallel/zero/sharded_optim/_utils.py", line 40, in <genexpr>
E           key.startswith("tensor") for key in parallel_mode
E       AttributeError: 'ParallelMode' object has no attribute 'startswith'

>>> from oslo.torch.distributed import ParallelMode
>>> ParallelMode.TENSOR_1D.startswith('tensor')
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
AttributeError: 'ParallelMode' object has no attribute 'startswith'

There was a small bug and it was fixed.
https://github.com/EleutherAI/oslo/pull/164/files

l4d2boomer

👍

@nijkah

* import ParallelMode (EleutherAI#166) ## fix typo on tensor parallel tutorial - `from oslo import ParallelContext, ParallelMode` * [Fix] zero param check (EleutherAI#164) ## Title - [Fix] zero param check ## Description - ZeRO checks the redundancy of parameters to calculate the norm. There is a minor bug in checking the TP and needs to be fixed. ## Linked Issues - N/A * [Fix] zero optimizer w/ tensor parallel test (EleutherAI#167) ## Title - [Fix] zero optimizer w/ tensor parallel test ## Description - ZeRO was not running in tensor parallel mode, so I fixed this by switching to a model from `transformers`. ## Linked Issues - N/A * Add restarting model from saved model and fix bug (EleutherAI#171) ## Description - load a model - start training again from a saved point - fix bug that training_arg not saved with nccl error. It was because of parallel_context, and it was removed before saving training_arg and re-attached again - test load and restart with oslo TP * Make decoder-only models to be able to generate with `inputs_embeds` (EleutherAI#172) ## Title Make decoder-only models to be able to generate with `inputs_embeds` ## Description Synchronize GPT2 code with Hugging Face transformers—GPT2 can generate with `input_embeds`. >Accepting `.generate()` calls with `inputs_embeds` on decoder-only models is a long-standing request (huggingface/transformers#6535) -- see huggingface/transformers#6535 (comment) particular and its reacts. > >It has to be added on a per-model basis, and this PR adds the necessary changes for GPT2. Other models will throw an informative exception if the user passes `inputs_embeds`, asking them to check this PR and implement the same pattern on the model they want to use it with 🤗 > >Please note that it is still expected that the user passes `input_ids`, i.e. ```python outputs = model.generate(input_ids, inputs_embeds=inputs_embeds) ``` >This is because decoder-only models expect the prompt to be present in the output, and this is the only way to preserve it! input_ids can also be omitted and, in that case, the output won't contain the prompt. For more details, please check out [this PR](huggingface/transformers#21405). * Wrong import in zero (EleutherAI#169) ## Title Prevent from using torch 2.0 ## Description - Some of feature have changed in torch 2.0. and oslo has dependency on torch._six which no longer support by torch 2.0. olso Dependency - https://github.com/EleutherAI/oslo/blob/910c789e7f46d2876b964c221d31984b7924974f/oslo/torch/nn/parallel/data_parallel/zero/sharded_optim/_utils.py#L19 other issues - microsoft/DeepSpeed#2845 ## Linked Issues - resolved #00 * [Fix] Support gradient accumulation for DDP (EleutherAI#173) ## Description In order to support gradient accumulation, I removed `free_storage` function that can cause `CUDA error: an illegal memory access was encountered` in many case. (but this change may lead to an increase in memory consumption) What do you guys think about this PR? @nijkah @jinwonkim93 * [Fix] minor bug for single output in _DistributedDataParallel (EleutherAI#177) ## Title - Fix minor bug for single output in _DistributedDataParallel ## Description - This PR addresses a minor bug in the `_DistributedDataParallel` class when handling single output tensors. The changes include: 1. Update the `forward` method in `_DistributedDataParallel` to correctly handle single output tensors. 2. Add new test cases in `tests_deprecated/torch/nn/parallel/data_parallel/data_parallel.py` to ensure the correct behavior for models with various output types (single tensor, multiple tensors, and dictionary of tensors). These updates will ensure that the `_DistributedDataParallel` class works correctly with various output types, providing a more robust solution for users. ## Linked Issues - N/A * [Enhance] Support ViT for TensorParallel (EleutherAI#155) ## Description I added support for ViT in TensorParallel by appending config to `_TensorParallelMapping`. `PatchEmbed` layer in ViT does not have the `weight` parameter unlike `Embedding` layer, so I replaced the `weight` parameter with a dummy value to prevent an `AttributeError`. Any feedback is welcome. ### Memory usage mode | world_size=1 | world_size=2 | world_size=4 | world_size=8 -|-|-|-|- 1D | 1760MiB | 1126MiB | 789MiB | 2D | | | 589MiB | 2.5D (d=1) | | | 589MiB | 2.5D (d=2) | | | | 586MiB 3D | | | | ### TODO - [ ] Benchmark with `world_size=8` - [ ] Refactor slicing patch embedding - [ ] Fix slicing logic to return the same value as `TensorParallel1D` <details><summary>code for testing</summary> <p> ```python import os import torch.multiprocessing as mp import torch from torch import nn from torch import optim import torch.distributed as dist from transformers import ViTModel, ViTForImageClassification, ViTConfig import oslo from oslo.torch.distributed.parallel_context import ParallelContext from oslo.torch.distributed.parallel_mode import ParallelMode from oslo.torch.nn.parallel import TensorParallel def setup(rank, world_size): os.environ["MASTER_ADDR"] = "localhost" os.environ["MASTER_PORT"] = "12340" os.environ["RANK"] = str(rank) os.environ["LOCAL_RANK"] = str(rank) os.environ["WORLD_SIZE"] = str(world_size) os.environ["LOCAL_WORLD_SIZE"] = str(world_size) def cleanup(): dist.destroy_process_group() def train(rank, world_size): print(f"Running oslo TP example on rank {rank}.") setup(rank, world_size) parallel_context = ParallelContext.from_torch( tensor_parallel_size=world_size, tensor_parallel_mode=ParallelMode.TENSOR_1D, ) # TENSOR2D or TENSOR_2P5D model = ViTForImageClassification(ViTConfig(num_labels=1000)).to(rank) model = TensorParallel(model, parallel_context) optimizer = optim.SGD(model.parameters(), lr=1e-4) loss_fn = nn.MSELoss() oslo.ready(model, parallel_context) for _ in range(100): model.zero_grad() logits = model(pixel_values=torch.ones(8, 3, 224, 224).to(rank)).logits labels = torch.ones(8, 1000).to(rank) * 100 loss = loss_fn(logits, labels) loss.backward() optimizer.step() print(logits) print(torch.cuda.max_memory_allocated() / 1024**2) # MB cleanup() def main(world_size): mp.spawn(train, args=(world_size,), nprocs=world_size, join=True) if __name__ == "__main__": main(4) ``` </p> </details> ## Linked Issues Related to EleutherAI#152 --------- Co-authored-by: Minho Ryu <[email protected]> Co-authored-by: Hansol Park <60593935 [email protected]> Co-authored-by: Ingyu Seong <[email protected]> Co-authored-by: whooray <[email protected]> Co-authored-by: Junhwa Song <[email protected]>

## Title - [Fix] zero optimizer w/ tensor parallel test ## Description - ZeRO was not running in tensor parallel mode, so I fixed this by switching to a model from `transformers`. ## Linked Issues - N/A

yhna940 added 3 commits March 27, 2023 21:58

Fix transformers

adf7993

Fix unittest for zero tp

d85691e

Fix minor

4981395

yhna940 added the ZeRO ZeroRedundancyOptimizer label Mar 27, 2023

yhna940 requested a review from KKIEEK March 27, 2023 14:31

yhna940 self-assigned this Mar 27, 2023

yhna940 requested a review from hyunwoongko as a code owner March 27, 2023 14:31

yhna940 added 2 commits March 28, 2023 10:37

Fix tp with oslo ready

294795d

Change dist query

5669a84

KKIEEK approved these changes Mar 28, 2023

View reviewed changes

yhna940 merged commit 3357dac into EleutherAI:main Mar 28, 2023

l4d2boomer reviewed Apr 14, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix] zero optimizer w/ tensor parallel test #167

[Fix] zero optimizer w/ tensor parallel test #167

yhna940 commented Mar 27, 2023

KKIEEK commented Mar 27, 2023 •

edited

Loading

hyunwoongko commented Mar 27, 2023

KKIEEK commented Mar 28, 2023 •

edited

Loading

yhna940 commented Mar 28, 2023

l4d2boomer left a comment

[Fix] zero optimizer w/ tensor parallel test #167

[Fix] zero optimizer w/ tensor parallel test #167

Conversation

yhna940 commented Mar 27, 2023

Title

Description

Linked Issues

KKIEEK commented Mar 27, 2023 • edited Loading

hyunwoongko commented Mar 27, 2023

KKIEEK commented Mar 28, 2023 • edited Loading

yhna940 commented Mar 28, 2023

l4d2boomer left a comment

Choose a reason for hiding this comment

KKIEEK commented Mar 27, 2023 •

edited

Loading

KKIEEK commented Mar 28, 2023 •

edited

Loading