You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to follow #481 but I'm getting this error:
evaluating HellaSwag: 30/79
evaluating HellaSwag: 40/79
evaluating HellaSwag: 50/79
evaluating HellaSwag: 60/79
evaluating HellaSwag: 70/79
Writing state to log124M/state_00019560_00002.bin
Error: Token out of vocabulary at train_gpt2.cu:675
Error details:
File: train_gpt2.cu
Line: 675
Token: -1149026846
Position: 0
Vocab: 50257
generating:
---
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[20376,1],0]
Exit code: 1
--------------------------------------------------------------------------
Happens at the end of training. I don't end up getting the final model weights.
I'm trying to follow #481 but I'm getting this error:
Happens at the end of training. I don't end up getting the final model weights.
Running:
You can find the 1500 model checkpoint state here:
https://huggingface.co/aidando73/repro-gpt-2-124M/tree/086c8895ae49f2472bcde14c7866e792b0a330f1/8x_A100_40GB/log124M
Commit hash I checked out: 7ecd890
Note that I didn't run
python train_gpt2.py
beforehand.Anyone else getting this error?
The text was updated successfully, but these errors were encountered: