-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Factors inference is slow (3 seconds/token) on A100 GPU #1110
Comments
Hi Amit, That's a good question. I don't know that anyone has tested Sockeye with that many factors. One hypothesis would be that the factor code contains cases of switching between Python/C /GPU execution and looping over more factors leads to greater slowdown. @fhieber may have more information about decoding with factors. For profiling, you could take a look at the PyTorch Profiler. Best, |
Thanks! One possible improvement I see, is instead of: To run the multiplications in parallel: futures = [torch.jit.fork(fol, decoder_out) for fol in self.factor_output_layers]
outputs = [torch.jit.wait(fut) for fut in futures] Also as a side note, in decoding, it seems like target factors are not embedded: |
With the
Compared to an A100 GPU:
|
Since it seems like the CPU time is huge, I list the CPU timing:
Profile output:
Here is a profile file, to be opened in |
with torch 2.3.0, on GPU:
on CPU:
why is sockeye restricted to torch 1? |
The If you change the line to just Best, |
My use case calls for splitting my input tokens to 5, and output tokens to 8.
That means that the input has a token 4 factors (SignWriting), and the output has a token 7 factors (VQ model)
I created factored files for an example sentence:
M|c0|r0|p518|p518 S2ff|c0|r0|p482|p483
And attempt to translate, with:
And the output is:
Why would translating a single sentence, with A100 GPU, on a small model, without beam search, be this slow?
Is there a way to profile the decoding step function?
The full output is:
Besides the fact that the output repeats the same token over and over, it is in the expected format.
The text was updated successfully, but these errors were encountered: