Replies: 1 comment 2 replies
-
hello, and thanks! implementation of Flash attention is edited from the beta version released by Google itself, and Ring attention is created and improved from ring-attention-near-infinite-context-length paper But generally using flash attention is 40 % more Memory efficient and faster by ~50/40 % than modeling in pure Jax is much faster than flax (about 3x up to 12x faster) but in case of being user-friendly and making people able to use EasyDeL I use Jax but I have no problem with releasing a pure JAX version for every model that doesn"t seem to be a hard thing to do but when Jax is much harder than PyTorch for newcomers and flax is kinda GNN Like I stick to flax |
Beta Was this translation helpful? Give feedback.
-
The work is impressive, but I see that there are no benchmarks. Could you please clear some concerns about speed and perfomance of this library? Specifically (for TPUs):
Beta Was this translation helpful? Give feedback.
All reactions