Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running Llama 3.1 on Mac OS with m2 chip has errors #1784

Open
antoninadert opened this issue Aug 5, 2024 · 3 comments
Open

Running Llama 3.1 on Mac OS with m2 chip has errors #1784

antoninadert opened this issue Aug 5, 2024 · 3 comments

Comments

@antoninadert
Copy link
Contributor

I tried to run h2ogpt with this command :

python generate.py --base_model=meta-llama/Meta-Llama-3.1-8B-Instruct --use_auth_token=...

and it triggered errors

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results. Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation. The attention mask is not set and cannot be inferred from input because pad token is same as eos token.As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results. thread exception: Traceback (most recent call last): File "/Users/.../h2ogpt/src/utils.py", line 524, in run self._return = self._target(*self._args, **self._kwargs) File "/Users/.../h2ogpt/src/gen.py", line 4288, in generate_with_exceptions func(*args, **kwargs) File "/Users/.../miniconda3/envs/h2ogpt/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/Users/.../miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/generation/utils.py", line 1727, in generate model_kwargs["attention_mask"] = self._prepare_attention_mask_for_generation( File "/Users/.../miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/generation/utils.py", line 493, in _prepare_attention_mask_for_generation raise ValueError( ValueError: Can't infer missing attention mask on `mps` device. Please provide an `attention_mask` or use a different device.

It worked fine when I tried to run other older models like llama2 for example.

Do you know what could be the source of this issue ?

@pseudotensor
Copy link
Collaborator

Looks like transformers bug: huggingface/transformers#31744

But need new transformers for llama 3.1.

Maybe stick to GGUF?

@antoninadert
Copy link
Contributor Author

It worked fine with GGUF, I had to install different package version than what is recommended though. I will propose a pull request to be able to start with llama 3.1

@antoninadert
Copy link
Contributor Author

See #1789 which is how I made llama 3.1 gguf work on Mac M2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants