Running Llama 3.1 on Mac OS with m2 chip has errors #1784

antoninadert · 2024-08-05T08:22:16Z

I tried to run h2ogpt with this command :

python generate.py --base_model=meta-llama/Meta-Llama-3.1-8B-Instruct --use_auth_token=...

and it triggered errors

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results. Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation. The attention mask is not set and cannot be inferred from input because pad token is same as eos token.As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results. thread exception: Traceback (most recent call last): File "/Users/.../h2ogpt/src/utils.py", line 524, in run self._return = self._target(*self._args, **self._kwargs) File "/Users/.../h2ogpt/src/gen.py", line 4288, in generate_with_exceptions func(*args, **kwargs) File "/Users/.../miniconda3/envs/h2ogpt/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/Users/.../miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/generation/utils.py", line 1727, in generate model_kwargs["attention_mask"] = self._prepare_attention_mask_for_generation( File "/Users/.../miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/generation/utils.py", line 493, in _prepare_attention_mask_for_generation raise ValueError( ValueError: Can't infer missing attention mask on `mps` device. Please provide an `attention_mask` or use a different device.

It worked fine when I tried to run other older models like llama2 for example.

Do you know what could be the source of this issue ?

The text was updated successfully, but these errors were encountered:

pseudotensor · 2024-08-05T16:00:43Z

Looks like transformers bug: huggingface/transformers#31744

But need new transformers for llama 3.1.

Maybe stick to GGUF?

antoninadert · 2024-08-06T07:36:27Z

It worked fine with GGUF, I had to install different package version than what is recommended though. I will propose a pull request to be able to start with llama 3.1

antoninadert · 2024-08-08T10:51:13Z

See #1789 which is how I made llama 3.1 gguf work on Mac M2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running Llama 3.1 on Mac OS with m2 chip has errors #1784

Running Llama 3.1 on Mac OS with m2 chip has errors #1784

antoninadert commented Aug 5, 2024

pseudotensor commented Aug 5, 2024

antoninadert commented Aug 6, 2024

antoninadert commented Aug 8, 2024

Running Llama 3.1 on Mac OS with m2 chip has errors #1784

Running Llama 3.1 on Mac OS with m2 chip has errors #1784

Comments

antoninadert commented Aug 5, 2024

pseudotensor commented Aug 5, 2024

antoninadert commented Aug 6, 2024

antoninadert commented Aug 8, 2024