Skip to content
This repository has been archived by the owner on Aug 30, 2024. It is now read-only.

Once upon a time, a little NE_ASSERT: /root/w0/workspace/neuralspeed-wheel-build/nlp_repo/neural_speed/core/ne_layers.c:2651: ne_nelements(a) == ne0 * ne1 * ne2 #326

Open
zwx109473 opened this issue Aug 26, 2024 · 2 comments

Comments

@zwx109473
Copy link

An error is reported during Qwen7B model inference when neural-speed 1.0 and intel-extension-for-transformers 1.4.2 are used.

model_quantize_internal: model size = 29454.52 MB
model_quantize_internal: quant size = 5006.17 MB
/root/miniconda3/envs/zl_cpu/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: clean_up_tokenization_spaces was not set. It will be set to True by default. This behavior will be depracted in transformers v4.45, and will be then set to False by default. For more details check this issue: huggingface/transformers#31884
warnings.warn(
AVX:1 AVX2:1 AVX512F:1 AVX_VNNI:0 AVX512_VNNI:1 AMX_INT8:0 AMX_BF16:0 AVX512_BF16:0 AVX512_FP16:0
beam_size: 1, do_sample: 0, top_k: 40, top_p: 0.950, continuous_batching: 0, max_request_num: 1, early_stopping: 0, scratch_size_ratio: 1.000
model.cpp: loading model from runtime_outs/ne_qwen_q_int4_bestla_cint8_g32.bin
Loading the bin file with NE format...
load_ne_hparams 0.hparams.n_vocab = 151936
load_ne_hparams 1.hparams.n_embd = 4096
load_ne_hparams 2.hparams.n_mult = 22016
load_ne_hparams 3.hparams.n_head = 32
load_ne_hparams 4.hparams.n_head_kv = 0
load_ne_hparams 5.hparams.n_layer = 32
load_ne_hparams 6.hparams.n_rot = 128
load_ne_hparams 7.hparams.ftype = 0
load_ne_hparams 8.hparams.max_seq_len = 32768
load_ne_hparams 9.hparams.alibi_bias_max = 0.000
load_ne_hparams 10.hparams.clip_qkv = 0.000
load_ne_hparams 11.hparams.par_res = 0
load_ne_hparams 12.hparams.word_embed_proj_dim = 0
load_ne_hparams 13.hparams.do_layer_norm_before = 0
load_ne_hparams 14.hparams.multi_query_group_num = 0
load_ne_hparams 15.hparams.ffn_hidden_size = 11008
load_ne_hparams 16.hparams.inner_hidden_size = 0
load_ne_hparams 17.hparams.n_experts = 0
load_ne_hparams 18.hparams.n_experts_used = 0
load_ne_hparams 19.hparams.n_embd_head_k = 0
load_ne_hparams 20.hparams.norm_eps = 0.000001
load_ne_hparams 21.hparams.freq_base = 10000.000
load_ne_hparams 22.hparams.freq_scale = 1.000
load_ne_hparams 23.hparams.rope_scaling_factor = 0.000
load_ne_hparams 24.hparams.original_max_position_embeddings = 0
load_ne_hparams 25.hparams.use_yarn = 0
load_ne_vocab 26.vocab.bos_token_id = 151643
load_ne_vocab 27.vocab.eos_token_id = 151643
load_ne_vocab 28.vocab.pad_token_id = -1
load_ne_vocab 29.vocab.sep_token_id = -1
init: n_vocab = 151936
init: n_embd = 4096
init: n_mult = 22016
init: n_head = 32
init: n_head_kv = 0
init: n_layer = 32
init: n_rot = 128
init: ftype = 0
init: max_seq_len= 32768
init: n_ff = 11008
init: n_parts = 1
load: ctx size = 5006.31 MB
load: scratch0 = 4096.00 MB
load: scratch1 = 2048.00 MB
load: scratch2 = 4096.00 MB
load: mem required = 15246.31 MB ( memory per state)
.......................................................................................
model_init_from_file: support_bestla_kv = 0
model_init_from_file: kv self size = 256.00 MB
Once upon a time, a little NE_ASSERT: /root/w0/workspace/neuralspeed-wheel-build/nlp_repo/neural_speed/core/ne_layers.c:2651: ne_nelements(a) == ne0 * ne1 * ne2
Aborted (core dumped)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@zwx109473 and others