You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Aug 30, 2024. It is now read-only.
Once upon a time, a little NE_ASSERT: /root/w0/workspace/neuralspeed-wheel-build/nlp_repo/neural_speed/core/ne_layers.c:2651: ne_nelements(a) == ne0 * ne1 * ne2
#326
Open
zwx109473 opened this issue
Aug 26, 2024
· 2 comments
An error is reported during Qwen7B model inference when neural-speed 1.0 and intel-extension-for-transformers 1.4.2 are used.
model_quantize_internal: model size = 29454.52 MB
model_quantize_internal: quant size = 5006.17 MB
/root/miniconda3/envs/zl_cpu/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1601: FutureWarning:
clean_up_tokenization_spaces
was not set. It will be set toTrue
by default. This behavior will be depracted in transformers v4.45, and will be then set toFalse
by default. For more details check this issue: huggingface/transformers#31884warnings.warn(
AVX:1 AVX2:1 AVX512F:1 AVX_VNNI:0 AVX512_VNNI:1 AMX_INT8:0 AMX_BF16:0 AVX512_BF16:0 AVX512_FP16:0
beam_size: 1, do_sample: 0, top_k: 40, top_p: 0.950, continuous_batching: 0, max_request_num: 1, early_stopping: 0, scratch_size_ratio: 1.000
model.cpp: loading model from runtime_outs/ne_qwen_q_int4_bestla_cint8_g32.bin
Loading the bin file with NE format...
load_ne_hparams 0.hparams.n_vocab = 151936
load_ne_hparams 1.hparams.n_embd = 4096
load_ne_hparams 2.hparams.n_mult = 22016
load_ne_hparams 3.hparams.n_head = 32
load_ne_hparams 4.hparams.n_head_kv = 0
load_ne_hparams 5.hparams.n_layer = 32
load_ne_hparams 6.hparams.n_rot = 128
load_ne_hparams 7.hparams.ftype = 0
load_ne_hparams 8.hparams.max_seq_len = 32768
load_ne_hparams 9.hparams.alibi_bias_max = 0.000
load_ne_hparams 10.hparams.clip_qkv = 0.000
load_ne_hparams 11.hparams.par_res = 0
load_ne_hparams 12.hparams.word_embed_proj_dim = 0
load_ne_hparams 13.hparams.do_layer_norm_before = 0
load_ne_hparams 14.hparams.multi_query_group_num = 0
load_ne_hparams 15.hparams.ffn_hidden_size = 11008
load_ne_hparams 16.hparams.inner_hidden_size = 0
load_ne_hparams 17.hparams.n_experts = 0
load_ne_hparams 18.hparams.n_experts_used = 0
load_ne_hparams 19.hparams.n_embd_head_k = 0
load_ne_hparams 20.hparams.norm_eps = 0.000001
load_ne_hparams 21.hparams.freq_base = 10000.000
load_ne_hparams 22.hparams.freq_scale = 1.000
load_ne_hparams 23.hparams.rope_scaling_factor = 0.000
load_ne_hparams 24.hparams.original_max_position_embeddings = 0
load_ne_hparams 25.hparams.use_yarn = 0
load_ne_vocab 26.vocab.bos_token_id = 151643
load_ne_vocab 27.vocab.eos_token_id = 151643
load_ne_vocab 28.vocab.pad_token_id = -1
load_ne_vocab 29.vocab.sep_token_id = -1
init: n_vocab = 151936
init: n_embd = 4096
init: n_mult = 22016
init: n_head = 32
init: n_head_kv = 0
init: n_layer = 32
init: n_rot = 128
init: ftype = 0
init: max_seq_len= 32768
init: n_ff = 11008
init: n_parts = 1
load: ctx size = 5006.31 MB
load: scratch0 = 4096.00 MB
load: scratch1 = 2048.00 MB
load: scratch2 = 4096.00 MB
load: mem required = 15246.31 MB ( memory per state)
.......................................................................................
model_init_from_file: support_bestla_kv = 0
model_init_from_file: kv self size = 256.00 MB
Once upon a time, a little NE_ASSERT: /root/w0/workspace/neuralspeed-wheel-build/nlp_repo/neural_speed/core/ne_layers.c:2651: ne_nelements(a) == ne0 * ne1 * ne2
Aborted (core dumped)
The text was updated successfully, but these errors were encountered: