Skip to content

Error trying to export mistralai/Mistral-7B-v0.3 to OpenVINO INT8Β #2415

@azhuvath

Description

@azhuvath

System Info

python3 -m venv llm_env
source llm_env/bin/activate
pip install huggingface_hub openvino openvino-genai
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
pip install optimum-intel nncf

Who can help?

@echarlaix @IlyasMoutawwakil

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

optimum-cli export openvino --model mistralai/Mistral-7B-v0.3 --task text-generation-with-past --weight-format int8 --group-size 64 --ratio 1.0 --sym --all-layers Mistral-7B-v0.3-INT8

Expected behavior

export the model to INT8 but getting below error. Want to try INT4 also, but similar error.

torch_dtype is deprecated! Use dtype instead!
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3/3 [00:00<00:00, 115.68it/s]
loss_type=None was set in the config but it is unrecognized. Using the default loss: ForCausalLMLoss.
/dev/shm/prj/llm_env/lib/python3.10/site-packages/transformers/cache_utils.py:132: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if not self.is_initialized or self.keys.numel() == 0:
/dev/shm/prj/llm_env/lib/python3.10/site-packages/transformers/masking_utils.py:207: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if (padding_length := kv_length + kv_offset - attention_mask.shape[-1]) > 0:
/dev/shm/prj/llm_env/lib/python3.10/site-packages/optimum/exporters/openvino/model_patcher.py:207: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
torch.tensor(0.0, device=mask.device, dtype=dtype),
/dev/shm/prj/llm_env/lib/python3.10/site-packages/optimum/exporters/openvino/model_patcher.py:208: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
torch.tensor(torch.finfo(torch.float16).min, device=mask.device, dtype=dtype),
/dev/shm/prj/llm_env/lib/python3.10/site-packages/transformers/integrations/sdpa_attention.py:81: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
is_causal = query.shape[2] > 1 and attention_mask is None and getattr(module, "is_causal", True)
Traceback (most recent call last):
File "/dev/shm/prj/llm_env/bin/optimum-cli", line 8, in
sys.exit(main())
File "/dev/shm/prj/llm_env/lib/python3.10/site-packages/optimum/commands/optimum_cli.py", line 219, in main
service.run()
File "/dev/shm/prj/llm_env/lib/python3.10/site-packages/optimum/commands/export/openvino.py", line 486, in run
_main_quantize(
File "/dev/shm/prj/llm_env/lib/python3.10/site-packages/optimum/exporters/openvino/main.py", line 657, in _main_quantize
model = model_cls.from_pretrained(
File "/dev/shm/prj/llm_env/lib/python3.10/site-packages/optimum/intel/openvino/modeling_base.py", line 625, in from_pretrained
return super().from_pretrained(
File "/dev/shm/prj/llm_env/lib/python3.10/site-packages/optimum/modeling_base.py", line 407, in from_pretrained
return from_pretrained_method(
File "/dev/shm/prj/llm_env/lib/python3.10/site-packages/optimum/intel/openvino/modeling_decoder.py", line 869, in _from_pretrained
model = cls.load_model(model_cache_path)
File "/dev/shm/prj/llm_env/lib/python3.10/site-packages/optimum/intel/openvino/modeling_base.py", line 370, in load_model
core.read_model(file_name.resolve(), file_name.with_suffix(".bin").resolve())
File "/dev/shm/prj/llm_env/lib/python3.10/site-packages/openvino/_ov_api.py", line 603, in read_model
return Model(super().read_model(model, weights, config))
RuntimeError: Exception from src/inference/src/cpp/core.cpp:93:
Exception from src/inference/src/model_reader.cpp:160:
Unable to read the model: /tmp/tmp3n0wlgfj/openvino_model.xml Please check that model format: xml is supported and the model is correct. Available frontends: tflite tf pytorch paddle onnx jax ir

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions