Error trying to export mistralai/Mistral-7B-v0.3 to OpenVINO INT8

### System Info

```shell
python3 -m venv llm_env
source llm_env/bin/activate
pip install huggingface_hub openvino openvino-genai
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
pip install optimum-intel nncf
```

### Who can help?

@echarlaix  @IlyasMoutawwakil 

### Information

- [ ] The official example scripts
- [x] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [x] My own task or dataset (give details below)

### Reproduction (minimal, reproducible, runnable)

optimum-cli export openvino --model mistralai/Mistral-7B-v0.3 --task text-generation-with-past --weight-format int8 --group-size 64 --ratio 1.0 --sym --all-layers Mistral-7B-v0.3-INT8

### Expected behavior

export the model to INT8 but getting below error. Want to try INT4 also, but similar error.


`torch_dtype` is deprecated! Use `dtype` instead!
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 115.68it/s]
`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.
/dev/shm/prj/llm_env/lib/python3.10/site-packages/transformers/cache_utils.py:132: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if not self.is_initialized or self.keys.numel() == 0:
/dev/shm/prj/llm_env/lib/python3.10/site-packages/transformers/masking_utils.py:207: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if (padding_length := kv_length + kv_offset - attention_mask.shape[-1]) > 0:
/dev/shm/prj/llm_env/lib/python3.10/site-packages/optimum/exporters/openvino/model_patcher.py:207: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  torch.tensor(0.0, device=mask.device, dtype=dtype),
/dev/shm/prj/llm_env/lib/python3.10/site-packages/optimum/exporters/openvino/model_patcher.py:208: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  torch.tensor(torch.finfo(torch.float16).min, device=mask.device, dtype=dtype),
/dev/shm/prj/llm_env/lib/python3.10/site-packages/transformers/integrations/sdpa_attention.py:81: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  is_causal = query.shape[2] > 1 and attention_mask is None and getattr(module, "is_causal", True)
Traceback (most recent call last):
  File "/dev/shm/prj/llm_env/bin/optimum-cli", line 8, in <module>
    sys.exit(main())
  File "/dev/shm/prj/llm_env/lib/python3.10/site-packages/optimum/commands/optimum_cli.py", line 219, in main
    service.run()
  File "/dev/shm/prj/llm_env/lib/python3.10/site-packages/optimum/commands/export/openvino.py", line 486, in run
    _main_quantize(
  File "/dev/shm/prj/llm_env/lib/python3.10/site-packages/optimum/exporters/openvino/__main__.py", line 657, in _main_quantize
    model = model_cls.from_pretrained(
  File "/dev/shm/prj/llm_env/lib/python3.10/site-packages/optimum/intel/openvino/modeling_base.py", line 625, in from_pretrained
    return super().from_pretrained(
  File "/dev/shm/prj/llm_env/lib/python3.10/site-packages/optimum/modeling_base.py", line 407, in from_pretrained
    return from_pretrained_method(
  File "/dev/shm/prj/llm_env/lib/python3.10/site-packages/optimum/intel/openvino/modeling_decoder.py", line 869, in _from_pretrained
    model = cls.load_model(model_cache_path)
  File "/dev/shm/prj/llm_env/lib/python3.10/site-packages/optimum/intel/openvino/modeling_base.py", line 370, in load_model
    core.read_model(file_name.resolve(), file_name.with_suffix(".bin").resolve())
  File "/dev/shm/prj/llm_env/lib/python3.10/site-packages/openvino/_ov_api.py", line 603, in read_model
    return Model(super().read_model(model, weights, config))
RuntimeError: Exception from src/inference/src/cpp/core.cpp:93:
Exception from src/inference/src/model_reader.cpp:160:
Unable to read the model: /tmp/tmp3n0wlgfj/openvino_model.xml Please check that model format: xml is supported and the model is correct. Available frontends: tflite tf pytorch paddle onnx jax ir 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error trying to export mistralai/Mistral-7B-v0.3 to OpenVINO INT8 #2415

System Info

Who can help?

Information

Tasks

Reproduction (minimal, reproducible, runnable)

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Error trying to export mistralai/Mistral-7B-v0.3 to OpenVINO INT8 #2415

Description

System Info

Who can help?

Information

Tasks

Reproduction (minimal, reproducible, runnable)

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions