Skip to content
This repository was archived by the owner on Jan 28, 2026. It is now read-only.

Commit d20a968

Browse files
authored
[NPU] Fix generate example (#12541)
1 parent 1521994 commit d20a968

File tree

1 file changed

+1
-1
lines changed
  • python/llm/example/NPU/HF-Transformers-AutoModels/LLM

1 file changed

+1
-1
lines changed

python/llm/example/NPU/HF-Transformers-AutoModels/LLM/generate.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,6 @@
5252
attn_implementation="eager"
5353
)
5454
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
55-
tokenizer.save_pretrained(args.lowbit_path)
5655
else:
5756
model = AutoModelForCausalLM.load_low_bit(
5857
args.lowbit_path,
@@ -66,6 +65,7 @@
6665

6766
if args.lowbit_path and not os.path.exists(args.lowbit_path):
6867
model.save_low_bit(args.lowbit_path)
68+
tokenizer.save_pretrained(args.lowbit_path)
6969

7070
with torch.inference_mode():
7171
input_ids = tokenizer.encode(args.prompt, return_tensors="pt")

0 commit comments

Comments
 (0)