[v0.18.0][Refactor] Use forward mapping instead of reverse mapping in AscendMo…#7716
Conversation
…delSlimConfig (vllm-project#7596) ### What this PR does / why we need it? This PR refactors the `AscendModelSlimConfig` class to use **forward mapping** instead of reverse mapping for quantization config key transformation. **Changes:** 1. Modified `apply_vllm_mapper()` to directly apply `hf_to_vllm_mapper.apply_dict()` to transform `quant_description` keys from HF format to vLLM format 2. Simplified `quant_prefix_mapper()` to return the prefix directly (no longer needs mapping since keys are already in vLLM format) 3. Removed `QUANT_MODEL_PREFIX_MAPPINGS` dictionary (~50 lines) - no longer needed 4. Removed `get_prefix_mapping()` function - no longer needed 5. Removed `vllm_to_hf_mapper` attribute - no longer needed **Why this change is needed:** The previous implementation used reverse mapping (vLLM → HF) which had several issues: - Some keys might not be used in the forward direction but would be incorrectly used in reverse - Empty values in the mapping would cause issues when reversed - Required maintaining a separate `QUANT_MODEL_PREFIX_MAPPINGS` dict that duplicated information already available in vLLM's model-specific `WeightsMapper` The new approach: - Uses the forward mapping (HF → vLLM) directly from vLLM's `WeightsMapper` - Eliminates the need for duplicate mapping definitions - Avoids issues with reverse mapping (unused keys, empty values) - Aligns with how `compressed_tensors_config.py` handles the same scenario - vLLM version: v0.18.0 - vLLM main: vllm-project/vllm@ed359c4 --------- Signed-off-by: Matrix_K <zhangke144@huawei.com> Signed-off-by: Feng-xiaosuo <tengchang1@huawei.com> Co-authored-by: Matrix_K <zhangke144@huawei.com> Co-authored-by: Wang Kunpeng <1289706727@qq.com>
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refactors the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
Suggested PR Title:
[Ops][Misc] Refactor ModelSlim quantization mapping to apply HF-to-vLLM transformation directlySuggested PR Summary:
### What this PR does / why we need it?
This PR refactors the `AscendModelSlimConfig` to simplify weight prefix mapping by removing the static `QUANT_MODEL_PREFIX_MAPPINGS` and reverse mapping logic. It now transforms the `quant_description` keys from HF to vLLM format directly when the mapper is applied. A critical bug was identified in `apply_vllm_mapper` where multiple calls with different mappers could lead to state corruption, and a suggestion was provided to ensure the method is idempotent.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
CI passed with existing tests.| if self._mapper_applied and self.hf_to_vllm_mapper is hf_to_vllm_mapper: | ||
| return |
There was a problem hiding this comment.
The current implementation has a potential state corruption bug. If apply_vllm_mapper is called a second time with a different hf_to_vllm_mapper instance, it will attempt to apply the new mapping to self.quant_description, which has already been transformed by the first mapper. This will lead to an incorrect quantization configuration.
The method should be made idempotent to prevent this. A safe approach is to prevent any re-application once a mapper has been applied, and log a warning if a different mapper is provided on a subsequent call.
| if self._mapper_applied and self.hf_to_vllm_mapper is hf_to_vllm_mapper: | |
| return | |
| if self._mapper_applied: | |
| if self.hf_to_vllm_mapper is not hf_to_vllm_mapper: | |
| logger.warning( | |
| "Attempted to apply a different vLLM mapper. This is not " | |
| "supported and will be ignored.") | |
| return |
…delSlimConfig (#7596)
What this PR does / why we need it?
This PR refactors the
AscendModelSlimConfigclass to use forward mapping instead of reverse mapping for quantization config key transformation.Changes:
apply_vllm_mapper()to directly applyhf_to_vllm_mapper.apply_dict()to transformquant_descriptionkeys from HF format to vLLM formatquant_prefix_mapper()to return the prefix directly (no longer needs mapping since keys are already in vLLM format)QUANT_MODEL_PREFIX_MAPPINGSdictionary (~50 lines) - no longer neededget_prefix_mapping()function - no longer neededvllm_to_hf_mapperattribute - no longer neededWhy this change is needed:
The previous implementation used reverse mapping (vLLM → HF) which had several issues:
QUANT_MODEL_PREFIX_MAPPINGSdict that duplicated information already available in vLLM's model-specificWeightsMapperThe new approach:
Uses the forward mapping (HF → vLLM) directly from vLLM's
WeightsMapperEliminates the need for duplicate mapping definitions
Avoids issues with reverse mapping (unused keys, empty values)
Aligns with how
compressed_tensors_config.pyhandles the same scenariovLLM version: v0.18.0
vLLM main: vllm-project/vllm@ed359c4 ---------
What this PR does / why we need it?
Does this PR introduce any user-facing change?
How was this patch tested?