Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
Summary of ChangesHello @sam-shubham, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request initiates the integration of the Qwen2-VL multimodal model into KerasHub. It introduces the core architectural elements necessary for handling both text and visual inputs, leveraging a sophisticated multimodal rotary position embedding scheme. The changes lay the groundwork for a powerful new model capable of processing and understanding diverse data types within the Keras ecosystem. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces the Qwen2-VL model, a significant contribution that includes the backbone, a custom attention mechanism with M-RoPE, and a vision encoder. The code is well-structured and generally follows the repository's style guide for file organization and naming. However, as this is a work-in-progress, there are several critical issues that need to be addressed. The most important ones are the missing vision processing logic in the backbone's forward pass and a non-vectorized loop in the vision encoder that will prevent graph compilation and be highly inefficient. I've also identified a bug in the decoder's cache handling and other areas for improvement regarding code clarity and performance. My detailed comments below provide specific suggestions for these points.
3802c01 to
bc9e3db
Compare
|
Hii .. @divyashreepathihalli @laxmareddyp , can i get a review on this |
|
Unfortunately this is duplicated effort, #2574 has an incorrect title but plans to do the same addition. Unfortunate, but maintainers will have to take a call on which PR to proceed with. Maybe you can contact and collaborate ith @jaytiwarihub. |
|
hii @jaytiwarihub .. i'll love to discuss and contribute together let me know if i can help.. |
|
@sam-shubham I see you uploaded the presets and tokenizer too in #2599. In my PR, I did not add Why? Because the model works if someone wants to train it from scratch (random weights). Also in #2574, I see Qwen2-VL reuses the Qwen 2.5 Tokenizer (for text). I see you use One thing that looks bad: Overall it looks good to me, although I pulled this first and I haven't modified #2574 accordingly because I'm waiting for review. Let maintainers take a call on which PR to proceed with. cc @samudraneel05 thanks for updating! |
thanks @jaytiwarihub for a kind review will be exposing these apis as well looking forward for mantainers let's see what they go with.. |
|
hey @sam-shubham, there's been some miscommunication on the assignment of issues. |
|
thanks @samudraneel05 glad you want to colab. but ig the pr is already complete except exposing some apis i.e. one commit away.. will be glad if you review and let me now what you are adding.. |
|
Sure, happy to do a review and if things pop out we can go along from there! |
Suree!! |
samudraneel05
left a comment
There was a problem hiding this comment.
I reviewed about 7 files in my initial pass, please have a look and lmk if we agree or differ on anything.
| """Tests for Qwen2-VL Backbone.""" | ||
|
|
There was a problem hiding this comment.
| """Tests for Qwen2-VL Backbone.""" | |
| import pytest |
instead of the docstring for testing, pytest should be present
|
Hi @jaytiwarihub, I just went through the exports of the other models, and I believe there might be a structural misunderstanding. From what I see, the inner No model in the codebase registers causal_lm_presets separately. If I’ve misunderstood anything, please let me know — happy to revisit this. |
|
@sam-shubham you are absolutely correct , my bad |
|
Hi @samudraneel05 thanks for higlighting those silly lines will take care of that rest other fixes are impressive thanks for pointing out. |
|
no worries, i have some other comments too - once you get these in order I can expand more feel free to shoot me a dm @ my socials for better collab |
|
@samudraneel05 sure !! thanks |
|
thanks @samudraneel05 addresd much of what you pointed!! let me know if i missed something.. |
|
hii @sachinprasadhs , @divyashreepathihalli @laxmareddyp can you please review if it need anything more.. |
|
hii @sachinprasadhs any update on this..? |
sachinprasadhs
left a comment
There was a problem hiding this comment.
Thanks for your contribution. I have viewed few files and made some comments. Please check.
| mrope_section: List of 3 ints specifying how many half-head-dim | ||
| elements are allocated to [temporal, height, width]. | ||
| rope_max_wavelength: float. Max wavelength for RoPE base. | ||
| kernel_initializer: Initializer for the kernel weights. |
| elements are allocated to [temporal, height, width]. | ||
| rope_max_wavelength: float. Max wavelength for RoPE base. | ||
| kernel_initializer: Initializer for the kernel weights. | ||
| bias_initializer: Initializer for the bias weights. |
|
|
||
| @keras_hub_export("keras_hub.models.Qwen2VLBackbone") | ||
| class Qwen2VLBackbone(Backbone): | ||
| """Qwen2-VL core network with optional vision encoder.""" |
There was a problem hiding this comment.
Add
- Description of the model
- arg names and it's details and default values.
- Usage example and loading from preset example.
Refer other implementation if needed.
| text_only_model = vision_encoder is None | ||
| head_dim = hidden_dim // num_query_heads | ||
|
|
||
| token_embedding = ReversibleEmbedding( |
There was a problem hiding this comment.
Add comment on top
# === Layers ===
| name="sequence_output_layernorm", | ||
| ) | ||
|
|
||
| token_id_input = keras.Input( |
There was a problem hiding this comment.
Add below comment
# === Functional Model ===
| del vision_embeddings_shape | ||
| del vision_indices_shape |
| mrope_position_ids, head_dim, rope_max_wavelength, mrope_section | ||
| ): | ||
| """Compute M-RoPE cos/sin embeddings from position IDs.""" | ||
| del mrope_section # Sections are applied in attention, not here. |
|
|
||
| from keras_hub.src.models.qwen.qwen_layernorm import QwenLayerNorm | ||
|
|
||
| Qwen2VLLayerNorm = QwenLayerNorm |
There was a problem hiding this comment.
You can rather subclass class Qwen2VLLayerNorm(QwenLayerNorm) and have the description and example defined inside the class and just pass at the end.
…version and improve model documentation.
|
hii @sachinprasadhs addresed your commentend issues.. let me know if i missed something.. |
|
and for that The data_format="channels_first" is intentional here — the Qwen2-VL model internally always processes vision patches in i tried doing it but ended up retransposing the already transposed date.. (This Commit) Changing the internal pipeline to support both formats would require restructuring how patches are produced throughout the backbone |
…3D as Keras now manages it internally.
Description of the change
Qwen2 VL Integration
#2323
Notebooks
Running OOM on Kaggle. (not able to use colab in my region.. need review)
Checklist