GSoC 2026: Discussion Thread on Project#15 #34414
Replies: 7 comments 1 reply
-
|
Hi @shahkarnav115-beep , I think your approach aligns with our idea. Please share your current integration branch and we can review it and provide feedback. You could also use that feedback when preparing to submit GSoC proposal. The goal of this project is to evaluate the feasibility of moving the existing OpenVINO backend from llama.cpp into OpenVINO-GenAI and eventually to OpenVINO. The proposed new GGUFReaderV2 should: I think, it is better to do it in phases. cc: @cavusmustafa Thanks, |
Beta Was this translation helpful? Give feedback.
-
|
hii @ravi9, @cavusmustafa and @adrianboguszewski!! While I was trying to run tinyllama.gguf through my branch, which i mentioned earlier. The model get loaded, llama.cpp context is created, the graph is extracted but I noticed one thing that out 798 nodes (which tinyllama.gguf have) only 9 nodes are named. These suggests 2 major architectural challanges for the translation pipeline:
These is the error log-file which contains the details of nodes, gguf-reader-v2 and the error: I have proposed the solution for it in the given proposal below: Is it aligning with our target and are there any prerequisite work from your side to work on, so I can add them in proposal or the prototype is enough?? Thanks, |
Beta Was this translation helpful? Give feedback.
-
|
Hi @ravi9, @cavusmustafa and @adrianboguszewski, Thank you for the guidance and feedback during the exploration phase — it directly shaped the proposal I submitted. link to the proposal: I wanted to let you know that I have submitted my GSoC 2026 proposal for Project#15. The proposal is structured around the phased plan you outlined:
During the prototype work I also identified and documented a few key architectural findings:
The prototype PR (openvinotoolkit/openvino.genai#3449) and submodule PR (ravi9/llama.cpp#55) are both open for review. I will continue improving the implementation while waiting for GSoC results. Thank you again for your time and support. Regards, |
Beta Was this translation helpful? Give feedback.
-
|
Hi @shahkarnav115-beep ! Apologies for the delayed response/feedback ! Thanks, |
Beta Was this translation helpful? Give feedback.
-
|
Hi @ravi9, Thank you so much for the review and the feedback! I'm really glad the proposal and prototype are aligning with the team's vision. I wanted to share a quick, update from the exploration side: I have successfully achieved 100% mathematical parity with the prototype! By mapping the anonymous n_past inputs (e.g., leaf_8), structuring a proper -INF causal attention mask natively, and testing against an unquantized F16 TinyLlama model, the OpenVINO translated graph now produces the exact same logits as native llama.cpp execution down to a strict 0.015f epsilon. This officially proves the dummy-capture and translation math is structurally sound! (My college internal exams are currently going on, so I was not able to respond early. I've just been working on this math proof in the background to ensure the foundation was solid). Looking at the official project scope, I am incredibly excited to spend the summer adapting this mathematical proof into the generalized architecture—specifically dynamically resolving those cache topologies, aligning the inputs for GenAI's LLMPipeline, and integrating it seamlessly into read_model(). Thanks again for the continuous guidance, and I look forward to the GSoC results! Best, |
Beta Was this translation helpful? Give feedback.
-
|
Sounds great @shahkarnav115-beep ! |
Beta Was this translation helpful? Give feedback.
-
|
Hi @ravi9! Just a quick update: my university exams are finally over, and my local prototype work is nearly finished! I'd love to use this extra time to keep contributing to the project. I noticed the new issues you added to the tracker. Since my local Windows/OpenVINO environment is completely set up, is there a specific issue you think would be best for me to jump into next? Also, regarding Issue ravi9/llama.cpp#116: I ran two of the models locally and they executed successfully on the CPU fallback. Since my machine doesn't have an NPU, is validating them on the CPU enough for now? I have already requested a Core Ultra instance on the Intel Tiber Developer Cloud so I can verify them properly on an NPU, and I am just waiting for the instance to be provisioned. Thanks, |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hey @ravi9, @cavusmustafa, and @adrianboguszewski!
I’m Karnav Shah, a 2nd-year BTech student at VIT-AP (Minoring in AI/ML). I’ve been following OpenVINO for a bit and I'm really interested in Project#15 for GSoC 2026 regarding the GGUF support.
I actually come from the OpenCV community—I’ve been contributing to their DNN module recently, especially working with ONNX models. Since OpenCV uses OpenVINO as a backend for optimization, I’m already somewhat familiar with how the ecosystem fits together. Since OpenCV isn't in GSoC this year, I’m excited to dive deeper into the OpenVINO side of things, specifically for GenAI.
Regarding the GGUF project: I’ve been studying the LLM and llama.cpp ecosystem and saw @ravi9’s fork. I’ve already set up a local environment where I’ve integrated that fork into openvino.genai.
I’ve also started on a basic skeleton for GGUFReaderV2 (added gguf_reader_v2.cpp/.hpp files) to map out how the tensors should be handled. I’m trying to keep the architecture clean from the start so it's easy to maintain.
I’d love to get some guidance on the specific tasks you have in mind for this project and see if my current approach aligns with the roadmap. Looking forward to hearing from you and hopefully contributing to the team!
Best,
Karnav Shah
https://github.com/shahkarnav115-beep
Beta Was this translation helpful? Give feedback.
All reactions