GSoC 2026: Discussion Thread on Project#15 #34414

shahkarnav115-beep · 2026-03-01T10:17:29Z

shahkarnav115-beep
Mar 1, 2026

Hey @ravi9, @cavusmustafa, and @adrianboguszewski!

I’m Karnav Shah, a 2nd-year BTech student at VIT-AP (Minoring in AI/ML). I’ve been following OpenVINO for a bit and I'm really interested in Project#15 for GSoC 2026 regarding the GGUF support.

I actually come from the OpenCV community—I’ve been contributing to their DNN module recently, especially working with ONNX models. Since OpenCV uses OpenVINO as a backend for optimization, I’m already somewhat familiar with how the ecosystem fits together. Since OpenCV isn't in GSoC this year, I’m excited to dive deeper into the OpenVINO side of things, specifically for GenAI.

Regarding the GGUF project: I’ve been studying the LLM and llama.cpp ecosystem and saw @ravi9’s fork. I’ve already set up a local environment where I’ve integrated that fork into openvino.genai.

I’ve also started on a basic skeleton for GGUFReaderV2 (added gguf_reader_v2.cpp/.hpp files) to map out how the tensors should be handled. I’m trying to keep the architecture clean from the start so it's easy to maintain.

I’d love to get some guidance on the specific tasks you have in mind for this project and see if my current approach aligns with the roadmap. Looking forward to hearing from you and hopefully contributing to the team!

Best,
Karnav Shah
https://github.com/shahkarnav115-beep

ravi9 · 2026-03-04T22:05:18Z

ravi9
Mar 4, 2026
Collaborator

Hi @shahkarnav115-beep ,
Nice to meet you and thank you for your interest in this project!

I think your approach aligns with our idea. Please share your current integration branch and we can review it and provide feedback. You could also use that feedback when preparing to submit GSoC proposal.

The goal of this project is to evaluate the feasibility of moving the existing OpenVINO backend from llama.cpp into OpenVINO-GenAI and eventually to OpenVINO.

The proposed new GGUFReaderV2 should:
-- 1. call the llama.cpp APIs to produce the ggml_cgraph,
-- 2. then pass ggml_cgraph to GgmlOvDecoder to produce an ov::Model
-- 3. then pass ov::Model into the GenAI read_model() pipeline for compilation and inference.

I think, it is better to do it in phases.
Phase 1: Validate if the produced ov::Model in (2) is compatible with GenAI, if not update the GgmlOvDecoder/ggml-openvino.
Phase 2: (optional) Move the GgmlOvDecoder/ggml-openvino into OpenVINO GenAI.
Phase 3: Move the GgmlOvDecoder/ggml-openvino into OpenVINO.

cc: @cavusmustafa

Thanks,
Ravi

1 reply

shahkarnav115-beep Mar 5, 2026
Author

Thanks for the response, @ravi9!

As requested, I’d like to share my current progress. I have opened two WIP Pull Requests with my initial work and would love your feedback on them:

Pull Requests:

Submodule headers: ravi9/llama.cpp#55

Main core logic: openvinotoolkit/openvino.genai#3449

Regarding the proposed plan you shared: I would love to work on it! I will cross-reference it with my current plan and align my next steps accordingly.

shahkarnav115-beep · 2026-03-11T22:53:54Z

shahkarnav115-beep
Mar 11, 2026
Author

hii @ravi9, @cavusmustafa and @adrianboguszewski!!

While I was trying to run tinyllama.gguf through my branch, which i mentioned earlier. The model get loaded, llama.cpp context is created, the graph is extracted but I noticed one thing that out 798 nodes (which tinyllama.gguf have) only 9 nodes are named. These suggests 2 major architectural challanges for the translation pipeline:

Tensor naming issue: llama.cpp leaves many intermediate tensors unnamed, causing std::map key collisions in the OpenVINO decoder.
Fragmented Graph Construction: The llama_decode scheduler splits the execution graph, preventing a full monolithic capture.

These is the error log-file which contains the details of nodes, gguf-reader-v2 and the error:
error-log.txt

I have proposed the solution for it in the given proposal below:
GSOC(2).pdf

Is it aligning with our target and are there any prerequisite work from your side to work on, so I can add them in proposal or the prototype is enough??

Thanks,
Karnav Shah
https://github.com/shahkarnav115-beep

0 replies

shahkarnav115-beep · 2026-03-16T19:41:44Z

shahkarnav115-beep
Mar 16, 2026
Author

Hi @ravi9, @cavusmustafa and @adrianboguszewski,

Thank you for the guidance and feedback during the exploration phase — it directly shaped the proposal I submitted.

link to the proposal:
Drafft-proposal.pdf

I wanted to let you know that I have submitted my GSoC 2026 proposal for Project#15. The proposal is structured around the phased plan you outlined:

Phase 1 (Core): Validate that the ov::Model produced by GgmlOvDecoder is fully compatible with GenAI's read_model() pipeline. The prototype PR (Project#15: GGUF Reader in OpenVINO for direct GGUF Execution (Prototype) openvino.genai#3449) and submodule PR
(ggml-openvino: Add graph capture API support for GGUFReaderV2 ravi9/llama.cpp#55) are both open for review.It has already validated the end-to-end flow with TinyLlama.gguf — 5,141 OpenVINO operations successfully captured, translated, and executed.
Phase 2 (Stretch): Move GgmlOvDecoder/ggml-openvino into OpenVINO GenAI.
Phase 3 (Future): Move GgmlOvDecoder/ggml-openvino into OpenVINO core.

During the prototype work I also identified and documented a few key architectural findings:

The capture mode mechanism (ggml_backend_ov_set_capture_mode) in the OpenVINO backend is purpose-built upstream infrastructure — exactly what GGUFReaderV2 needs.
The math equivalence test needs a two-decode pattern (one for capture, one for real reference logits) — this is the first thing I will fix in Phase 1.
Tensor naming collisions in GgmlOvDecoder need to be resolved by switching from std::string to const struct ggml_tensor* for node mapping.

The prototype PR (openvinotoolkit/openvino.genai#3449) and submodule PR (ravi9/llama.cpp#55) are both open for review. I will continue improving the implementation while waiting for GSoC results.

Thank you again for your time and support.

Regards,
Karnav Shah

0 replies

ravi9 · 2026-03-16T20:37:33Z

ravi9
Mar 16, 2026
Collaborator

Hi @shahkarnav115-beep ! Apologies for the delayed response/feedback !
We did receive your proposal and it looks great. It aligns well with the project goals.
Great work on the PRs and initial prototype ! Appreciate your interest and contributions so far.
We will review your proposal and follow GSoC timelines to provide decisions through GSoC portal.

Thanks,
Ravi.

0 replies

shahkarnav115-beep · 2026-03-22T17:04:24Z

shahkarnav115-beep
Mar 22, 2026
Author

Hi @ravi9,

Thank you so much for the review and the feedback! I'm really glad the proposal and prototype are aligning with the team's vision.

I wanted to share a quick, update from the exploration side: I have successfully achieved 100% mathematical parity with the prototype! By mapping the anonymous n_past inputs (e.g., leaf_8), structuring a proper -INF causal attention mask natively, and testing against an unquantized F16 TinyLlama model, the OpenVINO translated graph now produces the exact same logits as native llama.cpp execution down to a strict 0.015f epsilon. This officially proves the dummy-capture and translation math is structurally sound!

(My college internal exams are currently going on, so I was not able to respond early. I've just been working on this math proof in the background to ensure the foundation was solid).

Looking at the official project scope, I am incredibly excited to spend the summer adapting this mathematical proof into the generalized architecture—specifically dynamically resolving those cache topologies, aligning the inputs for GenAI's LLMPipeline, and integrating it seamlessly into read_model().

Thanks again for the continuous guidance, and I look forward to the GSoC results!

Best,
Karnav

0 replies

ravi9 · 2026-03-23T18:12:08Z

ravi9
Mar 23, 2026
Collaborator

Sounds great @shahkarnav115-beep !
All the best with your exams !

0 replies

shahkarnav115-beep · 2026-03-28T13:58:02Z

shahkarnav115-beep
Mar 28, 2026
Author

Hi @ravi9!

Just a quick update: my university exams are finally over, and my local prototype work is nearly finished!

I'd love to use this extra time to keep contributing to the project. I noticed the new issues you added to the tracker. Since my local Windows/OpenVINO environment is completely set up, is there a specific issue you think would be best for me to jump into next?

Also, regarding Issue ravi9/llama.cpp#116: I ran two of the models locally and they executed successfully on the CPU fallback. Since my machine doesn't have an NPU, is validating them on the CPU enough for now? I have already requested a Core Ultra instance on the Intel Tiber Developer Cloud so I can verify them properly on an NPU, and I am just waiting for the instance to be provisioned.

Thanks,
Karnav

0 replies

GSoC 2026: Discussion Thread on Project#15 #34414

Uh oh!

Uh oh!

shahkarnav115-beep Mar 1, 2026

Replies: 7 comments · 1 reply

Uh oh!

ravi9 Mar 4, 2026 Collaborator

Uh oh!

shahkarnav115-beep Mar 5, 2026 Author

Uh oh!

shahkarnav115-beep Mar 11, 2026 Author

Uh oh!

Uh oh!

shahkarnav115-beep Mar 16, 2026 Author

Uh oh!

ravi9 Mar 16, 2026 Collaborator

Uh oh!

Uh oh!

shahkarnav115-beep Mar 22, 2026 Author

Uh oh!

ravi9 Mar 23, 2026 Collaborator

Uh oh!

Uh oh!

shahkarnav115-beep Mar 28, 2026 Author

shahkarnav115-beep
Mar 1, 2026

Replies: 7 comments 1 reply

ravi9
Mar 4, 2026
Collaborator

shahkarnav115-beep Mar 5, 2026
Author

shahkarnav115-beep
Mar 11, 2026
Author

shahkarnav115-beep
Mar 16, 2026
Author

ravi9
Mar 16, 2026
Collaborator

shahkarnav115-beep
Mar 22, 2026
Author

ravi9
Mar 23, 2026
Collaborator

shahkarnav115-beep
Mar 28, 2026
Author