GSoC 2026 Interest: Expanding GenAI Pipelines and Core GGUF Infrastructure #34207
Replies: 5 comments 1 reply
-
|
I've been spending the weekend analyzing the openvino.genai codebase (specifically Text2VideoPipeline) and the LTX-Video architecture to ensure my GSoC proposal is technically sound. While looking into the Image-to-Video (I2V) requirements, I realized that since LTX is a Diffusion Transformer (DiT) utilizing a highly compressed Video-VAE rather than a traditional 3D U-Net injecting the image conditioning requires a specific architectural choice. I weighed two potential paths: building a multimodal cross-attention adapter versus using Latent Initialization. Since adding a new fusion layer might require altering the pretrained transformer weights or assuming multimodal support that isn't native to the base DiT, I am supposing that Latent Initialization is the architecturally safer and more performant path for C++. My thought process for the implementation is:
This seems like the best way to ensure strict parity with the Python LTXConditionPipeline while keeping memory overhead low. Before I finalize the milestones in my proposal, I wanted to humbly ask for your feedback on this: Is this VAE-to-Latent initialization approach the correct implementation direction for the C++ pipeline, or is the team envisioning a different architectural path? |
Beta Was this translation helpful? Give feedback.
-
|
Hi @Ashitpatel001 ! Thanks ! |
Beta Was this translation helpful? Give feedback.
-
|
Hi @ravi9! Thank you for the warm welcome. I am actively researching the GGUF Reader v2 architecture (specifically the llama.cpp graph translation) and will share my draft proposal soon. In the meantime, I'm wrapping up this C API PR to familiarize myself with the OpenVINO GenAI coding standards and tensor management. Excited to contribute! |
Beta Was this translation helpful? Give feedback.
-
|
Hi Ashit, Sorry for the late reply, and thank you for your application! Please note that the final proposal must be submitted through the GSoC portal webapp to be considered. Could you also please attach the PR/PRs you've worked on in OpenVINO (OpenVINO GenAI)? At the moment, solving good first issues or making a solid contribution is the requirement for applying. As for Project 4 specifically, both the implementation of image-to-video support and the planning of its architecture are expected to be part of the program, rather than something that is already planned to be added beforehand. Thanks again for your interest . Regards, |
Beta Was this translation helpful? Give feedback.
-
|
Hi @likholat @sgonorov , I have successfully sent the final draft of my GSoC proposal via email. I would really appreciate any high level feedback or red flags you might have when you get a chance to review it. I also had one quick architectural question regarding the LTX Image-to-Video implementation: Thanks again for your time and guidance! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi OpenVINO Team,
I am Ashit Patel, an active contributor across the OpenVINO ecosystem. I am excited to announce that I am preparing two high-impact proposals for GSoC 2026: Project 4 (LTX Image-to-Video) and Project 15 (GGUF Reader v2 for Direct Execution).
My recent contributions directly address the complex technical challenges associated with these "Hard" category projects:
Technical Track Record
Core C++ & Graph Logic: Successfully implemented the 'flip' preprocessing step in the OpenVINO Core repository (openvinotoolkit/openvino#34135). This involved internal operator management, tensor manipulation, and adherence to strict C++ development standards.
GenAI C API: Developed the C API bindings for Text-to-Video in the OpenVINO GenAI repository (openvinotoolkit/openvino.genai#3331). This provided me with deep architectural familiarity with the
openvino_genairepository and its optimized execution methods.Vision & Media Pipelines: Built the C++ Video Style Transfer sample (openvinotoolkit/openvino.genai#3269) and a Live VLM Chat C++ sample (openvinotoolkit/openvino.genai#3308), handling real-time visual data streams via OpenCV and complex inference loops
Performance Engineering: Conducted rigorous PyTorch vs. OpenVINO INT4 latency benchmarks in the OpenVINO Notebooks repository (openvinotoolkit/openvino_notebooks#3245) to validate quantization and optimization strategies on Intel hardware.
Why These Projects?
Project 4: LTX Image-to-Video Support
I aim to leverage my experience with the Video GenAI C API and diffusion-based pipelines to ensure seamless Image-to-Video (I2V) parity between Python and C++. My focus will be on maintaining minimal memory overhead for latent space denoising on Intel AI PCs.
Project 15: GGUF Reader v2 (Dynamic Execution)
I plan to utilize my understanding of OpenVINO's frontend architecture and
GgmlOVDecoderto transition from static model reconstruction to a dynamic, scalable GGML computation graph translation. This will significantly broaden OpenVINO's native support for the GGUF ecosystem.I am currently looking forward to 350-hour project timelines for both proposals and look forward to discussing the architectural specifics with mentors @likholat , @sgonorov , @cavusmustafa , @ravi9
Best regards,
Ashit Patel
Beta Was this translation helpful? Give feedback.
All reactions