Skip to content

Vibe coding some different ideas on vit param sharing/recursion/latents#2656

Draft
rwightman wants to merge 5 commits intomainfrom
vibe_rsvit
Draft

Vibe coding some different ideas on vit param sharing/recursion/latents#2656
rwightman wants to merge 5 commits intomainfrom
vibe_rsvit

Conversation

@rwightman
Copy link
Copy Markdown
Collaborator

@rwightman rwightman commented Jan 27, 2026

Some ideas I was exploring over a few days, different vit recursion / latent approaches. Created some new 'Recursive Supervision' ViT variants, revived Perceiver arch.

Possibly some ideas worth exploring more...

In this PR there are 3 variants of "Recursive Supervision" ViT. RSViT, RSPViT (latent handling insipired by Perceiver), RSTViT (update patterns inspired by TinyRecursiveModels).

These models roughly have x (patches), z (latent/hidden state), y (output representation/proxy, similar to class token).

There is an inner loop iterating over blocks and updating z, w/ some input/cross from x and previous y (or initial y), and then an outer supervision loop that usually updates y at the end (sometimes y is updated in the inner loop too). Each supervision step the y output is passed through the head, returned. The models differ in terms of how these updates are performed, cross attentions, concatenations, etc.

The RecursiveTask calculates the loss with weighting across each step. There's also a 'halting' signal/loss in attempt to provide a signal for early iteration stopping, though this part is rather half baked an untested (and needs a batching solution)...

Also included in this PR was some fiddling with my original Perceiver arch from a few years ago. It's udpated with a fourier embedding closer to the original, and an alternative ROPE that applies to the pixels (or patchified input), and cross inputs to cross attention.

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@rwightman rwightman marked this pull request as draft February 6, 2026 21:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants