Vibe coding some different ideas on vit param sharing/recursion/latents by rwightman · Pull Request #2656 · huggingface/pytorch-image-models

rwightman · 2026-01-27T18:47:48Z

Some ideas I was exploring over a few days, different vit recursion / latent approaches. Created some new 'Recursive Supervision' ViT variants, revived Perceiver arch.

Possibly some ideas worth exploring more...

In this PR there are 3 variants of "Recursive Supervision" ViT. RSViT, RSPViT (latent handling insipired by Perceiver), RSTViT (update patterns inspired by TinyRecursiveModels).

These models roughly have x (patches), z (latent/hidden state), y (output representation/proxy, similar to class token).

There is an inner loop iterating over blocks and updating z, w/ some input/cross from x and previous y (or initial y), and then an outer supervision loop that usually updates y at the end (sometimes y is updated in the inner loop too). Each supervision step the y output is passed through the head, returned. The models differ in terms of how these updates are performed, cross attentions, concatenations, etc.

The RecursiveTask calculates the loss with weighting across each step. There's also a 'halting' signal/loss in attempt to provide a signal for early iteration stopping, though this part is rather half baked an untested (and needs a batching solution)...

Also included in this PR was some fiddling with my original Perceiver arch from a few years ago. It's udpated with a fourier embedding closer to the original, and an alternative ROPE that applies to the pixels (or patchified input), and cross inputs to cross attention.

…nt/perceiver structure.

HuggingFaceDocBuilderDev · 2026-01-27T18:50:46Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…upervision/recursion loop options.

Vibe coding some different ideas on vit param sharing/recursion, late…

cfb77f9

…nt/perceiver structure.

rwightman added 3 commits January 28, 2026 08:34

Fix use with soft target loss, and tests for FX / classifier

125cd18

Fix perceiver tests

3cbb8f9

Fixup LayerScale use in perceiver & rsvit

e8f0ae3

rwightman marked this pull request as draft February 6, 2026 21:04

Add a StepGate/StepEmbedding idea to RSViT variants. Add stochastic s…

2490051

…upervision/recursion loop options.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Vibe coding some different ideas on vit param sharing/recursion/latents#2656

Vibe coding some different ideas on vit param sharing/recursion/latents#2656
rwightman wants to merge 5 commits intomainfrom
vibe_rsvit

rwightman commented Jan 27, 2026 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

rwightman commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rwightman commented Jan 27, 2026 •

edited

Loading