-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Description
Hi ncnn folks,
I wanted to share a small edge-language-runtime experiment and ask how people here would think about it relative to the more familiar graph/kernel style of on-device inference.
We built a public demo line called Engram and deployed it on a commodity ESP32-C3.
Current public numbers:
-
Host-side benchmark capability
LogiQA = 0.392523IFEval = 0.780037
-
Published board proof
LogiQA 642 = 249 / 642 = 0.3878504672897196host_full_match = 642 / 642- runtime artifact size =
1,380,771 bytes
Important scope note:
This is not presented as unrestricted open-input native LLM generation on MCU.
The board-side path is closer to a flash-resident, table-driven runtime with:
- packed token weights
- hashed lookup structures
- fixed compiled probe batches
- streaming fold / checksum style execution over precompiled structures
So this is not a standard operator/kernel execution path in the usual sense. It is closer to a task-specialized language runtime whose behavior has been pushed into a highly constrained executable form.
Repo:
https://github.com/Alpha-Guardian/Engram
What I’m curious about is whether people here would see this as:
- completely outside the ncnn-style deployment world
- an extreme endpoint of on-device specialization
- or a sign that some language-task systems may eventually want execution forms very different from standard graph runtimes
Would be interested in any thoughts.