Skip to content

attila-lendvai/maru

Repository files navigation

CI

Maru

What

Maru is a programming language. It's a self-hosting, yet tiny lisp dialect: a symbolic expression evaluator that can compile its own implementation to machine code, in about 2000 LoC altogether.

Maru is in particular trying to be malleable at the very lowest levels, so any special interest that cannot be accommodated easily within the common platform would be a strong indicator of a deficiency within the platform that should be addressed rather than disinherited. (Ian Piumarta)

This repo is also a place for exploration in the land of bootstrapping and computing system development. My primary drive with Maru is to clearly and formally express that which is mostly treated as black magic: the bootstrapping of a language on top of other languages (which includes the previous developmental stage of the same language).

Meta

This document aims to present an overview of Maru at its latest developmental stage.

In the earlier developmental stages (i.e. in the other git branches) this file contains notes relevant to that specific stage.

There are various documents in the doc/ directory that discuss some topics in further detail.

The documentation generated by deepwiki is also not useless.

How

Maru's architecture is described in doc/how.md.

Build instructions

To test a bootstrap cycle using one or all of the backends:

make test-bootstrap-x86    # defaults to the libc platform
make PLATFORM=[libc,linux] test-bootstrap[-llvm,-x86]
Platform specific instructions

Guix

My primary platform. There's a manifest.scm file in the repo, so you can run guix shell to enter into the same environment that I use when I work on Maru. Guix can also be used as a simple package manager on any Linux distro, no need to install Guix System.

Nix and NixOS

Used to be my primary platform. There's a (potentially bitrotten) default.nix file in the repo, so you can run nix-shell to enter into the same environment that I used on NixOS.

Debian, and derivatives

sudo apt install make time rlwrap

You will need LLVM, and/or a C compiler (any version beyond LLVM 8 should work):

sudo apt install llvm clang

For now the x86 backend only supports 32 bit mode. To use it you will need to have support for compiling and running 32 bit C code. On Debian based x86_64 systems this will install all the necessary libraries:

sudo apt install gcc-multilib

MacOS (darwin)

As of this writing (2026) both the x86_64 and the LLVM backends can bootstrap on an x86_64 MacOS running in kvm on a Linux. There's even CI set up for testing every commit on an arm64 runner using arch -x86_64 make [...] (but they fail currently).

The following instructions worked in the kvm setup as of 2026:

  1. Make sure XCode is installed. In a Terminal:
xcode-select --install
  1. Install Homebrew
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)"
  1. Install LLVM using Homebrew
brew install llvm
echo export PATH="$(brew --prefix llvm)/bin:$PATH" >> ~/.bash_profile
source ~/.bash_profile

Guix on OpenWRT on an aarch64 hw

I own a BPI-R4 (a powerful router) that is currently running OpenWrt. I installed Guix (an advanced package manager) on it out of curiosity.

In that environment Maru can bootstrap itself using the LLVM backend and through both the libc and the linux platform.

Other platforms

Maru should work everywhere where there's a libc and LLVM/clang is available. Alternatively, it should be able to bootstrap on any x86 machine where a libc and the gnu toolchain is available.

Patches are welcome to support other platforms.

Who

Originally written by Ian Piumarta, at around 2011. Full commit history is available in the piumarta branch.

The current gardener is attila@lendvai.name.

Where

Bugs and patches: maru github page.

Discussion: maru-dev google group.

Why

  • Programming badly needs better foundations, and Maru is part of this exploration. The foundations should get smaller, simpler, more self-contained, and more approachable by people who set out to learn programming.

  • I'm fascinated by bootstrapping issues. We lose a lot of value by not capturing the history of the growth of a language, including the formal encoding of its build instructions. They are useful both for educational purposes, and also for practical reasons: to have a minimal seed that is very simple to port to a new architecture, and then have a self-contained, formal bootstrap process that can automatically "grow" an entire computing system on top of that freshly laid, tiny foundation.

  • Ian seems to have abandoned Maru, and his published archive couldn't be run as-is. But it's an interesting piece of code that deserves a repo and a maintainer to keep bitrot at bay.

  • This work is full of puzzles that are a whole lot of fun to solve!

Contribution

You are very welcome to contribute, but beware that short of any contributors this repo receives forced pushes every now and then (i.e. git push -f rewriting git history (except the piumarta branch)). This will stop eventually when contributors show up, or I settle down with a build setup that nicely facilitates bootstrapping multiple, parallel paths of language development. Please make sure that you open a git branch for your work, and/or that you are ready for some git fetch and git rebase.

Status

Maru's status

There's the piumarta branch preserving the old state and accumulating some recent fixes.

There's the maru.1 line of development that started from the minimal version of Maru and properly bootstrapped the extra features of the piumarta branch, i.e. without evolving eval.c in parallel. I only use the 2300 LoC of throwaway C code as the initial stepping stone in the bootstrap process, but once the first step is made, the C code is left behind for good. The head of the maru.3 branch should be semantically equivalent to the eval.l that resides in the piumarta branch.

After that I started to further evolve Maru's implementation in the maru.4 branch and beyond. Each branch contains a readme explaining what's relevant for that stage.

Critique of the current codebase

IOW, it's sort of a high-level TODO:

To accommodate the various experiments, I had to cut the lisp codebase into countless files. This tree of small files should/could be simplified, and hopefully will be done once I will have rewritten the build in Maru.

The accidental complexity in the Makefile is an abomination compared to the rest of the project. I'm looking forward to forgetting it for good.

Many interesting experiments in the Piumarta branch are not yet revived.

The compiler accidentally got intertwined with the VM implementation and it cannot currently compile a simple standalone sexp to the target.

I'm not happy about how types are done currently. First, they are just runtime tags really, not types in the type-check sense. A simple static type system should be added, if for nothing else to show where its place would be in the codebase, and to contrast it with the runtime tagged values.

Notable new features

There are several Maru stages/branches now, introducing non-trivial new features. Some that are worth mentioning:

  • Introduction of platforms, and notably the linux platform that compiles to a statically linked executable that only uses Linux kernel syscalls; From a practical perspective this is almost equivalent with running directly on the bare metal, minus dealing with the hardware drivers (i.e. all dynamically allocated memory needs to be managed by our own GC, all IO behind our own abstractions, etc). Other platforms: libc (functional), and metacircular (only planned).

  • The host and the slave are isolated while bootstrapping which makes it possible to do things like reordering types (changing their type id in the target), or changing their object layout.

  • Relying on this isolation, the code in eval.l now looks pretty much the same as something that is meant to be loaded into the evaluator (i.e. the function implementing car in eval.l is now called car). This paves the way for metacircularity: to be able to "bring alive" the evaluator by loading it verbatim into another instance of itself (as opposed to compiling it to machine code and giving it to a CPU to animate it).

  • The addition of an LLVM backend.

Notable features

  • A statistical profiler scanning the backtrace.

  • A bootstrapped PEG parser.

  • A C groveller that uses a DSL implemented by a PEG grammar.

Future plans

Assorted TODO list
  • Make Maru Scheme compatible, either by forking it, or by some sort of a compatibility layer that is loadable into vanilla Maru. Then consider how that relates to GNU Mes and the bootstrappable.org community.

  • Finish the proof of concept in tests/test-elf.l to compile the Linux plaform directly into an ELF binary. This would reduce the list of external dependencies to a single one (GNU Make).

  • Rewrite the build process in Maru; eliminate dependency on GNU Make.

  • Replace the hand-written parser in eval.l with something generated by a parser generator, maybe the PEG compiler. More generally, make the parser extendable.

  • Implement modules and phase separation along with what is outlined in Submodules in Racket - You Want it When, Again?. Part of this is already done and is used in the bootstrap process.

  • Compile to, and bootstrap on the bare metal of some interesting targets. It's already demonstrated by the Linux platform. Another one could be pc-bios, or EFI, because it's easily testable using QEMU. Or port it to an ARM board (like Raspberry Pi)? Or maybe even attempt a C64 port?

  • Revive all the goodies in the piumarta branch, but in a structured way.

  • Investigate Cranelift, QBE, libfirm, Tilde, and consider adding them as backends.

  • Simplify the types-are-objects (as opposed to integers) part and its bootstrap, and maybe even make it optional?

  • Weed out some of the added bloat/complexity (e.g. compile closures instead of <selector>s, and use them to implement streams; write a tree shaker; etc).

  • Fully merge the language and API that the compiler and the evaluator understands; i.e. make the level-shifted code (eval.l & co.) less different than code understood by the evaluator. This would mean that we can e.g. load/compile source/buffer.l both into the level-shifted code and into the evaluator. This is slowly happening, but it's nowhere near done, and I'm not even sure what being done means here.

  • Maybe add PEG-based tree rewriter to the repo as a branch, and use it as a bootstrap stage. It seems to be an earlier, or different iteration of the same idea.

  • Introduce a simplified language that drops some langauge features, e.g. remove forms and the expand protocol. Make sure that this language can bootstrap itself off of C99. Then reintroduce forms and expand by using this simplified Maru as the bootstrap host.

  • Understand and incorporate François René Rideau's model of First Class Implementations: Climbing up the Semantic Tower, (see this couple of page summary, or see his page on reflection)

  • Maybe rename long to word throughout the project.

History and perspective

Around 2010-2013

Maru was developed as part of Alan Kay's Fundamentals of New Computing project, by the Viewpoints Research Institute. The goal of the project was to implement an entirely new, self-hosting computing system, with GUI, in 20.000 lines of code.

At some point VPRI went quiet and closed down in 2018. Much of their online content disappeared, and the team (probably) also dissolved.

Their annual reports: 2007, 2008, 2009, 2010, 2011, 2012.

This git repo

The piumarta branch of this git repo is a conversion of Ian Piumarta's Mercurial repo that was once available at http://piumarta.com/hg/maru/. To the best of my knowledge this is the latest publicly available state of Ian's work. This repo was full of assorted code, probably driving the VPRI demos.

The piumarta branch will be left stale (modulo small fixes and cleanups). My plan is to eventually revive most of the goodies from this branch, but in a more organized and approachable manner, and also paying attention to the bootstrapping issues.

Ian published another Mercurial repo somewhere halfway in the commit history with only a couple of commits from around 2011. I assume that it was meant to hold the minimal/historical version of Maru that can already self-host. I started out my work from this minimal repo (hence the divergence between the piumarta and the maru.x branches in this repo).

Other instances

There are some other copies/versions of Maru. Here are the ones that I know about and contain interesting code:

Related projects

A list of projects that are relevant in this context:

  • Robert van Engelen's tinylisp (99 lines of C with GC and a REPL), his paper, and its big brother, which is 1k LoC of heavily commented C. No compiler, they are not self-hosting.

  • Loko Scheme is a self-hosting scheme that runs on the bare metal, i.e. some hardware drivers included.

  • sectorlisp (github): LISP with GC in 436 bytes. It doesn't have a compiler, i.e. it cannot self-host. It only has a C implementation, an x86 assembly implementation (in the form of a boot sector), and John McCarthy's Lisp in Lisp evaluator. It would be an interesting project to add a compiler to it and see how the end result compares to Maru. Or to start growing a language as demonstrated in this repo, but starting out from sectorlisp. Note that sectorlisp is not equivalent to the first stage of Maru (the maru.1 git branches), because that can already self-host, i.e. it can bootstrap itself off of the C implementation.

  • Seedling: a ladder of languages, with a minimalistic core language at the bottom called Seed (it's a Forth like). Seed can self-host in less than 1k LoC. The higher level languages above Seed are (going to be) extensions of it, and are implemented on top of Seed. Porting to a new architecture will be trivial. And an interesting tidbit: the initial bootstrap was done not by using another programming language/compiler, but by pen and paper!

  • bootstrappable.org: a community around bootstrapping, and making/keeping projects bootstrapable. It brings together many interesting projects: stage0 (~500 byte self-hosting hex assembler), live-bootstrap, GNU Mes (Scheme + C, mutually self-hosting each other), m2-planet (a tiny C compiler).

  • Kalyn: a subset of Haskell semantics (mostly; not lazy), but with Lisp syntax. Entirely (!) self-hosting over x86-64 in 4-5 kLoC. The project feels of high standard, including its documentation.

  • nanohs: a tiny self-hosting subset of Haskell. It doesn't have any type-check implemented, but it can parse (and ignore) type annotations. This setup enables very simple self-hosting, while makes it possible to also type-check the source using a fully implemented Haskell compiler.

  • PEG-based tree rewriter: runnable code to accompany Ian Piumarta's paper called PEG-based tree rewriter provides front-, middle- and back-end stages in a simple compiler. Ian wrote this before Maru, and there are several similarities between the two. See the mailing list thread.

  • blynn's Haskell compiler: bootstrap a Haskell compiler incrementally from C, with extensive documentation..

  • ichbins is a minimal self-hosting compiler of a Lisp dialect to C in 6 pages of code. elv is a VM built on it for very basic Lisp + Erlang-style processes (abandoned).

  • RefPerSys: a mostly symbolic artificial intelligence long-term project, with ambitious Artificial General Intelligence goals. It contains interesting and relevant ideas, e.g. in refpersys-design.pdf.

  • Project Oberon: a project which encompasses CPU, language, operating system and user interface, and which can be run on a relatively inexpensive FPGA board, and simple enough for one person to understand it all.

  • tort: Inspired by Ian Piumarta's idst, maru and other small runtimes. Core is approx. 5000 lines of C.

  • kernel: "Kernel is a conservative, Scheme-like dialect of Lisp in which everything is a first-class object." (including special forms) You may want to also see this blog.

  • Compiling a Lisp: Overture: Educational article series about constructing a simple Lisp compiler, implemented in C.

  • D. F. Hendry's MINT3: Supposedly something interesting from the past, but I haven't researched it myself yet.

  • Meta II (1954) - a compiler compiler, or a language to write compilers in. See this tutorial.

    META II is not intended as a standard language which everyone will use to write compilers. Rather, it is an example of a simple working language which can give one a good start in designing a compiler-writing compiler suited to his own needs. Indeed, the META II compiler is written in its own language, thus lending itself to modification.”

  • mu: I haven't played with this yet, but it seems to be a very self-contained and low-level language/VM whose compiler can target the bare Linux kernel, or the bare machine (x86). The readme has a collection of links to docs on low-level stuff. Mu could be another stepping stone to bootstrap Maru.