This is a fun project to implement LISP 1.5 as described in "LISP 1.5 Programmer’s Manual". It’s not a complete LISP 1.5 environment emulator; instead, it is to get a glimpse of how it is constructed.
When LISP was in its infancy, it wasn’t wrapped with so many parentheses which would later become its most memorable characteristic. At least in front of people’s eyes.
An early LISP program looked like this:
member[a;l] = [null[l] → NIL;
eq[a;car[l]] → l;
T → member[a;cdr[l]]]
S-expressions were also used, but only to denote data.
member[C;(A B C D E)]
LISP programmers back then read and wrote programs in this form, on paper. Yes, program sources were meant to be read by humans. In order for computers to understand programs, one must have prepared a deck of cards with holes punched according to BCD character code. The character set didn’t have many punctuations and it was impossible to punch M-expressions directly. Instead, people translated M-expressions into S-expressions, which was already used to describe nested list structures. In other words, people fed the machine the internal representation of syntax tree directly.
Eventually, people got used to write programs in S-expresions directly, and M-expressions were faded into oblivion.
However, when we read papers of that era, we see M-expressions everywhere. Now that we can write M-expressions in a text file, let’s make the computer understand it directly!
Here’s a summary of M-expression sytnax:
-
Identifiers are in all lowercase. Literal symbols are in all uppercase. No
quotein M-expression. Literal list can be written using parentheses, e.g.(A B (C D . E)), without a quote. -
Function call is written as
fn[arg1;arg2;…] -
Conditional is
[test1 -> expr1; test2 -> expr2;…]. It works just likecond. Unicode arrow character→(U+2192) can be used in place of->. -
Symbol
NILis used for list terminator. Hencecons[A;NIL]yields(A). -
Literal function is written as
lambda[[arg;..];expr]. This lambda form itself doesn’t have a value — it must be called with arguments to take effect. -
Local recursive function can be written as
label[name;lambda[[arg;…];expr]], where you can usenamein theexprto refer to itself. -
In toplevel, you can define a function by
name[arg;…] = expr.
An M-expression can be translated into an S-expression as follows:
-
Literal data (symbols and lists) are embedded in the output by
(QUOTE <literal>). -
Function call
fn[arg1;arg2;…]becomes (FN ARG1 ARG2 …)`. We didn’t have lower case characters on computers back then. -
Conditional
[test1 -> expr1; test2 -> expr2;…]becomes(COND (TEST1 EXPR1) (TEST2 EXPR2) …). -
Literal function
lambda[[arg;..];expr]becomes(LAMBDA (ARG …) EXPR). -
Local recursive function
label[name;lambda[[arg;…];expr]]becomes(LABEL NAME (LAMBDA (ARG …) EXPR)).
Module LISP1.5.mexpr implements parsers. A procedure
parse-mexpr takes a string or an input port, and parses one M-expression
and returns an S-expression.
gosh> ,u LISP1.5.mexpr gosh> (parse-mexpr "cons[(A . B);C]") (CONS (QUOTE (A . B)) (QUOTE C)) gosh> (parse-mexpr "lambda[[x];[eq[NIL;x]→T; T→F]]") (LAMBDA (X) (COND ((EQ (QUOTE NIL) X) (QUOTE T)) ((QUOTE T) (QUOTE F))))
See how liteals (upper-case names and lists) are QUOTE -d
once parsed.
|
💡
|
If you try the above example on your REPL, make sure your terminal character encoding matches your Gauche setting (it’s utf-8 by default). |
Interestingly, there seemed no direct translation of definition
syntax name[arg;…] = expr into an S-expression.
When a user translates a prorgam in M-expressions into S-expressions
to punch the cards,
she gathers all the definitions into a call of DEFINE pseudo function
(see the note below).
For our purpose, we want the parser to yield a single S-expression
from one M-expression. So, we return ($= …) form
as the result of parsing a defintion:
gosh> (parse-mexpr "null[x] = [eq[NIL;x]→T; T→F]]") ($= (NULL X) (COND ((EQ (QUOTE NIL) X) (QUOTE T)) ((QUOTE T) (QUOTE F))))
(The preceding $ of $= indicates it is an internal or
implementation-dependent feature.)
The parse-mexpr function only parses one M-expression.
If the input may contain more than one M-expression, use parse-mexprs
instead, which reads input up to EOF and returns an lseq of result
S-expressions.
We also add these extensions, for the convenience.
There’s no comment syntax defined in M-expressions. Since
it’s for humans to read, you could freely intermix code
and natural language descriptions. For our purpose,
we make a hash sign # to the end of line is a comment.
(We avoid ;, for it is used as a separator.)
In Appendix B of LISP 1.5 Programmer’s Manual, they use a pseudo extension of conditional expression for concise explanation, in which you can access the result of test expression from the expression in that branch. Such extension wasn’t formalized and the actual code is written in assembly language instead of M-expressions. But for our purpose it’ll be convnient to support such extension.
It is to allow a conditional expression to have the following clause:
test => fun
Here, fun must be a LAMBDA form that takes one argument,
or an expression that yield a function.
First, test is evaluated, and if it yiels a true value
(a value neither NIL nor F), the value is passed
to the function. It’s the same as Scheme’s cond feature with =>.
We’ll explain the actual use case and implementation of this extension when we get to the full toplevel environment support.
With Gauche’s reader directive feature, you can write source in M-expressions, as follows:
;; ;; Scheme comments ;; (use LISP1.5.mexpr) #!m-expr # M-expression function definitions function1[arg;...] = expression1 function2[arg;...] = expression2 ...
For our purpose, we want to treat M-expressions as our source code, and the parser returns a single S-expression as a result. So we introduce our own extension.
($TOPLEVEL <toplevel-form> ...)
<toplevel-form> : ($= <expr> <expr> [<key>])
| <expr>
When read, the entire source is wrapped in $TOPLEVEL form.
Inside it, each toplevel form becomes either
($= <expr> <expr>) (in case of definition) or just an <expr>
in case of toplevel function call. This $TOPLEVEL form is
merely our parser’s way to wrap the result, and its interpretation
depends on the caller of the parser; it doesn’t mean we’ll have
a special form called $TOPLEVEL.
|
ℹ️
|
In the actual use case, all definitions in a program were gathered and translated into the following form to be punched: DEFINE (( (NAME (LAMBDA (ARG ...) EXPR)) (NAME (LAMBDA (ARG ...) EXPR)) ... )) This is actially a special syntax to execute a function call on toplevel.
It takes a form If you want to perform some calculation, you list the call of
the function after the DEFINE (( ... definitions .. (DOSOMETHING (LABMDA (ARG ...) EXPR)) )) DOSOMETHING (ARG ...) Examples are shown in p.15 and pp.48-51 of LISP1.5 Programmer’s Manual. |
The #!m-expr directive translates those M-expressions into
a LISP1.5 DEFINE form:
($TOPLEVEL ($= (FUNCTION1 ARG ...) EXPRESSION1) ($= (FUNCTION2 ARG ...) EXPRESSION2) ...)
Note that you have to have definitions of $TOPLEVEL and other primitive
LISP1.5 forms before loading the source file; The LISP1.5.mexpr module
only handles parsing.
We provide several implementations of those LISP1.5 primitives, which we’ll show you in the following chapters.
Section 1.6 of "LISP 1.5 Programmer’s Manual" is one of the pinnacles of the document. They show how to implement Lisp interpreter on top of Lisp systems. They call it "a Universal LISP function".
We write out their code in mx/eval.mx.
What’s interesting about it is that you only need a handful of
functions and syntaxes to run the interpreter. We define those
minimal set of primitives in LISP1/5/axioms.scm.
It provides the definition of the following primitives:
CAR, CDR, CONS, ATOM, EQ, QUOTE, and COND, as well as
a definition of $TOPLEVEL to handle toplevel forms.
To try the eval function, first use the axioms module, then
load the eval.mx file. Assuming you have
load path set to the top directory of LISP1.5 source,
you can say the following in the gosh REPL:
gosh> ,u LISP1.5.axioms gosh> ,l mx/eval.mx #t
Or, you can start gosh with loading necessary modules (this assumes you’re in the top directory of LISP1.5 source):
$ gosh -I. -u LISP1.5.axioms -l mx/eval.mx
On the gosh prompt, you can call EVAL. The first argument
is the S-expression to evaluate, and the second argument
is the environment (assoc list of symbols and values):
gosh> (EVAL '(CONS (CAR (QUOTE (X . Y))) (QUOTE Z)) 'NIL) (X . Z)
Be aware of the difference of ' (quote) and QUOTE.
The former one is recognized by Gauche. The latter one is recognized by
EVAL.
If you prefer, you can write M-expressions using
read-time constructor #,(m-expr "…"):
gosh> (EVAL '#,(m-expr "cons[car[(X . Y)];Z]") 'NIL) (X . Z)
Following is a bit more convoluted example. It defines append
as a recursive funciton using LABEL, and calls it with
two arguments, (A B C) and (X Y Z):
gosh> (EVAL '#,(m-expr "label[append;lambda[[xs;r];\
[eq[xs;NIL] -> r;\
T -> cons[car[xs];append[cdr[xs];r]]]]]\
[(A B C);(X Y Z)]")
'NIL)
(A B C X Y Z)
This interpreter only knows the minimal 7 primitives:
CAR, CDR, CONS, ATOM, EQ, QUOTE, and COND.
To refer to anything other than that,
you have to pass them in the environment argument.
The following example reverses a list, using the
definition of NULL, APPEND and REVERSE given to the environment:
gosh> (EVAL '#,(m-expr "reverse[(A B C D E F G)]")
'((NULL . #,(m-expr "lambda[[x];[eq[x;NIL] -> T; T -> F]]"))
(APPEND . #,(m-expr "lambda[[xs;r];\
[eq[xs;NIL] -> r;\
T -> cons[car[xs];append[cdr[xs];r]]]]"))
(REVERSE . #,(m-expr "lambda[[xs];\
[null[xs] -> NIL;\
T -> append[reverse[cdr[xs]];cons[car[xs];NIL]]]]"))
))
(G F D C B A)
|
ℹ️
|
We need to provide the function |
|
💡
|
When you refer to an identifier that’s neither one of the built-in primitive nor the one given in the environment, you’ll get an error like the following: *** ERROR: pair required, but got NIL
Stack Trace:
_______________________________________
0 (car x)
at "./LISP1/5/axioms.scm":9
1 (CAR X)
[unknown location]
2 (CAAR A)
[unknown location]
3 (EQUAL (CAAR A) X)
[unknown location]
4 (ASSOC E A)
[unknown location]
5 (EVAL FN A)
[unknown location]
...
The code searches the environment alist by |
Since the universal LISP function defined in eval.mx understands
the primitives required to interpret functions in eval.mx, you can use
our EVAL to evaluate eval.mx to run EVAL on top of
EVAL — now you’re running a metacircular interpreter!
You might have noticed though, that axioms.scm provides $TOPLEVELS,
which is missing in eval.mx. In our context of discussing
metacircular interpreter, $TOPLEVELS appears as a result of
parsing M-expression definitions, and should be understood
as a meta-language to direct the set-up, rather than an integrated
part of the language (one way to think of it is that if other primitives
are C built-ins then $TOPLEVELS is #pragma or Makefile — they belong
to a different layer.)
Of course, it is more convenient to have an ability in the core language to add new toplevel definitions, and we’ll deal with it later. For now, let’s stick to the 7 primitives.
In order to run EVAL inside EVAL, we need to prepare the definitions
in eval.mx as an environment alist passed to outer EVAL.
Run the following command in the toplevel source directory:
$ gosh tools/mexpr-env.scm mx/eval.mx
It reads eval.mx and prints the definitions in an alist. Copy the output,
then start gosh again, read axioms and load eval.mx, and evaluate
the EVAL expression, passing the copied alist as the environment
(don’t forget the quote before the alist!):
gosh> ,u LISP1.5.axioms
gosh> ,l mx/eval.mx
#t
gosh> (EVAL '(EVAL (QUOTE (CAR (QUOTE (X . Y)))) (QUOTE NIL))
'...<<here, copy & paste the output of mexpr-env.scm>>)
X
The result X is the result of (CAR (QUOTE (X . Y))), computed
by the EVAL function implemented in LISP1.5, not the underlying Gauche.
If cut&pasting the environment alist is too tedious, mexpr-env.scm can
create a definition of an auxiliary function EVAL*, which calls EVAL
with the environment that has all the definitions in the given source file.
Run mexpr-env.scm with -e option, and save the result in lisp/eval.lisp:
$ gosh tools/mexpr-env.scm -e mx/eval.mx > lisp/eval.lisp
|
💡
|
Instead of manually executing |
We use suffix lisp to indicate it is not a Scheme code (even though
Gauche can understand it after using LISP1.5.axioms).
The created lisp/eval.lisp looks as follows:
($TOPLEVELS ($= (EVAL* X) (EVAL X '...<<environment defined in eval.mx>>... )))
That is, it defines EVAL* which takes one LISP1.5 expression and
evaluates it under the enviornment where all the definitions in eval.mx
is visible.
The created eval.lisp can be loaded to gosh after using LISP1.5.axioms.
Together with mx/eval.mx, you can run EVAL on top of EVAL:
$ gosh -I. -uLISP1.5.axioms -lmx/eval.mx -leval-star.lisp gosh> (EVAL* '#,(m-expr"eval[(CONS (QUOTE X) (QUOTE Y));NIL]")) (X . Y)
This time we used M-expression in the inner call. It’s the same
as writing '(EVAL (QUOTE (CONS (QUOTE X) (QUOTE Y))) (QUOTE NIL)).
Let’s recap what’s happening. The outer EVAL (via EVAL*) is
executed by Gauche, using the initially loaded eval.mx. The
inner EVAL is interpreted by the outer EVAL, using the
enviornment created by mexpr-env.scm.
And the expression (CONS (QUOTE X) (QUOTE Y)) is interpreted by
the inner EVAL:
+----------------------------+
| (CONS (QUOTE X) (QUOTE Y)) |
+----------------------------+
| EVAL | ; inner EVAL
+----------------------------+
| EVAL | ; outer EVAL
+----------------------------+
| Gauche |
+----------------------------+
If it is not obvious, try it with an altered environment.
For example, edit the eval.lisp created above
to change the inner EVAL recognizes KWOTE instead of QUOTE.
There’s only one place to change:
(EVAL
LAMBDA
(E A)
(COND
((ATOM E) (CDR (ASSOC E A)))
((ATOM (CAR E))
(COND ((EQ (CAR E) (QUOTE KWOTE)) (CADR E))
^^^^^
((EQ (CAR E) (QUOTE COND)) (EVCON (CDR E) A))
((QUOTE T) (APPLY (CAR E) (EVLIS (CDR E) A) A))))
((QUOTE T) (APPLY (CAR E) (EVLIS (CDR E) A) A))))
(Leave other QUOTE intact, for they are recognized by the outer EVAL).
Now, try it:
(EVAL* '(EVAL (QUOTE (CONS (KWOTE X) (KWOTE Y))) (QUOTE NIL))) => (X . Y)
The two QUOTE s are recognized by the outer EVAL, and the two
KWOTE s are recognized by the inner EVAL. Furthermore,
the ' (quote) is recognized by Gauche.
(If you know what we’ll talk about from the section title, you can skip this section. Yes, it’s just about that.)
One advantage of having a simple language with a concise interpreter is that we can tweak it easily.
In the universal EVAL, a function is represented as a literal list
whose car is LAMBDA. It is a powerful idea—now you can have
a function as a first-class citizen of the language, that you can
construct it, pass it to another function, and return it from another
funciton. However, it has a flaw.
Let’s try a failure case and see if we can fix it.
Consider MAPCAR function, which takes a function and a list, and
returns a list of results of the function applied to each element of the
given list (that is, Scheme’s map function):
mapcar[fn;x] = [null[x] -> NIL;
T -> cons[fn[car[x]];mapcar[fn;cdr[x]]]]
It is in mx/mapcar.mx. You can’t load it directly
into Gauche, however. Treating a list starting with LAMBDA as
a function is a feature of EVAL, not Gauche.
We have to make EVAL understand the above definition.
We can use the same technique we used in the metacircular interpreter — that is, translate the definition of MAPCAR above into an enviroment
alist. We also need the definition of NULL, so let’s combine
eval.mx together with mapcar.mx. It can be done with the following
command line:
$ gosh tools/mexpr-env.scm -e mx/eval.mx mx/mapcar.mx > lisp/mapcar.lisp
Alternatively, run ./configure then make in the toplevel source directory.
Once you have lisp/mapcar.lisp, you can load it (after mx/eval.mx)
and you can call MAPCAR inside EVAL*:
$ gosh -I. -uLISP1.5.axioms gosh> ,l mx/eval.mx #t gosh> ,l lisp/mapcar.lisp #t gosh> (EVAL* '(MAPCAR (QUOTE (LAMBDA (X) (CONS X (QUOTE Y)))) (QUOTE (A B C)))) ((A . Y) (B . Y) (C . Y)) gosh> (EVAL* '#,(m-expr "mapcar[(LAMBDA (X) (CONS X (QUOTE Y)));(A B C)]")) ((A . Y) (B . Y) (C . Y))
So far, so good.
Now, Let’s try nesting MAPCAR. We’ll do equivalent to the following
Scheme code:
(map (lambda (x) (map (lambda (y) (cons x y)) '(p q r))) '(a b c)) => (((a . p) (a . q) (a . r)) ((b . p) (b . q) (b . r)) ((c . p) (c . q) (c . r)))
Here’s LISP1.5 version:
(EVAL* '(MAPCAR (QUOTE (LAMBDA (X)
(MAPCAR (QUOTE (LAMBDA (Y) (CONS X Y)))
(QUOTE (P Q R)))))
(QUOTE (A B C))))
=> ((((P Q R) . P) ((Q R) . Q) ((R) . R)) (((P Q R) . P) ((Q R) . Q) ((R) . R)) (((P Q R) . P) ((Q R) . Q) ((R) . R)))
Oops, what happened? Let’s examine the details.
Outer MAPCAR receives two actual parameters, (LAMBDA (X) …) and (A B C)
(QUOTE s are stripped when arguments are evaluated
by evlis before calling the function). They are bound to the
local parameters, FN and X, respectively. In other words,
the body of MAPCAR:
[null[x] -> NIL; T -> cons[fn[car[x]];mapcar[fn;cdr[x]]]]
is evaluated with the following environment:
((FN . (LAMBDA (X)
(MAPCAR (QUOTE (LAMBDA (Y) (CONS X Y)))
(QUOTE (P Q R)))))
(X . (A B C)))
Since X is not NIL, evaluation goes to cons[…] branch.
The first argument is fn[car[x]], so first car[x] is evaluated
and yields A, fn evaluated to the outer LAMBDA form
and we call it with A. The body of inner LAMBDA form, which
is the inner MAPCAR call, is evaluated with the following environment
(Keep in mind that the new local bindings are inserted in front of
outer environment):
((X . A)
(FN . (LAMBDA (X)
(MAPCAR (QUOTE (LAMBDA (Y) (CONS X Y)))
(QUOTE (P Q R)))))
(X . (A B C)))
Inner MAPCAR gets (LAMBDA (Y) (CONS X Y)) and (P Q R) as two
actual parameters, which are bound to MAPCAR 's formal paramter
FN and X again, and the environment under which innter MAPCAR 's
body is evaluated looks like this:
((FN . (LAMBDA (Y) (CONS X Y)))
(X . (P Q R))
(X . A)
(FN . (LAMBDA (X)
(MAPCAR (QUOTE (LAMBDA (Y) (CONS X Y)))
(QUOTE (P Q R)))))
(X . (A B C)))
Finally, innter LAMBDA is called — first, P as the
actual parameter, which is bound to Y. Hence the body
of the inner LAMBDA, which is (CONS X Y), is evaluated
under the following environment:
((Y . P)
(FN . (LAMBDA (Y) (CONS X Y)))
(X . (P Q R)) (1)
(X . A) (2)
(FN . (LAMBDA (X)
(MAPCAR (QUOTE (LAMBDA (Y) (CONS X Y)))
(QUOTE (P Q R)))))
(X . (A B C))) (3)
-
Argument for the inner
MAPCAR -
Argument for the outer
LAMBDA -
Argument for the outer
MAPCAR
Now it is clear why it didn’t work. When we write the
initial nested MAPCAR form, we expect that X in the
innermost expression (CONS X Y) refer to the formal parameter of the
outer LAMBDA. But it is shadowed by the formal parameter of the
MAPCAR.
This is a well-known problem, and in lambda calculus it is avoided
by renaming the parameter names to avoid conflict. In our case,
if we rename the formal parameter of inner LAMBDA to something
different from the formal parameter of MAPCAR, it works as expected:
(EVAL* '(MAPCAR (QUOTE (LAMBDA (Z) (1)
(MAPCAR (QUOTE (LAMBDA (Y) (CONS Z Y)))
(QUOTE (P Q R)))))
(QUOTE (A B C))))
=> (((A . P) (A . Q) (A . R)) ((B . P) (B . Q) (B . R)) ((C . P) (C . Q) (C . R)))
-
We use
Zto avoid confclit withMAPCAR'sX.
However, we can’t possibly avoid all potential conflict manually, and renaming all formal parameters programatically to unique ones can be costly.
LISP1.5 employed another way to solve this problem. Instead of passing
LAMBDA form quoted, it introduced another form, called FUNCTION.
The rule is that whenever you pass a function as an argument,
you wrap it with FUNCTION instead of QUOTE. With this rule,
our call of nested MAPCAR would look like this:
(EVAL* '(MAPCAR (FUNCTION (LAMBDA (X)
(MAPCAR (FUNCTION (LAMBDA (Y) (CONS X Y)))
(QUOTE (P Q R)))))
(QUOTE (A B C))))
Now we modify our universal LISP function to deal with FUNCTION.
We only need to change two lines. First, make EVAL understand
(FUNCTION <fn>) form. Whenver it sees the form, it just
returns a list (FUNARG <fn> <env>), where <env> is the evaluation
enviornment:
eval[e;a] =
[atom[e] -> cdr[assoc[e;a]];
atom[car[e]] -> [eq[car[e];QUOTE] -> cadr[e];
eq[car[e];FUNCTION] -> cons[FUNARG;cons[cadr[e];cons[a;NIL]]]; (1)
eq[car[e];COND] -> evcon[cdr[e];a];
T -> apply[car[e];evlis[cdr[e];a];a]];
T -> apply[car[e];evlis[cdr[e];a];a]]
-
If we see
(FUNCTION <fn>)form, wrap the function and the current environment inFUNARGform, as(FUNARG <fn> <env>).
Then, in APPLY, we call <fn> with the rememberd <env> instead of
the passed environment:
apply[fn;x;a] =
[atom[fn] -> [eq[fn;CAR] -> caar[x];
eq[fn;CDR] -> cdar[x];
eq[fn;CONS] -> cons[car[x];cadr[x]];
eq[fn;ATOM] -> atom[car[x]];
eq[fn;EQ] -> eq[car[x];cadr[x]];
T -> apply[eval[fn;a];x;a]];
eq[car[fn];FUNARG] -> apply[cadr[fn];x;caddr[fn]]; (1)
eq[car[fn];LAMBDA] -> eval[caddr[fn];pairlis[cadr[fn];x;a]];
eq[car[fn];LABEL] -> apply[caddr[fn];x;cons[cons[cadr[fn];caddr[fn]];a]]]
-
Apply the wrapped function in the rememberd environment
The changed definitions are in mx/funarg.mx. You can load it and see it addresses the issue (which has been called FUNARG problem).
$ gosh -I. -u LISP1.5.axioms -l mx/funarg.mx
gosh> ,l lisp/mapcar.lisp
#t
gosh> (EVAL* '(MAPCAR (FUNCTION (LAMBDA (X)
(MAPCAR (FUNCTION (LAMBDA (Y) (CONS X Y)))
(QUOTE (P Q R)))))
(QUOTE (A B C))))
(((A . P) (A . Q) (A . R)) ((B . P) (B . Q) (B . R)) ((C . P) (C . Q) (C . R)))
|
ℹ️
|
Did you notice that you actually did’t need eval[e;a] =
[atom[e] -> cdr[assoc[e;a]];
atom[car[e]] -> [eq[car[e];QUOTE] -> cadr[e];
eq[car[e];LAMBDA] -> cons[FUNARG;cons[e;cons[a;NIL]]];
eq[car[e];COND] -> evcon[cdr[e];a];
T -> apply[car[e];evlis[cdr[e];a];a]];
T -> apply[car[e];evlis[cdr[e];a];a]]
The updated definition is in mx/funarg-lambda.mx. Using it,
calling $ gosh -I. -u LISP1.5.axioms -l mx/funarg-lambda.mx
gosh> ,l lisp/mapcar.lisp
#t
gosh> (EVAL* '(MAPCAR (LAMBDA (X)
(MAPCAR (LAMBDA (Y) (CONS X Y))
(QUOTE (P Q R))))
(QUOTE (A B C))))
(((A . P) (A . Q) (A . R)) ((B . P) (B . Q) (B . R)) ((C . P) (C . Q) (C . R)))
This idea was realized by Sussman and Steele in 1975, as a dialect Scheme. The first paper of Scheme stated it at the beginning: SCHEME is essentially a full-funarg LISP. LAMBDA expressions need not be QUOTEd, FUNCTIONed, or *FUNCTIONed when passed as arguments or returned as values; they will evaluate to closures themselves. |
So far, our EVAL requires any bindings to be provided
via the environment argument. Preprocessing the source with mexpr-env.scm
was a remedy, but it’s still troublesome. So our next step is to
add a toplevel environment, that keeps global bindings of symbols.
The easiest way is to keep a global table, and when we search
a variable binding via ASSOC (in the first branch of EVAL),
we also look up the table when we didn’t find any local bindings.
However, LISP1.5 took a bit different approach. Since its symbol had a property list, or plist, which could hold arbitrary key-value pairs, so I suspect it was natural to store the global value of the symbol in its plist. In fact, even the name of a symbol was merely one of its properties. In LISP1.5, a symbol was just another type of list where the car of its head was marked with a special value (-1).
|
ℹ️
|
A property list (plist) associates keys to values, much like
an associative list (alist),
but its structure alternates keys and values. For example, if
key ;; alist ((A . APPLE) (B . BANANA)) ;; plist (A APPLE B BANANA) The number of cons cells used are the same. We’re not sure why LISP1.5
creators used plist for symbol properties, while they used
alist for environment in |
In our minimal infrastructure (LISP1/5/axioms.scm) we just used Gauche symbols for LISP symbols. It might be interesting, though, to reproduce what LISP1.5 did — using a list to implement symbols!
That is, from now on, our LISP symbol is a pair whose car is
a special marker. We use Gauche symbol ATOM. From LISP world,
a LISP symbol is an unbreakable unit (hence it is called atom), so
the marker is never be visible. Under the hood, in Gauche level,
we can break an atom to access its internal structure. It is as
if LISP world deals with chemical reactions and Gauche world deals
with nuclear reactions.
In LISP symbols, its name is stored as a value of the property
PNAME. Since the property list is scanned by LISP function,
we have to use LISP symbols as the property key. For the name itself,
we use a Scheme string; in real LISP1.5, the name is stored
in a special way and treated specially (there wasn’t a string type).
Thus, LISP symbol PNAME has the following structure in Gauche:
(define *PNAME* '#0=(ATOM #0# "PNAME"))The #0= notation is a Scheme way to write a circular structure.
The symbol PNAME has a propoerty list, in which the key PNAME
is associated to the name "PNAME". Note that they LISP symbol
PNAME itself doesn’t have a global value.
The global value of symbols is stored as a propery value with
the key APVAL. So we need the LISP symbol APVAL, which looks
like the following in Gauche. APVAL itself doesn’t have a global
value either:
(define *APVAL* `(ATOM ,*PNAME* "APVAL"))Once we have PNAME and APVAL, we can define NIL, whose name
is "NIL" and value is itself. We can’t use #0= notation this time,
since we have to construct the list using values of *PNAME\* etc.
(define *NIL* (rlet1 nil (list 'ATOM *PNAME* "NIL" *APVAL*)
(set! (cddddr nil) (list nil))))Here’s how *NIL* looks like in Gauche world.
1=(ATOM #1 "PNAME") is LISP symbol PNAME, and
(ATOM 1 "APVAL") is LISP symbol APVAL. Remember we’re looking
at the internal of atoms — from LISP world, this is just a symbol
NIL.
gosh> *NIL* #0=(ATOM #1=(ATOM #1# "PNAME") "NIL" (ATOM #1# "APVAL") #0#)
We can define several symbols in this way. See LISP1/5/runtime.scm for all the predefined symbols.
Let’s start building infrastructure. Our LISP world only have symbols
and cons cells so far (we’ll add numbers later). We can define $atom?
and $cons? as follows (The $ indicates it deals with LISP objects):
(define ($atom? obj) (and (pair? obj) (eq? (car obj) 'ATOM)))
(define ($cons? obj) (and (pair? obj) (not (eq? (car obj) 'ATOM))))Then we can define $lisp->scheme, which converts LISP data structure
into Scheme data structure, handy for debugging.
We map NIL inside the structure into Scheme empty list, so that
list structure can be printed naturally (instead of having . NIL)
at the end.) We also convert non-LISP object into a string #[…].
(define ($lisp->scheme obj)
(define (rec obj)
(cond [(eq? obj *NIL*) '()]
[($atom? obj) (string->symbol (cadr (member *PNAME* (cdr obj))))]
[(pair? obj) (cons (rec (car obj)) (rec (cdr obj)))]
[else (format "#[~s]" obj)]))
(if (eq? obj *NIL*)
'NIL
(rec obj)))It’s also handy to have $scheme->lisp, which converts Scheme
structure into LISP structure. One important point: We want to keep
symbol’s eq -ness, that is, LISP symbols with the same name
can be compared with eq. So we keep a hashtable to map Scheme
symbol to LISP symbols.
(define *obtable* (hash-table-r7 eq-comparator
'NIL *NIL*
'PNAME *PNAME*
'APVAL *APVAL*))
(define ($scheme->lisp obj)
(cond [(null? obj) *NIL*]
[(symbol? obj) (or (hash-table-get *obtable* obj #f)
(rlet1 s (list 'ATOM *PNAME* (symbol->string obj))
(hash-table-put! *obtable* obj s)))]
[(pair? obj) (cons ($scheme->lisp (car obj))
($scheme->lisp (cdr obj)))]
[else (errorf "Cannot convert ~s to LISP" obj)]))Let’s try them. Converting Scheme (A B C D E) into LISP results
somewhat scary structure, but converting it back shows it’s nothing
to be afraid of:
gosh> ($scheme->lisp '(A B C D E)) ((ATOM #0=(ATOM #0# "PNAME") "A") (ATOM #0# "B") (ATOM #0# "C") (ATOM #0# "D") (ATOM #0# "E") . #1=(ATOM #0# "NIL" (ATOM #0# "APVAL") #1#)) gosh> ($lisp->scheme *1) (A B C D E)
Not all global values are stored in APVAL property. LISP1.5 uses
several different keys, depending on the type of the value. APVAL
is used when a symbol is used as a variable, and other keys are
used when a symbol is used in the function position of the function call.
| Key | Value |
|---|---|
|
The value is a LISP object. |
|
The value is a LISP-defined function (LAMBDA or FUNARG form). The arguments are evaluated before passed to it. |
|
The value is a LISP-defined function (LAMBDA or FUNARG form). The arguments are not evaluated, and passed as a single list. |
|
The value is a native function (written in assembly in the acutal LISP1.5, written in Gauche in our case). The arguments are evaluated before passed it. |
|
The value is a native function (written in assembly in the acutal LISP1.5, written in Gauche in our case). The arguments are not evaluated, and passed as a single list. |
It is worth to mention that EXPR form receives fixed-number of arguments. If you want to write a function in LISP that takes variable number of arguments, you have to make it FEXPR, and evaluate the given list of arguments by yourself.
|
ℹ️
|
Lisp dialects can be categorized to either Lisp-1 or Lisp-2. They are not versions, but about namespaces. Lisp-1 unifies function and variable namespaces, so in the function call syntax, the function name is looked up the same way as variable look-up. Scheme is Lisp-1. Lisp-2 have separate namespaces for functions and variables.
You can use the argument named This design of having different keys for function call and
variable makes LISP1.5 a Lisp-2. However, interestingly,
to call a function stored in a variable you can place the variable
in the function position, without |
The EVAL procedure that uses symbol’s property lists are
shown in Appending B of "LISP1.5 Programmer’s Manual". However,
it contains some pseudo code which were actually implemented
in the assembly. Although we can rewrite the code in pure
LISP1.5, it would be pretty verbose; instead, we enhance our M-expressions
a bit so that the pseudo code in Appendix B can be written naturally
in our implementation.
Specifically, we allow this clause in the cond form:
test => proc
The test expression is evaluated, and if it yields neither NIL nor F,
the procedure proc is called with the result of test as the sole
argument. It is the same as Scheme’s cond. You can also
use Unicode character ⇒ (U+21d2) in place of ⇒.
We also allow λ (U+03bb) in place of lambda, for conciseness
and similarity to the listing in "LISP1.5 Programmer’s Manual".
Here’s a Scheme’s filter-map written in our M-expression:
filtermap[pred;lis] =
[null[lis] → NIL;
pred[car[lis]] ⇒ λ[[x];cons[x;filtermap[pred;cdr[lis]]]];
T → filtermap[pred;cdr[lis]]]
It works as follows:
filtermap[atom;(A (B) C (D) E)] ⇒ (A C E)
As we did in our first version with axioms.scm and eval.mx, we want to keep Scheme code minimal and write the rest of the system in LISP itself. We also want to write so-called standard libraries in LISP, too.
When you write language X in the language X itself, you have to be epecially careful which world you’re dealing with. Before proceeding, let’s recap the layered structure we saw in the previous sections.
-
In
axioms.scm, we defined minimal operators in Scheme to run LISP 1.5. It is the bottom world, or the Basement. We can see all the mechanics that runs the LISP system from the Basenment. -
Then we loaded
eval.mx, which is written in LISP 1.5 itself. At this time though, the functions ineval.mx, such asNULL,ASSOCorEVAL, are actually Gauche variables, bound to Gauche procedures; TheDEFINEmacro inaxioms.scmtranslates LISP 1.5 definitions into Gauche definitons. The functions ineval.mxdoesn’t know about Gauche, even though they themselves are running as Gauche procedures. We’re in the Ground Floor. -
Then we processed
eval.mxwithmexpr-env.scmto produceeval.lisp. It hasEVAL*, which is still Ground Floor function. It takes a LISP1.5 expression and evaluates it. The expression passed toEVAL*lives in the First Floor, above the Ground Floor. As we’ve seen, the habitants in the First Floor knows nothing about the Ground Floor or the Basement, except the bindings passed as the environment.
Now, in our revised runtime, difference between the Basement and the Ground Floor becomes wider: A LISP symbol is an unbreakable atom in the Ground Floor, but it’s just a pair in the Basement.
We do need a few conduits between the floors, so that the upper floor can access the functionality of lower floor:
-
SUBRandFSUBRare functions implemented in the basement. In the original LISP1.5, they were implemented in assembly language. In our case, they are written in Gauche. In order to invoke those functions from the ground floor, we need an additional primitive,CALLSUBR, which takes the instance ofSUBRorFSUBRand arguments, to call it. -
Atoms in LISP world now has structure---somewhat like that an atom in original sense was the minimal unbreakable building block of the universe, then mankind found it has electrons and nucleus in it. We do want to treat LISP atoms as unbreakable entity for most of the time, except when we want to access its property list.
-
The previous version of
EVALdoesn’t have error handling mechanism. For usability, we need some minimal mechanism to signal an error. Theoretically, a sophisticated error handling mechanism can be implemented fully in LISP layer---e.g. we can define a special "error" value, and whenever something yields an error value, we let all the expressions that got the error value just returns it, so that the error value propagates to the final result. However, it is convenient to stop and examine the evaluator at the moment when error occurs, and such a mechanism needs access to the basement. For now, we provideERRORprocedure as another primitive. -
The previous version of
EVALdidn’t have literal fuction (lambda form) inEVALcode itself---the lambda form is dealt byEVAL, but the Basement that executesEVALdoesn’t need to know that. Now that, for our convenience, we use several lambda forms in the definition ofEVAL, so the Basement needs to deal with them, too.
The revised runtime Basement routines---replacing axioms.scm is
in LISP1/5/runtime.scm:
(define (CAR x) (if (null? (car x)) *NIL* (car x)))
(define (CDR x) (if (null? (cdr x)) *NIL* (cdr x)))
(define (CONS x y) (cons x (if (eq? y *NIL*) '() y)))
(define (ATOM x) (if ($atom? x) *T* *F*))
(define (EQ x y) (if (eq? x y) *T* *F*))
(define (CALLSUBR subr args) (apply subr args))
(define (ERROR obj) (error "Meta*LISP Error:" ($lisp->scheme obj)))
(define T *T*)
(define F *NIL*)
(define NIL *NIL*)
(define-syntax LAMBDA lambda)
(define-syntax QUOTE
(syntax-rules ()
[(_ x) ($scheme->lisp 'x)]))
(define-syntax COND
(syntax-rules (=>)
[(_) *NIL*]
[(_ (test expr) . more)
(let ([t test])
(if (or (eq? t *NIL*) (eq? t *F*))
(COND . more)
expr))]
[(_ (test => proc) . more) ; extension
(let ([t test])
(if (or (eq? t *NIL*) (eq? t *F*))
(COND . more)
(proc t)))]))
(define-syntax $TOPLEVELS
(syntax-rules ($=)
[(_ ($= (name args ...) expr) ...)
(begin (define name
(let ([lsym ($scheme->lisp 'name)]
[lfn ($scheme->lisp '(LAMBDA (args ...) expr))])
(set! (cdr lsym) `(,($scheme->lisp 'EXPR) ,lfn ,@(cdr lsym)))
(lambda (args ...) expr)))
...)]))
When we read the definition of new EVAL in Gauche, the literals
are passed as Gauche’s liteals. We treat it as LISP1.5 literals,
hence our QUOTE form in the basement translates Gauche literals
to LISP1.5 literals by $scheme→lisp.
The COND is also enhanced to handle our extension.
The $TOPLEVELS now not only defines Gauche procedures,
but also registers the defined form to the LISP1.5 symbol’s EXPR
property. That is, if we load this M-expression on top of the
Basement:
caar[x] = car[car[x]]
It defines (1) a Gauche procedure CAAR, and (2) it registers EXPR
property of LISP1.5 symbol CAAR with S-expression
(LAMBDA (X) (CAR (CAR X))).
The runtime.scm also defines SUBR property of
several symbols that are used as primitives:
(define-syntax defglobal
(syntax-rules ()
[(_ var key val)
(let1 lsym ($scheme->lisp 'var)
(set! (cdr lsym) `(,($scheme->lisp key) ,val ,@(cdr lsym))))]))
(defglobal CAR 'SUBR CAR)
(defglobal CDR 'SUBR CDR)
(defglobal CONS 'SUBR CONS)
(defglobal ATOM 'SUBR ATOM)
(defglobal EQ 'SUBR EQ)
(defglobal ERROR 'SUBR ERROR)
(defglobal CALLSUBR 'SUBR CALLSUBR)
Now we write EVAL that understands the global environment.
First we need a couple of auxiliary procedures.
Previously, we used assoc to search local bindings in the
local environment. We didn’t consider an error there, so if you
use undefined variable it yielded Gauche error. The following sassoc
takes an extra thunk (a procedure with no arguments) and invokes it
when the item x isn’t found in an associative list a:
sassoc[x;a;thunk] = [null[a] → thunk[]; equal[caar[a];x] → car[a]; T → sassoc[x;cdr[a];thunk]]
Another function is to get the property value with the key y in
the symbol x. As we explained above, we access symbol’s property list
just using car and cdr:
get[x;y] = [null[x] → NIL; eq[car[x];y] → cadr[x]; T → get[cdr[x];y]]
Note: Since a property list alternates keys and values, it must loop
with cddr---skipping a value. However, LISP1.5 Programmer’s Manual
lists the above code, just looping with cdr. I don’t know if it
was just an overlook or they just reused exisitng get procedure.
One consequence of that choice is that we can’t simply store the
symbol’s global value as APVAL’s value, since if that value happens
to be a symbol such as `EXPR, it’ll confuse get procedure.
So the `APVAL’s value is wrapped with a list.
Now, we can write revised APPLY and EVAL that maintaines
the global environment in symbols' propetry lists:
apply[fn;args;a] =
[null[fn] → NIL;
atom[fn] → [get[fn;EXPR] ⇒ λ[[e];apply[e;args;a]];
get[fn;SUBR] ⇒ λ[[s];callsubr[s;args]];
T → apply[cdr[sassoc[fn;a;λ[[];error[A2]]]];args;a]];
eq[car[fn];LABEL] → apply[caddr[fn];args;cons[cons[cadr[fn];caddr[fn]];a]];
eq[car[fn];FUNARG] → apply[cadr[fn];args;caddr[fn]];
eq[car[fn];LAMBDA] → eval[caddr[fn];pairlis[cadr[fn];args;a]];
T → apply[eval[fn;a];args;a]]
eval[form;a] =
[null[form] → NIL;
atom[form] → [get[form;APVAL] ⇒ λ[[v];car[v]];
T → cdr[sassoc[form;a;λ[[];error[A8]]]]];
eq[car[form];QUOTE] → cadr[form];
eq[car[form];FUNCTION] → cons[FUNARG;cons[cadr[form];cons[a;NIL]]];
eq[car[form];COND] → evcon[cdr[form];a];
atom[car[form]] → [get[car[form];EXPR]
⇒ λ[[e];apply[e;evlis[cdr[form];a];a]];
get[car[form];FEXPR]
⇒ λ[[f];apply[f;cons[cdr[form];cons[a;NIL]];a]];
get[car[form];SUBR]
⇒ λ[[s];callsubr[s;evlis[cdr[form];a]]];
get[car[form];FSUBR]
⇒ λ[[f];callsubr[f;cons[cdr[form];cons[a;NIL]]]];
T → eval[cons[cdr[sassoc[car[form];a;λ[[];error[A9]]]];
cdr[form]];
a]];
T → apply[car[form];evlis[cdr[form];a];a]]
This is pretty close to what is shown in the Appendix B of "LISP1.5 Programmer’s Manual".
Note that We have a couple of calls of ERROR. The argument is an
error code.
-
error[A2]: A symbol is used as a procedure but doesn’t have a binding. -
error[A8]: A symbol is used as a variable but doesn’t have a binding. -
error[A9]: A symbol is used as a procedure or syntax but doesn’t have a binding.
Another curious point. Check the atom branch of eval:
atom[form] → [get[form;APVAL] ⇒ λ[[v];car[v]];
T → cdr[sassoc[form;a;λ[[];error[A8]]]]];
It first accesses symbol’s APVAL property, then
searches the environment. That is, symbol’s global value takes
precedence from local values.
You can run this version of EVAL by loading
mx/genv.mx into Gauche.
Note that EVAL now accepts LISP1.5 data, and returns
LISP1.5 data. You have to convert them to Gauche’s data back and forth.
Here, we just evaluate the global variable F, whose value is
NIL:
gosh> ($lisp->scheme (EVAL ($scheme->lisp 'F) ($scheme->lisp '()))) NIL
The following code defines REVERSE function locally and calls it.
Note that null and append are already stored as EXPR property
of those symbols when we loaded genv.mx, so we don’t need
to provide them in the environment:
gosh> ($lisp->scheme
(EVAL ($scheme->lisp '#,(m-expr "reverse[(A B C D E F G)]"))
($scheme->lisp
'((REVERSE . #,(m-expr "lambda[[xs];\
[null[xs] -> NIL;\
T -> append[reverse[cdr[xs]];cons[car[xs];NIL]]]]"))))))
(G F E D C B A)