Building an Interpreter
This file contains a description of the tasks you have to perform for assignment 8.
To better view the file, make sure your editor "warps" lines. In Sublime Text this is the "Word Wrap" option in the View menu.
Alternately, you can view this file in a nice form in GitHub.
As in previous assignments, you should NOT specify the types of your functions. Let the system determine them for you.
IMPORTANT: Unlike the prior assignments, this assignment requires you to edit multiple files, contained in the interpreter folder. You will need to make additions to the following files:
types.mltypes.mliparser.mlylexer.mlltests.ml
The steps of this assignment are broken up depending on the concept that they add to the language. Each will require you to make additions to one or more of the files above.
The automatic tests are only meant to test that the desugar and interp methods are working properly. For the changes in the lexer and parser, you should interact with the ./lang interpreter directly.
It will be important in the following to clarify which exceptions you should be raised when. Anything that violates the semantics of the language should be raising an appropriate exception, typically Interp .... For instance if the description that follows says about "addition" that the two summands must be numbers, then your code will need to verify that before trying to perform the addition, and raise an appropriate exception.
You will probably want to create different milestones for each of these types of language extensions.
The first thing we will do is add booleans. We will add new values, boolean primitive, conditionals, as well as logical operators desugared into conditionals.
It is important to distinguish between talking about the core/surface language's booleans, that we are trying to define, and the implementation/OCAML language's booleans.
- Add a new
Boolvariant in thevaluetype in thetypes.mlfile and thetypes.mlifile. It should hold a OCAML boolean. Update thevalToStringfunction to handle this new kind of value. ThevalToStringfunction is not used in any of the grading tests, you can choose to "print" booleans any way you like. - Add a new
BoolCvariant in theexprCtype in thetypes.mlfile and thetypes.mlifile. It should hold a OCAML boolean. - Add a case in the
interpmatch to correctly "interpret/evaluate" aBoolCto the appropriatevalue. - Add some tests in
tests.mlfor the correct interpretation ofBoolC. - Add a new
BoolSvariant in theexprStype in thetypes.mlfile and thetypes.mlifile. It should hold a OCAML boolean. - Add a new case in the
desugarmatch to correctly desugar aBoolSexpression to the appropriateexprC. - Add tests for the appropriate desugaring.
- Add two new tokens
TRUEandFALSEat the top ofparser.mly(where theFLOATtoken is defined). They should not "contain" anything (so no need for<float>). You can put them both in one line, like so:%token TRUE FALSE.
The following add the appropriate parser/lexer instructions for processing booleans.
- Add two new cases to the
exprset of rules near the bottom ofparser.mly, one forTRUEand one forFALSE. The corresponding expressions in braces should simply create the appropriate surface language expressions (involvingBoolS). - In
lexer.mll, add the two regular expression definitionslet true = "true" | "#t"(and an analogous one for false). This would allow us to typetrueor#tand have it be recognized as a boolean in the language. - Add cases to the
rulesection inlexer.mllthat turns thetruematch to theTRUEtoken and similarly forfalse. The order is important, don't put the new cases too far down. The last two rules are catch-alls, and should remain at the end of the options.
To test this part, you should follow the compile steps in the README file. Then start the interpreter via ./lang, and you should be able to type true;; or #t;; (and similarly for false), and have the interpreter reply appropriately.
We will now add conditionals. This does not require a new value, but it does require changes to the core and surface language, and the parser and lexer.
- Add a new variant in the
exprCtype (in bothtypes.mlandtypes.mli) for anIfCthat takes 3exprCs as arguments (a triple). The first of those is meant to be the conditional test, the second is thethenbranch, the third is theelsebranch. - Add a case in
interpfor properly evaluating a conditional: The test expression should be evaluated first (recursively callinginterp), and it should be an error (raiseInterp) if the result is not aBoolvalue. If it is aBoolvalue, then depending on the value either the "then" branch or the "else" branch should be evaluated. Only one of these branches should be evaluated, and only after the conditional test has first been evaluated. It is imperative that you ensure the appropriate evaluation semantics for the conditional. Make sure to parenthesize any inner match that you use. - Add tests for the correct behavior of the conditional. For example the expression
IfC (BoolC true, NumC 2.3, NumC 4.3)should evaluate toNum 2.3. Make sure to allow the possibility where the "test" is not aBoolCdirectly, but something that evaluates to a boolean; for instance it could be anotherIfCexpression. - Add a corresponding expression
IfSat the surface language, that takes as arguments threeexprSexpressions, and it desugars into an appropriateIfCexpression. - Add a case in the
desugarmatch to handleIfSand convert it toIfC.
We will now add "or", "and" and "not" to the surface language. They will all convert to IfCs.
- Add an
OrS, aAndSand aNotSto the surface language, the first two taking two expressions as arguments, the third taking only one expression. - Add
desugarcases for each of them based on the following pseudo-codes: "not eis the same asif e then false else true", "e1 or e2is the same asif e1 then true else if e2 then true else false", and something analogous forand. Note that we did not useif e1 then true else e2for the "or" case, because we want to enforce the behavior that both expressions are booleans. The conditional checks that already for its test, but not for its branch values. - Add tests to verify the correct implementation of these constructs.
Now we will add the correct components in the parser and lexer.
- Add three new tokens in
parser.mly, calledIF,THENandELSE, all in one line. - Add
%nonassoc ELSEbelow the%nonassoc FLOATline. - Add a case in the
exprrule inparser.mlythat expects aIF expr THEN expr ELSE exprand turns it into anIfC. You will need to use$2,$4and so on to refer to the values of thoseexprs. - Add three cases in the rules in
lexer.mllfor the strings"if","then"and"else"and converting them to the corresponding tokensIF,THEN,ELSE. No need to useletassignments for these single cases.
Compiling after these should allow you to write if-then-else cases. We will now add "or" and "else".
- Add three new tokens in
parser.mly, calledOR,ANDandNOT. - We need to specify precedence rules. This would be done by adding
%left OR ANDbelow the%nonassoc ELSEline. This says that both "or" and "and" should compute in a left-first direction, and they should be treated as having equal precedence to each other and stronger precedence to theELSE. This way, an expression likeif true then false else false or truewould be interpreted asif true then false else (false or true)instead of(if true then false else false) or true. - Add in the
exprrule at the bottom ofparser.mlycases forexpr OR exprandexpr AND expr. - Add a precedence rule
%nonassoc NOTbelow the one for OR and AND. - Add an
exprrule at the bottom ofparser.mlyforNOT expr. - Add in
lexer.mllthree rules for"or","and"and"not".
We will now add arithmetic operators, followed by comparison operators and finally, equality.
- Add a new
exprCtype variantArithCthat takes a "string" for the operator symbol, and twoexprCs for the two operands. - Implement a new "helper method" called
arithEvalwith typestring -> value -> value -> valuethat takes the operator and two values. If the two values are not bothNums then it should raise an interpreter exception. If they are then it should perform the appropriate operation and return a "value" of the result. The operators we will allow will be "+", "-", "*" and "/". It should throw an interpreter error if the operator symbol is not one of these. It should also throw an interpreter error about division by zero if the operator is division and the denominator is 0. Remember that in our arithmetic world, all numbers are floating point numbers. - Add a new case in
interpto handleArithCcases by evaluating the two operands in the correct order, then calling thearithEvalhelper method. - Add tests to ensure these cases are handled properly.
- Add a new
exprStype variantArithSthat modelsArithC. - Add a new case in the desugarer to convert
ArithStoArithC. - Add tests for the desugaring.
Now we add parser/lexer operations corresponding to these.
- Add in
parser.mlyfour new tokens,PLUS,MINUS,TIMES,DIVIDE. - We next need to set up precedence rules. Place at the bottom of the current list, below the
%nonassoc NOT, a line%left PLUS MINUS, and below that line add%left TIMES DIVIDE. This will ensure that all operations are performed in a left-associative way, and that multiplication and division will have higher precedence than addition and subtraction. This will agree with normal algebra rules. - Add
exprcases at the bottom ofparser.mlyforexpr PLUS exprand similarly for the other operations. They should all convert to the appropriateArithSexpression. - Add in
lexer.mllnear the bottom rules for"+","-"and so on, to be converted to the corresponding tokensPLUS,MINUSand so on. Remember that the last two rules there abouteofandanyare catchalls and need to be the last cases.
You should now be able to compile and try various arithmetic operations. Also test something like if true then 4 else 2 + 3, and make sure that the addition binds stronger than the conditionals. Also that 2 + 3 * 4 behaves properly.
The comparison operators we will consider are ">", ">=", "<", "<=". We leave equality and non-equality as a separate case, as they make sense for more than just numbers.
- Add a new
exprCtype variantCompCthat takes a string and twoexprCs. - Create a helper method
compEvalwith typestring -> value -> value -> valuethat takes as input a string and two values. If the two values are not bothNums then it should raise an interpreter error. If they are, then it should perform the appropriate operator and return aBoolvalue. It should be an interpreter error if the operator is not one of the four mentioned above. - Add a new case in
interpto handle theCompCconstruct, by evaluating the two expressions in the correct order then passing the work over tocompEval. - Add tests for the comparison semantics.
- Add a new
exprStype variantCompSanalogous toCompC, and the corresponding clause indesugarto convert it intoCompC. - Add tests for the desugaring.
Now to adjust the parser/lexer:
- Add a new token called
COMPOPthat contains a string in it. Model it after theFLOATtoken. - Place
%nonassoc COMPOPbetween the%nonassoc NOTand%left PLUS MINUSlines. This should make arithmetic happen first, but comparison right after that, before logical operators are taken into account. So something like3 < 4 and 1 < 2would behave properly. - Add
| expr COMPOP expr { CompS ($2, $1, $3) }as a case in theexprrule inparser.mly. - In
lexer.mll, add a new bindinglet comp = ">" | ">=" | "<" | "<=", then further down a new rule| comp as s { COMPOP s }. This recognizes the above four operators and stores the match as part of theCOMPOPtoken.
After you recompile everything, you should be ready to test the interpreter directly.
Equality is surprisingly a more difficult concept than most. Programmers expect to be able to compare for equality any two expressions. This means that you must handle all value possibilities, both Num and Bool as well as other values we add in the future. Every time we add a new kind of value (pairs, lists etc) we will need to update our equality function to account for them. We will use == to denote equality.
Testing for non-equality, often denoted by !=, will be implemented simply as desugaring via NotS and equality. So in the core language we will need to only worry about equality.
- Add a new
exprCtype variantEqCthat takes as input two expressions. - Write a new helper function
eqEvalof typevalue -> value -> valuethat takes as input two values and returns if those values are equal as aBoolvalue. It should do so as follows: If they are bothNums, it should compare the corresponding floats stored there. If they are bothBools it should compare the corresponding booleans stored there. Otherwise it should returnBool false. You will need to update this function in the future when we create new kinds of values. - Add a new case in
interpto handle theEqCby evaluating each operand in turn then callingeqEvalto compare them. - Add two
exprCvariants: aEqSto mirrorEqCand aNeqSand adjustdesugaraccordingly. ForNeqSyou should convert it into a combination ofNotSandEqS, then desugar that.
Parser/Lexer changes:
- Add two new tokens
EQandNEQtoparser.mly. - For precedence, add
%nonassoc EQ NEQbetween%left OR ANDand%nonassoc NOT. Make sure you understand the consequences of this choice. - Add two more rules for
exprinparser.mlyto handleexpr EQ exprandexpr NEQ expr. - Add to the rules at the bottom of
lexer.mllcases for turning"=="toEQand"!="toNEQ.
You should be ready to use equality tests now.