Lexing Lisp in Python

Question:

I have been interested by the Lisp language, and I decided to create my own dialect. This is going to be the most simple one to ever exist.

As you know, everything in Lisp is a list (or at least this dialect). A list consists of a command that comes at the start of it, and maybe arguments which are lists themselves. Using this information, I created the following.

class KList:
    def __init__(self, command, args=None):
        self.command = command
        self.args = args

So using this structure, (+ 1 2) should turn to KList('+', [KList('1'), KList('2')]) and welp to convert it. I need a lexer and my problem is that. How can I convert it? There are two things that are important to me.

  1. I just kind of hate downloading a quadrillion packages for a simple project. So a solution without a lexing library.
  2. Lisp is a functional language, and it might seem weird, but I use Python for functional programming, so please avoid statements and mutating variables.
Asked By: KianFakheriAghdam

||

Answers:

A simple Lisp parser can be easy to implement, you can basically write a recursive descent top-down parser.
You have individual readers, like readers for integers, strings, symbols, whatever you want:

class Reader:

    def read_integer(self, stream):
        pass

    def read_string(self, stream):
        pass

    def read_symbol(self, stream):
        pass

    def read_whitespace(self, stream):
        pass

In particular, a reader for compound forms:

    def read_open_parenthesis(self, stream):
        # read as many forms as possible until
        # you reach ")"
        pass

And you add a main reader that peeks at the stream, check which characters is being read, and calls whatever function it needs:

    def read_form(self, in):
        byte = stream.peek()
        if byte.isdigit():
            return self.read_integer(in)
        if byte == '(':
            _ = in.read_char() # read paren
            return self.read_open_parenthesis(in)
        # etc.

In fact, the Common Lisp reader does this, but in a programmable way. There is a readtable that associates characters to reader functions, and you can change this readtable at runtime. For example all digits in this table are mapped to some read-integer function, but you could hack that to have a different reader.

Note also that you are expected to intern symbols in Lisp, meaning that you create a symbol the first time you parse it, then use the same symbol object the next time you parse the same symbol, so that symbols are identical.

(eq 'a 'a)
=> T

Which is not necessarily the case for strings:

(eq "a" "a")
=> NIL

(it could be T for compiler that optimize away duplicate strings)

From a certain point of view this is a very simple approach, but of course the Common Lisp approach is a little more complex than that, there are a lot of subtelties that you can ignore at first. For a more complete explanation see 2.2 Reader Algorithm
.

Answered By: coredump
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.