Lexing Lisp in Python
Question:
I have been interested by the Lisp language, and I decided to create my own dialect. This is going to be the most simple one to ever exist.
As you know, everything in Lisp is a list (or at least this dialect). A list consists of a command that comes at the start of it, and maybe arguments which are lists themselves. Using this information, I created the following.
class KList:
def __init__(self, command, args=None):
self.command = command
self.args = args
So using this structure, (+ 1 2)
should turn to KList('+', [KList('1'), KList('2')])
and welp to convert it. I need a lexer and my problem is that. How can I convert it? There are two things that are important to me.
- I just kind of hate downloading a quadrillion packages for a simple project. So a solution without a lexing library.
- Lisp is a functional language, and it might seem weird, but I use Python for functional programming, so please avoid statements and mutating variables.
Answers:
A simple Lisp parser can be easy to implement, you can basically write a recursive descent top-down parser.
You have individual readers, like readers for integers, strings, symbols, whatever you want:
class Reader:
def read_integer(self, stream):
pass
def read_string(self, stream):
pass
def read_symbol(self, stream):
pass
def read_whitespace(self, stream):
pass
In particular, a reader for compound forms:
def read_open_parenthesis(self, stream):
# read as many forms as possible until
# you reach ")"
pass
And you add a main reader that peeks at the stream, check which characters is being read, and calls whatever function it needs:
def read_form(self, in):
byte = stream.peek()
if byte.isdigit():
return self.read_integer(in)
if byte == '(':
_ = in.read_char() # read paren
return self.read_open_parenthesis(in)
# etc.
In fact, the Common Lisp reader does this, but in a programmable way. There is a readtable that associates characters to reader functions, and you can change this readtable at runtime. For example all digits in this table are mapped to some read-integer
function, but you could hack that to have a different reader.
Note also that you are expected to intern symbols in Lisp, meaning that you create a symbol the first time you parse it, then use the same symbol object the next time you parse the same symbol, so that symbols are identical.
(eq 'a 'a)
=> T
Which is not necessarily the case for strings:
(eq "a" "a")
=> NIL
(it could be T for compiler that optimize away duplicate strings)
From a certain point of view this is a very simple approach, but of course the Common Lisp approach is a little more complex than that, there are a lot of subtelties that you can ignore at first. For a more complete explanation see 2.2 Reader Algorithm
.
I have been interested by the Lisp language, and I decided to create my own dialect. This is going to be the most simple one to ever exist.
As you know, everything in Lisp is a list (or at least this dialect). A list consists of a command that comes at the start of it, and maybe arguments which are lists themselves. Using this information, I created the following.
class KList:
def __init__(self, command, args=None):
self.command = command
self.args = args
So using this structure, (+ 1 2)
should turn to KList('+', [KList('1'), KList('2')])
and welp to convert it. I need a lexer and my problem is that. How can I convert it? There are two things that are important to me.
- I just kind of hate downloading a quadrillion packages for a simple project. So a solution without a lexing library.
- Lisp is a functional language, and it might seem weird, but I use Python for functional programming, so please avoid statements and mutating variables.
A simple Lisp parser can be easy to implement, you can basically write a recursive descent top-down parser.
You have individual readers, like readers for integers, strings, symbols, whatever you want:
class Reader:
def read_integer(self, stream):
pass
def read_string(self, stream):
pass
def read_symbol(self, stream):
pass
def read_whitespace(self, stream):
pass
In particular, a reader for compound forms:
def read_open_parenthesis(self, stream):
# read as many forms as possible until
# you reach ")"
pass
And you add a main reader that peeks at the stream, check which characters is being read, and calls whatever function it needs:
def read_form(self, in):
byte = stream.peek()
if byte.isdigit():
return self.read_integer(in)
if byte == '(':
_ = in.read_char() # read paren
return self.read_open_parenthesis(in)
# etc.
In fact, the Common Lisp reader does this, but in a programmable way. There is a readtable that associates characters to reader functions, and you can change this readtable at runtime. For example all digits in this table are mapped to some read-integer
function, but you could hack that to have a different reader.
Note also that you are expected to intern symbols in Lisp, meaning that you create a symbol the first time you parse it, then use the same symbol object the next time you parse the same symbol, so that symbols are identical.
(eq 'a 'a)
=> T
Which is not necessarily the case for strings:
(eq "a" "a")
=> NIL
(it could be T for compiler that optimize away duplicate strings)
From a certain point of view this is a very simple approach, but of course the Common Lisp approach is a little more complex than that, there are a lot of subtelties that you can ignore at first. For a more complete explanation see 2.2 Reader Algorithm
.