Lessons learned making a Lisp in C: Part 1

This note is part of a series.

I've decided to document do this series both to document my journey and to motivate me to actaully finish this project. Plus I have been running into interesting happenings I just wanna write about.

Currently, I've just implemented a solid chunk of part 2. It's not fully complete yet, and I've got a lot of deferrables I've skipped so far, but I've only been focusing on the main path.

I would like to stop and take a breather here to retrofit error handling into the repl. As of right now, there is no error handling/reporting. The repl just crashes. I can see how useful proper error reporting can be in the future, which is why I am eager to take it up.

Enough exposition. Now, about what I've been learning.

Memory management via Arenas

I've written about discovering arenas in this note. TL;DR: Learning about data oriented design and memory allocations schemes such as using arenas has drastically changed the way I think about programs. Although reading about all that was fun, I hadn't yet experienced the glory in practice myself, until I took up this project.

Arenas are game changers. Before I learned about them, I've tried and failed to make a Lisp in C. I failed for reasons that are acutely illustrated in those examples often illustrated by people who write about arenas. For instance, one of the issues I ran into was that my tokenizer worked properly–I verified by printing the token stream–but my program segfaulted in the parsing section. When it didn't crash, the strings were all messed up anyway. For the life of me, I couldn't figure out where these bugs were coming from and it really sucked.

With the use of arenas, how long a value lives is clarified, and its very easy to reason about when a value is freed, because there is only one call to free to process.

After I read about arenas and before I implemented them, my biggest worry was that constructing an arena, passing it around, and writing a nice interface for it was gonna be cumbersome, tedious work, but I've found it to be suprisingly managable, and considering the benefits they bring, I might even say that arenas are underpriced.

A custom string type

I've found C strings to be slightly annoying to work with. Having to rely on a zero delimiter seems scary, and having to strlen every time I want to iterate over a string's characters doesn't feel right. I've read here and there that people often define their own string type with a pointer to the first char, and some "length" field, so I went and made my own too.

Similar to my worries about arenas, I expected this interface to be cumbersome to work with, but it has made my life much easier. In the process, I also solved a memory bug that was coming up in my repl where each subsequent identifier was written at the previous_str.length, which meant the 0 delimiter of the previous string was overwritten. Granted this is a classic off-by-one error made by me, but I don't have to deal with such C-isms with a custom string type. As for my worries about the interface being too difficult to deal with, after making wrappers for strncpy and such and providing variants that use arenas, the abstraction has paid for itself.

All in all

I've been having more fun than I've expected writing C. Thinking in terms of memory takes a bit of getting used to, but that is what I started all this for anyway: to get out of my comfort zone.

I am eager to continue working on the bells and whistles of my Lisp, but I know I'll have to write proper error handling beforehand. This is an area where C is known to be lacking by modern standards, but my expectations have been unmet so far, which is a good thing.