I do most of my programming in higher level languages. For recreational projects and for day-to-day scripting needs, I use a variety of Lisps, Shell scripts, and for weekend projects that fit-the-bill, the choice is invariably Rust. For the more serious requirements, such as for jobs, I use what is mandated, or what I am most comfortable writing that kinda stuff in, which is usually Rust.
Lately however, with a little more time on my hands, and with motivation to further my knowledge about how computers work, I decided to take up low level programming. Specifically, I wanted to do more C, and at some points assembly, but that's a topic for another time.
I took a C++ course in my first year courses. We wrote a couple console programs that printed stars. That is the extent of my formal training in low level languages, and is also what I wanted to improve about myself as a programmer.
I started this journey because I yearned to be bit by memory bugs, and UB. That sounds like the type of experience one needs to ascend in their programming career. I understand, theoretically, about how memory works and about malloc
and friends, and syscalls and all that jazz. But the best learning happens when you're in the thick of it. You have to do the rigorous work. Sitting around and reading about something and convincing yourself you know all there is to know about said thing doesn't work in practical fields such as programming.
So anyway, I've been doing C for the past couple months or so. I've already had years of experience programming in a bunch of languages so I wasn't really learning much about the syntax and semantics of the C language itself but rather the nitty gritty about ways in which we interface with computers, and their operating systems. Most of my learning happened in the area of how memory works. Like I mentioned, I had some knowledge about how it worked, and had preconcieved notions about how to use malloc
and free
and such. There were even some things about the C language that I completely thought of as backwards and had averted me from considering learning it, but after having done so I have been convinced otherwise. I would like to reflect on some of these things that have happened in my journey.
Firstly, I would like to reflect on how my intuition about memory changed. With the languages I was working with so far, memory was either not managed by the programmer (in GC languages), or was managed in a different paradigm (Rust). In C, memory is presented front-and-center with all its warts and wrinkles, and with no guardrails and instructions on how to hold it besides "Here's malloc. Here's free. GodSpeed." Given this power, my initial naive intuitions about writing programs were as such:
- Program as you would in any other language.
- When writing a function that returns some data structure that is dynamically constructed and of an unbounded size
malloc
it and return a pointer to it. - Remember to free your mallocs at some point.
Ofcourse, anybody that has programmed in C for a while will be able to forsee the amount of spaghetti code this has produced. But I was convinced, and didn't consider to reflect on my intuitions for a long time assuming that this is just the way things are done in C.
After getting bit for the umpteenth time by segfaults and after having to deal with codebases that led to malloc-free
correspondences that were too complex to comprehend, making the code too scary to even poke with a stick, I had no choice but to look for better ways to deal with memory. Now, up to this point, I was dreading the fact that C doesn't have defer
semantics or RAII, or a standard library for dynamic arrays and maps or foreach
constructs, and other creature comforts. This disposition was made null after some study on how programs were structured better in C.
My first lesson–which came in the form of a video by Casey Muratori–was to stop thinking about data/memory and its management in individual elements. This was an Aha moment I was embarassed to have after so much time. It's a simple principle, don't scatter around mallocs and frees. It makes things hard. Allocate up front as much as possible. Think about memory in chunks/lifetimes/groups. As simple as a concept this is, it took me a while to incorporate it naturally into my coding. I still struggle to come up with such memory schemes when I'm programming–I often have to resist the urge to just malloc something and deal with the free later–but it should become easier and clearer with time, I suppose. And of course, sometimes it may not be possible to write code in such a manner, and the old adhoc individual malloc-free
may just be the only way to go. It's all about the details.
Of course I had more revelations throughought my journey but I will mention the more significant ones before I move on to talk about how my thinking has evolved when it comes to programming, and my opinions on C:
- Arena allocation
- kind of similar to the first point, but with more concrete examples from this article by Ryan Fleury.
- Data oriented design
- The infamous Data Oriented Design in C++ video by Mike Acton. This instilled within me guiding principles for writing performant programs that are simpler to reason about.
How have I come to see programs differently now that I've gotten my hands dirty with C?
To begin with, my understanding of variables has been challenged a little. Having dealth with my fair share of memory bugs in C where my enum of 5 variants had the value 214408237
in it, or when valgrind told me that my program had no errors or leaks but my strings were always truncated at 4 characters, I see structs not as some abstract high level containers of data but rather schemas for the underlying block of memory with size = sizeof(MyStruct)
.
The type system of C makes much more sense to me once this idea was cemented in my head, and I no longer find it super annoying when functions that operate on arrays require the size of the pointer as one of their arguments.
Additionally, having broken my preconcieved notions of malloc
usage, I see the power of the abstractions that arise. Arenas were a game changer for me, and so were pool allocators and things like free-lists. I no longer view memory allocation as a cost both on runtime and code maintainability, but rather just another one of the things one does to program a computer. Oh and also realizing that an allocated block of memory can be reused was mind blowing for me, and another embarassing oversight to have to come to terms with.
I no longer find C to be an unweildy scary old language any more. In some ways, it is a simple language. I am not brave enough to use it for serious large scale projects or at a job or something just yet since I don't see myself fit for it, but it has been enjoyable to program in. I have had my fair share of hours of debugging memory bugs even in the most trivial programs, but hey those are learning moments, and no progress could've been made without having gone through them. I expect more to come too.
Theres a lot to enjoy in the freedom that C programming allows for. Though the lack of a comprehensive standard library can be pretty annoying sometimes, I can see why that is the case. I woudn't know where one would start in building a dynamic array implementation that satisfies the average C programmers needs to the degree of freedom that is required of such a data structure in such a language. And I also understand why the defer
keyword doesn't really exist yet. I suppose it enables an RAII mode of programming. In any case, I don't think its as bad as a lot of people make it out to be.
There is a lot of learning ahead of me, and I can see it will be both a rewarding and painful journey. The amount of things I have yet to learn bring up within me feelings of curiousity and the excitement to program that I was fearing I was losing. I am having a good time.