Making another programming language
This is my third or fourth serious attempt at making a programming language. The reason they keep failing is that I’m making a language because I actually want to use it, not just to learn about compilers. So I came to the conclusion that I just need to make a very quick, very crappy prototype just so I can try it out as quickly as possible.
The idea was:
- Write a prototype in a GC language so I don’t need to worry about memory yet;
- only implement things as I need them;
- don’t worry about performance yet, focus on code simplicity.
I ended up implementing a tree-walk interpreter in Go[a], following the first half of Crafting Interpreters[b]. It’s slow, but fast enough. It has an auto-generated FFI. It can dump variables in the stack trace. The language is complete enough to write a text editor[c] in. I consider the prototype idea a success.
[a] Here’s the source for your perusal.
[b] Crafting Interpreters by Bob Nystrom.
[c] A mostly direct port of another text editor I had written in C.
Let’s go on a quick tour on what’s currently implemented.
Basic syntax
Syntax is bikeshedding yadda yadda you know the story. I mostly just made it be unambiguous, and expression-oriented because that’s useful sometimes. Any syntax is subject to change.
Let me give you an example just to give a basic idea:
import "os"
import "str"
/* functions are private by default */
fib(i int) int {
if i < 2 {
/* return for early returns... */
return 1
} else {
/* ...but the language is expression based */
fib(n-1) + fib(n-2)
}
}
/* pub for public declarations */
pub main() {
/* for-each loop */
for arg in os:args {
/* modules are accessed with a colon */
os:println(arg)
}
/* semicolons are inferred */
os:println("Hello, world!")
/* types are inferred on assignments; like auto in C++ */
let argsdesc = if os:args.size > 1 {
/* array/string length is accessed with .size */
str:from_int(os:args.size)
} else {
"no arguments"
}
os:println(argsdesc)
}
I’m particularly proud of the semicolon inference algorithm: if the last token of the previous line can’t end a statement, or the first token of the current line can’t start one, insert a semicolon between them.
That’s it. It’s very simple, can be done in the lexer, and—unlike Go, which has a similar rule—allows me to break operators at the start of the next line:
let an_unreasonably_long_variable_name_for_the_sake_of_example =
another_unreasonably_long_variable_name_for_sake_of_example
+ a_third_unreasonably_long_variable_name_to_complete_the_example
The type system
The type system is a procedural/functional hybrid. Variables are immutable by default (but more on that later) and values are built from basic types and combinations of them.
I currently have arrays, records and tagged unions implemented:
array []int = []int(1, 2, 3)
record RecordType(a, b, c int)
recordvalue = RecordType(1, 2, 3)
/* or */
recordvalue2 = RecordType(.a = 1, .b = 2, .c = 3)
record UnionType is
| Left(i int)
| Right(s string)
unionvalue UnionType = Left(1)
Defining a union with the “record” keyword is a bit weird and is something I’m considering changing. Also, there’s no way to make a type alias right now; you can only name records and unions.
Each member of a union is a separate type, all of which are subtypes of the union type. This means type switches can be type-safe, even without pattern-matching:
/* using UnionType from above */
let var UnionType = ...
switch var is
| Left l {
/* l is a Left. No flow-typing to see here */
os:println(str:from_int(l.i))
}
| Right r {
/* r is a Right, not a UnionType */
os:println(r.s)
}
This feature is missing from most functional language ADT implementations (for good reasons involving type inference). I took it from OO languages instead. But, because the type hierarchy is closed (like in functional languages), I can do exhaustiveness checking.
The type switch syntax is also the same as the union declarations, which is neat.
Mutable Value Semantics
This is an experimental thing, but seems to be working well so far. Tide doesn’t have first-class references or reference types. Everything is a value type. It’s probably easiest to explain with an example:
foo(arr []int) []int {
/* implicit copy here */
let mut otherarr = arr
mut otherarr[0] = 42
otherarr
}
pub main() {
let arr1 = []int(1, 2, 3)
os:println(str:join_with(", ", ...arr1))
// => a, b, c
let arr2 = foo(arr1)
os:println(str:join_with(", ", ...arr2))
// => 42, b, c
/* original array is unchanged */
os:println(str:join_with(", ", ...arr1))
// => a, b, c
}
This system gives you the guarantee that no “immutable” value can ever change, for any reason. You can’t have a “const” alias to a mutable variable. It gives you many of the same referential transparency properties that pure functional programming can give, but you can still mutate stuff when you need to.
Also, in most cases you don’t even get an implicit copy. Here’s a more representative example:
record State(a, b, c, d int)
/* foo does not change the state */
foo(s State) {
os:println(s.a)
}
/* bar changes the state */
bar(mut s Test) {
mut s.a = 42
}
pub main() {
/* s is mutable */
let mut s = State(1, 2, 3, 4)
/* s is passed as immutable; could be by copy or by reference,
foo won't change s either way
so the compiler can pick whichever is more efficient */
foo(s)
/* s is passed as mutable; must be by reference */
bar(mut s)
/* s is passed as immutable; same as above */
foo(s)
}
A mutable argument is similar to a var argument in Pascal, or an out parameter in C#, or an inout parameter in Ada, or a mutable reference &mut in Rust. It allows you to pass-by-reference without having first-class references.
In practice, most of the times you would want to pass-by-reference for performance, the compiler can do it for you. And because of the no-const-aliasing rule, a lot of that optimization becomes significantly easier to implement in the compiler (the current interpreter doesn’t actually do any optimization, it just implements the semantics).
Another nice consequence is that, in the absence of aliasing, it’s impossible to construct a cycle. This means we don’t need a tracing GC, and reference counts are very easy to optimize away. Actually, even region inference would be reasonably simple to implement. But memory management is for later, when I do the compiler; the current interpreter relies on the Go GC.
So my question now is: how much will I miss references? That requires writing more real software in this language first. For now, I’m convinced it’s a good default, but I might need to add an escape hatch eventually. Even then, all these guarantees would still apply for most types, only not for types that contained references, and that would hopefully not be most of them.
FFI
I wrote a generator to do all the boilerplate necessary to communicate between Go and Tide automatically. It’s my best decision so far, even though much of this code will be thrown away when I write the real compiler, simply because of how much work it has already saved me.
So I can write code like this:
func Str_append(src *[]byte, args ...string) {
for _, arg := range args {
*src = append(*src, arg...)
}
}
And have it automatically generate all the boilerplate to make it available as:
str:append(mut src string, args ...string)
All I have to do is put it in the std package and run go generate on it.
Debugging
A lot of hobby and research languages pay no attention to debug tooling. You are expected to use printf like a caveman or, worse, debug the interpreter itself.
Tide has stack traces with variables, which is often good enough for debugging crashes. It also allows overriding the interpreter panic function:
/* this getter/setter API is not final */
let fatal = debug:get_on_error()
debug:on_error(fn(stack debug:Stack, err debug:Dynamic) {
term:close(mut s)
fatal(stack, err)
})
Which allows it to clean up the screen before printing the stack trace so I can actually read it.
The introspection system is functional enough that you can do stack traces in Tide itself:
let stack = debug:stack()
while true {
switch s is
| debug:StackEnd { break }
| debug:StackFrame f {
mut s = f.next
os:println(f.filename, ":",
str:from_int(f.line), ":",
str:from_int(f.col), ": at ", f.name)
let mut scope = f.scope
while true {
switch scope is
| debug:ScopeEnd { break }
| debug:Var v {
mut scope = v.next
os:println("\t", v.filename, ":",
str:from_int(v.line), ":",
str:from_int(v.col), ": ",
v.name, " = ", debug:str_from_dynamic(v.val))
}
}
}
}
The above code sample also shows how badly this language needs iterators. Linked lists are nasty right now. Each loop iteration also currently causes a full deep copy of the list, which is very bad.
You may also have noticed debug:str_from_dynamic(). Right now, there’s no real introspection system, so the only thing you can do to inspect a variable is to convert it into a string. This also needs to be fixed eventually, though it will take more work.
Other assorted features
Features are implemented on an “as-needed” basis, so I ended up with a bunch of missing features. Even then there are some niceties:
- For-each loops, currently iterating arrays only;
- varargs, implemented as sugar on top of arrays;
- designated initializers, as in C;
- anonymous functions.
Plans?
So, the primary goal with this language is to make something I’d enjoy using. This also includes being a language I’d actually be able to use. Therefore:
- It must (eventually) run on any platform;
- it must be easy to interface with other languages;
- it must have a good toolchain and debugger;
- it must have a good standard library.
The plan is currently to finish porting my text editor to Tide, and polish the core language based on any roadblocks I find there. Then, write a formatter, LSP server, debugger and other tooling already in Tide itself, which should help polish the standard library. Finally, implement a Tide compiler in Tide, and take that opportunity to sand any rough corners.
The final language would have:
- A well-designed standard library (unlike what I have now);
- generic modules, like Oberon+ or Ada;
- some form of function overloading, likely type classes (did you see the str:from_int stuff above? horrible);
- an automatic formatter;
- a real compiler that compiles to native code with my own backend;
- a C backend that can generate header libraries you can use from C;
- a Common Lisp-style in-process debugger, using the debug API;
- Common Lisp- or Clojure-style live recompilation, using the debug API.
Will I manage? I think I can, if I keep at it for long enough. I’ve only worked on the language for about a month total, and can already use it. Once I have real programs (in particular the editor), making them faster can serve as motivation for rewriting the compiler. Or at least that’s the plan.
We’ll see how that goes.