Using Rust's 'Result' Approach For Neopolitan's Parser
It Came To Me In A Dream
Like, literally.
I was drifting in and out of a nap after a few hours working on Neopolitan's parserneo. The ASTast was floating around my headdreams. The way I'm using an ok/error container to wrap results kept bubbling to the surface.
The structure looks like this when the parser completes a journey through a valid file:
If something goes wrong, the top level ok turns into error.
This is straight out of Rust's Resultresult playbook. It forces you to check for errors before you can use the data. I love how explicit it is.
Catching Round One
My parsernom sends errors when it hits files it can't parse. For example, two dashes on their own line are invalid.
-- title This text is fine. It's part of a valid ``-- title`` block. But, these next two dashes sitting on their own line aren't allowed. -- The parser chokes on those. It sends an error message starting with them as the ``failed_at`` content.
Those dashes are invalid because they don't have any text behind them that identifies what type of block they're starting. The parser ejects as soon as it hits the issue and throws an error back to the AST.
That's great. But, it's not everything.
On The Run
Basic Neopolitan blocks of content looks like this:
-- title
What's A Parser To Do
You ever dream about error messages?
Yeah, me too...It starts with two dashes followed by the kind of block the content contains (-- title in this case) kinds.
Blocks can also nest each other. The process uses opening and closing lines with / characters.
-- title/
-- /titleOther blocks can slot bewtween them.
-- title/
-- div
Splitting title content into
multiple divs feels weird.
-- div
But, it's totally valid. And,
it keeps the examples consistent.
-- /titleThis opens the possiblity of a new type of error: Opening a block but never closing it.
-- title/
-- div
It doesn't matter what goes
here because there's no
closing ``-- /title``.
The parsing will fail.Failure To Track
The parser eats the rest of the file looking for the closing -- /title line. It pukes when it hits the end of the file wihtout finding one.
Unfortunately, the error information that's available when this happens doesn't include where things went off the rails.
That means I can't provide a message that points you to what needs to be fixed. You're left to poke around the file on your own trying to figure out what happened.
It sucks.
Enhanced Vision
That brings us back to the dream.
The ok/error from the top of the AST floated down the tree, bounced around a bit, then melted into the blocks.
Queue realization that I can use the technique from the top of the file for the blocks too. It would let me identify where errors happen beyond what the parser does on its own. The AST transforming into:
On To The Pondering
Given that I started writing this as soon as I got up, the approach is barely into the exploration stage. I need to think through how it effects the parser, the output templates, etc...
My mind's bubbling on all that stuff. But, this feels like one of those light bulb moments. Not one where I had an original idea. One where I realized something I learned about elsewhere offers a solution that wouldn't have otherwise occurred.
-a
Endnotes
Neopolitan has two content types: blocks and spans. The post talks about blocks. I expect to apply the same technique to spans. Effectively wrapping everything in ok/error blocks.
And good eye on you if you noticed that the value of the class attribute in an array of spans that also uses the ok/error.
Of course, this adds complexity. I'm not worried about it from the parser side. I think I can add end-of-file checks to anything that has open/close parts to deal with things.
The main thing to solve for is working with the data in output templates. It would be a bummer to have to add checks into every little template. One possibility is to pre-process the AST to shift everything up. That's not great, but it would work.
I don't think that's going to be necessary. If my mental model is right, you can drop in two files (one for blocks and one for spans) that act as gateways that everything passes through. They'll do the ok/error check and hand off the results to the proper next step.
The other inspiration I'm taking from Rust is how great the error messages are. I wrote an entire post singing their praises.
The TL;DR is that folks who work on the compiler have the philosophy that it's an issue worthy of being called a bug if an error message doesn't give you enough information about itself to fix it.
It's one of my favorite things about Rust and a goal I've got for Neopolitan.
It's still early thinking, but I'm already kicking around the idea of using numbered error messages like Rust does to provide information about how to solve issues.
The more frictions I can remove from the app and processes, the more folks will enjoy using them. That's a goal worthy of spending time on.
Footnotes
It's like Markdown on steroids.
The Neopolitan parser reads the files we humans make and turns them into data that apps can use to build websites. The formal name for that type of data is Abstract Syntax Tree.
Yep. I sometimes dream in code. It's weird, but you get used to it.
Rust's learning curve was pretty steep for me. One thing that took a while to get my head around is how Result works. Basically, when you get data back from something, you have to check if it's ok before you try to use it.
If felt like unnecessary overhead when I first started learning the language. Now, I don't like working without. (Which, of course, is how this entire post came to be.)
Neopolitan has seven primary kinds of content blocks: Basic, Checklist, CSV, JSON, List, Numbered List, and Raw. The kind of block determines how the content inside it is parsed.
Basic sections (like -- title) are just text. JSON blocks get turned into data objects in the AST that are available to use in templates.
Each kind of block gets its own output template. -- title and -- details are both Basic blocks. Using individual templates is what lets one become an and the other a element in the output.