The words Under construction in black text on a yellow background with diagonal black stipes surrounding it
I'm in the process of moving my site. It's still a work in progress. Please excuse the mess and broken links.

Neopolitan Parser Grammar

TODO: Pull subtitle into page object

All this stuff is in pre tags becaue to avoid having to comment out all the pipes

This is where I'm working on the Parser grammar to make the AST for the LSP and Tree-Sitter parsers for Neopolitan. (I built the original parser before learning how to define this grammar. Any differences will be normalized to use this moving forward)

Notes

*Primitives**

These are all the things that don't call another item. They're what will be used to assemble the full items. They'll be created in the scanner.cc file

pre_full_default_section

*SECTION TOKENS**

These tokens are used for basic sections as well as container sections. The assembly of the container section start and end triggers is done in a later step

pre_full_default_section

*SECTION HEADERS**

There are "SECTION" and "CONTAINER" template. Some section types have one, some have the other, and some have both.

pre_full_default_section

*Full Items**

These are the things that are made from either primitives, other full items, or both. They'll be assembled in the tree-sitter grammar.js file

pre_full_default_section

Prior Work

This is an example that someone from chat helped me put together

Code
CMD -> '!bear' ' ' BEAR_COMMAND_LIST
BEAR_COMMAND_LIST -> BEAR_COMMAND | BEAR_COMMAND ',' OPTIONAL_WHITESPACE BEAR_COMMAND_LIST
BEAR_COMMAND -> OP ':' OPTIONAL_WHITESPACE COLOR
OPTIONAL_WHITESPACE -> NOTHING | ' ' OPTIONAL_WHITESPACE
OP -> 'head' | 'body' | 'eyes'
COLOR -> HEXCOLOR | COLORNAME
HEXCOLOR -> '#' HEXDIGIT HEXDIGIT HEXDIGIT HEXDIGIT HEXDIGIT HEXDIGIT
COLORNAME -> 'red' | 'green' | 'blue' | 'white' | ...

Debugging Stuff

I'm moving stuff around right now. All this below is helping me figure out where to put stuff

        -- title

Neopolitan Parser Grammar

-- note

All this stuff is in pre tags becaue to 
avoid having to comment out all the pipes

-- p

This is where I'm working on the Parser grammar to 
make the AST for the LSP and Tree-Sitter parsers
for Neopolitan. (I built the original parser before
learning how to define this grammar. Any differences
will be normalized to use this moving forward)

-- notes

- This is kinda a scratch pade since it's working a few
different things. It's not a direct right now because
the tree-sitter parser in volvues using regex
and c and I haven't consolidted those things down
yet

- The current todo check marks are for adding the
specific items to the Tree-Sitter parser

- Thing with a 'ts' have been handled in tree sitter

-- p

*Primitives**

These are all the things that don't call
another item. They're what will be used to
assemble the full items. They'll be created
in the scanner.cc file


-- pre

[] ATTR_BOOL_VALUE -> not([' ' | '\t' | '\n'])

[] ATTR_DASHES -> ['--']

[] ATTR_KV_KEY -> not([':' | ' ' | '\t' | '\n']+)

[] ATTR_KV_SEPARATOR -> [':']

[] ATTR_KV_VALUE -> /[^\n]+/

[in progress] CONTAINER_TOKEN -> ['/']

[] EOF eof()

[] HTML_BODY_FOR_BASIC_SECTION -> [any_char]+ + lookahead(not(['\n'] + ['--']))

[x] LINE_ENDING -> [' ' | '\t']* + ['\n']

[] LINE_REMAINDER -> anychar+ + not([' '])* + ['\n' | eof]

[] NB_WHITESPACE -> [' ' | '\t']+

[x] SECTION_DASHES -> ['--']

[] SINGLE_CHARACTER_WORD -> [any_char] + lookahead(not([' ' | '\t' | '\n'])

[x] SINGLE_SPACE -> [' ']

[] TODO_BRACKET_END -> [']']

[] TODO_BRACKET_START -> ['[']


-- p

*SECTION TOKENS**

These tokens are used for basic sections as well
as container sections. The assembly of the container
section start and end triggers is done in a later
step

-- pre

[x] CODE_TOKEN -> ['html']

[x] HTML_TOKEN -> ['html']

[x] LIST_TOKEN -> ['list']

[x] P_TOKEN -> ['p']

[x] TITLE_TOKEN -> ['title']

[x] TODO_TOKEN -> ['todo']

-- p 

*SECTION HEADERS**

There are "SECTION" and "CONTAINER" template. Some section
types have one, some have the other, and some have both. 

-- pre


[] TITLE_SECTION_TRIGGER -> SECTION_DASHES + NB_WHITESPACE + TITLE_TOKEN + LINE_ENDING

[] HTML_CONTAINER_START_TRIGGER -> SECTION_DASHES + NB_WHITESPACE + HTML_TOKEN + LINE_ENDING

[] HTML_CONTAINER_END_TRIGGER -> SECTION_DASHES + NB_WHITESPACE + HTML_TOKEN + LINE_ENDING

[] HTML_SECTION_TRIGGER -> SECTION_DASHES + NB_WHITESPACE + HTML_TOKEN + LINE_ENDING

-- p 


*Full Items**

These are the things that are made from 
either primitives, other full items, or both.
They'll be assembled in the tree-sitter 
grammar.js file


-- pre

[] ATTRIBUTE -> BOOLEAN_ATTRIBUTE | KEY_VALUE_ATTRIBUTE

[] BOOLEAN_ATTRIBUTE -> DASHES + not[':'] + SPACE0 + SINGLE_NEWLINE

[] KEY_VALUE_ATTRIBUTE -> DASHES + not[':'] + ':' + any + SPACE0 + SINGLE_NEWLINE

[] PARAGRAPH -> PARAGRAPH_FIRST_WORD + WORD_BREAK + PARAGRAPH_BODY + EMPTY_LINE

note: this is for multi word paragraphs. single word paragraphs will
be addressed

[] PARAGRAPH_BODY ->  (WORD, sep_by, WORDBREAK)

note: this is done a little different in tree-sitter
since there isn't really a seb_by pattern without 
regex I haven't gotten into yet

[] PARAGRAPH_FIRST_WORD -> INLINE_TAG | WORD_WITHOUT_LEADING_DASH | WORD_WITH_ONE_LEADING_DASH

[] WORD_WITH_ONE_LEADING_DASH -> '-' + NB_WHITESPACE + WORD

[] WORD_BREAK -> [NB_WHITESPACE | SINGLE_NEWLINE]+ + lookahead(not(LINE_ENDING))

[] INLINE_TAG -> LINK | SPAN | etc...

[] LIST_ITEM -> "-" + NB_WHITESPACE + PARAGRAPH_BODY

[] LINE_ENDING_OR_EOF -> [NB_WHITESPACE + NEWLINE] || EOF

[] LINK -> tktktkt

[] SPAN -> tktktktk

[] WORD_WITHOUT_LEADING_DASH -> not['-'] + WORD

[] WORD -> not['<' + lookahead(not['<'])] + not(NB_WHITESPACE | LINE_ENDING)

[deprecated?] INITIAL_WORD_CHARS -> NON_LT_CHAR | LT_WITH_NON_LT_CHAR

[deprecated?] LT_WITH_NON_LT_CHAR -> '<' + NON_LT_CHAR

[deprecated?] NON_LT_CHAR -> none_of['< \n\t']

[deprecated?] FOLLOWING_WORD_CHARS -> none_of[' \n\t']

[] ATTR_KV_PAIR -> ATTR_KEY + ATTR_KV_SEPARATOR NB_WHITESPACE + ATTR_VALUE

[] ATTR -> ATTR_DASHES + [KEY_VALUE_ATTR | BOOLEAN_ATTR] + NEWLINE



-- h3

Prior Work

This is an example that someone from chat helped
me put together

-- code

CMD -> '!bear' ' ' BEAR_COMMAND_LIST
BEAR_COMMAND_LIST -> BEAR_COMMAND | BEAR_COMMAND ',' OPTIONAL_WHITESPACE BEAR_COMMAND_LIST
BEAR_COMMAND -> OP ':' OPTIONAL_WHITESPACE COLOR
OPTIONAL_WHITESPACE -> NOTHING | ' ' OPTIONAL_WHITESPACE
OP -> 'head' | 'body' | 'eyes'
COLOR -> HEXCOLOR | COLORNAME
HEXCOLOR -> '#' HEXDIGIT HEXDIGIT HEXDIGIT HEXDIGIT HEXDIGIT HEXDIGIT
COLORNAME -> 'red' | 'green' | 'blue' | 'white' | ...



-- ref
-- title: Markdown 
-- url: https://github.com/FIXTradingCommunity/md-grammar/blob/main/diagrams/MarkdownLexer.rrd.html


-- categories
-- Neopolitan 
-- Tree-sitter 

-- metadata
-- date: 2023-09-26 13:56:47
-- id: 2vwdjwpp
-- site: aws
-- type: post
-- status: draft