Introduction
Astray is a framework for building type safe parsing functions from Rust type definitions. It helps you do this with ease and correction.
This doc book is (or hopes to be) a repository for all Astray features and rules. It is, like the rest of the project, a work in progress.
Please check out the wishlist for small breakdown of what has not been implemented so far.
Main goals
Besides fullfilling its purpose as a parsing framework, Astray has a few non-functional goals:
- Correctness and type safety over performance (though performance improvements are welcome)
- Extensive and prolific use of the Rust type system
- Thorough and beginner friendly documentation
Contributing
No rules so far, just open an issue and we'll talk about it. Eventually, contributing rules will be made available here
Have fun and let me know if something in the book isn't quite right!
Project structure
Astray has a front facing crate which combines its two main crates
- Astray Macro provides a proc-macro that auto generates parsing functions from Rust type definitions.
- Astray Core holds all other functionality besides the proc-macro itself.
This division happen because a proc-macro crate may only export proc-macros, and Astray requires additional resources besides the proc-macro itself in order to work.
Let's go over the directory structure for each of these sub-crates
Astray Macro
src/ - lib.rs: exposes relevant macro functionality - node.rs:
A primer on compilers
Let's imagine a programming language called PseudoRust where the only valid statement is:
#![allow(unused)] fn main() { let <i> = <x> <sign> <y>; }
Where:
#![allow(unused)] fn main() { <i> := [a-z]([1-9] | [a-z])* <x> := [0-9] <y> := [0-9] <sign> := + | * }
If confusing, see here
Our goal is to write a compiler in Rust that takes PseudoRust text and turns it into machine code for a computer to run. A compiler can be divided into (at least) 3 steps:
- Lexing / Tokenization
- Parsing
- Code Generation
Real world compilers include other steps and features like type-checking and optimizations.
1. Lexing / Tokenization
Tokens (a.k.a lexems) are the smallest meaningful units in a programming language.
E.g. in PseudoRust, the following would be tokens:
Let
: let keyword+
: plus sign123
: an integer literalabc
: an identifier*
: asterisk sign
Tokens can be easily represented as enums, as seen below. Other representations might be possible, if you want to store extra information in each token
Lexing means taking as input text representing code as input and transforming it into a list of Tokens. Take a look at the pseudo-rust found below
For a full tutorial on lexers, check here
Below, an example of how a lexer for the PseudoRust
programming language could be typed in Rust:
#![allow(unused)] fn main() { enum Token { LetKw, Plus, Asterisk, IntLiteral(u32), Identifier(String), } /* Example of storing additional data struct TokenStruct { index_in_source: usize, token_len: usize, token_type: Token }*/ fn lex(text: &str) -> Vec<Token> { /* Loop through the text, find tokens. Record additional data if needed */ } }
2. Parsing
Lexing gives us a list of meaningful building blocks. Out compiler should now check that these building blocks are arranged in accordance with the language's syntax. A way to do this is by parsing the Tokens into an Abstract Syntax Tree (AST), which asserts meaningful logical relationships between tokens according to syntax rules.
Let's take a look at how a parse function could work:
E.g The foloowing PseudoRust
source file:
#![allow(unused)] fn main() { // PseudoRust let a = 1 + 3; }
... could be lexed into these tokens:
#![allow(unused)] fn main() { // the product of our PseudoRust lexer vec![ Token::LetKw, Token::Identifier("a"), Token::IntLiteral(1), Token::Plus, Token::IntLiteral(3) ] }
... and, given the following AST definition:
#![allow(unused)] fn main() { struct AST { // Token::LetKw let_kw: Token var_ident: String, // Token::EqualSign equals_sign: Token, body: Expr, // Token::SemiColon semicolon: Token, } struct Expr { // Token::LiteralInt(_) left: Token sign: Sign, // Token::LiteralInt(_) right: u32 } // Token::Plus | Token::Asterisk enum Sign { Add Mult } }
... and parse function:
#![allow(unused)] fn main() { fn parse(tokens: &[Token]) -> AST { // ... } }
... the Tokens could be parsed into:
#![allow(unused)] fn main() { AST { let_kw: Token::LetKw, var_ident: "a".to_owned(), equals_sign: Token::EqualsSign, body: Expr { left: Token::LiteralInt(1), sign: Sign::Add, right: Token::LiteralInt(3), }, semicolon: Token::Semicolon, } }
This AST let's us know that this item is an assignment, using the let
keyword of the result of (1 + 3)
to the identifier "a".
Note 1: Our PseudoRust syntax is quite simple. For more complex syntaxes, some new challenges start to arise. I recommend Nora Sandler's excellent guide on building a compiler, so you can understand these challenges. Note 2: Some of these fields could perhaps be dropped from the AST. As an example, the equals sign token doesn't have any use here, since we already typed this statement as being an Assignment.
Note 3:
Sometimes, storing the whole token might not be necessary, and maybe we'll just include type it contains in the AST, like we see in var_ident
field Assignment
.
2.5 Error Handling
In practice, any step of our compiler might fail:
When Lexing, maybe some unrecognized tokens are present in the source text:
let a = 1 👍 3;
According to our grammar, this statement is un-lexable and so lexing should fail
Even if lexing succeeds, maybe parsing will fail if there are no syntax rules to explain the tokens that were produced by the lexer:
let a let a = 1 + 3;
Though all tokens were valid let a let a
has no meaning according to our syntax, so parsing should fail.
Code generation from an AST is more straightforward than the previous steps and would, in this case, maybe only fail if there was some compatibility issue with the target architecture, or something like that.
So, in practice, all our steps should produce Result
s rather than just values.
3. Code Generation
3.1 Generating Assembly
Our computers really only care about machine code, a binary language that represents instructions for our CPU to execute. Machine code is rather unsavory for our simple human minds, so, instead, we'll think about a human readable version of machine code: Assembly. Turning an AST into Assembly is off the scope of this project and repository, but feel free to check Nora Sandler's guide.
In the end, our compiler would look something like this:
#![allow(unused)] fn main() { fn compiler(text: &str) -> Result<String,CompileTimeError> { let tokens: Vec<Token> = lex(text)?; let ast: AST = parse(tokens)?; generate_assembly(ast) } fn generate_assembly(ast: AST) -> Result<String, ParseError > { //... } }
3.2 Assembling Assembly into Machine Code
Assembling is the process of turning Assembly into machine code. It's a relativelly straightforward process, where each Assembly instruction is turned into 1 or more machine code instrutions. This process is very well studied, highly optimized and, once again, off the scope of this project. Important note: very often, compilers will transform an AST directly into machine code, skipping 3.1 entirelly. This makes sense, since likely no one will look at whatever the output of this phase is, hence no need for human readable output.
What is Astray?
In our compiler primer, we saw that compilation of a programming language is usually broken down in at least 3 phases:
- Lexing (text -> Tokens)
- Parsing (Tokens -> AST)
- Code generation (AST -> Assembly)
- Assembling (Assembly -> Machine Code)
Astray highjacks development in step 2 of this list: building a parser. More specifically, Astray can generate parsing functions for any AST definition, just using a few derive annotations and some attributes. To get a feel for why this is useful, let'ci's compare a parse function with and without Astray.
Parser implemented by hand
In more detail, let's think about the previous page' example of a parsing function and try to implement it by hand.
#![allow(unused)] fn main() { /** * Syntax: * let <i> = <x> <sign> <y>; * Where: * - <i> := [a-z]([1-9] | [a-z])* * - <x> := [0-9]; * - <y> := [0-9]; * - <sign> := + | *; * Check [here](./compiler_primer.md#2-parsing) for AST defintion */ fn parse(tokens: &[Token]) -> Result<AST, String> { let mut token_ptr = 0; /*parse let kw*/ match = tokens.get(token_ptr) { Some(Token::LetKw) => (), _ =>return Err(format!("Failed to parse 'let' keyword at {token_ptr}")) } // move on to next token token_ptr += 1; /*parse variable identifier */ let var_ident = match tokens.get(token_ptr) { Some( Token::Identifier(var_ident)) => var_ident _ =>return Err(format!("Failed to parse identifier of variable at {token_ptr}")) } // move on to next token token_ptr += 1; /*parse equal sign */ match tokens.get(token_ptr) { Some(Token::EqualSign) => (), _ => return Err(format!("Failed to parse '=' at {token_ptr}")) } // move on to next token token_ptr += 1; /*parse left side of expr*/ let left = match tokens.get(token_ptr) { Some(left @ Token::IntLiteral(_)) => left, _ => return Err(format!("Failed to parse integer literal at {token_ptr}")) } // move on to next token token_ptr += 1; /* parse sign (+ or *) */ let sign = match tokens.get(token_ptr) { Some(Token::Plus) => Sign::Add Some(Token::Asterisk) => Sign::Mult _ => return Err(format!("Failed to parse + or * at {token_ptr}")) } // move on to next token token_ptr += 1; /*parse right side of expr*/ let right = match tokens.get(token_ptr) { Some(right @ Token::IntLiteral(_)) => right, _ => return Err(format!("Failed to parse integer literal at {token_ptr}")) } // move on to next token token_ptr += 1; /* parse semi colon*/ match tokens.get(token_ptr) { Some(Token::SemiColon) => (), _ => return Err(format!("Failed to parse '=' at {token_ptr}")) } // move on to next token token_ptr += 1; // if there any tokens besides these, error if token_ptr != tokens.len(){ return Err("There were too many tokens") } Ok(AST { let_kw: Token::LetKw, var_ident, equals_sign: Token::EqualSign, body: Expr { left, sign, right, } semicolon: Token::SemiColon }) } }
There are a bunch of obvious problems with this implementation:
- Precise manipulation of a pointer to tokens might foment logic errors, since it gives us a lot of freedom, especially if we ever need to backtrack
- Very repetitive code. We could make it smaller with some abstractions, but it would still be quite repetitive for larger, more complex syntaxes.
- No mechanism for reusing
Now, imagine a syntax like Rust's. Building parsing functions for it is a tremendously difficult job that grows quickly.
Astray to the rescue
Luckily, Astray can help us with parsing functions.
Given any set of structs or enums representing an AST, Astray will generate type-safe parsing functions for each of those types. It allows you to compose types to generate complex ASTs without a hassle
So, for our previous AST definition, we would have to add some macros:
#![allow(unused)] fn main() { #[derive(SN)] struct AST { #[pat(Token::LetKw)] let_kw: Token #[extract(Token::Identifier(var_ident))] var_ident: String, #[pat(Token::EqualSign)] equals_sign: Token, body: Expr, #[pat(Token::SemiColon)] semicolon: Token, } #[derive(SN)] struct Expr { #[pat(Token::LiteralInt(_))] left: Token sign: Sign, #[pat(Token::LiteralInt(_))] right: u32 } #[derive(SN)] enum Sign { #[pat(Token::Plus)] Add #[pat(Token::Mult)] Mult } }
Now, instead of using comments to denote what Tokens are expected, we use pat(<token pattern>)
.
We annotate each type with a #derive[SN]
to let Astray know to implement parsing functions for this particular type.
Now, it's pretty easy to use our parser
#![allow(unused)] fn main() { fn parser () { let tokens: Vec<Token> = lex("let a = 1 + 1;") // `parse` is now an associated function let ast: Result<AST, ParseError> = AST::parse(tokens.into()); } }
Feature breakdown
- Parse a sequence of types, represented as a
struct
- Parse one of many possible types, represented as an
enum
- Pattern Matching on Tokens
- Vec
: for consuming multiple types or Tokens - Option
: for consuming a type if it is there - Box
: for consuming and heap allocating a type - (T,P): for tuples of types (only arity <=3 implemented for now)
- Either<T,P>: from the either crate
- NonEmpty
: from the nonempty crate, allows you to consume a sequence of at least one type
For more details, keep reading the book!
Basics
So, Astray is a framework to develop parsing functions from Rust type definitions. It provides 2 basic components, each a separate crate:
The core of Astray is the Parsable
trait, which can be automatically derived with the SN
("Syntax Node") macro.
Just annotating a type with SN
will auto generate an implementation of Parsable
, as we'll see in the next chapter.
At the heart of Astray lies the SN macro, a derive-macro that takes a type definition and builds a parsing function for it. We'll cover some basic examples in the next chapter
Basic types
Almost any type can derive the SN
macro. This auto implements the Parsable<T>
trait, which comes with the parse
associated function:
#![allow(unused)] fn main() { trait Parsable<T> { fn parse(token_iter: TokenIterator<T>) -> Result<Self, ParseError<T>>; /// other stuff we'll discuss later* } }
A TokenIterator is an abstraction over a Vec of Tokens that can go iterate bidirectionally and do a bunch of useful stuff. We'll cover it later.
Before doing anything with Astray, the user must call the set_token!(<your_token_type>)
macro to let Astray know what type will be considered a token for the parsing functions it will generate.
structs and pattern matching
When a struct
derives SN
, Parsable
set_token!(Token) #[derive(SN)] struct Pair{ l_element: Token, comma: Token, r_element: Token, } fn main() { // let tokens = lexer("a b c"); This will be parsed successfully as well let tokens = lexer("a , c"); assert_eq!(tokens, vec![ Token::Identifier("a".to_owned()) Token::Comma Token::Identifier("c".to_owned()) ]) let pair: Result<Pair, ParseError<Token>> = Pair::parse(tokens.into()) }
Struct fields will be parsed top to bottom. If any of the fields cannot be parsed, the struct will fail parsing as well.
The code above will actually parse any set of 3 tokens, since we have not specified which tokens should be consumed. We can do so with pattern matching, which we'll see next.
I'm working on support for Tuple Structs: TODO: Insert issue number here
set_token!(Token); #[derive(SN)] struct TwoNumbers(Token, Token); fn main() { let tokens = lexer("1 2"); assert_eq!(tokens, vec![ Token::IntLiteral(1) Token::Identifier(2) ]); let two_numbers: Result<TwoNumbers, ParseError<Token>> = TwoNumbers::parse(tokens.into()) }
pattern matching
Enums
Much like structs, enums can be used as consumable types. Astray will try to parse each of the enum's variants from top to bottom. If all variants fail to parse, the enum fails to parse If at least one variant does parse, the enum parsing succeeds All enums derive SN, as long as they follow the guidelines in the notes below:
Enum Note 1: Unit variants must have a #[pat] attribute annotation
#![allow(unused)] fn main() { /*valid*/ #[derive(SN)] enum Sign { #[pat(Token::Plus)] Plus, #[pat(Token::Minus)] Minus, } /*invalid, fails to compile*/ #[derive(SN)] enum Sign { Plus, Minus, } }
Enum Note 2: Single Tuple variants only
Use single element tuple variants that contain a tuple instead of tuple variants with many elements.
#![allow(unused)] fn main() { enum Sign { #[pat(Token::Plus)] Plus(Token), // valid #[pat(Token::Slash)] Div((Token)), // invalid, fails to compile IntegerDiv(Token, Token), // valid, applies pattern to tuple #[pat((Token::Slash, Token::Slash))] IntegerDiv((Token, Token)), } }
struct variants are not supported yet
TODO: Add support for this and mention tracking issue here
Pattern Matching
You can pattern match on each field in a struct (or enum, as we'll see later) to tell Astray that it should parse a specific instance of a Token (or type) for that specific field.
No Pattern matching
If pattern matching is not specified, then any instance of that type may be parsed. This includes Tokens:
set_token!(Token) #[derive(SN)] struct Pair{ l_element: Token, comma: Token, r_element: Token, } fn main() { let tokens = lexer("a b c"); assert_eq!(tokens, vec![ Token::Identifier("a".to_owned()) Token::Identifier("b".to_owned()) Token::Identifier("c".to_owned()) ]) // Any three tokens will be successfully parsed. This is pretty useless let pair: Result<Pair, ParseError<Token>> = Pair::parse(tokens.into()) }
Of course, parsing Tokens without a specific
With pattern matching
If the user wants to parse a specific instance of a type, annotate the desired field with the pattern that field is expected to have.
set_token!(Token) #[derive(SN)] struct Pair{ #[pat(Token::Identifier(_))] l_element: Token, #[pat(Token::Comma)] comma: Token, #[pat(Token::Identifier(_))] r_element: Token, } fn main() { let tokens = [ Token::Identifier("a".to_owned()), Token::Identifier("b".to_owned()), Token::Identifier("c".to_owned()), ] // result is err let pair: Result<Pair, ParseError<Token>> = Pair::parse(tokens.into()) assert!(pair.is_err()); let tokens = [ Token::Identifier("a".to_owned()), Token::Comma, Token::Identifier("c".to_owned()), ]; let pair: Result<Pair, ParseError<Token>> = Pair::parse(tokens.into()) assert_eq!(pair, Ok(Pair { l_element: Token::Identifier("a".to_owned()), comma: Token::Comma, r_element : Token::Identifier("c".to_owned()), })) }
Of course this works for all parsable types, not just tokens:
struct Expr { #[pat(Token::IntLiteral(_))] left: Token, #[pat(Sign::Add)] sign: Sign, #[pat(Token::IntLiteral(_))] right: Token, } enum Sign { #[pat(Token::Plus)] Add, #[pat(Token::Minus)] Sub, } fn main() { let tokens = [ Token::IntLiteral(3), Token::Plus, Token::IntLiteral(2), ] let expr_result = Expr::parse(tokens.into()); assert_eq!(expr_result, Ok( Expr { left: Token::IntLiteral(3), sign: Sign::Add right: Token::IntLiteral(2), } )) let tokens = [ Token::IntLiteral(3), Token::Minus, Token::IntLiteral(2), ] let expr_result = Expr::parse(tokens.into()); // Does not parse, since Expr is expecting specifically a Sign::Add, which may not be parsed when Token::Minus is present instead of Token::Plus assert!(expr_result.is_err()) }
Extract values
Currently a WIP, you can extract specific values form a matched pattern, should you want to keep only the inner values of a struct / enum in your AST
set_token!(Token) #[derive(SN)] struct Pair{ #[extract(Token::Identifier(l_element))] l_element: String, #[pat(Token::Comma)] comma: Token, #[extract(Token::Identifier(r_element))] r_element: String, } fn main() { let tokens = [ Token::Identifier("a".to_owned()), Token::Comma, Token::Identifier("c".to_owned()), ]; let pair: Result<Pair, ParseError<Token>> = Pair::parse(tokens.into()) assert_eq!(pair, Ok(Pair { l_element: "a".to_owned(), comma: Token::Comma, r_element : "c".to_owned(), })) }
Either this or that
As you'd expect, it's possible to use patterns with pipes to make Astray parse one possible type from a set of types. This can be a replacement for moving a type to a separate enum, and will very likely be faster. TODO: Benchmark this
set_token!(Token) #[derive(SN)] struct Pair{ #[extract(Token::Identifier(l_element))] l_element: String, #[pat(Token::Comma | Token::SemiColon)] comma: Token, #[extract(Token::Identifier(r_element))] r_element: String, } fn main() { let tokens = [ Token::Identifier("a".to_owned()), Token::Identifier(",".to_owned()), Token::Identifier("c".to_owned()), ] let pair: Result<Pair, ParseError<Token>> = Pair::parse(tokens.into()) assert_eq!(pair, Ok(Pair { l_element: "a".to_owned(), comma: Token::Comma, r_element : "c".to_owned(), })) let tokens = [ Token::Identifier("a".to_owned()), Token::Comma, Token::Identifier("c".to_owned()), ]; let pair: Result<Pair, ParseError<Token>> = Pair::parse(tokens.into()) assert_eq!(pair, Ok(Pair { l_element: "a".to_owned(), comma: Token::Comma, r_element : "c".to_owned(), })) }
Custom Types
Custom Types
Errors
As a general rule, <P as Parsable
This means that all parsing is fallible. If it does fail, a ParseError
#![allow(unused)] fn main() { #[derive(Debug, Clone, PartialEq, Eq)] pub enum ParseErrorType<T> where T: ConsumableToken, { /* Since Tokens are just parsable types, this might be removed in the future*/ UnexpectedToken { expected: T, found: T }, /* When you run out of tokens mid parsing a type */ NoMoreTokens, /* When a type can be parsed from the TokenIterator but it does not match the pattern that was applied to it */ ParsedButUnmatching { err_msg: String }, /** * Failed to parse a branch from a conjunct type * This will happen for: * - fields /elements in a struct / tuple struct * - elements in a tuple * - the first element in a NonEmpty vec */ ConjunctBranchParsingFailure { err_source: Box<ParseError<T>> }, /** * Failed to parse a branch from a conjunct type * This will happen for: * - variants in an enum * - fields in Either */ DisjunctBranchParsingFailure { err_source: Vec<ParseError<T>> }, } #[derive(Debug, Clone, PartialEq, Eq)] pub struct ParseError<T> where T: ConsumableToken, { type_name: &'static str, failed_at: usize, pub failure_type: ParseErrorType<T>, } }
As you can see, a ParseError can have 5 differnent causes. Check the comments in each for further details.
Astray Universal Rules
This is a complex project, so below are some rules/axioms/definitions that must be upheld throughout. If at any point the code/docs fail to a single one of these, then there's a bug either in the code/docs, or in the rules.
Throughout the code and docs, AUR::N will be the way to reference the Nth rule, as specified below:
- Given any type
T
and aTokenIterator<T>
, whereT: Parsable<T>
,T
will referred to as aToken
. - Given any type
P: Parsable<T>, T: Parsable<T>
P may be parsed from a TokenIterator. P will be called a "parsable type", or just a "parsable" - Since T: Parsable
, for all T, T may always be parsed from TokenIterator . - This means a Token will always be parsable from a TokenIterator
of itself. - All T: Parsable
- This means a Token will always be parsable from a TokenIterator
- Parsable types are either Tokens, or composed of other parsable types through sum types or product types.
- Parsing may fail, so it always produces a Result<P, ParseError
>, where P: Parsable - A failed parsing will always leave the TokenIterator at the place it was before parsing was attempted. This is true for all structs, enums and default implementations in Astray and should remain true for any custom types the user implements.
- A successful parsing can either leave the iterator in the same place it was before parsing was attempted, or move the iterator along according to the length of the type it is parsing. Check here if confusing
The Parsable trait
At the heart of Astray lies the Parsable<T>
trait. Check its definition here.
Parsable<T>
marks a type as consumable type. This means that given a TokenIterator<T>
, any struct implementing Parsable<T>
may be parsed from those tokens.
Its definition:
#![allow(unused)] fn main() { pub trait Parsable<T>: std::fmt::Debug where T: Parsable<T>, Self: Sized, T: ConsumableToken { type ApplyMatchTo: Parsable<T> = Self; fn parse(iter: &mut TokenIter<T>) -> Result<Self, ParseError<T>>; fn parse_if_match<F: Fn(&Self::ApplyMatchTo) -> bool>( iter: &mut TokenIter<T>, matches: F, pattern: Option<&'static str> ) -> Result<Self, ParseError<T>> where Self: Sized { todo!("parse_if_match not yet implemented for {:?}", Self::identifier()); } fn identifier() -> &'static str { std::any::type_name::<Self>() } } }
Let's go step by step
Trait declaration
#![allow(unused)] fn main() { // Any type that implements Parsable<T> must implement std::fmt::Debug // This is necessary for building nice ParseErrors pub trait Parsable<T>: std::fmt::Debug where // T: Parsable<T>, meaning T is a Token as per Astray Rule # 1 T: Parsable<T>, // Self is Sized is required, since parse and parse_if_match associated functions return Self Self: Sized, // This is just a marker trait, that might be removed in the future T: ConsumableToken }
Associated Type
#![allow(unused)] fn main() { { type ApplyMatchTo: Parsable<T> = Self } }
This is the type that patterns will be applied to when #[pat(<pattern>)]
is used.
Generally, it will be Self. However, for container types, ApplyMatchTo might be the contained type.
ApplyMatchTo may be any type that makes sense for each specific implementor of Parsable.
Check this page on implementing Parsable
parse function
#![allow(unused)] fn main() { fn parse(iter: &mut TokenIter<T>) -> Result<Self, ParseError<T>>; }
Parse takes &mut TokenIterator<T>
, which must be mut since the inner pointer in TokenIterator will be moved depending on what the parsing function does.
parse
will always return a Result, meaning it is always fallible.
parse function
#![allow(unused)] fn main() { fn parse_if_match<F: Fn(&Self::ApplyMatchTo) -> bool>( _iter: &mut TokenIter<T>, _matches: F, _pattern: Option<&'static str> ) -> Result<Self, ParseError<T>> where Self: Sized { todo!("parse_if_match not yet implemented for {:?}", Self::identifier()); } }
The parse_if_match
function will allow an implementor to restrict which types can be parsed according to a validating function, here named matches
(TODO: might be renamed in the future).
Ideally, we would be able to pass a pattern directly to this function, but Rust doesn't really have first class support for patterns, so a Fn(&Self::ApplyMatchTo) -> bool
does the trick. In practice, the function that actually passed to parse_if_match
is |ty|matches!(ty, <pattern>)
.
parse_if_match
requires a pattern
string which is a stringified version of a pattern.
Since Rust doesn't really have first class support for patterns, a matches
which would very useful. So
TODO: A default implementation is on the way.
Given a token_iterator: TokenIterator<T>
and P: Parsable<T>
:
P
shall called a parsable typeP
may be parsed from token iterator withP::parse(&mut token_iterator)
P::parse(&mut token_iterator)
always produces:Ok(P {/*fields*/})
if parsing succeeds. The iterator is left at the position pointing to the token after the last token that was consumed for parsingP
Err(ParseError<T> /*different errors exist*/)
. In this case, the iterator is reset to the position it was before parsing was attempted
T: Parsable<T>
(with some caveats)- Calling
<Token as Parsable<Token>>::parse(&mut token_iterator)
just consumes the next token
- Calling
Define a set of universal rules for Astray
Astray is complex and must follow some rulessarily These are being defined (WIP) at (./universal_rules.md)
Implement Iterator for TokenIterator
- Allows usage of iterator methods
- replaces
fn consume(&mut self)
withfn next(&mut self)
, which is more intuitive in Rust land - Prevents iterator from going backwards, of course. So it would not work on sum types.
Functional macro for enum optimization
When an enum implements Parsable
#![allow(unused)] fn main() { enum MyEnum { Variant1(Struct1), Variant2(Struct2), } struct Struct1 { #[pat(Token::Plus)] plus: Token #[pat(Token::LiteralInt(_))] plus: TOken } struct Struct2 { #[pat(Token::Plus)] plus: Token #[pat(Token::LiteralString(_))] plus: TOken } }
Given a TokenIterator
over [Token:Plus, TokenLiteralString("something")]
, Astray tries to parse a Plus (for a Struct1), succeeds and tries to parse a LiteralInt(_), fails, tries to parse a Plus again (for Struct2) and then a LiteralString (succeeds).
Plus has been tried twice unnecessarilly.
The goal is to optimize this. I already have an idea for execution, which I will expose later as soon as I can.
Parsing visualization for the command line (bonus points if also works in the browser)
An animation of how parsing is happening in "real" time, showing how the parser works.
Documenting all functions in the code
Although they are (hopefully) rather self explanatory, great Rusty documentation is missing from each function. I hope to add it in the future.
Updating nomenclature
Consumable types -> Parsable types Partially in docs, WIP in the code.
Generic Errors instead of ParseError
Turning ParseError into a trait allows users to specify what Error type their parsing functions to produce.
Implement FromIterator for TokenIterator
Makes it easier to create a TokenIterator, rather.
Update parse_if_match function
Parsers should be generic not only on type's they take, but also on their arity! I'm investigating possible solutions for this
Add more advanced validation functions
#![allow(unused)] fn main() { struct A { /* These should get combined into a single validation function? Or maybe multiple validation functinons... more experimentation is required*/ #[pat(Token::LiteralInt(_))] #[len(> 5)] field1: Vec<Token>, } }
Perhaps renaming parse_if_match
to parse_if_valid
Remove UnexpectedToken ParseErrorType variant
Since Tokens are to be treated as parsable types, under the specifc restrictions that Token: Parsable