Introduction

Astray is a framework for building type safe parsing functions from Rust type definitions. It helps you do this with ease and correction.

This doc book is (or hopes to be) a repository for all Astray features and rules. It is, like the rest of the project, a work in progress.

Please check out the wishlist for small breakdown of what has not been implemented so far.

Main goals

Besides fullfilling its purpose as a parsing framework, Astray has a few non-functional goals:

  • Correctness and type safety over performance (though performance improvements are welcome)
  • Extensive and prolific use of the Rust type system
  • Thorough and beginner friendly documentation

Contributing

No rules so far, just open an issue and we'll talk about it. Eventually, contributing rules will be made available here

Have fun and let me know if something in the book isn't quite right!

Project structure

Astray has a front facing crate which combines its two main crates

  1. Astray Macro provides a proc-macro that auto generates parsing functions from Rust type definitions.
  2. Astray Core holds all other functionality besides the proc-macro itself.

This division happen because a proc-macro crate may only export proc-macros, and Astray requires additional resources besides the proc-macro itself in order to work.

Let's go over the directory structure for each of these sub-crates

Astray Macro

src/ - lib.rs: exposes relevant macro functionality - node.rs:

A primer on compilers

Let's imagine a programming language called PseudoRust where the only valid statement is:

#![allow(unused)]
fn main() {
let <i> = <x> <sign> <y>;
}

Where:

#![allow(unused)]
fn main() {
<i>    :=  [a-z]([1-9] | [a-z])*
<x>    :=  [0-9]
<y>    :=  [0-9]
<sign> :=  + | *
}

If confusing, see here

Our goal is to write a compiler in Rust that takes PseudoRust text and turns it into machine code for a computer to run. A compiler can be divided into (at least) 3 steps:

  1. Lexing / Tokenization
  2. Parsing
  3. Code Generation

Real world compilers include other steps and features like type-checking and optimizations.

1. Lexing / Tokenization

Tokens (a.k.a lexems) are the smallest meaningful units in a programming language.

E.g. in PseudoRust, the following would be tokens:

  • Let: let keyword
  • +: plus sign
  • 123: an integer literal
  • abc: an identifier
  • *: asterisk sign

Tokens can be easily represented as enums, as seen below. Other representations might be possible, if you want to store extra information in each token

Lexing means taking as input text representing code as input and transforming it into a list of Tokens. Take a look at the pseudo-rust found below

For a full tutorial on lexers, check here Below, an example of how a lexer for the PseudoRust programming language could be typed in Rust:

#![allow(unused)]

fn main() {
enum Token {
    LetKw,
    Plus,
    Asterisk,
    IntLiteral(u32),
    Identifier(String),
}

/* Example of storing additional data
struct TokenStruct {
    index_in_source: usize,
    token_len: usize,
    token_type: Token
}*/


fn lex(text: &str) -> Vec<Token> {
    /* Loop through the text, find tokens. Record additional data if needed  */
}

}

2. Parsing

Lexing gives us a list of meaningful building blocks. Out compiler should now check that these building blocks are arranged in accordance with the language's syntax. A way to do this is by parsing the Tokens into an Abstract Syntax Tree (AST), which asserts meaningful logical relationships between tokens according to syntax rules.

Let's take a look at how a parse function could work: E.g The foloowing PseudoRust source file:

#![allow(unused)]
fn main() {
    // PseudoRust
    let a = 1 + 3;
}

... could be lexed into these tokens:

#![allow(unused)]
fn main() {
    // the product of our PseudoRust lexer
    vec![ Token::LetKw, Token::Identifier("a"),
        Token::IntLiteral(1), Token::Plus,
        Token::IntLiteral(3)
    ]
}

... and, given the following AST definition:

#![allow(unused)]
fn main() {
    struct AST {
        // Token::LetKw
        let_kw: Token
        var_ident: String,
        // Token::EqualSign
        equals_sign: Token,
        body: Expr,
        // Token::SemiColon
        semicolon: Token,
    }


    struct Expr {
        // Token::LiteralInt(_)
        left: Token
        sign: Sign,
        // Token::LiteralInt(_)
        right: u32
    }

    // Token::Plus | Token::Asterisk
    enum Sign {
        Add
        Mult
    }
}

... and parse function:

#![allow(unused)]
fn main() {
fn parse(tokens: &[Token]) -> AST {
    // ...
}
}

... the Tokens could be parsed into:

#![allow(unused)]
fn main() {
    AST {
        let_kw: Token::LetKw,
        var_ident: "a".to_owned(),
        equals_sign: Token::EqualsSign,
        body: Expr {
            left: Token::LiteralInt(1),
            sign: Sign::Add,
            right: Token::LiteralInt(3),
        },
        semicolon: Token::Semicolon,
    }
}

This AST let's us know that this item is an assignment, using the let keyword of the result of (1 + 3) to the identifier "a".

Note 1: Our PseudoRust syntax is quite simple. For more complex syntaxes, some new challenges start to arise. I recommend Nora Sandler's excellent guide on building a compiler, so you can understand these challenges. Note 2: Some of these fields could perhaps be dropped from the AST. As an example, the equals sign token doesn't have any use here, since we already typed this statement as being an Assignment.

Note 3: Sometimes, storing the whole token might not be necessary, and maybe we'll just include type it contains in the AST, like we see in var_ident field Assignment.

2.5 Error Handling

In practice, any step of our compiler might fail: When Lexing, maybe some unrecognized tokens are present in the source text: let a = 1 👍 3; According to our grammar, this statement is un-lexable and so lexing should fail

Even if lexing succeeds, maybe parsing will fail if there are no syntax rules to explain the tokens that were produced by the lexer: let a let a = 1 + 3;
Though all tokens were valid let a let a has no meaning according to our syntax, so parsing should fail.

Code generation from an AST is more straightforward than the previous steps and would, in this case, maybe only fail if there was some compatibility issue with the target architecture, or something like that.

So, in practice, all our steps should produce Results rather than just values.

3. Code Generation

3.1 Generating Assembly

Our computers really only care about machine code, a binary language that represents instructions for our CPU to execute. Machine code is rather unsavory for our simple human minds, so, instead, we'll think about a human readable version of machine code: Assembly. Turning an AST into Assembly is off the scope of this project and repository, but feel free to check Nora Sandler's guide.

In the end, our compiler would look something like this:

#![allow(unused)]
fn main() {
fn compiler(text: &str) -> Result<String,CompileTimeError> {
    let tokens: Vec<Token> = lex(text)?;
    let ast: AST  = parse(tokens)?;
    generate_assembly(ast)
}

fn generate_assembly(ast: AST) -> Result<String, ParseError > {
    //...
}
}

3.2 Assembling Assembly into Machine Code

Assembling is the process of turning Assembly into machine code. It's a relativelly straightforward process, where each Assembly instruction is turned into 1 or more machine code instrutions. This process is very well studied, highly optimized and, once again, off the scope of this project. Important note: very often, compilers will transform an AST directly into machine code, skipping 3.1 entirelly. This makes sense, since likely no one will look at whatever the output of this phase is, hence no need for human readable output.

What is Astray?

In our compiler primer, we saw that compilation of a programming language is usually broken down in at least 3 phases:

  1. Lexing (text -> Tokens)
  2. Parsing (Tokens -> AST)
  3. Code generation (AST -> Assembly)
  4. Assembling (Assembly -> Machine Code)

Astray highjacks development in step 2 of this list: building a parser. More specifically, Astray can generate parsing functions for any AST definition, just using a few derive annotations and some attributes. To get a feel for why this is useful, let'ci's compare a parse function with and without Astray.

Parser implemented by hand

In more detail, let's think about the previous page' example of a parsing function and try to implement it by hand.

#![allow(unused)]
fn main() {
/**
* Syntax:
* let <i> = <x> <sign> <y>;
* Where:
* - <i>    :=  [a-z]([1-9] | [a-z])*
* - <x>    :=  [0-9];
* - <y>    :=  [0-9];
* - <sign> :=  + | *;

* Check [here](./compiler_primer.md#2-parsing) for AST defintion
*/
fn parse(tokens: &[Token]) -> Result<AST, String> {
    let mut token_ptr = 0;
    /*parse let kw*/
    match  = tokens.get(token_ptr)  {
        Some(Token::LetKw) => (),
        _ =>return Err(format!("Failed to parse 'let' keyword at {token_ptr}"))
    }
    // move on to next token
    token_ptr += 1;

    /*parse variable identifier */
    let var_ident =  match  tokens.get(token_ptr)  {
        Some( Token::Identifier(var_ident)) => var_ident
        _ =>return Err(format!("Failed to parse identifier of variable at {token_ptr}"))
    }
    // move on to next token
    token_ptr += 1;

    /*parse equal sign */
     match tokens.get(token_ptr)  {
Some(Token::EqualSign) => (),
_ => return Err(format!("Failed to parse '=' at {token_ptr}"))
     }
    // move on to next token
    token_ptr += 1;

    /*parse left side of expr*/
    let left =  match tokens.get(token_ptr)  {
        Some(left @ Token::IntLiteral(_)) => left,
        _ => return Err(format!("Failed to parse integer literal at {token_ptr}"))
    }
    // move on to next token
    token_ptr += 1;

    /* parse sign (+ or *) */
    let sign = match tokens.get(token_ptr)  {
        Some(Token::Plus) => Sign::Add
        Some(Token::Asterisk) => Sign::Mult
        _ => return Err(format!("Failed to parse + or * at {token_ptr}"))

    }

    // move on to next token
    token_ptr += 1;

    /*parse right side of expr*/
    let right =  match tokens.get(token_ptr)  {
        Some(right @ Token::IntLiteral(_)) => right,
        _ => return Err(format!("Failed to parse integer literal at {token_ptr}"))
    }
    // move on to next token
    token_ptr += 1;

    /* parse semi colon*/
    match tokens.get(token_ptr)  {
        Some(Token::SemiColon) => (),
        _ => return Err(format!("Failed to parse '=' at {token_ptr}"))
    }
    // move on to next token
    token_ptr += 1;

    // if there any tokens besides these, error
    if token_ptr != tokens.len(){
        return Err("There were too many tokens")
    }

    Ok(AST {
        let_kw: Token::LetKw,
        var_ident,
        equals_sign: Token::EqualSign,
        body: Expr {
            left,
            sign,
            right,
        }
        semicolon: Token::SemiColon
    })

}
}

There are a bunch of obvious problems with this implementation:

  1. Precise manipulation of a pointer to tokens might foment logic errors, since it gives us a lot of freedom, especially if we ever need to backtrack
  2. Very repetitive code. We could make it smaller with some abstractions, but it would still be quite repetitive for larger, more complex syntaxes.
  3. No mechanism for reusing

Now, imagine a syntax like Rust's. Building parsing functions for it is a tremendously difficult job that grows quickly.

Astray to the rescue

Luckily, Astray can help us with parsing functions.

Given any set of structs or enums representing an AST, Astray will generate type-safe parsing functions for each of those types. It allows you to compose types to generate complex ASTs without a hassle

So, for our previous AST definition, we would have to add some macros:

#![allow(unused)]
fn main() {
    #[derive(SN)]
    struct AST {
        #[pat(Token::LetKw)] 
        let_kw: Token
        #[extract(Token::Identifier(var_ident))] 
        var_ident: String,
        #[pat(Token::EqualSign)] 
        equals_sign: Token,
        body: Expr,
        #[pat(Token::SemiColon)] 
        semicolon: Token,
    }


    #[derive(SN)]
    struct Expr {
        #[pat(Token::LiteralInt(_))]
        left: Token
        sign: Sign,
        #[pat(Token::LiteralInt(_))]
        right: u32
    }

    #[derive(SN)]
    enum Sign {
        #[pat(Token::Plus)]
        Add
        #[pat(Token::Mult)]
        Mult
    }
}

Now, instead of using comments to denote what Tokens are expected, we use pat(<token pattern>). We annotate each type with a #derive[SN] to let Astray know to implement parsing functions for this particular type.

Now, it's pretty easy to use our parser

#![allow(unused)]
fn main() {
fn parser () {
    let tokens: Vec<Token> = lex("let a = 1 + 1;")
    // `parse` is now an associated function
    let ast: Result<AST, ParseError> = AST::parse(tokens.into());
}

}

Feature breakdown

  • Parse a sequence of types, represented as a struct
  • Parse one of many possible types, represented as an enum
  • Pattern Matching on Tokens
  • Vec: for consuming multiple types or Tokens
  • Option: for consuming a type if it is there
  • Box: for consuming and heap allocating a type
  • (T,P): for tuples of types (only arity <=3 implemented for now)
  • Either<T,P>: from the either crate
  • NonEmpty: from the nonempty crate, allows you to consume a sequence of at least one type

For more details, keep reading the book!

Basics

So, Astray is a framework to develop parsing functions from Rust type definitions. It provides 2 basic components, each a separate crate:

The core of Astray is the Parsable trait, which can be automatically derived with the SN ("Syntax Node") macro. Just annotating a type with SN will auto generate an implementation of Parsable , as we'll see in the next chapter.

At the heart of Astray lies the SN macro, a derive-macro that takes a type definition and builds a parsing function for it. We'll cover some basic examples in the next chapter

Basic types

Almost any type can derive the SN macro. This auto implements the Parsable<T> trait, which comes with the parse associated function:

#![allow(unused)]
fn main() {
trait Parsable<T> {
    fn parse(token_iter: TokenIterator<T>) -> Result<Self, ParseError<T>>;
    /// other stuff we'll discuss later*
}
}

A TokenIterator is an abstraction over a Vec of Tokens that can go iterate bidirectionally and do a bunch of useful stuff. We'll cover it later.

Before doing anything with Astray, the user must call the set_token!(<your_token_type>) macro to let Astray know what type will be considered a token for the parsing functions it will generate.

structs and pattern matching

When a struct derives SN, Parsable is auto implemented. You can call the parse function like this:

set_token!(Token)

#[derive(SN)]
struct Pair{
    l_element: Token,
    comma: Token,
    r_element: Token,
}

fn main() {
    // let tokens = lexer("a b c"); This will be parsed successfully as well
    let tokens = lexer("a , c");
    assert_eq!(tokens, vec![
        Token::Identifier("a".to_owned())
        Token::Comma
        Token::Identifier("c".to_owned())
    ])
    let pair: Result<Pair, ParseError<Token>> = Pair::parse(tokens.into())
}

Struct fields will be parsed top to bottom. If any of the fields cannot be parsed, the struct will fail parsing as well.

The code above will actually parse any set of 3 tokens, since we have not specified which tokens should be consumed. We can do so with pattern matching, which we'll see next.

I'm working on support for Tuple Structs: TODO: Insert issue number here

set_token!(Token);

#[derive(SN)]
struct TwoNumbers(Token, Token);

fn main() {
    let tokens = lexer("1 2");
    assert_eq!(tokens, vec![
        Token::IntLiteral(1)
        Token::Identifier(2)
    ]);

    let two_numbers: Result<TwoNumbers, ParseError<Token>> = TwoNumbers::parse(tokens.into())
}

pattern matching

Enums

Much like structs, enums can be used as consumable types. Astray will try to parse each of the enum's variants from top to bottom. If all variants fail to parse, the enum fails to parse If at least one variant does parse, the enum parsing succeeds All enums derive SN, as long as they follow the guidelines in the notes below:

Enum Note 1: Unit variants must have a #[pat] attribute annotation

#![allow(unused)]
fn main() {
/*valid*/
#[derive(SN)]
enum Sign {
    #[pat(Token::Plus)]
    Plus,
    #[pat(Token::Minus)]
    Minus,
}

/*invalid, fails to compile*/
#[derive(SN)]
enum Sign {
    Plus,
    Minus,
}
}

Enum Note 2: Single Tuple variants only

Use single element tuple variants that contain a tuple instead of tuple variants with many elements.

#![allow(unused)]

fn main() {
enum Sign {
    #[pat(Token::Plus)]
    Plus(Token),
    // valid 
    #[pat(Token::Slash)]
    Div((Token)),
    // invalid, fails to compile 
    IntegerDiv(Token, Token),
    // valid, applies pattern to tuple
    #[pat((Token::Slash, Token::Slash))]
    IntegerDiv((Token, Token)),
}
}

struct variants are not supported yet

TODO: Add support for this and mention tracking issue here

Pattern Matching

You can pattern match on each field in a struct (or enum, as we'll see later) to tell Astray that it should parse a specific instance of a Token (or type) for that specific field.

No Pattern matching

If pattern matching is not specified, then any instance of that type may be parsed. This includes Tokens:

set_token!(Token)

#[derive(SN)]
struct Pair{
    l_element: Token,
    comma: Token,
    r_element: Token,
}

fn main() {
    let tokens = lexer("a b c");
    assert_eq!(tokens, vec![
        Token::Identifier("a".to_owned())
        Token::Identifier("b".to_owned())
        Token::Identifier("c".to_owned())
    ])
    // Any three tokens will be successfully parsed. This is pretty useless
    let pair: Result<Pair, ParseError<Token>> = Pair::parse(tokens.into())
}

Of course, parsing Tokens without a specific

With pattern matching

If the user wants to parse a specific instance of a type, annotate the desired field with the pattern that field is expected to have.

set_token!(Token)

#[derive(SN)]
struct Pair{
    #[pat(Token::Identifier(_))]
    l_element: Token,
    #[pat(Token::Comma)]
    comma: Token,
    #[pat(Token::Identifier(_))]
    r_element: Token,
}

fn main() {
    let tokens = [
        Token::Identifier("a".to_owned()),
        Token::Identifier("b".to_owned()),
        Token::Identifier("c".to_owned()),
    ]
    // result is err
    let pair: Result<Pair, ParseError<Token>> = Pair::parse(tokens.into())
    assert!(pair.is_err());

    let tokens = [
        Token::Identifier("a".to_owned()),
        Token::Comma,
        Token::Identifier("c".to_owned()),
    ];
    let pair: Result<Pair, ParseError<Token>> = Pair::parse(tokens.into())
    assert_eq!(pair, Ok(Pair {
        l_element: Token::Identifier("a".to_owned()),
        comma: Token::Comma,
        r_element : Token::Identifier("c".to_owned()),
    }))
}

Of course this works for all parsable types, not just tokens:

struct Expr {
    #[pat(Token::IntLiteral(_))]
    left: Token,
    #[pat(Sign::Add)]
    sign: Sign,
    #[pat(Token::IntLiteral(_))]
    right: Token,
}

enum Sign {
    #[pat(Token::Plus)]
    Add,
    #[pat(Token::Minus)]
    Sub,
}


fn main() {
    let tokens = [
        Token::IntLiteral(3),
        Token::Plus,
        Token::IntLiteral(2),
    ]
    let expr_result = Expr::parse(tokens.into());
    assert_eq!(expr_result, Ok(
        Expr {
            left: Token::IntLiteral(3),
            sign: Sign::Add
            right: Token::IntLiteral(2),
        }
    ))

    let tokens = [
        Token::IntLiteral(3),
        Token::Minus,
        Token::IntLiteral(2),
    ]
    let expr_result = Expr::parse(tokens.into());
    // Does not parse, since Expr is expecting specifically a Sign::Add, which may not be parsed when Token::Minus is present instead of Token::Plus
    assert!(expr_result.is_err())
}

Extract values

Currently a WIP, you can extract specific values form a matched pattern, should you want to keep only the inner values of a struct / enum in your AST

set_token!(Token)

#[derive(SN)]
struct Pair{
    #[extract(Token::Identifier(l_element))]
    l_element: String,
    #[pat(Token::Comma)]
    comma: Token,
    #[extract(Token::Identifier(r_element))]
    r_element: String,
}

fn main() {
    let tokens = [
        Token::Identifier("a".to_owned()),
        Token::Comma,
        Token::Identifier("c".to_owned()),
    ];
    let pair: Result<Pair, ParseError<Token>> = Pair::parse(tokens.into())
    assert_eq!(pair, Ok(Pair {
        l_element: "a".to_owned(),
        comma: Token::Comma,
        r_element : "c".to_owned(),
    }))
}

Either this or that

As you'd expect, it's possible to use patterns with pipes to make Astray parse one possible type from a set of types. This can be a replacement for moving a type to a separate enum, and will very likely be faster. TODO: Benchmark this

set_token!(Token)

#[derive(SN)]
struct Pair{
    #[extract(Token::Identifier(l_element))]
    l_element: String,
    #[pat(Token::Comma | Token::SemiColon)]
    comma: Token,
    #[extract(Token::Identifier(r_element))]
    r_element: String,
}

fn main() {
    let tokens = [
        Token::Identifier("a".to_owned()),
        Token::Identifier(",".to_owned()),
        Token::Identifier("c".to_owned()),
    ]

    let pair: Result<Pair, ParseError<Token>> = Pair::parse(tokens.into())
    assert_eq!(pair, Ok(Pair {
        l_element: "a".to_owned(),
        comma: Token::Comma,
        r_element : "c".to_owned(),
    }))

    let tokens = [
        Token::Identifier("a".to_owned()),
        Token::Comma,
        Token::Identifier("c".to_owned()),
    ];
    let pair: Result<Pair, ParseError<Token>> = Pair::parse(tokens.into())
    assert_eq!(pair, Ok(Pair {
        l_element: "a".to_owned(),
        comma: Token::Comma,
        r_element : "c".to_owned(),
    }))
}

Custom Types

Custom Types

Errors

As a general rule, <P as Parsable>::parse(...) and <P as Parsable>::parse_if_match(...) both produce a Result<P, ParseError>, where T: Parsable.

This means that all parsing is fallible. If it does fail, a ParseError is produced. Check its definition below:

#![allow(unused)]
fn main() {
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum ParseErrorType<T>
where
    T: ConsumableToken,
{
    /* Since Tokens are just parsable types, this might be removed in the future*/
    UnexpectedToken { expected: T, found: T },
    /* When you run out of tokens mid parsing a type */
    NoMoreTokens,
    /* When a type can be parsed from the TokenIterator but it does not match the pattern that was applied to it */
    ParsedButUnmatching { err_msg: String }, 
    /**
     * Failed to parse a branch from a conjunct type
     *  This will happen for:
     * - fields /elements in a struct / tuple struct
     * - elements in a tuple
     * - the first element in a NonEmpty vec
     */
    ConjunctBranchParsingFailure { err_source: Box<ParseError<T>> },
    /**
     * Failed to parse a branch from a conjunct type
     *  This will happen for:
     * - variants in an enum
     * - fields in Either
     */
    DisjunctBranchParsingFailure { err_source: Vec<ParseError<T>> },
}

#[derive(Debug, Clone, PartialEq, Eq)]
pub struct ParseError<T>
where
    T: ConsumableToken,
{
    type_name: &'static str,
    failed_at: usize,
    pub failure_type: ParseErrorType<T>,
}
}

As you can see, a ParseError can have 5 differnent causes. Check the comments in each for further details.

Astray Universal Rules

This is a complex project, so below are some rules/axioms/definitions that must be upheld throughout. If at any point the code/docs fail to a single one of these, then there's a bug either in the code/docs, or in the rules.

Throughout the code and docs, AUR::N will be the way to reference the Nth rule, as specified below:

  1. Given any type T and a TokenIterator<T>, where T: Parsable<T>, T will referred to as a Token.
  2. Given any type P: Parsable<T>, T: Parsable<T> P may be parsed from a TokenIterator. P will be called a "parsable type", or just a "parsable"
  3. Since T: Parsable, for all T, T may always be parsed from TokenIterator.
    • This means a Token will always be parsable from a TokenIterator of itself.
    • All T: Parsable
  4. Parsable types are either Tokens, or composed of other parsable types through sum types or product types.
  5. Parsing may fail, so it always produces a Result<P, ParseError>, where P: Parsable
  6. A failed parsing will always leave the TokenIterator at the place it was before parsing was attempted. This is true for all structs, enums and default implementations in Astray and should remain true for any custom types the user implements.
  7. A successful parsing can either leave the iterator in the same place it was before parsing was attempted, or move the iterator along according to the length of the type it is parsing. Check here if confusing

The Parsable trait

At the heart of Astray lies the Parsable<T> trait. Check its definition here. Parsable<T> marks a type as consumable type. This means that given a TokenIterator<T>, any struct implementing Parsable<T> may be parsed from those tokens.

Its definition:

#![allow(unused)]


fn main() {
pub trait Parsable<T>: std::fmt::Debug
where
    T: Parsable<T>,
    Self: Sized,
    T: ConsumableToken

{
    type ApplyMatchTo: Parsable<T> = Self;

    fn parse(iter: &mut TokenIter<T>) -> Result<Self, ParseError<T>>;

    fn parse_if_match<F: Fn(&Self::ApplyMatchTo) -> bool>(
        iter: &mut TokenIter<T>,
        matches: F,
        pattern: Option<&'static str>
    ) -> Result<Self, ParseError<T>>
    where
        Self: Sized {
            todo!("parse_if_match not yet implemented for {:?}", Self::identifier());
        }
    

    fn identifier() -> &'static str {
        std::any::type_name::<Self>()
    }
}
}

Let's go step by step

Trait declaration

#![allow(unused)]
fn main() {
// Any type that implements Parsable<T> must implement std::fmt::Debug
// This is necessary for building nice ParseErrors
pub trait Parsable<T>: std::fmt::Debug
where
    // T: Parsable<T>, meaning T is a Token as per Astray Rule # 1
    T: Parsable<T>,
    // Self is Sized is required, since parse and parse_if_match associated functions return Self
    Self: Sized,
    // This is just a marker trait, that might be removed in the future
    T: ConsumableToken
}

Associated Type

#![allow(unused)]
fn main() {
{
    type ApplyMatchTo: Parsable<T> = Self
}
}

This is the type that patterns will be applied to when #[pat(<pattern>)] is used. Generally, it will be Self. However, for container types, ApplyMatchTo might be the contained type. ApplyMatchTo may be any type that makes sense for each specific implementor of Parsable. Check this page on implementing Parsable by hand for an example.

parse function

#![allow(unused)]
fn main() {
    fn parse(iter: &mut TokenIter<T>) -> Result<Self, ParseError<T>>;
}

Parse takes &mut TokenIterator<T>, which must be mut since the inner pointer in TokenIterator will be moved depending on what the parsing function does. parse will always return a Result, meaning it is always fallible.

parse function

#![allow(unused)]
fn main() {
fn parse_if_match<F: Fn(&Self::ApplyMatchTo) -> bool>(
    _iter: &mut TokenIter<T>,
    _matches: F,
    _pattern: Option<&'static str>
) -> Result<Self, ParseError<T>>
where
    Self: Sized {
        todo!("parse_if_match not yet implemented for {:?}", Self::identifier());
    }
}

The parse_if_match function will allow an implementor to restrict which types can be parsed according to a validating function, here named matches (TODO: might be renamed in the future). Ideally, we would be able to pass a pattern directly to this function, but Rust doesn't really have first class support for patterns, so a Fn(&Self::ApplyMatchTo) -> bool does the trick. In practice, the function that actually passed to parse_if_match is |ty|matches!(ty, <pattern>).

parse_if_match requires a pattern string which is a stringified version of a pattern. Since Rust doesn't really have first class support for patterns, a matches which would very useful. So

TODO: A default implementation is on the way.

Given a token_iterator: TokenIterator<T> and P: Parsable<T>:

  1. P shall called a parsable type
  2. P may be parsed from token iterator with P::parse(&mut token_iterator)
  3. P::parse(&mut token_iterator) always produces:
    • Ok(P {/*fields*/}) if parsing succeeds. The iterator is left at the position pointing to the token after the last token that was consumed for parsing P
    • Err(ParseError<T> /*different errors exist*/). In this case, the iterator is reset to the position it was before parsing was attempted
  4. T: Parsable<T> (with some caveats)
    • Calling <Token as Parsable<Token>>::parse(&mut token_iterator) just consumes the next token

Define a set of universal rules for Astray

Astray is complex and must follow some rulessarily These are being defined (WIP) at (./universal_rules.md)

Implement Iterator for TokenIterator

  • Allows usage of iterator methods
  • replaces fn consume(&mut self) with fn next(&mut self), which is more intuitive in Rust land
  • Prevents iterator from going backwards, of course. So it would not work on sum types.

Functional macro for enum optimization

When an enum implements Parsable, its branches can have common parsing paths!

#![allow(unused)]
fn main() {
enum MyEnum {
    Variant1(Struct1),
    Variant2(Struct2),
}

struct Struct1 {
    #[pat(Token::Plus)]
    plus: Token
    #[pat(Token::LiteralInt(_))]
    plus: TOken
}

struct Struct2 {
    #[pat(Token::Plus)]
    plus: Token
    #[pat(Token::LiteralString(_))]
    plus: TOken
}
}

Given a TokenIterator over [Token:Plus, TokenLiteralString("something")], Astray tries to parse a Plus (for a Struct1), succeeds and tries to parse a LiteralInt(_), fails, tries to parse a Plus again (for Struct2) and then a LiteralString (succeeds). Plus has been tried twice unnecessarilly. The goal is to optimize this. I already have an idea for execution, which I will expose later as soon as I can.

Parsing visualization for the command line (bonus points if also works in the browser)

An animation of how parsing is happening in "real" time, showing how the parser works.

Documenting all functions in the code

Although they are (hopefully) rather self explanatory, great Rusty documentation is missing from each function. I hope to add it in the future.

Updating nomenclature

Consumable types -> Parsable types Partially in docs, WIP in the code.

Generic Errors instead of ParseError

Turning ParseError into a trait allows users to specify what Error type their parsing functions to produce.

Implement FromIterator for TokenIterator

Makes it easier to create a TokenIterator, rather.

Update parse_if_match function

Parsers should be generic not only on type's they take, but also on their arity! I'm investigating possible solutions for this

Add more advanced validation functions

#![allow(unused)]
fn main() {
struct A {
    /* These should get combined into a single validation function? Or maybe multiple validation functinons... more experimentation is required*/
    #[pat(Token::LiteralInt(_))]
    #[len(> 5)]
    field1: Vec<Token>, 
}
}

Perhaps renaming parse_if_match to parse_if_valid

Remove UnexpectedToken ParseErrorType variant

Since Tokens are to be treated as parsable types, under the specifc restrictions that Token: Parsable, then Conjunct / Disjunct branch failure error variant should be used, depending on whether the token is an enum or a struct

Rename ParseErrorType to ParseErrorReason

contributing