How to write your own derive-macro in Rust

Derive macros are one of the three procedural macro types in Rust. Visit the rust-reference to learn more about their differences and use cases.
Once implemented, they are used to add extra functionality to your code, without having to write it. But simply through adding an “annotation” if you will, to the source code. And the compiler creates the code (at compile time), in the way the author (we) have described it.

One derive macro rustaceans might be familiar with, is the #[derive(Debug)] macro which provides a default implementation of the Debug trait. When added to a struct or enum, a text representation gets implemented for that data structure, which can be used in debug logging for example.

And we are now writing our own derive macro!

The ToUrl macro

Note:
⚠️ This is not a macro for production! ⚠️ It is meant for training purposes only.
The entire code is on github under https://github.com/niilz/to-url

ToUrl is supposed to work as follows:

When the #[derive(ToUrl)] is added to a struct (only structs are supported) then a
to_url(&self, base_url: String) -> String method gets implemented for that struct.
When called with a URL-String (the url_base), this method will first add a (the start of a url-query-section), to the given URL. It then iterates over all fields and adds them in the form field=value, to the current URL-String.
If the struct has more than one field, the pairs are concatenated with an & (ampersand).
Vecs are treated slightly special (only one dimensional Vecs are supported):
Their values are all joined with a url-encoded space character (%20).

Usage Example

// This example would be in a different crate than *to-url* (because proc_macros must be defined in their own crates)

// Annotate a struct with ToUrl
#[derive(ToUrl)]
pub struct Request {
    response_type: String,
    client_id: String,
    scope: Vec<String>,
    redirect_uri: String,
    state: String,
    nonce: String,
}
// Create an instance of that struct
let dummy_req = Request {
    response_type: "code".to_string(),
    client_id: "1234andSomeText".to_string(),
    scope: vec!["openid".to_string(),
                "email".to_string(),
                "profile".to_string()],
    redirect_uri: "http://dummy-redirect.com".to_string(),
    state: "security_token0815".to_string(),
    nonce: "4242-3531".to_string(),
};

// Calling the to_url-method on the instance in the following way...
dummy_req.to_url("https://my-dummy-url".to_string()),

// ...would create the following string:
/*
"https://my-dummy-url?\
    response_type=code&\
    client_id=1234andSomeText&\
    scope=openid%20email%20profile&\
    redirect_uri=http://dummy-redirect.com&\
    state=security_token0815&\
    nonce=4242-3531"
*/

Implementation of the ToUrl derive macro

Create the library with cargo

// from the command line run
cargo new to-url --lib && cd to-url

configuration (Cargo.toml)

The dependencies and their versions are as follows:

[lib]
proc_macro = true

[dependencies]
syn = { version = "1.0", features = ["full", "extra-traits"] }
quote = "1.0"
proc-macro2 = "1.0"

Note that under lib we have specified that we want to create a procedural macro crate. As of writing those have to live in their own crates.

The function definition

Proc macros are functions that have to be annotated with #[proc_macro_derive(NameOfYourMacro)]. The function receives a TokenStream, which is an abstract token representation of the source code, on which the macro has been added. It’s not quite the code that we wrote anymore. But also not actual machine instructions yet.
And procedural macros can modify that TokenStream, in order to create new/different code, again in form of a TokenStream.
In our case we call the macro ToUrl.

#[proc_macro_derive(ToUrl)]
pub fn to_url(tokens: TokenStream) -> TokenStream {
  /* implementation */
}

Parsing (syn) & Generation (quote)

Inside the function we will make use of two amazing crates. For parsing we will use syn and for code generation quote. With parse_macro_input! the TokenStream gets transformed into a DeriveInput which is helpful for walking the tree structure and provides additional helpful methods. To retrieve the name of struct that we want to derive our macro, we safe the value on the indent field of the input.

pub fn to_url(tokens: TokenStream) -> TokenStream {
    let input = parse_macro_input!(tokens as DeriveInput);
    let name = input.ident;

    /* rest of the implementation */ 
}

To get to the fields I have chosen to match on the data field of the input. It looks a bit funky, but if you follow the types through the documentation, starting with the DeriveInput, you might notice that I walk the structure, only mentioning the parts that I am interested in. The part that we don’t care about is ignored with the double dots (..). The fields are of type Punctuated<Field, Comma> and live on an instance of type FieldsNamed.

    /* implementation before is skipped */

    let fields_punct = match input.data {
        Data::Struct(DataStruct {
            fields: Fields::Named(fields),
            ..
        }) => fields.named,
        _ => panic!("Only structs with named fields can be annotated with ToUrl"),
    };

    /* rest of the implementation */

Here comes the rest of the implementation inside of our derive macro function. The part where the code is generated. The modified code gets passed back to the compiler as a TokenStream.
The part, that concatenates the fields and their values, will be looked at further down. For now, it is abstracted away as a call to query_from_field_and_value(..). Just know that it gives us an Iterator over TokenStreams.

  
  /* implementation before is skipped */

    let query_parts = query_from_field_and_value(&fields_punct);

    let modified = quote! {
        impl #name {
            pub fn to_url(&self, base_url: String) -> String {

                let url = format!("{}?", base_url) #(#query_parts)*;

                url
            }
        }
    };
    TokenStream::from(modified)
}

quote! is a macro that lets us construct full TokenStreams, which the compiler understands, from the text we write inside of it. One must of course adhere to the rules of quote!. But it is much more convenient than constructing parsable trees by hand.
In the impl line, we use the name variable, we have defined on the very top of our function. Bindings within scope but outside of the quote!-macro-call are referenced by prefixing them with the pound symbol (#). At compile time #name contains the name of the struct, that our macro gets derived for. In the example that is Request. So the impl-line actually gets expanded to:

impl Request {

Note: This version of the macro does not take into account that the struct might have lifetime annotations like <‘a>. If we would like to support structs with and without lifetimes, they have to be taken into account in the code generation.

Before we move on, let’s disect the line
let url = format!(“{}?”, base_url) #(#query_parts)*;
The left hand side is merely defining a binding with the name url. To which we assign the right hand side. The first section of the expression on the right, the format!(“{}?”, base_url) will evaluate to a String. Resulting in base_url extended with the question mark, the beginning of the query-section. In the example this part would turn into:
my-dummy-url?
Now comes the fun part: #(#query_parts)* which combines the field=value pairs into one String.
In quote! we have the possibility to repeat patterns by using the library’s interpolation-syntax. We are using the form #(#var)* where #var will turn into some expression and we repeat this pattern until there are no more elements in that #var. In our case (remember above: query_from_field_and_values(..) returns an Iterator), the repetitions end when the Iterator is exhausted.

Constructing the Url

How does the URL-String come together? Have a look at the beginning of the returned quote!-call in the else block of query_from_field_and_values(..).

fn query_from_field_and_value(
    fields: &Punctuated<Field, Comma>,
) -> impl Iterator<Item = proc_macro2::TokenStream> + '_ {
    let fields = fields.iter().enumerate().map(move |(i, field)| {
        let field_ident = field.ident.as_ref().unwrap();
        let delim = if i < fields.len() - 1 { "&" } else { "" }; // Add an & between two field=value pairs
        if is_vec(&field) {
            join_values(field_ident)
        } else {
            quote! { + &format!("{}={}{}", stringify!(#field_ident), self.#field_ident, #delim) }
        }
    });
    fields
}

The implicitly returned quote!-section in the else-case starts with the “+” operator. After that follows a call to the format!-macro, which will evaluate to nothing more but a String. This means that on every iteration over the pattern #(#query_parts)*, we get “+ field=value” which gets concatenated to the existing String, by using “+”. No magic. Just adding Strings together.

// format!("{}?", base_url) #(#query_parts)* expands to something similar to this:
"my-dummy-url? + "response_type=code" + "client_id=1234andSomeText" + ... + "nonce=4242-3531"

Three more things are worth mentioning.

  1. We use the stringify! macro. It turns the given token(s) into literal text. Which means our field-identifiers get turned into their String-representation. If we would use #ident.to_string() the compiler would try to find a variable with that name (e.g. client_id) to call to_string() on it’s value. But of course would complain that there is no such variable.
  2. We use self inside a function without having it as an argument! 😯 Pretty cool right. At least it took me quite a while to notice that in some other examples. It sure gives a lot more flexibility to separate some logic into its own concise section.
  3. This example uses the format!-macro a lot and is tailored around String concatenation. This means field-types have to implement std::fmt::Display, the trait that provides the implementation of to_string() for a type. This makes ToUrl quite limited. Even worse, there is no error-handling around that fact. But like I said in the beginning: this is a training implementation.

And thats pretty much it.
There is still a is_vec() and a join_values() helper-function. They don’t do anything new regarding the derive-macro-topic and they are pretty specific to this concrete example (which has questionable applicability in it’s current form).
Never the less, here is what those two look like (I have implemented them outside of the macro-function but in the same lib.rs file). Like mentioned above, the entire ToUrl-macro-code is also available on github.

fn is_vec(field: &Field) -> bool {
    match &field.ty {
        Type::Path(TypePath {
            path: Path { segments, .. },
            ..
        }) => {
            // segments is of Type syn::punctuated::Punctuated<PathSegment, _>
            if let Some(path_seg) = segments.first() {
                let ident = &path_seg.ident;
                return ident == "Vec";
            }
            false
        }
        _ => false,
    }
}

fn join_values(field_ident: &Ident) -> proc_macro2::TokenStream {
    let len = quote! { self.#field_ident.len() };
    let vec_values = quote! {
        //let len = self.#field_ident.len();
        self.#field_ident.iter().enumerate().fold(String::new(), |mut vals, (i, v)| {
            vals.push_str(v);
            if (i < #len - 1) {
                vals.push_str(#URL_SPACE);
            }
            if (i == #len - 1) {
                vals.push('&');
            }
            vals
        })
    };
    quote! {+ &format!("{}={}", stringify!(#field_ident), #vec_values)}
}

Summary

Proc macros in Rust, especially the derive macro in my opinion, are fantastic concepts that let the developers extend the language in very versatile ways. They are heavily used in many crates and make our lives as library users much easier.
They are also a rather advanced topic and not necessarily coined towards Rust beginners. Nevertheless there are extremely helpful crates like syn and quote with great documentation.
Precious resources are also the proc-macro-workshop by David Tolnay on github.
And my favorite, the one that actually took away my scare around proc_macros:
Procedural Macros in Rust Part1 & Part 2 by Jon Gjengset, where he works through some of the exercises of the before mentioned proc-macro-workshop.

That’s it! I hope this was helpful or interesting to you. Thank you for reading.

Späters

niilz