Experimental type-safe easy_parser router

The problem

Express-like router was added to RESTinio in v.0.2. It’s an easy to use tool that is known for many developers. But express-like router has several principal drawbacks those lead to various kind of errors. The worst thing is that those errors can be detected only in run-time.

The propensity to errors

Let’s see a simple code snippet:

router->http_get("/api/v1/books/:id",
   [](const auto & req, auto params) {
      const auto book_id = restinio::cast_to<std::uint64_t>(params["Id"]);
      ...
   });

There are several problems here and all of them will be detected only at run-time.

There is no explicit requirement for the format of “id” param. It can be a number or a sequence of non-digit symbols. If “id” is not a number an exception will be thrown in cast_to.

And there is a stupid typo in the extraction of the value of “id” parameter: “Id” instead of “id”. This is just a typo but such typos encounter very often.

It seems that those errors can be easily fixed. For example:

router->http_get(R"(/api/v1/books/:id(\d+))",
   [](const auto & req, auto params) {
      const auto book_id = restinio::cast_to<std::uint64_t>(params["id"]);
      ...
   });

But there still is a bit more subtle bug: there is no limitation for the count of digits in “id” parameter. So “id” can contain a value that can’t fit into std::uint64_t. And because of that, the more accurate fix should look like:

router->http_get(R"(/api/v1/books/:id(\d{1,10}))",
   [](const auto & req, auto params) {
      const auto book_id = restinio::cast_to<std::uint64_t>(params["id"]);
      ...
   });

Unfortunately, we have no help from the compiler in the detection of such problems. It’s a pity.

A C++ compiler could help here, but the principal design of the express router prevents such help. It’s because express router borrowed from dynamically typed language where there is no such thing as a type-checking from the compiler before the execution.

The opacity and absence of type-safety

Let’s see such example:

// Type of actual requests handler.
class api_v1_handler {
   ...
public:
   auto on_get_book(
         const restinio::request_handle_t & req,
         restinio::router::route_params_t params) {
      const auto book_id = restinio::cast_to<std::uint64_t>(params["id"]);
      ...
   }

   auto on_get_book_version(
         const restinio::request_handle_t & req,
         restinio::router::route_params_t params) {
      const auto book_id = restinio::cast_to<std::uint64_t>(params["id"]);
      const auto ver_id = restinio::cast_to<std::string>(params["version"]);
      ...
   }

   auto on_get_author_books(
         const restinio::request_handle_t & req,
         restinio::router::route_params_t params) {
      const auto author = restinio::cast_to<std::string>(params["author"]);
      ...
   }
   ...
};

// The definition of routes and handlers.
auto handler = std::make_shared<api_v1_handler>(...);
router->http_get(R"(/api/v1/books/:id(\d{1,10}))",
   [handler](const auto & req, auto params) {
      return handler->on_get_book_version(req, params);
   });
router->http_get(R"(/api/v1/books/:id(\d{1,10})/versions/:version)",
   [handler](const auto & req, auto params) {
      return handler->on_get_author_books(req, params);
   });
router->http_get(R"(/api/v1/:author)",
   [handler](const auto & req, auto params) {
      return handler->on_get_book(req, params);
   });

Nothing prevents calling a wrong handler’s method for a particular route. Thus express router allows calling a handler on_get_book where on_get_author_books is expected. It’s because all handlers have the same format and restinio::router::route_params_t plays the role of the untyped key-value map.

Unfortunately, we can’t get help from the compiler here, because of the problem in the principal design of express router: providing the parameters from a parsed route in the form of an untyped key-value map. So a map instance can easily be passed to a wrong handler. And that mistake can only be detected at the run-time.

Another problem is the opacity of the prototype of a request handler. We just see a route_params_t in the prototype, but do not know what parameters the handler actually needs and the types of those parameters. That information can only be obtained from the body of the handler. And that is not good because it makes the maintenance and extension of request handlers harder.

A type-safe router as a solution

Since v.0.6.6 RESTinio provides a type-safe router that is similar to express router but allows to work with typed parameters.

As a very simple example, this express router-based code:

router->http_get(R"(/api/v1/books/:id(\d{1,10}))",
   [](const auto & req, auto params) {
      const auto book_id = restinio::cast_to<std::uint64_t>(params["id"]);
      ...
   });

can be expressed in a new type-safe router that way:

namespace epr = restinio::router::easy_parser_router;

auto book_id_p = epr::non_negative_decimal_number_p<std::uint64_t>;
router->http_get(
   epr::path_to_params("/api/v1/books/", book_id_p),
   [](const auto & req, std::uint64_t book_id) {
      ...
   });

And the example with api_v1_handler above can be rewritten that way:

// Type of actual requests handler.
class api_v1_handler {
   ...
public:
   using book_id_type = std::uint64_t;

   auto on_get_book(
         const restinio::request_handle_t & req,
         book_id_type book_id) {
      ...
   }

   auto on_get_book_version(
         const restinio::request_handle_t & req,
         book_id_type book_id,
         const std::string & ver_id) {
      ...
   }

   auto on_get_author_books(
         const restinio::request_handle_t & req,
         const std::string & author) {
      ...
   }
   ...
};

// The definition of routes and handlers.
namespace epr = restinio::router::easy_parser_router;

auto book_id_p = epr::non_negative_decimal_number_p<api_v1_handler::book_id_type>();
auto ver_id_p = epr::path_fragment_p();
auto author_p = epr::path_fragment_p();

auto handler = std::make_shared<api_v1_handler>(...);
router->http_get(
   epr::path_to_params("/api/v1/books/", book_id_p),
   [handler](const auto & req, auto book_id) {
      return handler->on_get_book(req, book_id);
   });
router->http_get(
   epr::path_to_params("/api/v1/books/", book_id_p, "/versions/", ver_id_p),
   [handler](const auto & req, auto book_id, const auto & ver_id) {
      return handler->on_get_book_version(req, book_id, ver_id);
   });
router->http_get(
   epr::path_to_params("/api/v1/", author_p),
   [handler](const auto & req, const auto & author) {
      return handler->on_get_author_books(req, author);
   });

In that variant we can’t can on_get_book for a route where on_get_author_books is expected.

The easy_parser_router

The type-safe router mentioned above is represented as restinio::router::easy_parser_router_t class and a set of helper functions from restinio::router::easy_parser_router namespace.

To use easy_parser_router it is necessary to do the following steps.

Include restinio/router/easy_parser_router.hpp header file. Please note that this header is not included automatically in restinio/all.hpp. So it is necessary to write something like that:

#include <restinio/all.hpp>
#include <restinio/router/easy_parser_router.hpp>

Then the type restinio::router::easy_parser_router_t should be set as request_handler_t type in server’s traits:

struct my_traits : public restinio::default_traits_t {
   using request_handler_t = restinio::router::easy_parser_router_t;
};

// Or:
using my_traits = restinio::traits_t<
      restinio::asio_timer_manager_t,
      restinio::single_threaded_ostream_logger_t,
      restinio::router::easy_parser_router_t >;

Then an instance of easy_parser_router_t should be created and tuned:

namespace epr = restinio::router::easy_parser_router;
auto router = std::make_unique<restinio::router::easy_parser_router_t>();
router->http_get(epr::path_to_params(...), ...);
router->http_post(epr::path_to_params(...), ...);
...

And then this instance should be passed to RESTinio server:

restinio::run(
   restinio::on_this_thread<my_traits>()
      .request_handler(std::move(router))
      ...
);

Setting up handlers for routes

The easy_parser_router_t class has the similar set of methods as express_router_t class:

  • http_get for handlers of HTTP GET method;
  • http_head for handlers of HTTP HEAD method;
  • http_post for handlers of HTTP POST method;
  • http_put for handlers of HTTP PUT method;
  • http_delete for handlers of HTTP DELETE method;
  • add_handler for the case when http_* methods mentioned above can’t be used;
  • non_matched_request_handler for the case when a handler for a particular request is not found.

So the definition of handlers for routes for easy_parser-router looks similar to express-router:

auto router = std::make_unique<restinio::router::easy_parser_router_t>;
...
router->http_get(route1, handler1);
router->http_post(route1, handler2);
router->http_delete(route1, handler3);
router->add_handler(restinio::http_method_lock(), route1, handler4);
...
router->http_get(route2, handlerN);
router->http_post(route2, handlerM);
...
router->non_matched_request_handler(non_matched_handler);

The main difference with express-router is the description of routes. The express-router requires that a route be described as a string with a regular expression inside. The easy_parser-router uses a special DSL based on easy_parser helper.

easy_parser_router DSL

There are two functions in restinio::router::easy_parser_router namespace those should be used for description of routes: path_to_params and path_to_tuple. Both have the same format but require request-handlers with different prototypes.

The path_to_params and path_to_tuple functions are variadic-template functions that returns an implementation-specific type. Each of those functions accepts a list of arguments where every argument is a string (string literal, an object of type std::string or restinio::string_view_t) or a value producer. Every value producer gives a single parameter extracted from a route. For example:

namespace epr = restinio::router::easy_parser_router;

// Handler for a route without parameters inside.
router->http_get(
   epr::path_to_params("/api/v1/books"),
   [](const auto & req) {...});

// Handler for a router with one parameter inside.
router->http_get(
   epr::path_to_params("/api/v1/books/",
         // Producer for a value of the single parameter.
         epr::non_negative_decimal_number_p<std::uint64_t>()),
   [](const auto & req, std::uint64_t book_id) {...});

// Handler for a router with two parameters inside.
router->http_get(
   epr::path_to_params("/api/v1/books/",
         // Producer for a value of the first parameter.
         epr::non_negative_decimal_number_p<std::uint64_t>(),
         "/title/",
         // Producer for a value of the second parameter.
         epr::path_fragment_p()),
   [](const auto & req, std::uint64_t book_id, const std::string & title) {...});

When path_to_params is used for the description of a route then request-handler will receive every parameter from the route as a separate argument. If there are no parameters in a route then request-handler will receive just one argument: a request handle. Those cases are shown above.

When path_to_tuple is used then a request-handler will receive all parameters from the route as a single argument of type std::tuple<Vs...> where Vs... is a list of parameter types. If there are no parameters in the route then there will be a single argument of type std::tuple<>:

namespace epr = restinio::router::easy_parser_router;

// Handler for a route without parameters inside.
router->http_get(
   epr::path_to_tuple("/api/v1/books"),
   [](const auto & req, std::tuple<>) {...});

// Handler for a router with one parameter inside.
router->http_get(
   epr::path_to_params("/api/v1/books/",
         // Producer for a value of the single parameter.
         epr::non_negative_decimal_number_p<std::uint64_t>()),
   [](const auto & req, std::tuple<std::uint64_t> params) {...});

// Handler for a router with two parameters inside.
router->http_get(
   epr::path_to_params("/api/v1/books/",
         // Producer for a value of the first parameter.
         epr::non_negative_decimal_number_p<std::uint64_t>(),
         "/title/",
         // Producer for a value of the second parameter.
         epr::path_fragment_p()),
   [](const auto & req, std::tuple<std::uint64_t, std::string> params) {...});

Note. Of course, generic lambdas can be used here and parameters can be accepted by a const reference:

router->http_get(
   epr::path_to_tuple("/api/v1/books/",
         // Producer for a value of the first parameter.
         epr::non_negative_decimal_number_p<std::uint64_t>(),
         "/title/",
         // Producer for a value of the second parameter.
         epr::path_fragment_p()),
   [](const auto & req, const auto & params) {...});

Performance

There are results of cmp_route_bench benchmark described in Performance section for express-router (at 2020.04.10):

# of threads hardcoded easy_parser_router express-router (std) express-router (PCRE)
1 115,083.86 105,205.42 (91.42%) 88,115.27 (76.57%) 102,601.51 (89.15%)
2 159,301.80 152,842.62 (95.95%) 131,806.19 (82.74%) 143,969.74 (90.38%)
3 192,849.04 187,092.28 (97.01%) 161,748.54 (83.87%) 177,840.71 (92.22%)
4 210,509.90 207,102.15 (98.38%) 176,486.59 (83.84%) 193,072.76 (91.72%)

Benchmark environment:

  • CPU: 8x Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz;
  • Memory: 16343MB;
  • Operating System: Ubuntu 16.04.2 LTS.
  • Compiler: gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~16.04~ppa1)

easy_parser

Intro

The easy_parser is an internal part of RESTinio introduced in 0.6.1 for parsing HTTP fields. This is a template-based implementation of PEG recursive-descent parser. It supports the main functionality of PEG and can express rather complex grammars in ordinary C++ code.

The easy_parser was tested inside RESTinio and looks pretty usable for real-world tasks. However, it’s not the simplest part of RESTinio, so if you encounter some problems with easy_parser or have some ideas of how to make easy_parser more expressive and easy to use, please let us know.

easy_parser and easy_parser_router namespaces

All easy_parser-related stuff is defined in restinio::easy_parser namespace. But all that stuff is also available via restinio::router::easy_parser_router namespace. Because this section is related to easy_parser features useful for easy_parser_router we will use a namespace epr that is defined as:

namespace epr = restinio::router::easy_parser_router;

The usage of easy_parser alone without easy_parser_router is out of scope of the current document. If you want to use easy_parser’s functionality in your project directly and have some questions feel free to ask us.

The main easy_parser principle

The easy_parser is a rather complex tool, but there is just one main principle in easy_parser’s design: easy_parser gets a set of rules, tries to apply them to an input string and produces a single value in the case of successful parsing.

It means that the result of successful parsing is always a single value. And that value is produced by a special entity called producer.

Producers, consumers, transformers and clauses

The easy_parser has several ready-to-use basic producers with _p suffix in their names. Like: symbol_p, digit_p, decimal_number_p, path_fragment_p. So for trivial syntax rules, it’s enough to use one of easy_parser’s producers. For example, for the rule like:

version = ['-'|'+'] NUMBER

it’s enough to use epr::decimal_number_p<int>() call to make a corresponding parser.

But the vast majority of practical cases are more complex and require handling of several produced values. For example, let’s see this very simple grammar:

indicators = LETTER DIGIT

This grammar describes two-symbol strings like “c1”, “B2”, “D4” and so on. After the parsing of such string, we want to have a pair of values: the first letter and the digit. This pair can be represented by such simple struct:

struct idicators {
   char class_;
   char level_;
};

There are any_symbol_p and digit_p producers in easy_parser, so we can extract the first letter and a digit from the input string. But our task is to make an instance of type indicators. So we should create that instance somehow and store values produced by any_symbol_p and digit_p into it.

In easy_parser DSL that is encoded the following way:

epr::produce<indicators>(
   epr::any_symbol_p() >> &indicators::class_,
   epr::digit_p() >> &indicators::level_
);

We’ll speak about the interpretation of that code a bit later, and now we concentrate on the sense of a construct like producer >> dest.

Every produced value should be consumed somehow. And the expression like digit_p() >> &indicators::level_ tells that a value produced by digit_p producer should be consumed as a value of indicators::level_ member.

There are several ready-to-use consumers in easy_parser and we’ll see some of them later.

Sometimes a produced value has to be transformed somehow before the consumption. This is possible in easy_parser with the help of transformers. A transformer is a function that takes an input value and produces a new value from it, maybe with a different type.

Let’s make out example with indicators struct a bit harder. In the first version that’s shown above indicators::class_ can contain upper and lower case letters. In some circumstances, it’s not convenient and it’s better to store only lower case letters in indicators::class_ member. In easy_parser we can do that by using to_lower transformer:

epr::produce<indicators>(
   epr::any_symbol_p() >> epr::to_lower() >> &indicators::class_,
   epr::digit_p() >> &indicators::level_
);

In this example a symbol produced by any_symbol_p will be transformed by a new symbol by to_lower transformer and that new symbol will be stored in indicators::level_ member.

There can be any number of transformers in expressions like producer >> transformer1 >> transformer2 >> ... >> transformerN >> consumer. But it’s important to note that transformers in such expressions can only be used between a producer and consumer.

An expression like producer >> consumer or producer >> transformer >> consumer defines a clause. A clause is a part of DSL that doesn’t produce value by itself. Some values can be produced and consumed inside a clause, but clause by itself is not a value producer.

A producer can be seen as a function with non-void return type. A clause, in that case, is a void-returning function.

Clauses play a very important role because some parts of real-world grammars can’t be expressed only by producers+consumers, and clauses help in such cases.

The skip() consumer and ready-to-use clauses

Sometimes it’s necessary to skip a produced value. Let’s see the following grammar:

indicators = LETTER SPACE DIGIT

As a result of applying this rule, we should get a compound value that consists of a letter and a digit. We don’t need to store a space character between them. But every produced value in easy_parsed should be consumed somehow. Because of that produce accept a set of clauses and we can’t write this simple code:

epr::produce<indicators>(
   epr::any_symbol_p() >> &indicators::class_,
   epr::space_p(),
   epr::digit_p() >> &indicators::level_);

Expression epr::space_p() is a producer here, not a clause. To make it a clause we have to throw out a value produced by epr::space_p. It can be done by using a special skip consumer:

epr::produce<indicators>(
   epr::any_symbol_p() >> &indicators::class_,
   epr::space_p() >> epr::skip(),
   epr::digit_p() >> &indicators::level_);

The skip consumer is a special consumer that just ignores any previously produced value. This consumer is specially added to easy_parser to handle cases like that.

The case producer >> skip() is so widely used that there are a set of ready-to-use consumers in easy_parser that just use skip() under the hood. Thus, the expression space_p() >> skip() can be replaced by space clause:

epr::produce<indicators>(
   epr::any_symbol_p() >> &indicators::class_,
   epr::space(),
   epr::digit_p() >> &indicators::level_);

Where space() is just a predefined shorthand for space_p() >> skip().

There are other ready-to-use clauses like space in easy_parser: symbol, caseless_symbol, digit, hexdigit, exact.

just() transformer

Sometimes it’s necessary to replace the value of one type to a specific value of another type. For example, let’s see the following simple grammar:

seed = ("in" | "out") NUMBER

This grammar describes strings like “in 4096” and “out 16000”.

We may want to parse those string to values of the type:

enum class direction { in, out };
struct speed {
   direction dir_;
   unsigned int value_;
};

It means, that we have to replace the substring “in” to a value direction::in, and the substring “out” should be replaced by “direction::out”. One way to do so is to use convert transformer + a lambda function:

auto parse = epr::produce<speed>(
   epr::alternatives(
      epr::exact_p("in")
         >> epr::convert([](const auto &) // Ignore actual value.
            { return direction::in; })
         >> &speed::dir_,
      epr::exact_p("out")
         >> epr::convert([](const auto &) // Ignore actual value.
            { return direction::out; })
         >> &speed::dir_
   ),
   epr::non_negative_decimal_number_p<unsigned int>() >> &speed::value_);

Writing such simple convert transformers is a boring, time-consuming and error-prone task and easy_parser has a special just transformer that allows to write the same thing more compact and precise:

auto parse = epr::produce<speed>(
   epr::alternatives(
      epr::exact_p("in")  >> epr::just(direction::in)  >> &speed::dir_,
      epr::exact_p("out") >> epr::just(direction::out) >> &speed::dir_
   ),
   epr::non_negative_decimal_number_p<unsigned int>() >> &speed::value_);

How epr::producer() works

Let’s see this simple example again:

epr::produce<indicators>(
   epr::any_symbol_p() >> &indicators::class_,
   epr::digit_p() >> &indicators::level_
);

It’s important to understand what happens inside epr::produce. Understanding of this will help when we’ll speak about as_result and to_container consumers.

The easy_parser implements recursive-descent parser. Because of that the epr::produce shown above is extended to something like that:

epr::expected_t<indicators, epr::parse_error_t>
try_produce__indicators__(epr::impl::input_t & source)
{
   indicators result; // An empty value of expected result type is created.

   // Try to handle the first part of expression.
   {
      epr::impl::any_symbol_producer_t producer;
      const auto r = producer.try_parse(source);
      if(!r) // Parsing failed.
         return make_unexpected(r.error());
      epr::impl::field_value_setter_t<&indicators::class_> consumer;
      consumer.consume(*r, result); // Consume the produced value.
   }

   // Try to handle the second part of expression.
   {
      epr::impl::digit_producer_t producer;
      const auto r = producer.try_parse(source);
      if(!r) // Parsing failed.
         return make_unexpected(r.error());
      epr::impl::field_value_setter_t<&indicators::level_> consumer;
      consumer.consume(*r, result); // Consume the produced value.
   }

   // No errors. Actual value can be returned.
   return result;
}

This is just a sketch and is not an actual code behind epr::produce<T>, but it shows the whole principle of epr::produce<T>:

  • create an instance of T with the default constructor;
  • try to handle all nested clauses. If a clause is processed without an error, then that instance of type T is passed to a clause for modification;
  • if all nested clauses are handled without errors then that instance of type T is returned as the result.

The main point is the presence of an instance of type T that is created inside produce() and is passed to every nested clause of that produce().

as_result() consumer

Sometimes we have to parse a string with several terms inside but only one term should be treated as the result value. For example, let’s see a grammar:

limit: "limit" [SPACE] ":" [SPACE] NUMBER SPACE "bytes"

This grammar describes strings like “limit:4096 bytes”, “limit : 4096 bytes”, “limit: 4096 bytes”, and so on.

To parse those strings we have to define a parser like that:

auto parser = epr::produce<unsigned int>(
   epr::exact("limit"),
   epr::maybe(epr::space()),
   epr::symbol(':'),
   epr::maybe(epr::space()),
   epr::non_negative_decimal_number_p<unsigned int>(),
   epr::space(),
   epr::exact("bytes"));

But this code won’t be compiled because produce expects a set of clauses that do not produce values. But non_negative_decimal_number_p() is a producer here. And we have to transform it into a clause.

A value returned by that non_negative_decimal_number_p() should be used as the return value of the whole produce call. So, in that case, we can use a special as_result() consumer:

auto parser = epr::produce<unsigned int>(
   epr::exact("limit"),
   epr::maybe(epr::space()),
   epr::symbol(':'),
   epr::maybe(epr::space()),
   epr::non_negative_decimal_number_p<unsigned int>() >> epr::as_result(),
   epr::space(),
   epr::exact("bytes"));

The as_result consumer can be used not only with trivial types like int or char, but also with structs:

// A parser for grammar:
//
// communicator = "port=" ("default" | port_params)
// port_params = '(' NUMBER ':' NUMBER ',' NUMBER ')'
//
struct port_params {
   unsigned short port_index_;
   unsigned int in_speed_;
   unsigned int out_speed_;
};

auto parser = epr::produce<port_params>(
   epr::exact("port="),
   epr::alternatives(
      epr::exact("default")
         >> epr::just(port_params{10u, 4096u, 4096u})
         >> epr::as_result(),
      epr::produce<port_params>(
         epr::symbol('('),
         epr::non_negative_decimal_number_p<unsigned short>()
            >> &port_params::port_index_,
         epr::symbol(':'),
         epr::non_negative_decimal_number_p<unsigned int>()
            >> &port_params::in_speed_,
         epr::non_negative_decimal_number_p<unsigned int>()
            >> &port_params::out_speed_,
         epr::symbol(')')
      ) >> epr::as_result()
   )
);

The as_result consumer can even be used with containers (we’ll discuss repeat and to_conainer below):

// A parser for grammar:
//
// communication_ports = "ports=" ("none" | port_params (',' port_params)*)
//
// port_params = '(' NUMBER ':' NUMBER ',' NUMBER ')'
//
struct port_params {
   unsigned short port_index_;
   unsigned int in_speed_;
   unsigned int out_speed_;
};

auto port_params_p = epr::produce<port_params>(
   epr::symbol('('),
   epr::non_negative_decimal_number_p<unsigned short>()
      >> &port_params::port_index_,
   epr::symbol(':'),
   epr::non_negative_decimal_number_p<unsigned int>()
      >> &port_params::in_speed_,
   epr::non_negative_decimal_number_p<unsigned int>()
      >> &port_params::out_speed_,
   epr::symbol(')')
);
auto parser = epr::produce<std::vector<port_params>>(
   epr::exact("ports="),
   epr::alternatives(
      epr::exact("none")
         >> epr::just(std::vector<port_params>>{})
         >> epr::as_result(),
      epr::produce<std::vector<port_params>>(
         port_params_p >> epr::to_container(),
         epr::repeat(0, epr::N,
            epr::symbol(','),
            port_params_p >> epr::to_container())
      ) >> epr::as_result()
   )
);

just_result() consumer

The just_result() consumer is just a shorthand for:

epr::just(T{}) >> epr::as_result()

So just_result() allows to write:

epr::alternatives(
   epr::exact("none")
      >> epr::just_result(std::vector<port_params>>{}),

instead of:

epr::alternatives(
   epr::exact("none")
      >> epr::just(std::vector<port_params>>{})
      >> epr::as_result(),

Support of PEG features

Alternatives

Alternatives in PEG grammars are supported via alternatives clause. So the following grammar:

demo = A | B | C

can be expressed in easy_parser’s DSL as:

epr::produce<SomeType>(
   epr::alternatives(
      parse_A_clause,
      parse_B_clause,
      parse_C_clause)
);

For example:

// Grammar is:
//
// duration = NUMBER ("seconds" | "sec" | "s")
//
epr::produce<int>(
   epr::non_negative_decimal_number_p<int>(),
   epr::alternatives(
      epr::exact("seconds"),
      epr::exact("sec"),
      epr::exact("s"))
);

Please note that alternatives() takes a list of clauses. It means that if there is a value producer in some clause that producer should be connected with a consumer.

// Grammar:
//
// book_id = (NUMBER | STRING)
//
using book_identity = std::variant<int, std::string>;

epr::produce<book_identity>(
   epr::alternatives(
      epr::non_negative_decimal_number_p<int>() >> as_result(),
      epr::path_fragment_p() >> as_result()
   )
);

Sometimes it could be necessary to pass a complex clause as an alternative in alternatives(). In that case, such a complex clause can be expressed via sequence() helper function:

// A parser for:
//
// rev-id = ("hash-" STRING | "tag/" STRING)
//
epr::produce<std::string>(
   epr::alternatives(
      epr::sequence(
         epr::exact("hash-"),
         epr::path_fragment_p() >> as_result()),
      epr::sequence(
         epr::exact("tag/"),
         epr::path_fragment_p() >> as_result())
   )
);

Optional clauses

Optional clauses in PEG grammars are supported via maybe clause. So the following grammar:

demo = A [B] C

can be expressed in easy_parser’s DSL as:

epr::produce<SomeType>(
   parse_A_clause,
   epr::maybe(parse_B_clause),
   parse_C_clause);

For example:

// Grammar is:
//
// limit = "limit" [SPACE] ':' [SPACE] NUMBER [SPACE "bytes"]
//
auto parser = epr::produce<unsigned int>(
   epr::exact("limit"),
   epr::maybe(epr::space()),
   epr::symbol(':'),
   epr::maybe(epr::space()),
   epr::non_negative_decimal_number_p<unsigned int>() >> epr::as_result(),
   epr::maybe(epr::space(), epr::exact("bytes"))
);

Please note that maybe() takes a list of clauses. It means that if there is a value producer in some clause that producer should be connected with a consumer.

// Grammar is
//
// duration = NUMBER ['.' NUMBER] ["s"]
//
struct duration {
   unsigned short integer_{0u};
   unsigned short fractional_{0u};
};
auto parser = epr::produce<duration>(
   epr::non_negative_decimal_number_p<unsigned short>()
      >> &duration::integer_,
   epr::maybe(
      epr::symbol('.'),
      epr::non_negative_decimal_number_p<unsigned short>()
         >> &duration::fractional_),
   epr::maybe(epr::symbol('s'))
);

Repetitions

Repetitions are supported via the repeat clause. For example, PEG’s A+ (one or more repetitions) is expressed as epr::repeat(1, epr::N, parse_A_clause) and A* (zero or more repetitions) is expressed as epr::repeat(0, epr::N, parse_A_clause).

// This expression allows to parse sequences like
//
// group-group-group-group
//
// where each group can contain from 2 to 8 hexadecimal digits.
//
epr::sequence(
   // For the first three groups.
   epr::repeat(3u, 3u,
      epr::repeat(2u, 8u, epr::hexdigit()),
      epr::symbol('-')
   ),
   // For the last group.
   epr::repeat(2u, 8u, epr::hexdigit())
);

The value epr::N is a special value that means unlimited maximum count of occurrencies.

The repeat clause expects a set of subclasses. It means that a producer can’t be used inside repeat clause if the produced value is not consumed somehow. So this code is invalid and won’t be compiled:

// This expression allows to parse sequences like
//
// group-group-group-group
//
// where each group can contain from 2 to 8 hexadecimal digits.
//
epr::sequence(
   // For the first three groups.
   epr::repeat(3u, 3u,
      // NOTE: hexdigit_p is a producer!
      epr::repeat(2u, 8u, epr::hexdigit_p()),
      epr::symbol('-')
   ),
   // For the last group.
   // NOTE: hexdigit_p is a producer!
   epr::repeat(2u, 8u, epr::hexdigit_p())
);

So the main trick related to the usage of repeat clause is the consumption of produced values. Usually, a special to_container consumer is used inside repeat clause:

auto parser = epr::produce<std::vector<std::uint32_t>>(
   epr::repeat(3u, 3u,
      epr::hexadecimal_number_p<std::uint32_t>() >> epr::to_container(),
      epr::symbol('-')
   ),
   epr::hexadecimal_number_p<std::uint32_t>() >> epr::to_container()
);

The main difference between as_result() and to_container() consumer is that as_result() sets the whole result value of the appropriate producer while to_container() adds another value to the result (and the result is expected to be some kind of a container like std::vector, std::map or std::string).

and_clause as and-predicate

The PEG’s and-predicate is expressed via and_clause in easy_parser. For example, this simple grammar:

duration = NUMBER &(SPACE "sec")

can be represented as:

auto parser = epr::produce<unsigned int>(
   epr::non_negative_decimal_number_p<unsigned int>(),
   epr::and_clause(epr::space(), epr::exact("sec"))
);

Please note that and_clause accepts a set of clauses. If a producer is used inside and_clause it should be connected with a consumer. But there is no much sense to use producers inside and_clause because and_clause doesn’t consumes matched input.

not_clause as not-predicate

The PEG’s not-predicate is expressed via not_clause in easy_parser. For example, this simple grammar:

milliseconds = NUMBER !(SPACE "sec")

can be represented as:

auto parser = epr::produce<unsigned int>(
   epr::non_negative_decimal_number_p<unsigned int>(),
   epr::not_clause(epr::space(), epr::exact("sec"))
);

Please note that not_clause accepts a set of clauses. If a producer is used inside not_clause it should be connected with a consumer. But there is no much sense to use producers inside not_clause because not_clause doesn’t consumes matched input.

Where to find information about easy_parser’s ready-to-use stuff?

The easy_parser and easy_parser_router already contain a set of ready-to-use tools that can’t be discussed in this section because the lack of the room. Information about those tools can be found in API Reference Manual. See the content of restinio::easy_parser and restinio::easy_parser_router namespaces.

Some more complex examples

A router from long_output example

The long_output example shows how to create big responses by using chunked_output. This example uses a router that handles the following routes:

routes = '/' chunk-size [multiplier] '/' chunk-count
       | '/' chunk-size [multiplier]
       | '/'

chunk-size  = NUMBER
chunk-count = NUMBER

multiplier = ('b'|'B') | ('k'|'K') | ('m'|'M')

So if long_output receives GET request for / path it uses default number of chunks of the default size. If long_output receives GET request for /15k path it uses default number of chunks of size 15 kilobytes each. If long_output receives GET request for /512/200 path it responds by 200 chunks of 512 bytes each.

By using express-router those routes can be defined that way:

router->http_get("/",
      [&ctx](auto req, auto) {...});

router->http_get(
      R"(/:value(\d+):multiplier([MmKkBb]?))",
      [&ctx](auto req, auto params) {...});

router->http_get(
      R"(/:value(\d+):multiplier([MmKkBb]?)/:count(\d+))",
      [&ctx](auto req, auto params) {...});

With easy_parser_router things will be a bit more interesting…

The first way we can go is to mimics express-router and to define three route handlers like:

// To make things compact and clear.
using namespace restinio::router::easy_parser_router;

// This producer will be repeated.
auto multiplier_p = produce<std::optional<char>>(
      maybe(
         alternatives(
            caseless_symbol_p('b') >> as_result(),
            caseless_symbol_p('k') >> as_result(),
            caseless_symbol_p('m') >> as_result()
         )
      )
   );

router->http_get(
      path_to_params("/"),
      [&ctx](auto req) {...});

router->http_get(
      path_to_params(
         "/",
         non_negative_decimal_number_p<std::size_t>(),
         multiplier_p,
      ),
      [&ctx](auto req,
         std::size_t chunk_size,
         std::optional<char> multiplier) {...});

router->http_get(
      path_to_params(
         "/",
         non_negative_decimal_number_p<std::size_t>(),
         multiplier_p,
         "/",
         non_negative_decimal_number_p<std::size_t>()
      ),
      [&ctx](auto req,
         std::size_t chunk_size,
         std::optional<char> multiplier,
         std::size_t chunk_count) {...});

But this approach doesn’t look promising because it lefts some repeated tasks to route handlers. For example, we should handle multiplier in two separate handlers, but this handling will be the same in each of them. There is also a need to get the default chunk count in the first and the second handlers.

We can try to write just one route handler that will handle all corner cases.

To do that we have to define a struct like that:

struct distribution_params
{
   std::size_t chunk_size_{100u*1024u};
   std::size_t count_{10000u};
};

And an instance of that struct will be a result of the route parser. So we can write a parser in the form:

router->http_get(
   path_to_params(
      produce<distribution_params>(
         exact("/"),
         maybe(
            ..., // Some code related to "chunk-size [multiplier]"
                 // That code fills distribution_params::chunk_size_ member.
            maybe(
               exact("/"),
               non_negative_decimal_number_p<std::size_t>()
                  >> &distribution_params::count_
            )
         )
      )
   ),
   [&ctx](auto req,
      distribution_params params) {...});

The main problem is what we can write instead of ellipsis.

But before we dive deep in that problem a note about the usage of exact("/") inside produce. String literals can be used directly only if they are parameters to path_to_params or path_to_tuple functions. It is because path_to_params/path_to_tuple is a part of easy_parser_router DSL and that DSL treats string literals special way. But clauses inside produce belongs to easy_parser’s DSL and that DSL doesn’t understand string literals. So we have to enclose string literals into exact in easy_parser’s DSL.

And now go back to the problem with parsing “chunk-size [multiplier]” part. If there weren’t multiplier part we could write just:

produce<distribution_params>(
   exact("/"),
   maybe(
      non_negative_decimal_number_p<std::size_t>()
         >> &distribution_params::chunk_size_,
      maybe(
         exact("/"),
         non_negative_decimal_number_p<std::size_t>()
            >> &distribution_params::count_
      )
   )
)

But we have to handle multiplier and chunk_size_ should be set with the respect to multiplier value.

So we can use a trick here: extract two values (chunk-size and multiplier) and then transform this pair into a single value of type std::size_t.

A helper struct is necessary here (just a note: std::pair can be used instead, but a dedicated struct chunk_size makes things cleaner):

struct chunk_size { std::uint32_t c_{1u}, m_{1u}; };

The code of extraction of chunk-size and multiplier into an instance of chunk_size will look like:

produce<chunk_size>(
   non_negative_decimal_number_p<std::uint32_t>()
      >> &chunk_size::c_,
   maybe(
      produce<std::uint32_t>(
         alternatives(
            caseless_symbol_p('b') >> just_result(1u),
            caseless_symbol_p('k') >> just_result(1024u),
            caseless_symbol_p('m') >> just_result(1024u * 1024u)
         )
      ) >> &chunk_size::m_
   )
)

Now we have an instance of chunk_size and all we need is the transformation of that instance into a single std::size_t value. We can do that by using easy_parser’s convert transformer:

produce<chunk_size>(
   ...
) >> convert([](auto cs) { return std::size_t{cs.c_} * cs.m_; })

And now we have to store the transformed value into distribution_params::chunk_size_ member:

produce<chunk_size>(
   ...
) >> convert(...)
  >> &distribution_params::chunk_size_

So the whole code for long_output router will look like:

router->http_get(
   path_to_params(
      produce<distribution_params>(
         exact("/"),
         maybe(
            produce<chunk_size>(
               non_negative_decimal_number_p<std::uint32_t>()
                  >> &chunk_size::c_,
               maybe(
                  produce<std::uint32_t>(
                     alternatives(
                        caseless_symbol_p('b') >> just_result(1u),
                        caseless_symbol_p('k') >> just_result(1024u),
                        caseless_symbol_p('m') >> just_result(1024u * 1024u)
                     )
                  ) >> &chunk_size::m_
               )
            ) >> convert([](auto cs) { return std::size_t{cs.c_} * cs.m_; })
               >> &distribution_params::chunk_size_,
            maybe(
               exact("/"),
               non_negative_decimal_number_p<std::size_t>()
                  >> &distribution_params::count_
            )
         )
      )
   ),
   [&ctx](auto req,
      distribution_params params) {...});

Parsing of UUIDs in URL

Another interesting example is the parsing of UUID values specified in route paths. For express router dealing with UUID in route can be done with a rather simple regular expression:

router->http_get(
   "/books/:id([A-Fa-f0-9]{8}-([A-Fa-f0-9]{4}-){3}[A-Fa-f0-9]{12})",
   ...);

The easy_parser_router also allows handling such cases, but it may require some more work from a programmer. Here we’ll discuss two ways of extraction of UUID values.

Extraction of UUID as a string

Let’s start with the simplest approach: the extraction of UUID value as a string. It can look like that:

// To make things compact and clear.
using namespace restinio::router::easy_parser_router;

// The definition of UUID parser. It then be used in path_to_params.
auto uuid_p = produce<std::string>(
   repeat(8u, 8u, hexdigit_p() >> to_container()),
   symbol_p('-') >> to_container(),
   repeat(3u, 3u,
      repeat(4u, 4u, hexdigit_p() >> to_container()),
      symbol_p('-') >> to_container()),
   repeat(12u, 12u, hexdigit_p() >> to_container())
);

// The definition of route handler.
router->http_get(
   path_to_params("/books/", uuid_p),
   [](auto & req, const auto & uuid) {...});

The definition of uuid_p is a bit wordy but it is straightforward: we expect 8 hexadecimal digits, then hyphen sign, then three groups each of them contains 4 hexadecimal digits and hyphen sign, and then 12 hexadecimal digits. The only thing that should be mentioned is the storing of every extracted symbol to the result container. Even hyphens are stored, otherwise we’ll get “12345678000011112222123456789abc” instead of “12345678-0000-1111-2222-123456789abc”.

This definition of uuid_p can be slightly improved. In its first form uuid_p stores UUID values in the original form. So if there is a mix of lower and upper case letters (like “abcd1234-bbbb-CCCC-dddd-12345678ABC”) then this mix will be kept as is. If this is not appropriate we can tell uuid_p to do the automatic transformation of values to lower case:

auto uuid_p = produce<std::string>(
   repeat(8u, 8u, hexdigit_p() >> to_container()),
   symbol_p('-') >> to_container(),
   repeat(3u, 3u,
      repeat(4u, 4u, hexdigit_p() >> to_container()),
      symbol_p('-') >> to_container()),
   repeat(12u, 12u, hexdigit_p() >> to_container())
) >> to_lower(); // Now the extracted value will be converted
                 // to lower case.

The main drawback of this solution is the usage of std::string for holding a small fixed-size value of UUID (but the length of that value can exceed the internal std::string buffer used for SSO). Dynamic allocation for storing just 36 bytes is not a good idea. Can we avoid it?

Yes, we can use std::array<char, 36> instead of std::string. Let’s look how the definition of uuid_p will be changed for std::array:

const auto uuid_p = produce< std::array<char, 36> >(
   repeat(8u, 8u, hexdigit_p() >> to_container()),
   symbol_p('-') >> to_container(),
   repeat(3u, 3u,
      repeat(4u, 4u, hexdigit_p() >> to_container()),
      symbol_p('-') >> to_container()),
   repeat(12u, 12u, hexdigit_p() >> to_container())
) >> to_lower();

The only change required is the replacement of std::string to std::array.

Extraction of UUID as a struct with integers inside

If the storing of UUID in the form of a string is not appropriate for some reasons we can make a parser that extracts the value of UUID as a struct with integers inside. Such struct can be defined that way:

struct uuid_t
{
   std::uint32_t time_low_;
   std::uint16_t time_mid_;
   std::uint16_t time_hi_and_version_;
   std::uint8_t clock_seq_hi_and_res_;
   std::uint8_t clock_seq_low_;
   std::array<std::uint8_t, 6> node_;
};

And the extraction of UUID value into such struct can look like that:

// To make things compact and clear.
using namespace restinio::router::easy_parser_router;

// Helpers to be used in the uuid_p below.
const auto x_uint32_p =
      hexadecimal_number_p<std::uint32_t>(expected_digits(8));
const auto x_uint16_p =
      hexadecimal_number_p<std::uint16_t>(expected_digits(4));
const auto x_uint8_p =
      hexadecimal_number_p<std::uint8_t>(expected_digits(2));

// The parser for UUID.
const auto uuid_p = produce<uuid_t>(
   x_uint32_p >> &uuid_t::time_low_,
   symbol('-'),
   x_uint16_p >> &uuid_t::time_mid_,
   symbol('-'),
   x_uint16_p >> &uuid_t::time_hi_and_version_,
   symbol('-'),
   x_uint8_p >> &uuid_t::clock_seq_hi_and_res_,
   x_uint8_p >> &uuid_t::clock_seq_low_,
   symbol('-'),
   produce< std::array<std::uint8_t, 6> >(
      repeat( 6, 6, x_uint8_p >> to_container() )
   ) >> &uuid_t::node_
);

In this version, we store only numeric values and ignore all hyphens.

Strictly speaking, there is no need to define helpers like x_uint32_p and x_uint16_p, but they make the definition of uuid_p much more readable.