Boost C++ Libraries Home Libraries People FAQ More

PrevUpHomeNext

The utree data structure

utree is a dynamically-typed hierarchical data structure that can represent abstract syntax trees. It's well integrated with Spirit.Qi and Spirit.Karma. utree can be passed as an attribute to almost any grammar. utree's type system is implemented through the use of a discriminated union and type punning.

utree has a minimal memory footprint. The data structure size is 16 bytes on a 32-bit platform, and 32 bytes on 64-bit a platform (4*sizeof(void*)). Being a container of itself, it can represent tree structures.

Each instance of an utree data structure can store exactly one of the following data types at a time:

struct utree_type
{
    enum info
    {
        invalid_type,       // the utree has not been initialized (it's 
                            // default constructed)
        nil_type,           // nil is the sentinel (empty) utree type.
        list_type,          // A doubly linked list of utrees.
        range_type,         // A range of list::iterators. 
        reference_type,     // A reference to another utree.
        any_type,           // A pointer or reference to any C++ type. 
        function_type,      // A utree holding a stored_function<F> object,
                            // where F is an unary function object taking a 
                            // utree as it's parameter and returning a
                            // utree.

        // numeric atoms
        bool_type,          // An utree holding a boolean value
        int_type,           // An utree holding a integer (int) value
        double_type,        // An utree holding a floating point (double) value

        // text atoms (utf8)
        string_type,        // An UTF-8 string 
        string_range_type,  // A pair of iterators into an UTF-8 string
        symbol_type,        // An UTF-8 symbol name

        binary_type         // Arbitrary binary data
    };
    typedef boost::uint_t<sizeof(info)*8>::exact exact_integral_type;
    typedef boost::uint_t<sizeof(info)*8>::fast fast_integral_type;
};

The UTF-8 string, UTF-8 symbol, and binary data types are internally stored either directly as the node data (small string optimization applied), or they are allocated from the heap, storing the pointer to the allocated data in the utree. The maximum possible length of the data to be stored in the node data depends on the platform the utree is compiled for. It is 14 bytes for a 32-bit platform and 30 bytes for a 64-bit platform.

Class Reference

The utree data structure is very versatile and can be used as an attribute for all possible Spirit.Qi parsers and Spirit.Karma generators. For this reason, it exposes a set of typedef's making it compatible with STL containers:

typedef utree value_type;
typedef utree& reference;
typedef utree const& const_reference;
typedef std::ptrdiff_t difference_type;
typedef std::size_t size_type;

typedef detail::list::node_iterator<utree> iterator;
typedef detail::list::node_iterator<utree const> const_iterator;

The utree data type exposes the functional interface of a bidirectional STL container. The iterators returned from begin() et.al. conform to the Standard requirements of a bidirectional iterator.

// STL Container interface

// insertion 
template <class T>
void push_back(T const&);
template <class T>
void push_front(T const&);
template <class T>
iterator insert(iterator, T const&);
template <class T>
void insert(iterator, std::size_t, T const&);
template <class Iterator>
void insert(iterator, Iterator, Iterator);

// erasure
void pop_front();
void pop_back();
iterator erase(iterator);
iterator erase(iterator, iterator);

// front access
reference front();
const_reference front() const;
iterator begin();
const_iterator begin() const;
ref_iterator ref_begin();

// back access
reference back();
const_reference back() const;
iterator end();
const_iterator end() const;
ref_iterator ref_end();

The exposed container interface makes the utree usable with all Spirit.Qi parser and Spirit.Karma generator components, which are compatible with an STL container attribute type.

A utree can be constructed or initialized from a wide range of data types, allowing to create utree instances for every possible node type (see the description of utree_type::info above). For this reason it exposes a constructor and an assignment operator for each of the allowed node types as shown below. All constructors are non-explicit on purpose, allowing to use an utree instance as the attribute to almost any Qi parser.

// This constructs an `invalid_type` node. When used in places
// where a boost::optional is expected (i.e. as an attribute for the 
// optional component), this represents the 'empty' state.
utree(invalid_type = invalid_type());

// This initializes a `nil_type` node, which represents a valid,
// 'initialized empty' utree (different from invalid_type!).
utree(nil_type);
reference operator=(nil_type);

// This initializes a `boolean_type` node, which can hold 'true' or
// 'false' only.
explicit utree(bool);
reference operator=(bool);

// This initializes an `integer_type` node, which can hold arbitrary 
// integers. For convenience these functions are overloaded for signed
// and unsigned integer types.
utree(unsigned int);
utree(int);
reference operator=(unsigned int);
reference operator=(int);

// This initializes a `double_type` node, which can hold arbitrary 
// floating point (double) values.
utree(double);
reference operator=(double);

// This initializes a `string_type` node, which can hold a narrow 
// character sequence (usually an UTF-8 string).
utree(char);
utree(char const*);
utree(char const*, std::size_t);
utree(std::string const&);
reference operator=(char);
reference operator=(char const*);
reference operator=(std::string const&);

// This constructs a `string_range_type` node, which does not copy the 
// data but stores the iterator range to the character sequence the 
// range has been initialized from.
utree(utf8_string_range_type const&, shallow_tag);

// This initializes a `reference_type` node, which holds a reference to 
// another utree node. All operations on such a node are automatically
// forwarded to the referenced utree instance.
utree(boost::reference_wrapper<utree>);
reference operator=(boost::reference_wrapper<utree>);

// This initializes an `any_type` node, which can hold a pointer to an
// instance of any type together with the typeid of that type. When 
// accessing that pointer the typeid will be checked, causing a 
// std::bad_cast to be thrown if the typeids do not match.
utree(any_ptr const&);
reference operator=(any_ptr const&);

// This initializes a `range_type` node, which holds an utree list node
// the elements of which are copy constructed (assigned) from the 
// elements referenced by the given range of iterators.
template <class Iterator>
utree(boost::iterator_range<Iterator>);
template <class Iterator>
reference operator=(boost::iterator_range<Iterator>);

// This initializes a `function_type` node from a polymorphic function
// object pointer (takes ownership) or reference. 
utree(function_base const&);
reference operator=(function_base const&);
utree(function_base*);
reference operator=(function_base*);

// This initializes either a `string_type`, a `symbol_type`, or a 
// `binary_type` node (depending on the template parameter `type_`), 
// which will hold the corresponding narrow character sequence (usually 
// an UTF-8 string).
template <class Base, utree_type::info type_>
utree(basic_string<Base, type_> const&);
template <class Base, utree_type::info type_>
reference operator=(basic_string<Base, type_> const&);

The utree data type exposes the functional interface compatible to Boost.Variant as well. Its very nature is to hold different data types, one at each point in time, making it functionally very similar to Boost.Variant.

// return the data type (`utree_type::info`) of the currently stored 
// data item
utree_type::info which() const;

// access the currently stored data in a type safe manner, this will 
// throw a `std::bad_cast()` if the currently stored data item is not 
// default convertible to `T`.
template <class T>
T get() const;

The exposed variant-like interface makes the utree usable with all Spirit.Qi parser and Spirit.Karma generator components, which are compatible with an Boost.Variant attribute type.

String Types

The utree string types described below are used by the utree API only. These are not used to store information in the utree itself. Their purpose is to refer to different internal utree node types only. For instance, creating a utree from a binary data type will create a binary_type utree node (see above).

The binary data type can be represented either verbatim as a sequence of bytes or as a pair of iterators into some other stored binary data sequence. Use this string type to access/create a binary_type utree.

typedef basic_string<
    boost::iterator_range<char const*>, utree_type::binary_type
> binary_range_type;
typedef basic_string<
    std::string, utree_type::binary_type
> binary_string_type;

The UTF-8 string can be represented either verbatim as a sequence of characters or as a pair of iterators into some other stored binary data sequence. Use this string type to access/create a string_type utree.

typedef basic_string<
    boost::iterator_range<char const*>, utree_type::string_type
> utf8_string_range_type;
typedef basic_string<
    std::string, utree_type::string_type
> utf8_string_type;

The UTF-8 symbol can be represented either verbatim as a sequence of characters or as a pair of iterators into some other stored binary data sequence. Use this string type to access/create a symbol_type utree.

typedef basic_string<
    boost::iterator_range<char const*>, utree_type::symbol_type
> utf8_symbol_range_type;
typedef basic_string<
    std::string, utree_type::symbol_type
> utf8_symbol_type;

Function Object Interface

The stored_function template class can to store a unary function objects with a signature of utree(scope const&) as a utree node.

struct function_base
{
    virtual ~function_base() {}
    virtual utree operator()(utree const& env) const = 0;
    virtual utree operator()(utree& env) const = 0;

    // Calling f.clone() must return a newly allocated function_base 
    // instance that is equal to f.
    virtual function_base* clone() const = 0;
};

template <typename F>
struct stored_function : function_base
{
    F f;
    stored_function(F f = F());
    virtual ~stored_function();
    virtual utree operator()(utree const& env) const;
    virtual utree operator()(utree& env) const;
    virtual function_base* clone() const;
};

template <typename F>
struct referenced_function : function_base
{
    F& f;
    referenced_function(F& f);
    virtual ~referenced_function();
    virtual utree operator()(utree const& env) const;
    virtual utree operator()(utree& env) const;
    virtual function_base* clone() const;
};

Exceptions

All exceptions thrown by utree are derived from utree_exception.

struct utree_exception : std::exception {};

The bad_type_exception is thrown whenever somebody calls a member function, which applies to certain stored utree_type's only, but this precondition is violated as the utree instance holds some other type.

struct bad_type_exception /*: utree_exception*/;

The empty_exception is thrown whenever a precondition of a list or range utree method is violated due to the list or range being empty.

struct empty_exception /*: utree_exception*/;

Example: Sexpr Parser

Our first example demonstrates how to use utree to write a parser for symbolic expressions. While utree is capable of representing just about any AST, utree's design is based on the simple yet powerful nature of symbolic expressions. This example introduces a number of basic and intermediate utree development techniques: using Spirit.Qi and Spirit.Karma integration, tracking source code locations and taking advantage of UTF8 support.

The source for this example can be found here: ../../example/support/utree.

namespace sexpr
{

template <typename Iterator, typename ErrorHandler = error_handler<Iterator> >
struct parser : qi::grammar<Iterator, utree(), whitespace<Iterator> >
{
    qi::rule<Iterator, utree(), whitespace<Iterator> >
        start, element, list;

    qi::rule<Iterator, utree()>
        atom;

    qi::rule<Iterator, int()>
        integer;

    qi::rule<Iterator, utf8_symbol_type()>
        symbol;

    qi::rule<Iterator, utree::nil_type()>
        nil_;

    qi::rule<Iterator, binary_string_type()>
        binary;

    utf8::parser<Iterator>
        string;

    px::function<ErrorHandler> const
        error;

    tagger<Iterator, save_line_pos>
        pos;

    parser(std::string const& source_file = "<string>"):
        parser::base_type(start), error(ErrorHandler(source_file))
    {
        using standard::char_;
        using qi::unused_type;
        using qi::lexeme;
        using qi::hex;
        using qi::oct;
        using qi::no_case;
        using qi::real_parser;
        using qi::strict_real_policies;
        using qi::uint_parser;
        using qi::bool_parser;
        using qi::on_error;
        using qi::fail;
        using qi::int_;
        using qi::lit;
        using qi::_val;
        using qi::_1;
        using qi::_2;
        using qi::_3;
        using qi::_4;

        real_parser<double, strict_real_policies<double> > strict_double;
        uint_parser<unsigned char, 16, 2, 2> hex2;
        bool_parser<bool, sexpr::bool_input_policies> boolean;

        start = element.alias();

        element = atom | list;

        list = pos(_val, '(') > *element > ')';

        atom = nil_
             | strict_double
             | integer
             | boolean
             | string
             | symbol
             | binary;

        nil_ = qi::attr_cast(lit("nil"));

        integer = lexeme[ no_case["#x"] >  hex]
                | lexeme[ no_case["#o"] >> oct]
                | lexeme[-no_case["#d"] >> int_];

        std::string exclude = std::string(" ();\"\x01-\x1f\x7f") + '\0';
        symbol = lexeme[+(~char_(exclude))];

        binary = lexeme['#' > *hex2 > '#'];

        start.name("sexpr");
        element.name("element");
        list.name("list");
        atom.name("atom");
        nil_.name("nil");
        integer.name("integer");
        symbol.name("symbol");
        binary.name("binary");

        on_error<fail>(start, error(_1, _2, _3, _4));
    }
};

} // sexpr


PrevUpHomeNext