boost.png (6897 bytes)Boost.MultiIndex Tutorial: Key extraction



Contents

Introduction

STL associative containers have a notion of key, albeit in a somewhat incipient form. So, the keys of such containers are identified by a nested type key_type; for std::sets and std::multisets, key_type coincides with value_type, i.e. the key is the element itself. std::map and std::multimap manage elements of type std::pair<const Key,T>, where the first member is the key. In either case, the process of obtaining the key from a given element is implicitly fixed and cannot be customized by the user.

Fixed key extraction mechanisms like those performed by STL associative containers do not scale well in the context of Boost.MultiIndex, where several indices share their value_type definition but might feature completely different lookup semantics. For this reason, Boost.MultiIndex formalizes the concept of a Key Extractor in order to make it explicit and controllable in the definition of key-based indices.

Intuitively speaking, a key extractor is a function object that accepts a reference to an element and returns its associated key. The formal concept also imposes some reasonable constraints about the stability of the process, in the sense that extractors are assumed to return the same key when passed the same element: this is in consonance with the informal understanding that keys are actually some "part" of the element and do not depend on external data.

Read/write key extractors

A key extractor is called read/write if it returns a non-constant reference to the key when passed a non-constant element, and it is called read-only otherwise. Boost.MultiIndex requires that the key extractor be read/write when using the modify_key member function of ordered and hashed indices. In all other situations, read-only extractors suffice. The section on advanced features of Boost.MultiIndex key extractors details which of the predefined key extractors are read/write.

Predefined key extractors

identity

The identity key extractor returns the entire base object as the associated key:

#include <boost/multi_index_container.hpp>
#include <boost/multi_index/ordered_index.hpp>
#include <boost/multi_index/identity.hpp>

multi_index_container<
  int,
  indexed_by<
    ordered_unique<
      identity<int> // the key is the entire element
    >
  >
> cont;

member

member key extractors return a reference to a specified data field of the base object. For instance, in the following version of our familiar employee container:

#include <boost/multi_index_container.hpp>
#include <boost/multi_index/ordered_index.hpp>
#include <boost/multi_index/identity.hpp>
#include <boost/multi_index/member.hpp>

typedef multi_index_container<
  employee,
  indexed_by<
    ordered_unique<identity<employee> >,
    ordered_non_unique<member<employee,std::string,&employee::name> >,
    ordered_unique<member<employee,int,&employee::ssnumber> >
  >
> employee_set;

the second and third indices use member extractors on employee::name and employee::ssnumber, respectively. The specification of an instantiation of member is simple yet a little contrived:

member<(base type),(key type),(pointer to member)>

It might seem that the first and second parameters are superfluous, since the type of the base object and of the associated data field are already implicit in the pointer to member argument: unfortunately, it is not possible to extract this information with current C++ mechanisms, which makes the syntax of member a little too verbose.

const_mem_fun and mem_fun

Sometimes, the key of an index is not a concrete data member of the element, but rather it is a value returned by a particular member function. This resembles the notion of calculated indices supported by some relational databases. Boost.MultiIndex supports this kind of key extraction through const_mem_fun. Consider the following container where sorting on the third index is based upon the length of the name field:

#include <boost/multi_index_container.hpp>
#include <boost/multi_index/ordered_index.hpp>
#include <boost/multi_index/identity.hpp>
#include <boost/multi_index/member.hpp>
#include <boost/multi_index/mem_fun.hpp>

struct employee
{
  int         id;
  std::string name;

  employee(int id,const std::string& name):id(id),name(name){}

  bool operator<(const employee& e)const{return id<e.id;}

  // returns the length of the name field
  std::size_t name_length()const{return name.size();}
};

typedef multi_index_container<
  employee,
  indexed_by<
    // sort by employee::operator<
    ordered_unique<identity<employee> >,
    
    // sort by less<string> on name
    ordered_non_unique<member<employee,std::string,&employee::name> >,
    
    // sort by less<int> on name_length()
    ordered_non_unique<
      const_mem_fun<employee,std::size_t,&employee::name_length>
    >
  >
> employee_set;

const_mem_fun usage syntax is similar to that of member:

const_mem_fun<(base type),(key type),(pointer to member function)>

The member function referred to must be const, take no arguments and return a value of the specified key type. Almost always you will want to use a const member function, since elements in a multi_index_container are treated as constant, much as elements of an std::set. However, a mem_fun counterpart is provided for use with non-constant member functions, whose applicability is discussed on the paragraph on advanced features of Boost.MultiIndex key extractors.

Example 2 in the examples section provides a complete program showing how to use const_mem_fun.

global_fun

Whereas const_mem_fun and mem_fun are based on a given member function of the base type from where the key is extracted, global_fun takes a global function (or static member function) accepting the base type as its parameter and returning the key:

#include <boost/multi_index_container.hpp>
#include <boost/multi_index/ordered_index.hpp>
#include <boost/multi_index/global_fun.hpp>

struct rectangle
{
  int x0,y0;
  int x1,y1;
};

unsigned long area(const rectangle& r)
{
  return (unsigned long)(r.x1-r.x0)*(r.x1-r.x0)+
         (unsigned long)(r.y1-r.y0)*(r.y1-r.y0);
}

typedef multi_index_container<
  rectangle,
  indexed_by<
    // sort by increasing area
    ordered_non_unique<global_fun<const rectangle&,unsigned long,&area> >
  >
> rectangle_container;

The specification of global_fun obeys the following syntax:

global_fun<(argument type),(key type),(pointer to function)>

where the argument type and key type must match exactly those in the signature of the function used; for instance, in the example above the argument type is const rectangle&, without omitting the "const" and "&" parts. So, although most of the time the base type will be accepted by constant reference, global_fun is also prepared to take functions accepting their argument by value or by non-constant reference: this latter case cannot generally be used directly in the specification of multi_index_containers as their elements are treated as constant, but the section on advanced features of Boost.MultiIndex key extractors describes valid use cases of key extraction based on such functions with a non-constant reference argument.

Example 2 in the examples section uses gobal_fun.

User-defined key extractors

Although the predefined key extractors provided by Boost.MultiIndex are intended to serve most cases, the user can also provide her own key extractors in more exotic situations, as long as these conform to the Key Extractor concept.

// some record class
struct record
{
  boost::gregorian::date d;
  std::string            str;    
};

// extracts a record's year
struct record_year
{
  // result_type typedef required by Key Extractor concept
  typedef boost::gregorian::greg_year result_type; 
  
  result_type operator()(const record& r)const // operator() must be const
  {
    return r.d.year();
  }
};

// example of use of the previous key extractor
typedef multi_index_container<
  record,
  indexed_by<
    ordered_non_unique<record_year> // sorted by record's year
  >
> record_log;

Example 6 in the examples section applies some user-defined key extractors in a complex scenario where keys are accessed via pointers.

Composite keys

In relational databases, composite keys depend on two or more fields of a given table. The analogous concept in Boost.MultiIndex is modeled by means of composite_key, as shown in the example:

#include <boost/multi_index_container.hpp>
#include <boost/multi_index/ordered_index.hpp>
#include <boost/multi_index/member.hpp>
#include <boost/multi_index/composite_key.hpp>

struct phonebook_entry
{
  std::string family_name;
  std::string given_name;
  std::string phone_number;

  phonebook_entry(
    std::string family_name,
    std::string given_name,
    std::string phone_number):
    family_name(family_name),given_name(given_name),phone_number(phone_number)
  {}
};

// define a multi_index_container with a composite key on
// (family_name,given_name)
typedef multi_index_container<
  phonebook_entry,
  indexed_by<
    //non-unique as some subscribers might have more than one number
    ordered_non_unique< 
      composite_key<
        phonebook_entry,
        member<phonebook_entry,std::string,&phonebook_entry::family_name>,
        member<phonebook_entry,std::string,&phonebook_entry::given_name>
      >
    >,
    ordered_unique< // unique as numbers belong to only one subscriber
      member<phonebook_entry,std::string,&phonebook_entry::phone_number>
    >
  >
> phonebook;

composite_key accepts two or more key extractors on the same value (here, phonebook_entry). Lookup operations on a composite key are accomplished by passing tuples with the values searched:

phonebook pb;
...
// search for Dorothea White's number
phonebook::iterator it=pb.find(boost::make_tuple("White","Dorothea"));
std::string number=it->phone_number;

Composite keys are sorted by lexicographical order, i.e. sorting is performed by the first key, then the second key if the first one is equal, etc. This order allows for partial searches where only the first keys are specified:

phonebook pb;
...
// look for all Whites
std::pair<phonebook::iterator,phonebook::iterator> p=
  pb.equal_range(boost::make_tuple("White"));

As a notational convenience, when only the first key is specified it is possible to pass the argument directly without including it into a tuple:

phonebook pb;
...
// look for all Whites
std::pair<phonebook::iterator,phonebook::iterator> p=pb.equal_range("White");

On the other hand, partial searches without specifying the first keys are not allowed.

By default, the corresponding std::less predicate is used for each subkey of a composite key. Alternate comparison predicates can be specified with composite_key_compare:

// phonebook with given names in reverse order

typedef multi_index_container<
  phonebook_entry,
  indexed_by<
    ordered_non_unique<
      composite_key<
        phonebook_entry,
        member<phonebook_entry,std::string,&phonebook_entry::family_name>,
        member<phonebook_entry,std::string,&phonebook_entry::given_name>
      >,
      composite_key_compare<
        std::less<std::string>,   // family names sorted as by default
        std::greater<std::string> // given names reversed
      >
    >,
    ordered_unique<
      member<phonebook_entry,std::string,&phonebook_entry::phone_number>
    >
  >
> phonebook;

See example 7 in the examples section for an application of composite_key.

Composite keys and hashed indices

Composite keys can also be used with hashed indices in a straightforward manner:

struct street_entry
{
  // quadrant coordinates
  int x;
  int y;

  std::string name;

  street_entry(int x,int y,const std::string& name):x(x),y(y),name(name){}
};

typedef multi_index_container<
  street_entry,
  indexed_by<
    hashed_non_unique< // indexed by quadrant coordinates
      composite_key<
        street_entry,
        member<street_entry,int,&street_entry::x>,
        member<street_entry,int,&street_entry::y>
      >
    >,
    hashed_non_unique< // indexed by street name
      member<street_entry,std::string,&street_entry::name>
    >
  >
> street_locator;

street_locator sl;
...
void streets_in_quadrant(int x,int y)
{
  std::pair<street_locator::iterator,street_locator::iterator> p=
    sl.equal_range(boost::make_tuple(x,y));

  while(p.first!=p.second){
    std::cout<<p.first->name<<std::endl;
    ++p.first;
  }
}

Note that hashing is automatically taken care of: boost::hash is specialized to hash a composite key as a function of the boost::hash values of its elements. Should we need to specify different hash functions for the elements of a composite key, we can explicitly do so by using the composite_key_hash utility:

struct tuned_int_hash
{
  int operator()(int x)const
  {
    // specially tuned hash for this application
  }
};

typedef multi_index_container<
  street_entry,
  indexed_by<
    hashed_non_unique< // indexed by quadrant coordinates
      composite_key<
        street_entry,
        member<street_entry,int,&street_entry::x>,
        member<street_entry,int,&street_entry::y>
      >,
      composite_key_hash<
        tuned_int_hash,
        tuned_int_hash
      >
    >,
    hashed_non_unique< // indexed by street name
      member<street_entry,std::string,&street_entry::name>
    >
  >
> street_locator;

Also, equality of composite keys can be tuned with composite_key_equal_to, though in most cases the default equality predicate (relying on the std::equal_to instantiations for the element types) will be the right choice.

Unlike with ordered indices, we cannot perform partial searches specifying only the first elements of a composite key:

// try to locate streets in quadrants with x==0
// compile-time error: hashed indices do not allow such operations
std::pair<street_locator::iterator,street_locator::iterator> p=
  sl.equal_range(boost::make_tuple(0));

The reason for this limitation is quite logical: as the hash value of a composite key depends on all of its elements, it is impossible to calculate it from partial information.

Advanced features of Boost.MultiIndex key extractors

The Key Extractor concept allows the same object to extract keys from several different types, possibly through suitably defined overloads of operator():

// example of a name extractor from employee and employee *
struct name_extractor
{
  typedef std::string result_type;

  const result_type& operator()(const employee& e)const{return e.name;}
  result_type&       operator()(employee* e)const{return e->name;}
};

// name_extractor can handle elements of type employee...
typedef multi_index_container<
  employee,
  indexed_by<
    ordered_unique<name_extractor>
  >
> employee_set;

// ...as well as elements of type employee *
typedef multi_index_container<
  employee*,
  indexed_by<
    ordered_unique<name_extractor>
  >
> employee_ptr_set;

This possibility is fully exploited by predefined key extractors provided by Boost.MultiIndex, making it simpler to define multi_index_containers where elements are pointers or references to the actual objects. The following specifies a multi_index_container of pointers to employees sorted by their names.

typedef multi_index_container<
  employee *,
  indexed_by<
    ordered_non_unique<member<employee,std::string,&employee::name> > >
> employee_set;

Note that this is specified in exactly the same manner as a multi_index_container of actual employee objects: member takes care of the extra dereferencing needed to gain access to employee::name. A similar functionality is provided for interoperability with reference wrappers from Boost.Ref:

typedef multi_index_container<
  boost::reference_wrapper<const employee>,
  indexed_by<
    ordered_non_unique<member<employee,std::string,&employee::name> > >
> employee_set;

In fact, support for pointers is further extended to accept what we call chained pointers. Such a chained pointer is defined by induction as a raw or smart pointer or iterator to the actual element, to a reference wrapper of the element or to another chained pointer; that is, chained pointers are arbitrary compositions of pointer-like types ultimately dereferencing to the element from where the key is to be extracted. Examples of chained pointers to employee are:

In general, chained pointers with dereferencing distance greater than 1 are not likely to be used in a normal program, but they can arise in frameworks which construct "views" as multi_index_containers from preexisting multi_index_containers.

In order to present a short summary of the different usages of Boost.MultiIndex key extractors in the presence of reference wrappers and pointers, consider the following final type:

struct T
{
  int        i;
  const int  j;
  int        f()const;
  int        g();
  static int gf(const T&);
  static int gg(T&);
};

The table below lists the appropriate key extractors to be used for different pointer and reference wrapper types based on T, for each of its members.

Use cases for Boost.MultiIndex key extractors.
element type  key  key extractor applicable to
const elements?
read/write?
T i member<T,int,&T::i> yes yes
j member<T,const int,&T::j> yes no
f() const_mem_fun<T,int,&T::f> yes no
g() mem_fun<T,int,&T::g> no no
gf() global_fun<const T&,int,&T::gf> yes no
gg() global_fun<T&,int,&T::gg> no no
reference_wrapper<T> i member<T,int,&T::i> yes yes
j member<T,const int,&T::j> yes no
f() const_mem_fun<T,int,&T::f> yes no
g() mem_fun<T,int,&T::g> yes no
gf() global_fun<const T&,int,&T::gf> yes no
gg() global_fun<T&,int,&T::gg> yes no
reference_wrapper<const T> i member<T,const int,&T::i> yes no
j member<T,const int,&T::j> yes no
f() const_mem_fun<T,int,&T::f> yes no
g()  
gf() global_fun<const T&,int,&T::gf> yes no
gg()  
chained pointer to T
or to reference_wrapper<T>
i member<T,int,&T::i> yes yes
j member<T,const int,&T::j> yes no
f() const_mem_fun<T,int,&T::f> yes no
g() mem_fun<T,int,&T::g> yes no
gf() global_fun<const T&,int,&T::gf> yes no
gg() global_fun<T&,int,&T::gg> yes no
chained pointer to const T
or to reference_wrapper<const T>
i member<T,const int,&T::i> yes no
j member<T,const int,&T::j> yes no
f() const_mem_fun<T,int,&T::f> yes no
g()  
gf() global_fun<const T&,int,&T::gf> yes no
gg()  

The column "applicable to const elements?" states whether the corresponding key extractor can be used when passed constant elements (this relates to the elements specified in the first column, not the referenced T objects). The only negative cases are for T::g and T:gg when the elements are raw T objects, which make sense as we are dealing with a non-constant member function (T::g) and a function taking T by non-constant reference: this also implies that multi_index_containers of elements of T cannot be sorted by T::g or T::gg, because elements contained within a multi_index_container are treated as constant.

The column "read/write?" shows which combinations yield read/write key extractors.

Some care has to be taken to preserve const-correctness in the specification of member key extractors: in some sense, the const qualifier is carried along to the member part, even if that particular member is not defined as const. For instance, if the elements are of type const T *, sorting by T::i is not specified as member<const T,int,&T::i>, but rather as member<T,const int,&T::i>.

For practical demonstrations of use of these key extractors, refer to example 2 and example 6 in the examples section.




Revised June 11th 2007

© Copyright 2003-2007 Joaquín M López Muñoz. Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)