Home | Libraries | People | FAQ | More |
The XML format is an industry standard for storing information in textual form. Unfortunately, there is no XML parser in Boost as of the time of this writing. The library therefore contains the fast and tiny RapidXML parser (currently in version 1.13) to provide XML parsing support. RapidXML does not fully support the XML standard; it is not capable of parsing DTDs and therefore cannot do full entity substitution.
By default, the parser will preserve most whitespace, but remove element content that consists only of whitespace. Encoded whitespaces (e.g.  ) does not count as whitespace in this regard. You can pass the trim_whitespace flag if you want all leading and trailing whitespace trimmed and all continuous whitespace collapsed into a single space.
Please note that RapidXML does not understand the encoding specification. If you pass it a character buffer, it assumes the data is already correctly encoded; if you pass it a filename, it will read the file using the character conversion of the locale you give it (or the global locale if you give it none). This means that, in order to parse a UTF-8-encoded XML file into a wptree, you have to supply an alternate locale, either directly or by replacing the global one.
XML / property tree conversion schema (read_xml
and write_xml
):
<xmlattr>
.
There is one child node per attribute in the attribute node. Existence
of the <xmlattr>
node is not guaranteed or necessary
when there are no attributes.
<xmlcomment>
,
unless comment ignoring is enabled via the flags.
<xmltext>
.
The XML storage encoding does not round-trip perfectly. A read-write cycle loses trimmed whitespace, low-level formatting information, and the distinction between normal data and CDATA nodes. Comments are only preserved when enabled. A write-read cycle loses trimmed whitespace; that is, if the origin tree has string data that starts or ends with whitespace, that whitespace is lost.
The JSON format is a data interchange format derived from the object literal notation of JavaScript. (JSON stands for JavaScript Object Notation.) JSON is a simple, compact format for loosely structured node trees of any depth, very similar to the property tree dataset. It is less structured than XML and has no schema support, but has the advantage of being simpler, smaller and typed without the need for a complex schema.
The property tree dataset is not typed, and does not support arrays as such. Thus, the following JSON / property tree mapping is used:
JSON round-trips, except for the type information loss.
For example this JSON:
{ "menu": { "foo": true, "bar": "true", "value": 102.3E+06, "popup": [ {"value": "New", "onclick": "CreateNewDoc()"}, {"value": "Open", "onclick": "OpenDoc()"}, ] } }
will be translated into the following property tree:
menu { foo true bar true value 102.3E+06 popup { "" { value New onclick CreateNewDoc() } "" { value Open onclick OpenDoc() } } }
The INI format was once widely used in the world of Windows. It is now deprecated, but is still used by a surprisingly large number of applications. The reason is probably its simplicity, plus that Microsoft recommends using the registry as a replacement, which not all developers want to do.
INI is a simple key-value format with a single level of sectioning. It is thus less rich than the property tree dataset, which means that not all property trees can be serialized as INI files.
The INI parser creates a tree node for every section, and a child node for every property in that section. All properties not in a section are directly added to the root node. Empty sections are ignored. (They don't round-trip, as described below.)
The INI serializer reverses this process. It first writes out every child of the root that contains data, but no child nodes, as properties. Then it creates a section for every child that contains child nodes, but no data. The children of the sections must only contain data. It is an error if the root node contains data, or any child of the root contains both data and content, or there's more than three levels of hierarchy. There must also not be any duplicate keys.
An empty tree node is assumed to be an empty property. There is no way to create empty sections.
Since the Windows INI parser discards trailing spaces and does not support quoting, the property tree parser follows this example. This means that property values containing trailing spaces do not round-trip.
The INFO format was created specifically for the property tree library. It provides a simple, efficient format that can be used to serialize property trees that are otherwise only stored in memory. It can also be used for any other purpose, although the lack of widespread existing use may prove to be an impediment.
INFO provides several features that make it familiar to C++ programmers and efficient for medium-sized datasets, especially those used for test input. It supports C-style character escapes, nesting via curly braces, and file inclusion via #include.
INFO is also used for visualization of property trees in this documentation.
A typical INFO file might look like this:
key1 value1 key2 { key3 value3 { key4 "value4 with spaces" } key5 value5 }
Here's a more complicated file demonstrating all of INFO's features:
; A comment key1 value1 ; Another comment key2 "value with special characters in it {};#\n\t\"\0" { subkey "value split "\ "over three"\ "lines" { a_key_without_value "" "a key with special characters in it {};#\n\t\"\0" "" "" value ; Empty key with a value "" "" ; Empty key with empty value! } } #include "file.info" ; included file
INFO round-trips except for the loss of comments and include directives.