Simbody  3.5
SimTK::Xml Class Reference

This class provides a minimalist capability for reading and writing XML documents, as files or strings. More...

Classes

class  Attribute
 Elements can have attributes, which are name="value" pairs that appear within the element start tag in an XML document; this class represents the in-memory representation of one of those attributes and can be used to examine or modify the name or value. More...
 
class  attribute_iterator
 This is a bidirectional iterator suitable for moving forward or backward within a list of Attributes within an Element, for writable access. More...
 
class  Comment
 A comment contains only uninterpreted text. More...
 
class  Element
 An element has (1) a tagword, (2) a map of (name,value) pairs called attributes, and (3) a list of child nodes. More...
 
class  element_iterator
 This is a bidirectional iterator suitable for moving forward or backward within a list of Element nodes, for writable access. More...
 
class  Node
 Abstract handle for holding any kind of node in an XML tree. More...
 
class  node_iterator
 This is a bidirectional iterator suitable for moving forward or backward within a list of Nodes, for writable access. More...
 
class  Text
 This is the "leaf" content of an element. More...
 
class  Unknown
 This is something we don't understand but can carry around. More...
 

Public Types

enum  NodeType {
  NoNode = 0x00,
  ElementNode = 0x01,
  TextNode = 0x02,
  CommentNode = 0x04,
  UnknownNode = 0x08,
  NoJunkNodes = ElementNode|TextNode,
  JunkNodes = CommentNode|UnknownNode,
  AnyNodes = NoJunkNodes|JunkNodes
}
 The NodeType enum serves as the actual type of a node and as a filter for allowable node types during an iteration over nodes. More...
 
typedef Xml Document
 This typedef allows Xml::Document to be used as the type of the document which is more conventional than using just Xml, and provides future compatibility should we decide to upgrade Xml::Document to a class. More...
 

Public Member Functions

Construction

You can start with an empty Xml::Document or initialize it from a file.

 Xml ()
 Create an empty XML Document with default declaration and default root element with tag "_Root". (You should invoke this as Xml::Document() instead of just Xml().) More...
 
 Xml (const String &pathname)
 Create a new XML document and initialize it from the contents of the given file name. (You should invoke this as Xml::Document() instead of just Xml().) More...
 
 Xml (const Xml::Document &source)
 Copy constructor makes a deep copy of the entire source document; nothing is shared between the source and the copy. More...
 
Xml::Documentoperator= (const Xml::Document &souce)
 Copy assignment frees all heap space associated with the current Xml::Document and then makes a deep copy of the source document; nothing is shared between the source and the copy. More...
 
 ~Xml ()
 The destructor cleans up all heap space associated with this document. More...
 
void clear ()
 Restore this document to its default-constructed state. More...
 
Top-level node manipulation

These methods provide access to the top-level nodes, that is, those that are directly owned by the Xml::Document.

Comment and Unknown nodes are allowed anywhere at the top level, but Text nodes are not allowed and there is just one distinguished Element node, the root element. If you want to add Text or Element nodes, add them to the root element rather than at the document level.

Element getRootElement ()
 Return an Element handle referencing the top-level element in this Xml::Document, known as the "root element". More...
 
const StringgetRootTag () const
 Shortcut for getting the tag word of the root element which is usually the document type. More...
 
void setRootTag (const String &tag)
 Shortcut for changing the tag word of the root element which is usually the document type. More...
 
void insertTopLevelNodeAfter (const node_iterator &afterThis, Node insertThis)
 Insert a top-level Comment or Unknown node just after the location indicated by the node_iterator, or at the end of the list if the iterator is node_end(). More...
 
void insertTopLevelNodeBefore (const node_iterator &beforeThis, Node insertThis)
 Insert a top-level Comment or Unknown node just before the location indicated by the node_iterator. More...
 
void eraseTopLevelNode (const node_iterator &deleteThis)
 Delete the indicated top-level node, which must not be the root element, and must not be node_end(). More...
 
Node removeTopLevelNode (const node_iterator &removeThis)
 Remove the indicated top-level node from the document, returning it as an orphan rather than erasing it. More...
 
Iteration through top-level nodes (rarely used)

If you want to run through this document's top-level nodes (of which the "root element" is one), these methods provide begin and end iterators.

By default you'll see all the nodes (types Comment, Unknown, and the lone top-level Element) but you can restrict the node types that you'll see via the NodeType mask. Iteration is rarely used at this top level since you almost never care about about the Comment and Unknown nodes here and you can get to the root element directly using getRootElement().

See also
getRootElement()
node_iterator node_begin (NodeType allowed=AnyNodes)
 Obtain an iterator to all the top-level nodes or a subset restricted via the allowed NodeType mask. More...
 
node_iterator node_end () const
 This node_end() iterator indicates the end of a sequence of nodes regardless of the NodeType restriction on the iterator being used. More...
 
XML Declaration attributes (rarely used)

These methods deal with the mysterious XML "declaration" line that comes at the beginning of every XML document; that is the line that begins with "<?xml" and ends with "?>".

There are at most three of these attributes and they have well-defined names that are always the same (default values shown):

  • version = "1.0": to what version of the XML standard does this document adhere?
  • encoding = "UTF-8": what Unicode encoding is used to represent the character in this document? Typically this is UTF-8, an 8-bit encoding in which the first 128 codes match standard ASCII but where other characters are represented in variable-length multibyte sequences.
  • standalone = "yes": can this document be correctly parsed without consulting other documents?

You can examine and change these attributes with the methods in this section, however unless you really know what you're doing you should just leave the declaration alone; you'll get reasonable behavior automatically.

String getXmlVersion () const
 Returns the Xml "version" attribute as a string (from the declaration line at the beginning of the document). More...
 
String getXmlEncoding () const
 Returns the Xml "encoding" attribute as a string (from the declaration line at the beginning of the document). More...
 
bool getXmlIsStandalone () const
 Returns the Xml "standalone" attribute as a bool (from the declaration line at the beginning of the document); default is true ("yes" in a file), meaning that the document can be parsed correctly without any other documents. More...
 
void setXmlVersion (const String &version)
 Set the Xml "version" attribute; this will be written to the "declaration" line which is first in any Xml document. More...
 
void setXmlEncoding (const String &encoding)
 Set the Xml "encoding" attribute; this doesn't affect the in-memory representation but can affect how the document gets written out. More...
 
void setXmlIsStandalone (bool isStandalone)
 Set the Xml "standalone" attribute; this is normally true (corresponding to standalone="yes") and won't appear in the declaration line in that case when we write it out. More...
 

Static Public Member Functions

static String getNodeTypeAsString (NodeType type)
 Translate a NodeType to a human-readable string. More...
 

Friends

class Node
 

Related Functions

(Note that these are not member functions.)

std::ostream & operator<< (std::ostream &o, const Xml::Document &doc)
 Output a "pretty printed" textual representation of the given Xml::Document to an std::ostream, using the document's current indent string for formatting. More...
 

Serializing and I/O

These methods deal with conversion to and from the in-memory representation of the XML document from and to files and strings.

static void setXmlCondenseWhiteSpace (bool shouldCondense)
 Set global mode to control whether white space is preserved or condensed down to a single space (affects all subsequent document reads; not document specific). More...
 
static bool isXmlWhiteSpaceCondensed ()
 Return the current setting of the global "condense white space" option. More...
 
void readFromFile (const String &pathname)
 Read the contents of this Xml::Document from the file whose pathname is supplied. More...
 
void writeToFile (const String &pathname) const
 Write the contents of this in-memory Xml::Document to the file whose pathname is supplied. More...
 
void readFromString (const String &xmlDocument)
 Read the contents of this Xml::Document from the supplied string. More...
 
void readFromString (const char *xmlDocument)
 Alternate form that reads from a null-terminated C string (char*) rather than a C++ string object. More...
 
void writeToString (String &xmlDocument, bool compact=false) const
 Write the contents of this in-memory Xml::Document to the supplied string. More...
 
void setIndentString (const String &indent)
 Set the string to be used for indentation when we produce a "pretty-printed" serialized form of this document. The default is to use four spaces for each level of indentation. More...
 
const StringgetIndentString () const
 Return the current value of the indent string. The default is four spaces. More...
 

Detailed Description

This class provides a minimalist capability for reading and writing XML documents, as files or strings.

This is based with gratitude on the excellent open source XML parser TinyXML (http://www.grinninglizard.com/tinyxml/). Note that this is a non-validating parser, meaning it deals only with the XML file itself and not with a Document Type Definition (DTD), XML Schema, or any other description of the XML file's expected contents. Instead, the structure of your code that uses this class encodes the expected structure and contents of the XML document.

Our in-memory model of an XML document is simplified even further than TinyXML's. There a lot to know about XML; you could start here: http://en.wikipedia.org/wiki/XML. However, everything you need to know in order to read and write XML documents with the SimTK::Xml class is described below.

Much of the detailed documention is in the class Xml::Element; be sure to look there as well as at this overview.

Our in-memory model of an XML document

We consider an XML document to be a tree of "Nodes". There are only four types of nodes: Comments, Unknowns, Text, and Elements. Only Elements can contain Text and other nodes, including recursively child Element nodes. Elements can also have "Attributes" which are name:value pairs (not nodes).

The XML document as a whole is represented by an object of class Xml::Document. The Xml::Document object directly contains a short list of nodes, consisting only of Comments, Unknowns, and a single Element called the "root element". The tag word associated with the root element is called the "root tag" and conventionally identifies the kind of document this is. For example, XML files produced by VTK begin with a root tag "<VTKFile>".

We go to some pain to make sure every Xml::Document fits the above model so that you don't have to think about anything else. For example, if the file as read in has multiple root-level elements, or has document-level text, we will enclose all the element and text nodes within document start tag "<_Root>" and end tag "</_Root>" thus making it fit the description above. We call this "canonicalizing" the document.

Value Elements

Element nodes can be classified into "value elements" and "compound elements". A value element is a "leaf" element (no child elements) that contains at most one Text node. For example, a document might contain value elements like these:

<name>John Doe</name>
<rating>7.2</rating>
<winnings currency=euro>3429</winnings>
<preferences/>
<vector>1.2 -4 2e-3</vector>

All of these have a unique value so it makes sense to talk about "the" value of these elements (the empty "preferences" element has a null value). These are very common in XML documents, and the Xml::Element class makes them very easy to work with. For example, if Element elt is the "<vector>" element from the example, you could retrieve its value as a Vec3 like this:

Vec3 v = elt.getValueAs<Vec3>();

This would automatically throw an error if the element wasn't a value element or if its value didn't have the right format to convert to a Vec3.

Note that it is okay for a value element to have attributes; those are ignored in determining the element's value. Any element that is not a value element is a "compound element", meaning it has either child elements and/or more than one Text node.

Reading an XML document

To read an XML document, you create an Xml::Document object and tell it to read in the document from a file or from a string. The document will be parsed and canonicalized into the in-memory model described above. Then to rummage around in the document, you ask the Xml::Document object for its root element, and check the root tag to see that it is the type of document you are expecting. You can check the root element's attributes, and then process its contents (child nodes). Iterators are provided for running through all the attributes, all the child nodes contained in the element, or all the child nodes of a particular type. For a child node that is an element, you check the tag and then pass the element to some piece of code that knows how to deal with that kind of element and its children recursively.

Here is a complete example of reading in an Xml file "example.xml", printing the root tag and then the types of all the document-level nodes, in STL iterator style:

Xml::Document doc("example.xml");
cout << "Root tag: " << ex.getRootTag() << endl;
for (Xml::node_iterator p=doc.node_begin(); p != doc.node_end(); ++p)
cout << "Node type: " << p->getNodeTypeAsString() << endl;

Exactly one of the above nodes will have type "ElementNode"; that is the root element. To print out the types of nodes contained in the root element, you could write:

Xml::Element root = ex.getRootElement();
for (Xml::node_iterator p=root.node_begin(); p != root.node_end(); ++p)
cout << "Node type: " << p->getNodeTypeAsString() << endl;

(Some confessions: despite appearances, "Xml" is not a namespace, it is a class with the other Xml classes being internal classes of Xml. An object of type Xml is an XML document; the name Xml::Document is a typedef synonymous with Xml.)

Writing an XML document

You can insert, remove, and modify nodes and attributes in a document, or create a document from scratch. Then you can write the results in a "pretty-printed" or compact format to a file or a string; for pretty-printing you can override the default indentation string (four spaces). Whenever we write an XML document, we write it in canoncial format, regardless of how it looked when we found it.

At the document level, you can only insert Comment and Unknown nodes. Text and Element nodes can be inserted only at the root element level and below.

Details about XML

This section provides detailed information about the syntax of XML files as we accept and produce them. You won't have to know these details to read and write XML files using the SimTK::Xml class, but you may find this helpful for when you have to look at an XML file in a text editor.

Lexical elements

(Ignore the quote characters below; those are present so I can get this text through Doxygen.)

  • An XML document is a string of Unicode characters; all metadata is case sensitive.
  • The file begins with a "declaration" tag beginning with "<?xml" and ending with "?>"
  • Comments look like this: "<!--" anything "-->"
  • The characters in an XML file represent markup and content
  • Markup consists of "tags" delimited by "<" and ">", attributes denoted by name="value", and character escapes delimited by "&" and ";".
  • Tags come in three flavors: start tags like "<word>", end tags like "</word>" and empty element tags like "<word/>". Tag words must begin with a letter or an underscore and are case sensitive; "xml" is reserved; don't use it.
  • Attributes are recognized only in start tags, empty element tags, and declaration tags. In standard XML the value must be quoted with single or double quotes, but we'll supply missing quotes if there are none. Attribute names are case sensitive and must be unique within a tag; but if we see duplicates we'll just ignore all but the last.
  • There are five pre-defined escapes: "&lt;" and "&gt;" representing "<" and ">", "&amp;" for ampersand, "&apos;" for apostrophe (single quote) and "&quot;" for double quote.
  • There are also "numeric character reference" escapes of the form "&#nnnnn;" (decimal) or "&#xnnnn;" (hex), with only as many digits as needed.
  • Text set off by "<![CDATA[" and "]]>" is interpreted as a raw byte stream.
  • Tags that begin "<x" where x is not a letter or underscore and isn't one of the above recognized patterns will be passed through uninterpreted.
  • Anything else is Unicode text.

File structure

An XML file contains a single document which consists at the top level of

  • a declaration
  • comments and unknowns
  • a root element
  • more comments and unknowns

Elements can be containers of other nodes and are thus the basis for the tree structure of XML files. Elements can contain:

  • comments
  • unknowns
  • text
  • child elements, recursively
  • attributes

A declaration (see below) also has attributes, but there are only three: version, encoding, and standalone ('yes' or 'no'). Unknowns are constructs found in the file that are not recognized; they might be errors but they are likely to be more sophisticated uses of XML that our feeble parser doesn't understand. Unknowns are tags where the tag word doesn't begin with a letter or underscore and isn't one of the very few other tags we recognize, like comments. As an example, a DTD tag like this would come through as an Unknown node here:

<!DOCTYPE note SYSTEM "Note.dtd">

Here is the top-level structure we expect of a well-formed XML document, and we will impose this structure on XML documents that don't have it. This allows us to simplify the in-memory model as discussed above.

<?xml version="1.0" encoding="UTF-8"?>
<!-- maybe comments and unknowns -->
<roottag attr=value ... >
... contents ...
</roottag>
<!-- maybe comments and unknowns -->

That is, the first line should be a declaration, most commonly exactly the characters shown above, without the "standalone" attribute which will default to "yes". If we don't see a declaration when reading an XML document, we'll assume we read the one above. Then the document should contain exactly one root element representing the type of document and document-level attributes. The tag for the root element is not literally "roottag" but some name that makes sense for the given document. Note that the root element is an ordinary element so "contents" can contain text and child elements (as well as comments and unknowns).

When reading an XML document, if it has exactly one document-level element and no document-level text, we'll take the document as-is. If there is more than one document-level element, or we find some document-level text, we'll assume that the root element is missing and act as though we had seen a root element "<_Root>" at the beginning and "</_Root>" at the end so the root tag will be "_Root". Note that this means that we will interpret even a plain text file as a well-formed XML document:

A file consisting <?xml version="1.0" encoding="UTF-8" ?>
of just text ==> <_Root>
like this. A file consisting of just text like this.
</_Root>

The above XML document has a single document-level element and that element contains one Text node whose value is the original text.

See also
Xml::Element, Xml::Node

Member Typedef Documentation

This typedef allows Xml::Document to be used as the type of the document which is more conventional than using just Xml, and provides future compatibility should we decide to upgrade Xml::Document to a class.

Member Enumeration Documentation

The NodeType enum serves as the actual type of a node and as a filter for allowable node types during an iteration over nodes.

We consider Element and Text nodes to be meaningful, while Comment and Unknown nodes are meaningless junk. However, you are free to extract some meaning from them if you know how. In particular, DTD nodes end up as Unknown.

Enumerator
NoNode 

Type of empty Node handle, or null filter.

ElementNode 

Element node type and only-Elements filter.

TextNode 

Text node type and only-Text nodes filter.

CommentNode 

Comment node type and only-Comments filter.

UnknownNode 

Unknown node type and only-Unknowns filter.

NoJunkNodes 

Filter out meaningless nodes.

JunkNodes 

Filter out meaningful nodes.

AnyNodes 

Allow all nodes.

Constructor & Destructor Documentation

SimTK::Xml::Xml ( )

Create an empty XML Document with default declaration and default root element with tag "_Root". (You should invoke this as Xml::Document() instead of just Xml().)

If you were to print out this document now you would see:

<?xml version="1.0" encoding="UTF-8"?>
<_Root />
SimTK::Xml::Xml ( const String pathname)
explicit

Create a new XML document and initialize it from the contents of the given file name. (You should invoke this as Xml::Document() instead of just Xml().)

An exception will be thrown if the file doesn't exist or can't be parsed.

See also
readFromFile(), readFromString()
SimTK::Xml::Xml ( const Xml::Document source)

Copy constructor makes a deep copy of the entire source document; nothing is shared between the source and the copy.

SimTK::Xml::~Xml ( )

The destructor cleans up all heap space associated with this document.

Member Function Documentation

static String SimTK::Xml::getNodeTypeAsString ( NodeType  type)
static

Translate a NodeType to a human-readable string.

Xml::Document& SimTK::Xml::operator= ( const Xml::Document souce)

Copy assignment frees all heap space associated with the current Xml::Document and then makes a deep copy of the source document; nothing is shared between the source and the copy.

void SimTK::Xml::clear ( )

Restore this document to its default-constructed state.

void SimTK::Xml::readFromFile ( const String pathname)

Read the contents of this Xml::Document from the file whose pathname is supplied.

This first clears the current document so the new one completely replaces the old one.

See also
readFromString()
void SimTK::Xml::writeToFile ( const String pathname) const

Write the contents of this in-memory Xml::Document to the file whose pathname is supplied.

The file will be created if it doesn't exist, overwritten if it does exist. The file will be "pretty-printed" using the current indent string.

See also
setIndentString(), writeToString()
void SimTK::Xml::readFromString ( const String xmlDocument)

Read the contents of this Xml::Document from the supplied string.

This first clears the current document so the new one completely replaces the old one.

See also
readFromFile()
void SimTK::Xml::readFromString ( const char *  xmlDocument)

Alternate form that reads from a null-terminated C string (char*) rather than a C++ string object.

This would otherwise be implicitly converted to string first which would require copying.

void SimTK::Xml::writeToString ( String xmlDocument,
bool  compact = false 
) const

Write the contents of this in-memory Xml::Document to the supplied string.

The string cleared first so will be completely overwritten. Normally the output is "pretty-printed" as it is for a file, but if you set compact to true the tabs and newlines will be suppressed to make a more compact representation.

See also
setIndentString(), writeToFile()
void SimTK::Xml::setIndentString ( const String indent)

Set the string to be used for indentation when we produce a "pretty-printed" serialized form of this document. The default is to use four spaces for each level of indentation.

See also
writeToFile(), writeToString(), getIndentString()
const String& SimTK::Xml::getIndentString ( ) const

Return the current value of the indent string. The default is four spaces.

See also
setIndentString()
static void SimTK::Xml::setXmlCondenseWhiteSpace ( bool  shouldCondense)
static

Set global mode to control whether white space is preserved or condensed down to a single space (affects all subsequent document reads; not document specific).

The default is to condense.

static bool SimTK::Xml::isXmlWhiteSpaceCondensed ( )
static

Return the current setting of the global "condense white space" option.

Note that this option affects all Xml reads; it is not document specific.

Element SimTK::Xml::getRootElement ( )

Return an Element handle referencing the top-level element in this Xml::Document, known as the "root element".

The tag word of this element is usually the type of document. This is the only top-level element; all others are its children and descendents. Once you have the root Element handle, you can also use any of the Element methods to manipulate it. If you need a node_iterator that refers to the root element (perhaps to use one of the top-level insert methods), use node_begin() with a NodeType filter:

Xml::node_iterator rootp = Xml::node_begin(Xml::ElementNode);

That works since there is only one element at this level.

const String& SimTK::Xml::getRootTag ( ) const

Shortcut for getting the tag word of the root element which is usually the document type.

This is the same as getRootElement().getElementTag().

void SimTK::Xml::setRootTag ( const String tag)

Shortcut for changing the tag word of the root element which is usually the document type.

This is the same as getRootElement().setElementTag(tag).

void SimTK::Xml::insertTopLevelNodeAfter ( const node_iterator afterThis,
Node  insertThis 
)

Insert a top-level Comment or Unknown node just after the location indicated by the node_iterator, or at the end of the list if the iterator is node_end().

The iterator must refer to a top-level node. The Xml::Document takes over ownership of the Node which must be a Comment or Unknown node and must have been an orphan. The supplied Node handle will retain a reference to the node within the document and can still be used to make changes, but will no longer by an orphan.

void SimTK::Xml::insertTopLevelNodeBefore ( const node_iterator beforeThis,
Node  insertThis 
)

Insert a top-level Comment or Unknown node just before the location indicated by the node_iterator.

See insertTopLevelNodeAfter() for details.

void SimTK::Xml::eraseTopLevelNode ( const node_iterator deleteThis)

Delete the indicated top-level node, which must not be the root element, and must not be node_end().

That is, it must be a top-level Comment or Unknown node which will be removed from the Xml::Document and deleted. The iterator is invalid after this call; be sure not to use it again. Also, there must not be any handles referencing the now-deleted node.

Node SimTK::Xml::removeTopLevelNode ( const node_iterator removeThis)

Remove the indicated top-level node from the document, returning it as an orphan rather than erasing it.

The node must not be the root element, and must not be node_end(). That is, it must be a top-level Comment or Unknown node which will be removed from the Xml::Document and returned as an orphan Node. The iterator is invalid after this call; be sure not to use it again.

node_iterator SimTK::Xml::node_begin ( NodeType  allowed = AnyNodes)

Obtain an iterator to all the top-level nodes or a subset restricted via the allowed NodeType mask.

node_iterator SimTK::Xml::node_end ( ) const

This node_end() iterator indicates the end of a sequence of nodes regardless of the NodeType restriction on the iterator being used.

String SimTK::Xml::getXmlVersion ( ) const

Returns the Xml "version" attribute as a string (from the declaration line at the beginning of the document).

String SimTK::Xml::getXmlEncoding ( ) const

Returns the Xml "encoding" attribute as a string (from the declaration line at the beginning of the document).

bool SimTK::Xml::getXmlIsStandalone ( ) const

Returns the Xml "standalone" attribute as a bool (from the declaration line at the beginning of the document); default is true ("yes" in a file), meaning that the document can be parsed correctly without any other documents.

We won't include "standalone" in the declaration line for any Xml documents we generate unless the value is false ("no" in a file).

void SimTK::Xml::setXmlVersion ( const String version)

Set the Xml "version" attribute; this will be written to the "declaration" line which is first in any Xml document.

void SimTK::Xml::setXmlEncoding ( const String encoding)

Set the Xml "encoding" attribute; this doesn't affect the in-memory representation but can affect how the document gets written out.

void SimTK::Xml::setXmlIsStandalone ( bool  isStandalone)

Set the Xml "standalone" attribute; this is normally true (corresponding to standalone="yes") and won't appear in the declaration line in that case when we write it out.

If you set this to false then standalone="no" will appear in the declaration line when it is written.

Friends And Related Function Documentation

friend class Node
friend
std::ostream & operator<< ( std::ostream &  o,
const Xml::Document doc 
)
related

Output a "pretty printed" textual representation of the given Xml::Document to an std::ostream, using the document's current indent string for formatting.

See also
Xml::setIndentString()

The documentation for this class was generated from the following file: