Getting started with Libxml2 and Python (Part 2)
by Mike on Feb.21, 2007, under Knowledge Base, Technology, Tutorials
After I published the first part of this tutorial, John Dennis gave me some
feedback on the xml@gnome.org mailing list (http://mail.gnome.org/archives/xml/2007-February/thread.html).
He posed a couple of interesting questions
- 1. how do I build complex python objects by parsing an XML doc?
- 2. how can I serialize python objects into XML?
Normally I would use pickling and unpickling to serialise Python objects, but
I can see some cases in which this might come in useful. Having a bit of a play
with dynamically creating objects in Python made me realise that this is a
non-trivial challenge as well and so is probably an ideal exercise for this
tutorial. Creating arbitrary objects from an XML source document.
Simple example
Let’s imagine that we have the following XML document
<?xml version='1.0'> <user> <name>Mike Kneller</name> <homepage>http://www.mikekneller.com</homepage> </user>
Dynamically populating an object is fairly straightforward in Python as it
allows the dynamic creation of object attributes. We just need to loop through
the document creating the properties. This is simple when we realise that
setattr(obj,’foo’,123) is equivalent to obj.foo = 123.
class DynamicObject: pass user = DynamicObject() doc = libxml2.parseFile( 'userdata.xml' ) child = doc.getRootElement().children while child is not None: if child.type == "element": setattr(user, child.name, child.content) child = child.next doc.freeDoc()
That was almost too easy! Examining user shows that we have a Python object
populated with the contents of the XML document as expected.
>>> print user.__dict__
{'homepage': 'http://www.mikekneller.com', 'name': 'Mike Kneller'}
>>> print user.homepage
http://www.mikekneller.com
Although this is a useful routine, simply filling an object with string values
doesn’t really count as a ‘complex’ object (although it illustrates the point).
It’s main use is the generation of arbitrary data objects where the order and
naming of the data is not known in advance.
Walking the tree
A ‘complex’ object would be one containing a mixture of data types, possibly
holding other objects and maybe some code.
This XML document (people.xml) contains a group of people that we would like to
load.
<?xml version='1.0'?> <people> <person> <name>Mike</name> <age>34</age> <friends> <friend>Steve</friend> <friend>Mark</friend> <friend>Dave</friend> </friends> </person> <person> <name>Steve</name> <friends> <friend>Mike</friend> <friend>Mark</friend> <friend>Dave</friend> </friends> <hobbies> <hobby>Stamp collecting</hobby> <hobby>Train spotting</hobby> </hobbies> </person> <person> <name>Mark</name> <age>28</age> <friends> <friend>Mike</friend> <friend>Steve</friend> </friends> </person> <person> <name>Dave</name> <age>30</age> <friends> <friend>Mike</friend> <friend>Steve</friend> </friends> </person> </people>
To construct arbitrary Python objects from a document like this, we will need to walk the tree. This example recurses an XML document, printing out the node names and content, indenting as it goes.
import libxml2
def walkTree(xmlnode):
child = xmlnode.children
while child is not None:
if not child.isBlankNode():
if child.type == "element":
childCount = int(child.xpathEval('count(*)'))
# a count of the ancestor nodes tells us how deep in the
# tree we are - lets just use it to indent our printed
# output
depth = int(child.xpathEval('count(ancestor::*)')) - 1
if childCount == 0:
# If the count of child elements is 0 then we
# have a node only containing text
print depth * '\t' + child.name + ' : ' + child.content
else:
# If the node contains other child elements then
# we can recurse down the tree
print depth * '\t' + child.name
walkTree(child)
child = child.next
doc = libxml2.parseFile('people.xml')
root = doc.getRootElement()
walkTree(root)
doc.freeDoc()