Getting started with Libxml2 and Python (Part 2)

by on Feb.21, 2007, under Knowledge Base, Technology, Tutorials

After I published the first part of this tutorial, John Dennis gave me some
feedback on the xml@gnome.org mailing list (http://mail.gnome.org/archives/xml/2007-February/thread.html).
He posed a couple of interesting questions

  1. 1. how do I build complex python objects by parsing an XML doc?
  2. 2. how can I serialize python objects into XML?

Normally I would use pickling and unpickling to serialise Python objects, but
I can see some cases in which this might come in useful. Having a bit of a play
with dynamically creating objects in Python made me realise that this is a
non-trivial challenge as well and so is probably an ideal exercise for this
tutorial. Creating arbitrary objects from an XML source document.

Simple example

Let’s imagine that we have the following XML document

	<?xml version='1.0'>
	<user>
		<name>Mike Kneller</name>
		<homepage>http://www.mikekneller.com</homepage>
	</user>

Dynamically populating an object is fairly straightforward in Python as it
allows the dynamic creation of object attributes. We just need to loop through
the document creating the properties. This is simple when we realise that
setattr(obj,’foo’,123) is equivalent to obj.foo = 123.

	class DynamicObject:
		pass

	user = DynamicObject()

	doc = libxml2.parseFile( 'userdata.xml' )
	child = doc.getRootElement().children

	while child is not None:
		if child.type == "element":
			setattr(user, child.name, child.content)
		child = child.next
	doc.freeDoc()

That was almost too easy! Examining user shows that we have a Python object
populated with the contents of the XML document as expected.

	>>> print user.__dict__
	{'homepage': 'http://www.mikekneller.com', 'name': 'Mike Kneller'}
	>>> print user.homepage

http://www.mikekneller.com

Although this is a useful routine, simply filling an object with string values
doesn’t really count as a ‘complex’ object (although it illustrates the point).
It’s main use is the generation of arbitrary data objects where the order and
naming of the data is not known in advance.

Walking the tree

A ‘complex’ object would be one containing a mixture of data types, possibly
holding other objects and maybe some code.

This XML document (people.xml) contains a group of people that we would like to
load.

	<?xml version='1.0'?>
	<people>
		<person>
			<name>Mike</name>
			<age>34</age>
			<friends>
				<friend>Steve</friend>
				<friend>Mark</friend>
				<friend>Dave</friend>
			</friends>
		</person>

		<person>
			<name>Steve</name>
			<friends>
				<friend>Mike</friend>
				<friend>Mark</friend>
				<friend>Dave</friend>
			</friends>
			<hobbies>
				<hobby>Stamp collecting</hobby>
				<hobby>Train spotting</hobby>
			</hobbies>
		</person>

		<person>
			<name>Mark</name>
			<age>28</age>
			<friends>
				<friend>Mike</friend>
				<friend>Steve</friend>
			</friends>
		</person>

		<person>
			<name>Dave</name>
			<age>30</age>
			<friends>
				<friend>Mike</friend>
				<friend>Steve</friend>
			</friends>
		</person>
	</people>

To construct arbitrary Python objects from a document like this, we will need to walk the tree. This example recurses an XML document, printing out the node names and content, indenting as it goes.

	import libxml2

	def walkTree(xmlnode):
		child = xmlnode.children
		while child is not None:
			if not child.isBlankNode():
				if child.type == "element":
					childCount = int(child.xpathEval('count(*)'))

					# a count of the ancestor nodes tells us how deep in the
					# tree we are - lets just use it to indent our printed
					# output
					depth = int(child.xpathEval('count(ancestor::*)')) - 1
					if childCount == 0:
						# If the count of child elements is 0 then we
						# have a node only containing text
						print  depth * '\t' + child.name + ' : ' + child.content
					else:
						# If the node contains other child elements then
						# we can recurse down the tree
						print depth * '\t' + child.name
						walkTree(child)

			child = child.next

	doc = libxml2.parseFile('people.xml')
	root = doc.getRootElement()

	walkTree(root)

	doc.freeDoc()
:, , , ,

Leave a Reply

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!