MSc Bioinformatics

DOM: Reading and writing XML from Python

DOM is a de facto standard for reading and writing XML across many languages including Python, Perl and JavaScript. DOM reads the whole XML document into memory making it easy to access information and jump around the document. This has the disadvantage that it is CPU and memory intensive. An alternative is SAX which handles XML as a stream of events.

Download the lecture slides.

In this practical you will write a Perl script to read an XML file and extract some data from it.

You will two scripts to extract sets of results and print them. If you are familiar with HTML (which will be taught in Biocomputing II) and you have time, you can then extend this to print out some data as an HTML table.

We have provided an XML file containing information on single site mutations in the Protein Databank. You will use this to extract data.