BearPanther.com

Converting XML to CSV with Python

Introduction

user

bear


LATEST POSTS

Using jsformat with Sublime Text 26th July, 2014

There are many d3 demos like this but this one is mine 08th June, 2014

Blog

Converting XML to CSV with Python

Posted on .

XML. The word “yay” doesn’t necessarily spring to mind. I recently found myself staring at a URL that returned a bunch of XML. It crossed my mind to stop there, but I really wanted this data so I pressed on.

Luckily, like so many things, Python, by way of the lxml toolkit, makes dealing with XML pretty close to painless. What was the data? Why it was all Trader Joe’s store locations. I decided that I needed to be able to put all their store locations on any map of my choosing.

I end up coming up with script:

import csv
from lxml import etree

# in:  xml with trader joe's locations
# out:  csv with trader joe's locations

out = raw_input("Name for output file:  ")
if out.strip() is "":
  out = "trader-joes-all-locations.csv"

out_data = []

# use recover=True to ignore errors in the XML
# examples of errors in this XML:
#   missing "<" in opening tag:
#   fax></fax>
# missing "</" in closing tag:
#   <uid>1429860810uid>
# 
# also ignore blank text
parser = etree.XMLParser(recover=True, remove_blank_text=True)

# xml on disk...could also pass etree.parse a URL
file_name = "trader-joes-all-locations.xml"

# use lxml to read and parse xml
root = etree.parse(file_name, parser)

# element names with data to keep
tag_list = [ "name", "address1", "address2", "beer", "city", "comingsoon", "hours", "latitude", "longitude", "phone", "postalcode", "spirits", "state", "wine" ]

# add field names by copying tag_list
out_data.append(tag_list[:])

def missing_location(p):
  lat = p.find("latitude")
  lon = p.find("longitude")
  if lat is None or lon is None:
    return True
  else:
    return False

# pull info out of each poi node
def get_poi_info(p):
  # if latitude or longitude doesn't exist, skip
  if missing_location(p):
    print "tMissing location for %s" % p.find("name").text
    return None
  info = []
  for tag in tag_list:
    # if tag == "name":
    #   print "%s" % p.find(tag).text
    node = p.find(tag)
    if node is not None and node.text:
      if tag == "latitude" or tag == "longitude":
        info.append(round(float(node.text), 5))
      else:
        info.append(node.text.encode("utf-8"))
        # info.append(node.text.encode("ascii", "ignore"))
    else:
      info.append("")
  return info

print "nreading xml..."

# get all <poi> elements
pois = root.findall(".//poi")
for p in pois:
  poi_info = get_poi_info(p)
  # print "%s" % (poiInfo)
  if poi_info:
    out_data.append(poi_info)

print "finished xml, writing file..."

out_file  = open(out, "wb")
csv_writer = csv.writer(out_file, quoting=csv.QUOTE_MINIMAL)
for row in out_data:
  csv_writer.writerow(row)
 
out_file.close()

print "wrote %sn" % out

As with most Python code, it’s pretty readable and therefore self-explanatory. There’s not anything ground-breaking in there but if you’re curious about something, leave a comment.

Once I had a CSV, I put it in a map on arcgis.com and patted myself on the back.

Comments
user

Author Tim

Posted at 3:06 am May 5, 2014.

Do you have a sample of the XML file you were converting from?

Reply
user

Author John

Posted at 6:25 am October 15, 2014.

Hello and good Morning.

How do I test and run this? I am currently trying to find a Python script that converts XML to CSV and came across this blog. I am also receiving a Syntax Error when I run this in PyCharm. Here is the error that I am getting:

line 47 print “tMissing location for %s” % p.find(“name”).text SyntaxError: Invalid Syntax, process finished with error code 1

Reply

Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

View Comments (3) ...