Saturday, December 12, 2009

OGDI Python tutorial, or why OGDI is awesome.

The open government data initiative is a program started by Obama. It is so, that the US government can make easily available to the public, data that the public normally has the right to see, but it allows it to be accessed in a programmable way. It is sponsored by microsoft. And at first that sounds bad. Like to much information. And that microsoft has the data. But I gotta say this thing is shaping up to be pretty spiffy. And though it seems like it is more geared toward microsoft's .net, it has JSON and XML plugged into it as well, and grabbing the data with python is easy as pi.

Ok so the below, is an example of Juvenile Arrest Charges in DC if you just want to start with that use this url:
instead of$filter=gender%20eq%20'F

What I have done is add a query of gender eq 'F' to the filter which automagically filters the results. So now you only see juvenile's arrested that are Female.

import urllib
from xml.dom import minidom
from xml.dom.minidom import parse, parseString

def GetData():
url = "$filter=gender%20eq%20'F'"
xmldoc = minidom.parse(urllib.urlopen(url))
contentNodes = xmldoc.getElementsByTagName("content")
for contentNode in contentNodes:
partitionKeyNodes = contentNode.getElementsByTagName("d:PartitionKey")
for node in partitionKeyNodes:
print node.childNodes[0].nodeValue

rowKeyNodes = contentNode.getElementsByTagName("d:RowKey")
for node in rowKeyNodes:
print node.childNodes[0].nodeValue

timestampNodes = contentNode.getElementsByTagName("d:TimeStamp")
for node in timestampNodes:
print node.childNodes[0].nodeValue

entityidNodes = contentNode.getElementsByTagName("d:entityid")
for node in entityidNodes:
print node.childNodes[0].nodeValue

nameNodes = contentNode.getElementsByTagName("d:name")
for node in nameNodes:
print node.childNodes[0].nodeValue

addressNodes = contentNode.getElementsByTagName("d:address")
for node in addressNodes:
print node.childNodes[0].nodeValue

weburlNodes = contentNode.getElementsByTagName("d:weburl")
for node in weburlNodes:
print node.childNodes[0].nodeValue

gis_idNodes = contentNode.getElementsByTagName("d:gis_id")
for node in gis_idNodes:
print node.childNodes[0].nodeValue

genderNodes = contentNode.getElementsByTagName("d:gender")
for node in genderNodes:
print node.childNodes[0].nodeValue

offense_descriptionNodes = contentNode.getElementsByTagName("d:offensedescription")
for node in offense_descriptionNodes:
print node.childNodes[0].nodeValue

Your output will look something like this:


If you just want the output you can actually load a link up in excel. The website shows you how to do this with code samples in various .net languages, ruby, python, php etc..
The video and sample code on the site are very useful.

Information is only available in DC right now it seems...Though it seems while I was doing research on this there was a map showing michigan had finished as well as some other states...though I don't know how to get to that data...hmmm.

For more information there is a microsoft site here:, why is this the bees knees? Why am I so hyped about this? Everyone has those horror stories of how they didn't get a job because of drunk photos on facebook or something. I honestly believe that with public information sources like these, there will be so much crap on everybody. So much of bad things that people do...usually in the comfort of their own home. That people are going to forget this silly nonsense, of you did something I don't approve of, and I'm not going to hire you. I think with this really open sources of information a day will come when you don't have to be Jesus to get a job. I write a lot of shit down on the intarwebs, and post pictures etc... I know what posting my opinion or general malfeasentry can cost me. But in the end I think this big ol' goofy world is just going to have to accept imperfect big ol' goofy people.

This does make a bold assumption that things like people's personal arrest records will be available online.

Plus the world needs an app that pairs dates based on similar criminal interests, lulz.

No comments:

Post a Comment