Parsing JSON with Python

hi everyone so this week what we're
going to be doing is we're going to be
rounding out our work with Python to
include JSON so obviously last week what
we what we did is we use Python to
process a comma separated value file
that was much larger than we could
conveniently do with our existing tools
like open refine what we're going to do
this week is show now we know that we
can use open refine to open JSON files
which is handy and we know that we can
obviously use even refined to open
multiple files at once what we're going
to do this week is we're going to just
going to look at how we can use Python
to parse JSON files so it's going to be
very similar in many ways to what we did
with with the CSV data which is that
we're going to have a recipe book called
JSON we're going to import that and then
we're going to use methods from that in
order to access parts of our JSON but
the really the thing that I hope you'll
take away from this is that actually
once you kind of get past that initial
part of which recipe book do I use the
process of kind of cross loading and
parsing this data is very similar the
other thing that we're going to look at
is that's a nice feature that we can do
whether we're looking at CSV or JSON
data is that we actually don't have to
manually enter the title of every single
file that we need to look at that we
want to look at right so if I have a
collection of CSV data for example data
that I've collected over the course of
time or JSON data collected over the
course of time I don't have to actually
you know write an individual script for
every single one of those Python has a
way that it can just say look I want you
to look at every file in this folder and
for each file do the following things so
that's going to help when you are trying
to look at data over time which is very
often the case and then finally we're
just going to show you this is going to
be sort of just like a nice to have
something for you to take with you
something called a shell script it runs
in in the terminal will do it directly
you can actually run it in the terminal
just the same way we have been in aptana
we're going to look at this shell script
and what this is going to do is actually
let us automate the downloading of data
from a URL
so you get started here you can see
we're right back on our city bike data
page button rather than going to the CSV
files this time I'm going to come to the
station the station feed right so we've
looked at this multiple times before you
know nothing nothing really new to see
here although as you know as we've noted
in the past right one of the things this
shows us is what time I loaded the data
on the web page so what I'm going to do
is I'm going to download a copy of this
and to do that I'm just going to go
ahead and do a ctrl a or an Apple a and
an Apple C right I'm going to open up a
text edit file oops
TextEdit right and so this is all stuff
that we've done before I'm going to
paste it into my text edit file I'm
going to use a shortcut or the format
fund menu to convert it to plain text
and now I'm going to just save it and
I'm going to put it into the same Python
Word file that I've been that I've been
working in now I'm gonna go say ok so
I'm going to call this city like data
JSON right it's going to ask me if I
want to use text I say no thank you very
much and now I have a copy of this file
in my Python folder right so you can see
I also have some of the other stuff in
here that I was working on last week no
big deal so the next thing that I'm
going to do is obviously open aptana
studio and navigate to the right
location so dududu so I've already it
happens that already has my folder here
so I'm just going to click OK because I
already have the Python work folder set
up there and now I'm going to go
again and just as before I'm going to
say new from template Python you can see
that I didn't have to go through I won't
have to go through that configuration
process again because of course I've
already done it once on this computer
and I'm going to again do a save as and
make sure that I save a saved my file in
my work folder so just hop to the
desktop here and replace my file name
Brennan I call this one JSON JSON
processing for parody and here I go
so just as we did the last time when we
started by importing CSV instead I'm
going to start by importing JSON right
simple the name of the it's not always
going to be exactly this way but both
CSV and JSON are very common data types
so they have readily available quote
unquote recipe books or libraries which
is actually the programming term
Cawley's libraries for dealing with for
dealing with these data types and of
course the first thing that I want to do
is actually is I'm just going to go
ahead and I want to open my file so the
way that I do this is going to be in the
same way so I'm going to have source
file equals and now in this case I'm
going to tell the JSON specifically I'm
going to start out by telling the the
file specifically what to look for so
this is going to be exactly the same
thing that we did before I'm going to
say open I'm going to look for my file
name which in this case whoops is I've
named it citibike underscore data JSON
so I'm gonna say citibike data
oops say that JSON right I'm going to
pass it my are you right now in this
case I'm not going to create a dict
reader or use my CSV library because of
course the data that I'm targeting is
JSON not CSV right so in this case I
have a similar thing where I'm going to
use one of the
one of the recipes in my JSON recipe
book to make sure that the computer
understands that this is a JSON file and
to do that I'm going to create something
here that's called JSON data and the
method so I'm saying JSON so that's JSON
the recipe book JSON load ok and then
I'm gonna pass it source file ok so very
similar to what we did before in this
case instead of saying CSV dict writer
we're saying JSON load that happens to
be the name of the built in of the
recipe in the JSON cookbook that will
handle this and so now I can just say
just to get a sense that this is working
I'm just going to say print JSON data so
very very straightforward to start here
I'm gonna of course down in my terminal
window here I'm going to get into my
appropriate folder and I'm going to run
my file to make sure that everything is
working ok so JSON processing and you
can see that actually one of the things
so it looks kind of ok you can see that
one of the things coming out is
everything has this you ahead of it
that's actually an indicator that this
is unicode or rather or utf-8 ok is the
format of is the format of the data file
you may not have noticed that when we
saved this from TextEdit it had a check
bar it had a under the format thing it
said save as unicode utf-8
so I'll just demonstrate this again if I
open this with TextEdit ok um and I do
file I'm gonna say duplicate because
that's the way that it will sort of show
me this save as window again right and I
say file save okay you notice that the
plain text encoding is coming out as
unicode utf-8 that is what python needs
to in order to process this JSON data
now it's not something right of that
it's not something you need to worry
about so if you're doing this process
this way you don't need to worry about
it it will generally if you converted it
to plain text it will convert it will
default to utf-8 but if you ever run
into errors with it that's something to
look for is that maybe it's saved in a
different quote unquote
encoding encoding
but plaintext is what we're looking for
here so what we're going to do now is
basically just what we did the other day
now of course if we recall our the
structure of our data here I'm going to
sort of scroll back to the top here I
could obviously do this by looking at
text that I which might be a little bit
easier mm-hmm probably not the
sufficient way to do this very certainly
not the most efficient way but if we
remind ourselves up sign I'm going to
show you the whole thing well that was
not very useful okay so once again I'm
going to open this with TextEdit just to
remind myself of what the features of
this are and again if you look at the
top here you may remember now that we
have these two features we have
execution time which shows me when the
data was downloaded and then we have
this list called station being list and
station being list is the name of the
list inside the in the JSON document
right we know it's a list because it
starts with a square bracket that has
each station as as a JSON object so we
want to do is we want to of course loop
through each of these objects right loop
through the stations and then we're
going to want to pull the values out and
eventually we'll write those to a CSV
file so you'll see that this starts to
look very familiar so what I can say so
just the way that I printed the JSON
data I can test this by saying okay I
want to print the JSON data and then I
want it by the way actually I just want
to look at the execution time put some
double quotes
just want to look at the execution time
property right that execution time is
that one attribute that tells us when
this when I grab that file and so this
is something that as we're writing our
CSV we probably want to include right
because otherwise we sort of lose track
of that piece of information and that's
going to be very important especially
when we are trying to collect a bunch of
data over time right because this is
data that update updates all the time so
now you can see it's printing it no
problem so how am I going to loop
through all of my stations well it's
actually pretty straightforward
I'm going to say for station in
at JSON data station beam beam list
right exactly the same structure that we
looked at before we know that station
beam list is a list we check that in
data we know that this for in
construction can be used to loop through
Adams in a list in this case each item
happens to represent a row and then
we're going to say so this is where
we're going to say look if we want to
row for every station then we're going
to do very similar we're going to do
basically the same thing that we did
with our CSV which is we're gonna say
okay I want to create a row array or a
row copy right it's going to be empty
except in this case the first thing I'm
going to add to it is um the name the
execution time of the file right which I
have right here I can just copy this
into it right and the remember that the
execution time doesn't change it's only
once per file right so each row is going
to have the same value in here this is
going to make more sense when we're
actually trying to process multiple
files at once with a single Python file
we want to be able to process like three
or four copies of this JSON file so
we're going to do that and then what do
we want to do well we know that for
every attribute in station right so the
station itself is a list has a list of
attributes and I can go ahead and say
row whoops ah you missed anything row
array dot append right whatever the
station attribute that's that tribute
right now of course we haven't at this
point yet created an output file to
write to so we want to be sure and do
that and again we're going to do this in
exactly the same way that we did last
time which is going to be to open a file
I'm gonna say converted json dot csv
right and specify that this is a
writable file now in this case i want to
go back and I do want to import my C
V recipe book because I want to output
this as a CSV right so I didn't need it
to process the JSON coming in because
it's JSON data I use the JSON recipe
book for that but I'll put it I want to
say output writer dot sorry output
writer equals CS v dot writer right and
I'm going to pass output file to that
right so you see this is very very
similar to what we did last time and now
down here all I have to do is say okay
so I've done this I've created my row
array right this is all pretty
straightforward very very similar to
what we did last time and now I'm going
to say output writer dot right oh sorry
right row row right right
so again very very similar I'm going to
come back here I'm going to close my
files I'm going to say a source file
close output file close again not
strictly strictly necessary but useful
and then give us a go and you see that I
still had my print statement in here so
that did that and we come back we're
going to take a look at the file see if
it worked as we expect and then we're
going to move on to seeing how we would
handle running this script against just
all of the files that are in a given
folder right so again if I have multiple
files how do I process them without
having every single time to specifically
write the file name so I'll see you in
just a minute
Skill:
Expertise:

A tutorial on using the json library to parse JSON files with Python and write them to a csv

Contributor: Susan McGregor

Video 2 of 3