Parsing All Files in Folder Using Python

okay folks so the next thing we're gonna
do is double check that we got something
that seemed useful out of our out of our
reading and writing over here and so the
simple way to do that is I'm just going
to go check on my CSV file
so here's making for JSON to CSV it does
seem like things got written to it which
is good
usually what we are looking for now you
will notice be sure there are no row
headers here right I actually haven't
written the names of the field at the
same time that I put them into this file
so of course how might I deal with this
one thing I could do is I could loop
through I'm not gonna pop them to save
this because obviously it's already
there you know I could do something like
I could loop through the attributes
right in station I can do that once
ahead of time right before I started
going through every single one in the
list and just write it to the output
file so I'm not gonna worry about that
just yet because what I want to do is I
want to look at how I might deal with
multiple files so to do that
you know if you're gonna be dealing with
multiple data files and these again
remember could be CSV or JSON files it
doesn't matter and we'll see exactly how
and why it doesn't matter mm-hmm you
know chances are you're not gonna want
to have them kind of here cluttering up
your folder that has your actual Python
scripts in it you're gonna want to put
them into a folder altogether so I'm
gonna call this JSON data or do some
downloads let's call it that okay I'm
gonna put my city bike data JSON in
there and now I'm also going to put in
just another copy of this right so a few
minutes has passed that means that the
data values will have changed here so
again imagining that you were doing an
analysis over time you might want to be
able to load multiple copies of this
data so I'm gonna go right back to I'm
gonna just return in my browser to this
you'll see that if I reload this page
right there's been
date in the execution time the data has
changed I'm going to go ahead and copy
this same process as before create a new
text edit file nothing new to see here
oops so I'm gonna say new taste make
sure I'm gonna plain text save it and
I'm just gonna call this city by data 1
Jason once again it's gonna ask me about
saving in these JSON all as well okay so
the question is how can i get python 2
rather than requiring that I say open
this specific file that it will actually
just go through all of the files there
we are
so how can I make sure that it goes
through all of the files so gonna
comment out my file out my print
statement here right so we're going back
all kind of all the way back up here now
the one thing that I'm gonna stipulate
is that I only want to write to one file
but I want to read in from multiple
files right so I'm only gonna create my
output file reader and writer once but
now there's a question of everything
below this right I need to I need to
somehow make more generic right so what
I'm going to use is I'm going to use
another recipe book and this one is kind
of funny
it's called glob I think it might stand
for global I'm not sure I think it's
funny that it's called glob so I sort of
prefer not to look into it too deeply I
can do is glob can actually look at a
folder and will look it will treat all
of the files in the folder it will
convert it basically create something it
will create a list of all the files in a
folder right then hopefully you're
seeing the parallels here I can just use
my for in loop to say okay loops for all
of those files so the way that I access
the way that I access club is gonna be
pretty simple here the first thing that
I'm gonna do is I'm gonna I'm gonna
create a variable called folder name
okay because remember I just moved all
of my data into a folder because I don't
want it cluttering up the place where
all of my scripts are right so I have a
folder called Jason downloads so I'm
gonna say hey by the way the folders
name is chase on down lives right this
is just kind of keep it out of the way
then the next thing that I'm gonna do is
I'm gonna say okay for file name in this
is where I use glob
so glob is the name of yeah glob global
maybe glob is the name of the recipe
book and then the name of the recipe
specific name of the recipe is a glob
okay - laughs and then what I do is I
say okay so I want I'm gonna I'm gonna
tell I'm gonna give it the name - look
I'm gonna give it the folder to look at
and then I'm also going to give it some
general parameters of the type of files
that it should be looking for so it's
gonna be folder name right which is a
variable but then on top of that I'm
gonna say look you're gonna go to the
JSON downloads folder forward slash
means go into it right you're familiar
with that from looking at your you know
your computer and also probably moving
around with the command line at this
point and now what I can do is I can use
asterisk asterisk is a wild-card okay so
it's kind of the equivalent it's
actually more than the equivalent of the
dot in our regular expressions wildcard
is like a combination of dot and
wildcard dot and asterisk right so
basically it's saying any collection of
characters that ends in dot JSON so
essentially you would interpret this as
go to this folder and then look for any
file name that ends in dot JSON okay and
that's because I might have other stuff
in this folder right I mean we could be
very clean and tidy about it make sure
nothing else gets in there but we want
to be specific and just say by the way
you know there's something that isn't a
JSON file I don't want you to try to
deal with it and now again we have a for
loop
so all the code that we have will still
work right we just need to make sure
that it's clear that it belongs to this
for loop right and so the way that I'm
going to do that is just I actually
should see if there's a way to shortcut
tabbing I'm gonna go look for a shortcut
for tabbing a whole chunk of code and
I'll be right back okay well that was
quick so it turns out that if I want to
get all of this under my for loop right
so I can't I need to have all of this
indented if I actually just highlight
all of it and hit the tab button it just
bumps it all in one tab right if I
wanted to untap things if I hold down
the shift key and hit tab it moves them
back so that's a nice little shortcut
so now I need to make obviously a few
minor adjustments here but they really
are quite minor because now instead of
saying I'm specifying this particular
one we know that file name is the
variable that we're looking at right so
I can just say open file name right and
then apply the rest of my code so
hopefully if this is written correctly
right I'm already writing it too of a
new file if this is working correctly
this is actually just now going to apply
the code that I wrote for one file to
all of the files that are in the folder
JSON downloads which at this point is
only two files right I just made a
couple of copies oh no that's the wrong
one this one right I just have you know
citibike data said you know JSON
citibike j21 dot JSON notice that it
doesn't matter or it shouldn't matter
what the actual name in a file is that's
what that asterisk gets me so even if
they weren't named in a regular way like
I saved one of them with a different
file name and the rest of them came from
whatever they had been named you know at
the web site or something like that
not a problem here so let's go ahead and
run
okay so didn't get any errors return to
the command prompt no news is good news
time to go look at my file so now I'm
gonna come back here I'm looking at
converting JSON CSV I open this up now
again this would actually probably be
this but of course probably it would of
course be easier to scan if I was
looking at it and open refine but the
thing I'm gonna look at to test right is
to see the start time right so I have
the start time will change every time
the file changes obviously so I know
that I had the - Oh 244 before so let's
see if that value changes now I think if
we recall they're around 600 600
something individual ones so down here
yep so it looks like I have captured
both of my files so this is really handy
right when we're dealing with two files
maybe not such a big deal doesn't seem
so great what if we had to deal with you
know 10 20 30 files right the process of
creating a new script every time and
pointing it to specifically that file
name and of course we all know how
troublesome typos can be you know we're
really saving a lot of time here um so I
do want to go ahead and I'm actually
gonna
so now what I want to do is I want to
just introduce you all to in the next
video what we're going to talk about is
something called shell scripts okay so a
shell script is a kind of script that
can be run from your terminal window so
the terminal window that looks
oops just like this one right sorry
we're gonna turn on window right here
okay so a shell script is not Python it
is kind of its own thing that exists on
the operating system running these shell
scripts is going to be very
straightforward on a Mac computer on an
Apple computer not straightforward at
all on a Windows machine so for those of
you who working on Windows machines I
apologize this is one of those things
that just really works much better on
Macs
if you're interested in the future about
doing this kind of thing on a PC what
you probably need to look into is a dual
boot which means that you actually run
Linux on part of your machine or more
realistically
if you were using this in a work context
these shell scripts you would actually
be you would actually be running them on
a server right so you would actually
like get a little server at Amazon for
example and you run your shell scripts
there in any case I just want to
demonstrate this as a handy way to
download the same file or set of files
at intervals at timed intervals without
having to sit there and do what we've
been doing which is go to the website
reload the page copy paste etc etc so
I'll be back in just a minute we're
going to take a look at shell scripts
Skill:
Expertise:

A tutorial on using Python to parse all of the data files in a given folder, without having to explicitly pass the file names

Contributor: Susan McGregor

Video 3 of 3