Intro to Aptana and Python

hi everyone so this week we are going to
dive headlong into Python we started to
introduce some of the basic concepts of
Python and just sort of programming in
general last week and this week we're
going to actually start writing some
code of our own it's going to help us
primarily because it will let us deal
with vastly more data much more quickly
than we've been able to do thus far with
something like openrefine so whereas a
an open refine we've been able to handle
maybe a hundred thousand worse a few
more rows at a time when we work with
Python we're gonna be able to deal with
several million rows at least at once
so to get a sort of an idea of how a
sort of what the trajectory is here you
know we've been constantly scaling up
the volume of data that we're working
with Python is also because it is a
programming language it's also
incredibly versatile it's a you know
it's used it can be used for any number
of tasks so think of this also as an
introduction to Python 3 which is
something that should you choose to
pursue it will continue to be of use to
you in doing other types of data
journalism going forward so to work in
Python we're looking we're going to be
using a program called aptana studio 3
a-10s 3 or 3 is what's known as an
integrated development environment or
IDE basically what it means is it is a
it's a text editor that is specifically
designed to work with code so it's going
to give us the kind of helpful hints
that you get for example when you're
typing in Microsoft Word or Google Docs
you know you get grammar checking and
spell checking things like that attendez
2 or 3 is going to do that but it's
going to do that for us for code it's
also going to give us a way to actually
execute the code which we have to do in
sort of a separate little window but
that's partly that's why it's an
integrated development environment
because it lets us do both of those
things together so when we first
launched at launch eped Anna you're
gonna get a window that looks like this
it has this sort of default thing where
it says what's your workspace a
workspace is just a folder and so I'm
gonna go ahead and create a new folder
here called
just call it Python scripts okay and I'm
just gonna click okay and start up a
program here now you might notice that I
actually have my city by trip data on my
desktop already of course the whole
purpose of creating the script is to be
able to deal with that with that don't
worry about these errors there are a
bunch of errors that are gonna come up
we're just gonna close all this stuff
I'm obviously gonna be wanting to
working on my city bike data so I'm
gonna move that into my folder in just a
minute but before that I'm gonna get
started I'm gonna just come to the local
file system here and under desktop and
then python scripts i'm gonna just say
new file now it's important when i'm
naming this file that i give it the
extension dot py py is the extension for
python as we've discussed in class a
couple of times the file extension is
just a clue for the computer to
understand what program it should use to
run or open a given file so we can name
our file here whatever we want
i'm gonna call mine parsing csv dot p
why but you do need to make sure that
you end it with that dot py and when it
comes up it's just gonna look like an
empty file nothing going on here now if
you notice I get this this Auto config
option you may or may not get this your
first time running aptana because there
are other students who've been using it
in the labs already just to note that
we're going to be using Python 3 their
versions of Python is it over time it
gets updated and upgraded so we're gonna
be using the latest version which is
Python 3 this is you know I just
basically say do your thing here and
hopefully this will take not more than a
couple of minutes seconds ok so the
first thing I want to do is test that
actually my installation of Python is
working right because if that's not
working everything else is gonna fall
apart and so what I need to do is I need
to first write write some code in my
Python file so I'm gonna do this by
using a very common
very common mechanism here called the
hello world function this is a this is a
classic from computer science and I'm
using the print command now I want to
put up point out several things here one
you'll notice that when I started typing
that when I typed print it's turned blue
now it's turn blue because this is
aptana studio 3 knowing that we're
working in python it's letting us know
that print is a special or reserved word
okay it has a particular meaning most
importantly we shouldn't use the word
print as a variable name or it will get
all confused so and we know we've done
it correctly because if we expect to
print and we see it in blue you know we
expect to use that command and we see it
in blue then it means it recognizes the
command you'll also notice that hello
world surrounded in quotes which is of
course a string is green says a little
this is sort of the reverse of what we
saw and open we're fine our strengths
here are turned green other texts will
just be black mostly our code will just
be black except for where we use
reserved words next I want to point out
that there's this asterisk now next to
the file name what that means is that I
have unsaved changes this is important
because sometimes we can forget to save
changes so you'll be adding to your code
testing it out and saying you know it's
not working the update isn't taking
often it's just because we've forgotten
to hit you know control us or Apple s in
between so you always want to make sure
that you save your file so that you have
no asterisk when you're testing it out
the final step that we need to do here
we're going to come to terminal and
terminal is something that's available
you can access it on a Mac outside of a
virtual outside of aptana studio 3 again
this is just a way to give you to give
you access to that without having to
manage multiple windows again integrated
development environment and we are going
to be interacting with the terminal via
we call this the command line okay and
so the command line basically lets us do
all of the things that we do you know
with a mouse and keyboard well with a
mouse I should say you know Mouse you
know moving and clicking but we can
actually do it through text commands so
the first command that I'm going to use
is an LS which stands for list and what
that's going to do is it's going to show
me basically what my
where I am in my computer and what
folders and files my computer I can see
from that point in the terminal so at
the moment I'm actually I'm in my user
folder and that's why I can see desktop
documents downloads etc etc so I'm going
to I want to move on to the desktop
because of course that's where I created
my Python scripts folder keeping this
Python file and to do that I'm gonna hit
type CD which stands for change
directory and desktop now this is a nice
trick if I start typing desktop for
example if as long as I have put in
enough characters to make it unique note
that it is case sensitive so I do have
to use a capital D I don't have to type
out the entire thing perfectly once I've
put in several letters typically I can
hit tab and it's going to fill in the
rest for me right it's gonna say oh I
see you know what it is so it's actually
kind of nice because when we're running
our files we don't have to type in the
entire name every time so now I hit
enter and you'll notice that now has
changed my position it sees added this
blue desktop entry to the line so it's
saying okay you're now in a position of
being on this user on the desktop now if
I do an LS to list I can see my python
scripts folder and again I can CD into
that folder and let me do another LS now
I can actually see my Python file right
I can see parsing CSV py
you know that it doesn't show the file
extension at the top here but it is the
same file so now that I can
quote-unquote see this file with an LS I
am actually able to run it and to run it
I just I just type Python 3 which is
saying use the interpreter literally use
the translator or interpreter of the
Python 3 dictionary of the Python 3
library language really Python 3
language to interpret the contents of
this file and so again I can just type a
few characters and hit enter and this is
what I expected print will output
something to my terminal window as
opposed to for example writing it to a
separate file which is ultimately our
objective today is that we're going to
actually use Python to pull in our city
bike data to filter or a particular
start station ID and to
I'll put all of the files that match all
of the rows that have that start station
ID to a new file so the print command is
a way to kind of I use it kind of as a
debugging right you just make sure
things are working and see what's
happening at different points in my code
and at different points in time
and it's handy because it shows up right
here now
why do I not always want to use the
Print command two reasons one it's
actually quite a bit slower than
outputting to a file so if I try to
output all of my results to terminal
when I'm filtering for example it would
take many many five or ten minutes to go
through all the rows and print them all
to the terminal second of course I don't
really have a way to save things from
the terminal the whole point of writing
to a file right just saving to a file is
that then of course I have a fixed
record of it just the same way you all
did when you text faceted and open
refine and then export at a CSV in this
case we were going to construct a CSV
from the original after applying our
filtering method and then we got the
same thing except we're doing it on our
entire dataset all at once as opposed to
just a hundred thousand rows so I'm
gonna take a second right now just to
pump up my font size and then we're
going to get started with the kind of
basics of this okay so now that I have
updated my font size in place a little
bit more readable for all of you uh
maybe I can even go ahead and minimize
this a little bit just so it's not
taking so much space okay so what is it
that I actually want to do with this
script I don't just want to print hello
world what I want to do is I want to
pull up my data file I want to go
through each line checking to see if it
has a start station ID of 72 I'm gonna
use 72 in this case and if it does then
I want to make copy of that row of the
whole entire row and write it to an
output file okay so actually the first
step is exactly verse 7 doing any
program is to do exactly that which is
to write out what it is we're trying to
accomplish so my outline is going to be
one
open my source data file and it's handy
here to actually put in the name you
know she's put in some of the specific
details so in this case first of all I'm
going to copy this file name if I can
just so silly okay so I'm gonna copy
this into here CSV okay the other thing
I'm gonna do is I'm going to again move
my data file into my Python scripts
folder this just makes it easier to deal
with it's not that everything always has
to be in the same folder for you to use
it in Python but it is going to make it
easier for us okay so first I want to
open my source data file hmm
I want to let's say I want to create a
file well I want to convert it to I want
to I'm gonna say I want to use a CSV
library use library make it easier to
handle I'm going to create a reader and
a writer or the output file then I'm
going to go through the source file one
row at a time I'm going to check to see
if the start station ID is 72 if it is
make a copy of the row and write it to
my output and finally once I've
looked through all the rows okay so
another thing to point out here you may
note that I have begun each of these
lines with a hashtag okay what that
means is that what follows on that line
is a comment in other words the computer
should ignore it not try to t treat it
as Python code these comments are
extremely important because they are the
way that we as the programmer explain in
English what we're trying to accomplish
what our code is doing and I can
guarantee you that no matter how expert
you become at Python or anyone is at
Python one of the most frustrating
things in dealing with code is either
looking at someone else's code or coming
to your own code and not being able to
remember quite what you did and having
to figure it out from reading the code
itself so one of our biggest focuses in
this course is going to be as we work
with Python is going to be looking at
the quality of our comments and making
sure that we're explaining clearly what
each line of code is doing as well as
what our overall program is doing so
that's what we've done right here at the
top of the said look this is what our
program is going to attempt to do this
is also going to provide a really
valuable reference for us going forward
as we start to write our code because if
we run into hiccups we'll kind of get
caught up doing one thing and it'll be
like wait what was I supposed to be
doing in general so this is a really
important part of the process so once
you have your outline written out we're
gonna come back in just a minute and we
are going to actually start doing some
code so I will see you all back here in
a moment
Skill:
Expertise:

An introduction to Aptana Studio 3, Python 3 and the command line. Navigating, testing and running Python code

Contributor: Susan McGregor

Video 1 of 3