Region,Gender,Year,Name,HowMany

csv processing reading in file and writing out
As an instructor, I have lots and lots of csv files full of student data (like the toy file we just played with).  For this class, it would be useful to use some of those files to provide examples for scripting (for example computing average exam scores or total weighted score for grading. etc., real tasks, however, I do not want to share real student data (as that is a violation of student privacy).  This gives us a chance to play with some super useful computation with csv files,
reading in some type of data and writing it out (perhaps modified).
In this case, we are going to practice writing scripts that use csv files by generating a big pile of ‘fake’ student data from some real csv input data.  This also illustrates how easy it is to generate fake data in general.
Goal: we want to generate files in the same format as the prior example:
name,email,SID,exam1,exam2,exam3
However, we want to generate a lot of data.
We want our fake data to seem as real as possible, so lets use real names.  A good source of real names is the US social security administration: https://www.ssa.gov/oact/babynames/limits.html
From there I downloaded some data about baby names in Puerto Rico over the year: PR.csv  Download PR.csv
This csv file is formatted as:
Region,Gender,Year,Name,HowMany
(where how many represents how many babies were given that name).
Use your prior code to make sure you can read in this file (watch out it is a big file).
Working with this data, instead of just using index offsets to get to specific data fields, it would be useful to be able to access data with the column header names.
To accomplish this, we are going to use a DictReader and DictWriter, that allow us to refer to column name headers.  Like this:
import csv
#read in the source data with baby names
babyNamesDataFile = open(‘PR.csv’)
babyNameDictReader = csv.DictReader(babyNamesDataFile)

print(“Popular baby names in 2000″)
for row in babyNameDictReader:
if int(row[‘HowMany’]) > 300 and row[‘Year’] == ‘2000’:
print(row[‘Name’])
Which should generate the following.  Stop and test to make sure that this code works for you (once you’ve downloaded the file).  Make sure your output matches the below:
Popular baby names in 2000
Paola
Alondra
Gabriela
Genesis
Andrea
Nicole
Adriana
Luis
Jose
Angel
Kevin
Carlos
Bryan
Christian
Jean
Juan
Gabriel
Michael
Jonathan
Christopher
Jorge
Now if we wanted to write out a file using these popular names as our ‘fake students’ we could do something like:
import csv
#read in the source data with baby names
babyNamesDataFile = open(‘PR.csv’)
babyNameDictReader = csv.DictReader(babyNamesDataFile)

Get Your Custom Essay Written From Scratch
Are You Overwhelmed With Writing Assignments?
Give yourself a break and turn to our top writers. They’ll follow all the requirements to compose a premium-quality piece for you.
Order Now

#make the output file for ‘fake student’ data
outputFile = open(‘FakeStudents.csv’, ‘w’, newline=”)
outputDictWriter = csv.DictWriter(outputFile, [‘Name’, ’email’, ‘SID’, ‘exam1’, ‘exam2’, ‘exam3’])
outputDictWriter.writeheader()

count = 0;
for row in babyNameDictReader:
if int(row[‘HowMany’]) > 300 and row[‘Year’] == ‘2000’:
outputDictWriter.writerow({‘Name’: row[‘Name’], ’email’: ‘fixme’,
‘SID’: 100+count, ‘exam1’: 100, ‘exam2’: 100, ‘exam3’: 100})
Using this base code, once you run it, you should have created a new file called “FakeStudents.csv”
Stop and test this code.  You will augment it in the next section.