TDM 10200: Project 7 — 2024
Motivation: Pandas is a powerful tool to manipulate and analysis data in Python. We can use Pandas to extract information about one or more variables on data frames, sometimes we can group the data according to one variable and summarizing another variable within each of those groups.
Context: Understanding how to use Pandas and be able to develop functions allows for a systematic approach to analyzing data.
Scope: Pandas and functions
Reading and Resources
Datasets
/anvil/projects/tdm/data/noaa
Just as we did in Project 3, please remember to use header=None
when you read in the data set, and remember to use:
names=["id","date","element_code","value","mflag","qflag","sflag","obstime"]
We added six new videos to help you with Project 7. |
Questions
For all questions in this project, consider only the rows from the NOAA data that are for US records, namely, in which the first field starts with the letters `US`.
Dr Ward used this code in the videos, which might help you during your work:
|
Question 1 (2 points)
-
For the year 1880, find how many rows are for US records. Then do this again (in separate cells) for the years 1881, 1882, 1883.
-
Now make a dictionary that has these years as the four keys, and has the counts (that you discovered in question 1a) as the values.
-
Now that you know how to do the work from questions 1a and 1b, wrap your work into a function that accepts a list of years as input, and returns a dictionary as the output.
-
Run the function for the years in the range from 1880 to 1883 (inclusive), and make sure that the results agree with your results from question 1a and 1b.
The following sample code can be used to get US records
The shape of a data frame contains the number of rows and the number of columns. The range of years to test is |
Question 2 (2 points)
-
Revise your work from question 1b (before you built your function!), so that the dictionary is in reverse order. The code from the tip might help. Test your work on the years 1880 through 1883, and make sure that the resulting dictionary is in descending order.
-
Now that you know how to do the work from questions 2a, wrap your work from question 2a into a function that accepts a list of years as input, and returns a dictionary in descending order as the output.
-
Run the function for the years in the range from 1880 to 1883 (inclusive), and make sure that the results agree with your results from question 2a.
This code takes a dictionary called
|
Question 3 (2 points)
-
For the year 1880, find how many rows (which are for US records) are
SNOW
days with a positive amount of snowfall. In other words look for rows with three conditions: The first field starts withUS
, and theelement_code
isSNOW
, and thevalue
is strictly positive. Then do this again (in separate cells) for the years 1881, 1882, 1883. -
Now make a dictionary that has these years as the four keys, and has the counts (that you discovered in question 3a) as the values.
-
Now that you know how to do the work from questions 3a and 3b, wrap your work into a function that accepts a list of years as input, and returns a dictionary as the output.
-
Run the function for the years in the range from 1880 to 1883 (inclusive), and make sure that the results agree with your results from question 3a and 3a.
Question 4 (2 points)
-
For the year 1880, consider only the types of rows from question 3a (which are for US records, with
element_code
asSNOW
, and withvalue
as strictly positive). Group those rows according to theid
, and determine whichid
has the largest number of snowfall days. Then do this again (in separate cells) for the years 1881, 1882, 1883. -
Now make a dictionary that has these years as the four keys, and has the
id
values with the largest number of snowfall days in each of these individual years. -
Now that you know how to do the work from questions 4a and 4b, wrap your work into a function that accepts a list of years as input, and returns a dictionary as the output.
-
Run the function for the years in the range from 1880 to 1883 (inclusive), and make sure that the results agree with your results from question 4a and 4a.
Question 5 (2 points)
-
For the year 1880, consider only the types of rows from question 3a/4a (again, which are for US records, with
element_code
asSNOW
, and withvalue
as strictly positive). Group those rows according to theid
, and determine whichid
has the largest amount of snowfall (in other words,sum
the snowfall amounts for eachid
). Then do this again (in separate cells) for the years 1881, 1882, 1883. -
Now make a dictionary that has these years as the four keys, and has the
id
values with the largest amount of snowfall in each of these individual years. -
Now that you know how to do the work from questions 5a and 5b, wrap your work into a function that accepts a list of years as input, and returns a dictionary as the output.
-
Run the function for the years in the range from 1880 to 1883 (inclusive), and make sure that the results agree with your results from question 5a and 5b.
Project 07 Assignment Checklist
-
Jupyter Lab notebook with your code, comments and output for the assignment
-
firstname-lastname-project07.ipynb
.
-
-
Python file with code and comments for the assignment
-
firstname-lastname-project07.py
-
-
Submit files through Gradescope
Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted. In addition, please review our submission guidelines before submitting your project. |