Part 1: Code + Module (11 pts)
In this section, you’ll write the functions needed to carry out the analysis.
Q1: function: read_data (3 pts)
Define a function read_data with a single parameter file that accomplishes the following, using pandas functions/methods:
- uses
pandasto read thefilein - extracts only two of the columns:
'Gender'and'11' - Renames these two columns to have the column names
'gender'and'last_minute' returns the resulting DataFrame from the function.
Notes:
- column ’11’ from the original dataset are respondent’s response on the questionnaire to the statement ‘You tend to leave things to the last minute’
- to test out your function here, you’ll need to
import pandas as pdfirst (outside your function).
Suggested smoke test: Executing the function as follows: read_data(file = 'testfile.csv') should return a pandas DataFrame with two columns and 10 rows.
In [ ]: YOUR CODE HERE
Q2: function: calculate_stats (3 pts)
Now, define a function calculate_stats that takes in two parameters df (the DataFrame it will operate on) and label (the value in the column that we want to extract – which will be either ‘M’ or ‘F’ upon execution).
This function should:
- Filter to only include those values from the
'gender'column indfthat are exactly equal tolabel - Calculate
value_counts()on the'last_minute'column of the dataframe generated in step 1 above, using thenormalize=Trueparameter in thevalue_counts()method returns the results from step 2 from the function
Suggested smoke tests: Executing the function as follows (where df is the output after having run read_data() on ‘testfile.csv’): calculate_stats(df, 'M') should return:
Neither 0.6
Agree 0.4
Name: last_minute, dtype: float64
and calculate_stats(df, 'F') should return:
Strong Agree 0.4
Disagree 0.4
Agree 0.2
Name: last_minute, dtype: float64


0 comments