Proposal
Your main task for the proposal is to find a dataset to analyze for your project and describe at least one question you can address with data visualizations
It is important that you choose a readily accessible dataset that is large enough so multiple relationships can be explored, but no so complex that you get lost. I suggest your dataset should have at least 50 observations and about 10 variables. If you find a bigger dataset, you can make a subset to work with for your project. The dataset should include categorical and quantiative variables.
- TidyTuesday
- Kaggle
- OpenIntro
- Awesome public datasets
- Bikeshare data portal
- Harvard Dataverse
- Statistics Canada
- Other sources listed in the Data sources section of these notes
- Data you find on your own may be suitable too.
Describe a dataset and question you can address with the data for your proposal. Outline a plan to use five visualizations (e.g., data overview plot, dplyr/table summary, small multiples, smoothing/regression, k-means/PCA, map).
The repository contains a template for your proposal called proposal.rmd. Write your proposal by revising this file and using this template.
- Questions: The introduction should introduce your research questions
- Data: Describe the data (where it came from, how it was collected, what are the cases, what are the variables, etc.). Place your data in the /data folder. Show that you can read the data and include the output of
dplyr::glimpse()orskimr::skim()on your data in the proposal. - Visualization plan:
- The outcome (response, Y) and predictor (explanatory, X) variables you will use to answer your question.
- Ideas for at least two possible visualizations for exploratory data analysis, including some summary statistics and visualizations, along with some explanation on how they help you learn more about your data.
- An idea of how at least one statistical method described in the course (smoothing, PCA, k-means) could be useful in analyzing your data
- Team planning: briefly decribe how members of your team will divide the tasks to be performed.
Assessment. See the file grade-proposal.rmd for the assessment guidelines and rubric.
Oral presentation
The oral presentation should be about 5 minutes long. The goal is to present the highlights of your project and allow for feedback which can be incorporated as you revise your written report.
You should have a small number of slides to accompany your presentation. I have provided a template for you to use as presentation.rpres. I suggest a format such as the following:
- A title with team members’ names
- A description of the data you are analyzing
- At least one question you can investigate with your data visualization
- At least two data visualizations
- A conclusion
For suggestions on making slide presentations see the lesson on slides and recorded video.
Don’t show your R code; the focus should be on your results and visualizations not your computing. Set echo = FALSE to hide R code (this is already done in the template).
Your presentation should not just be an account of everything you tried (“then we did this, then we did this, etc.”), instead it should convey what choices you made, and why, and what you found.
Presentation schedule: Presentations will take place during the last two synchronous sessions of the course. You can choose to do your presentation live or pre-record it. You will watch presentations from other teams and provide feedback on one each day in the form of peer evaluations. The presentation schedule will be generated randomly.
Assessment. See the file grade-presentation.rmd for the assessment guidelines.
Pratice your presentation, as a team, using the course collaborate room or other videoconferencing tool!
Written report
Follow the template provided for your written report (report.rmd) to present your visualizations and insights about your data. Review the marking guidelines in grade-report.rmd and ask questions if any of the expectations are unclear.
Style and format does count for this assignment, so please take the time to make sure everything looks good and your tables and visualizations are properly formatted.
You and your teammate will be using the same repository, so merge conflicts will happen, issues will arise, and that’s fine! Pull work from github before you start, commit your changes, and push often. Ask questions when stuck. Look at the lesson on collaboration for help.
In your knitted report your R code should be hidden (echo = FALSE) so that your document is neat and easy to read. However your document should include all your code such that if I re-knit your R Markdown file I will obtain the results you presented. If you want to highlight something specific about a piece of code, you’re welcome to show that portion.
General criteria for evaluation:
- Content – What is the quality of research question and relevancy of data to those questions?
- Correctness – Are visualization procedures carried out and explained correctly?
- Writing and Presentation – What is the quality of the visualizations, writing, and explanations?
- Creativity and Critical Thought – Is the project carefully thought out? Does it appear that time and effort went into the planning and implementation of the project?


0 comments