Home
Blog
EFIN 401 Advanced Data Analytics in Eco Project

EFIN 401 Advanced Data Analytics in Eco Project

Daniel Kevins

0 comments

### Instructions

1. Complete your assignment using R Markdown. Your repo should include two files:

a. An **R Markdown** script (.Rmd extension)

b. A “knitted” (output) **HTML** document (.html extension)

2. The output document should show your **code**, the relevant **R output**, and the **answer** to each question in the assignment. Clearly label things with the numbers/letters of each question so we know what we’re grading. Interpret your output (explain what it’s showing) anytime it’s not immediately 100% obvious.

3. Your script should contain **comments** that meaningfully describe what it does. Comments clearly communicate to others what’s happening and why, every step of the way. Your audience very much includes future you, who will remember surprisingly little of the code you write today. You should have at least one comment every 5-10 lines if not more. Write comments in present tense and omit unnecessary words.

4. Write the names of any **other students you worked with** at the top of your assignment. As a reminder, you can work together and discuss strategies for solving problems, but we don’t want to see identical code.

You’ll learn about R Markdown this week. Some further resources:

– [Using R Markdown for Class Assignments](https://ntaback.github.io/UofT_STA130/Rmarkdownforclassreports.html)

– [R Markdown Quick Tour](https://rmarkdown.rstudio.com/authoring_quick_tour.html)

– [R Markdown Reference Guide](https://www.rstudio.com/wp-content/uploads/2015/03/rmarkdown-reference.pdf)

– [R for Data Science: R Markdown](https://r4ds.had.co.nz/r-markdown.html)

This document was itself created using R Markdown. This repo includes both the original .Rmd source script and an HTML file that you can open in any web browser.

**Grading:** We’ll be looking at completeness, correctness, and style. By style I mean: Does the submission follow the GitHub & R Markdown instructions? Does it include informative comments? Is the code itself readable?

***

### Part 1

We wish to use R to examine/simulate the finite sample properties of sample means using data generated with the student-t distribution with 4 degrees of freedom.

1. Use vector/matrix methods in R to generate 2000 samples of size 5, 10, 15, 20, 30, and 50 from the student-t distribution using a seed value of 1001. Compute and record the 2000 sample means for each sample size.

2. For each sample size use the 2000 estimated sample means to construct a qqnorm-qqline plot and compute (and store) the p-values for the Cramer-Von Mises normality test (use a panel plot with three “rows” and two “columns” of plots). Label each plot indicating the appropriate sample size.

3. At what sample size do the sample means appear to become approximately normally distributed?

### Part 2

Assume the following: An animal GM had a genetic disease that is passed to it’s offspring with probability 0.5. GM had one offspring M. M had three offspring C1, C2, and C3. C1-C3 each had three offspring GC11, GC12, GC13, G21, GC22, GC23, GC31, GC32, and GC33. The disease does not express symptoms until the animal reaches older age and an animal without the disease does not express symptoms at any age.

M died before the age of symptomatic onset. Animals C1, C2, and C3 have reached an age where 75%, 70%, and 65% respectively of the population with the disease have expressed symptoms. Animals GC11 and GC31 have reached an age where 10% of the population with the disease have expressed symptoms. None of M’s offspring C1, …, GC33 have expressed symptoms at this time.

4. Use vectorized code in R to simulate and estimate the probability that M had the disease.

5. Estimate the probability that C1 has the disease.

6. Estimate the probability that C3 has the disease. Why does this differ from the estimate for C1?

About the Author

Daniel Kevins

Follow me