General guidelines:

  • The assigment is due on may-11 (Friday)

  • It must be turned in in both .Rmd and .html format

  • Use code chunks and subtitles for each exercise

  • The code must be thoroughly commented and must follow the code organization guidelines learned in class


This assignment digs into alternative statistical methods, comparing their robustness, power and performance to conventional approaches.

First, we simulate 4 data sets that will be used for the exercises below. The simulated data sets represent the value of a trait in different populations. Each population is composed of 1000 individuals. We will extract samples of this populations to run stats.

The first 2 data sets follow a normal distribution, but with different means (use set.seed to make sure we all simulate the same data):

set.seed(15)
pop.nrm.1 <- rnorm(n = 1000, mean = 20, sd = 3)

set.seed(25)
pop.nrm.2 <- rnorm(n = 1000, mean = 30, sd = 3)

 

The other 2 data sets share the same trait value range than the previous ones, but in this case they follow a uniform distribution:

set.seed(5)
pop.unf.3 <- runif(n = 1000, min = min(pop.nrm.1), max = max(pop.nrm.1))

set.seed(51)
pop.unf.4 <- runif(n = 1000, min = min(pop.nrm.2), max = max(pop.nrm.2))

 


Exercise 1


1.1 Create ggplot2 histograms to explore the distribution of each data set. Use a multipanel plot with each data set in a separate panel (2 rows and 2 columns)


1.2 Create a single ggplot2 graph with all 4 histograms on the same graph. Use transparency to improve the visualization of all histograms


 

Alternative statistical methods

There are several statistical tools to examine whether a single continuous trait differs between 2 populations. Probably the most common one is the t-test. In class we also learned about powerful alternatives that help to relax some of the assumptions.

But first, let’s pretend we measured 20 individuals from each population. This can be simulated as follows:

# sample 1
set.seed(120)
smp1 <- sample(pop.nrm.1, 20)

# sample 2
set.seed(14)
smp2 <- sample(pop.nrm.2, 20)

 

Now we can compare the trait values between population. We can use the t-test for this:

t.test(smp1, smp2)
## 
##  Welch Two Sample t-test
## 
## data:  smp1 and smp2
## t = -10, df = 30, p-value = 1e-11
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.71  -7.71
## sample estimates:
## mean of x mean of y 
##      20.5      30.2

 

In the example above we can see that there is a statistically significant difference between the 2 populations (confidence intervals don’t include 0 and the p-value is lower than 0.05). Of course, this is not surprising at all: we simulated the data using different mean values.


Exercise 2


2.1 Use a non-parametric statistical test to compare mean trait values between samples smp1 and smp2 (analogous to the t-test above)


2.2 Build and randomization test equivalent to the t-test to compare the mean value of the 2 samples (as in 2.1). The test must be “packed” in a function. The function must contain and argument to control the number of iterations and should return a p-value.


2.3 Run the function created in 2.2 to compare samples smp1 and smp2



False positives: type I error

We expect our statistical tools not to find differences between randomly selected samples belonging to the same population. For instance:

# sample 1
set.seed(120)
m3 <- sample(pop.nrm.1, 20)

# sample 2
set.seed(14)
m4 <- sample(pop.nrm.1, 20)


#  t-test
t.test(m3, m4)
## 
##  Welch Two Sample t-test
## 
## data:  m3 and m4
## t = 0.1, df = 40, p-value = 0.9
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.94  2.16
## sample estimates:
## mean of x mean of y 
##      20.5      20.4

 

However, there is some margin for error, define by alpha, which determines the proportion of false positives that would be accepted. A false positive can occur just by chance when the samples are not truly representative of the whole population. For instance, the following code shows 2 randomly selected samples from the same population that show a statistically significant difference between their means:

# sample 1
set.seed(69)
m5 <- sample(pop.nrm.1, 20)

# sample 2
set.seed(89)
m6 <- sample(pop.nrm.1, 20)

# t-test
t.test(m5, m6)
## 
##  Welch Two Sample t-test
## 
## data:  m5 and m6
## t = 3, df = 40, p-value = 0.01
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.535 4.080
## sample estimates:
## mean of x mean of y 
##      21.4      19.1

 

A alpha of 0.05 is typically used in the natural sciences. This means that we are OK with having 5% of false positives in our analyses. As you probably know, in the statistical jargon this is called “Type I error”.


Exercise 3


3.1 Calculate the proportion of false positives produced when running 1000 t-tests each time using 2 randomly (and independently) generated data sets of 20 individuals each, taken from pop.nrm.1 (hint: use a loop).


3.2 Do the same calculation as in 3.1 but this time on the non-parametric test used in 2.1



Statistical power

Statistical power refers to the probability of finding differences when they exist (true positives). In other words, the probability of not making a “type II error”. This value is considered acceptable when it is equals or higher than 0.8 (e.g. when the detection of differences is right in 80% of the cases). Statistical power is determined by sample size. Hence, the analysis of statistical power is typically used to determined the sample size needed to find a a priori defined effect size. The common graph for evaluating statistical power looks like this:

 

A powerful test will rapidly reach and asymptote above 0.8. Note that each point represents the proportion of tests that were significant for a given sample size.


Exercise 4


Make a “statistical power” graph (like the one above) for a sample size range between 3 and 20 observations (X axis) and 100 iterations for each value, for the following combinations of data sets and statistical tests:


4.1 t-test on samples from pop.nrm.1 and pop.nrm.2 (normally distributed populations)


4.2 t-test on samples from pop.unf.1 and pop.unf.2 (uniformly distributed populations)


4.3 Non-parametric test on samples from pop.nrm.1 and pop.nrm.2 (normally distributed populations)


4.4 Non-parametric test on samples from pop.unf.1 and pop.unf.2 (uniformly distributed populations)


4.5 Randomization test on samples from pop.nrm.1 and pop.nrm.2 (normally distributed populations)


4.6 Randomization test on samples from pop.unf.1 and pop.unf.2 (uniformly distributed populations)


4.7 Which test behave better when comparing samples from normally distributed populations? Which one did better when comparing samples from uniformly distributed populations?


4.8 Pool all data together and make a multipanel plot (2 columns, 3 rows) with each of the graphs above in a single panel.


4.9 Estimate which of the 3 tests used above has a better “behavior” when working with log-normal distributed data (hint: rlnorm())