This project consists of a large-scale simulation study to investigate how robust the simple linear regression model is to deviations from its assumptions.
The basic simple linear regression model states that for \(i=1, \ldots, n\)
\[ Y_i = \beta_0 + \beta_1 x_i + \epsilon_i \]
where the \(\epsilon_i\) are a random sample from a \(N(0, \sigma^2)\) distribution. The goal of this project is to assess how robust our inference about \(\beta_1\) is to deviations from these assumptions.
Provide a thorough examination of how good \(\hat{\beta}_1\) is as an estimator of \(\beta_1\) when some of the assumptions of the simple linear model do not hold. Specifically, we have derived the bias and the standard deviation of \(\hat{\beta}_1\) under the above modeling assumptions. In this project, you will (repeatedly) generate data from models which deviate from the above assumptions in one or more ways. For a particular simulation setting, if you run \(B\) simulations, you can save those \(B\) estimates of \(\hat{\beta}_1\) and calculate the empirical bias and empirical standard error of \(\hat{\beta}_1.\) You can also count how many times the confidence intervals calculated according to standard formulas contain the true value of \(\beta_1.\) By comparing the empirical bias and standard error to the theoretical bias and standard error, and the observed confidence interval coverage to the nominal confidence interval coverage, you can evaluate how well the purported properties of \(\hat{\beta}_1\) hold up when different assumptions do not hold. Some possible deviations to consider:
You will want to evaluate these for at least a few different sample sizes – at least one small sample size and one large sample size. Present your results in graphical and tabular format, and describe your observations. Which assumptions may be relaxed in practice, and which ones can’t?