Project 2B:

Overview

The data set consists of observations of the expression levels of 8,993 genes at 6 time points in 6 mice.

Details

Dr. Julie Wilder (Lovelace Respiratory Research Institute) studies the response of the lung to the introduction of pathogens. In particular, she has examined the genetic characteristics of immune response to infection with the pathogen Cryptococcus neoformans in mice. A preliminary study indicated that differences in the ability of two strains of mice, C57BL/6 and C.B-17, to clear the pathogen from the lung were likely due to genetic causes.

One component of Dr. Wilder’s overall study was the use of microarray data to determine which genes show differential expression between the two strains of mice as a function of time following infection with C. neoformans. For each strain of mouse, she used three microarrays at each of six time points: 0, 6 hours, 24 hours, 72 hours, 7 days, and 14 days post-infection. This is a total of 2 \(\times\) 3 \(\times\) 6 = 36 arrays.

In this assignment, you will try to determine which genes show differential expression between the two groups of mice and at which time points.

Data Description

Variable Description
Name List of gene names
b6ijs Gene expression value for C57BL/6 mouse \(j\) at time point \(i\)
cbijs Gene expression value for C.B-17 mouse \(j\) at time point \(i\)
mouse_data = read.table("data/MouseGeneExpression2B.txt", header = TRUE)
head(mouse_data)

Data Files

Objectives

Within each time point, two-sample t-tests can be used to compare the expression levels of each gene in the C57BL/6 vs. C.B-17 mice. However, this will amount to carrying out \(8,993 \times 6\) separate hypothesis tests. This project explores the feasibility of searching for genes that show differential expression in the two groups in this way.

  1. When carrying out \(8,993 \times 6\) (independent) tests, what proportion of the tests would be expected to be significant if no genes were actually differentially expressed at any time point?
  2. For how many tests did you observe significant differential expression?
  3. Read (online) about methods for correcting for multiple testing. Define the following terms: familywise error rate and false discovery rate.
  4. Apply at least one multiple testing correction to this data set. A commonly used correction for microarray data is the Benjamini-Hochberg correction, but you can use (with justification) any correction you wish.
  5. How many genes are significant after applying the correction you selected in part (4)?
  6. Describe how you would report your results to Dr. Wilder. Which genes would you suggest that she pursue for future work in this problem?