Overview

The data set consists of observations of the number of colony-forming units (CFUs) for 35 mice, as well as genotype information for each mouse at 13 different locations in the genome (called loci – singluar: locus).

Details

Dr. Julie Wilder (Lovelace Respiratory Research Institute) studies the response of the lung to the introduction of pathogens. In particular, she has examined the genetic characteristics of immune response to infection with the pathogen Cryptococcus neoformans in mice. A preliminary study indicated that differences in the ability of two strains of mice, C57BL/6 and C.B-17, to clear the pathogen from the lung were likely due to genetic causes.

One component of Dr. Wilder’s overall study was to measure the ability of mice with either the C57BL/6 background genotype or the C.B-17 background genotype to clear the pathogen from their lungs. This was measured in each mouse by examining the average number of colony-forming units (CFUs) found in the lung. The data set gives these measurements, as well as the genotype of the mouse (A or B) at each of 13 loci. The goal is to determine whether any of these loci are associated with the average number of CFUs.

Data Description

Variable	Description
Column 1:	List of locus names
Columns 2-26:	Data for each mouse
Row 1:	Mouse ID
Row 2:	CFU values for each mouse
Rows 3-15:	Genotype (A or B) for each mouse at each locus

mouse_cfus = read.csv("data/qtl2C.csv", header = FALSE)
head(mouse_cfus)

ABCDEFGHIJ0123456789

	V1 <fctr>	V2 <fctr>	V3 <fctr>	V4 <fctr>	V5 <fctr>	V6 <fctr>	V7 <fctr>	V8 <fctr>	V9 <fctr>
1	Mouse	M1324	M1357	M1391	M1323	M1309	M1306	M1363	M1388
2	Trait	2.89	3.08	3.94	6.32	6.33	6.57	3.65	4.22
3	D6Mit8	A	A	B	A	A	A	B	B
4	D6Mit15	B	B	B	B	B	B	A	B
5	D6Mit138	B	B	B	B	A	A	B	B
6	D6Mit149	A	B	B	B	A	A	B	B

Data Files

qtl2C.csv

Objectives

Within each locus, two-sample t-tests can be used to compare the CFU values for genotypes A and B. This will amount to carrying out 13 different hypothesis test. Another complication is that some of the data are missing for some loci. This project will explore these issues.

When carrying out 13 tests, what proportion of the tests would be expected to be significant if no loci were actually associated with CFU values?
For how many tests did you observe significant differences in CFU values for the two genotypes?
Read (online) about methods for correcting for multiple testing. Define the following terms: familywise error rate and false discovery rate.
Apply at least one multiple testing correction to this data set. You can use (with justification) any correction you wish.
How many genes are significant after applying the correction you selected in part (4)?
Imputation is a common approach to handle missing data. Read about imputation online. Can you suggest an approach for imputation in this data set? Is missing data likely to cause problems in this data set?
Describe how you would report your results to Dr. Wilder. Which genes would you suggest that she pursue for future work in this problem?

Project 2C: Genetic Quantitative Trait Locus (QTL) Study

STAT 3202: Group Project I

Autumn 2018, OSU

Overview

Details

Data Description

Data Files

Objectives