Overview

The data consist of gene expression measurements of high grade prostate tumors and low grade prostate tumors.

Details

Gleason score is a score from 2 to 10, and is a measure of tumor dedifferentiation in prostate cancer. Higher scores are associated with worse prognosis. The data consist of 198 tumors from Swedish prostate cancer patients diagnosed by transurethral resection of the prostate and followed expectantly; 89 of the tumors are high grade (Gleason \(\ge\) 8) and 109 of the tumors are low grade (Gleason \(\le\) 6). The data were used in a study by Dr. Kathryn Penney et al. http://jco.ascopubs.org/content/29/17/2391.full.pdf+html, which provides more information about these samples. The data may be read in using code such as:

prostate_data = read.csv("data/Swedish_Gleason_High_Vs_Low_for_3202_2N.csv")
head(prostate_data)

Data Description

Variable Description
gleason_hi 1 for tumors with Gleason \(\ge\) 8 and 0 for tumors with Gleason \(\le\) 6
GM2A - TKTL1 gene expression levels

Objectives

As stated above, the overall objective is to identify genes that are different when comparing high vs low Gleason score. One possible way that a gene could be different is that it could be up-regulated or down-regulated in high Gleason, compared with low – this sort of difference could be detected by a (two-sample) t-test. Another way a gene could be different is that it could be more variable – for example, perhaps a pathway of genes maintains fairly stable expression in low grade disease, but in high grade disease the pathway is dysregulated, and this is manifested by genes in that pathway showing a lot of variability across patients. To explore this:

  1. First, read (online) about methods for correcting for multiple testing, such as the simple Bonferroni method. Describe why such a correction is needed, and use the correction in what follows.
  2. Perform the two-sample t-test to identify genes with different mean expression comparing high and low Gleason tumors.
  3. Perform the F-test for equality of variances (Section 10.9) for each gene to identify genes with different variability comparing high and low Gleason tumors.
  4. Learn about the two-sample Kolmogorov-Smirnov, a nonparametric test that tests whether the two samples are drawn from the same distribution. Perform the test for each gene to identify genes with different distributions comparing high and low Gleason tumors.
  5. Discuss the results. Are the results similar or different across the three tests? Look more closely at the top few genes declared statistically significant by each test. What are the test assumptions, and do you feel comfortable making them? Which genes would you consider most interesting for further investigation by Dr. Penney?