The data consist of gene expression measurements of high grade prostate tumors and low grade prostate tumors.
Gleason score is a score from 2 to 10, and is a measure of tumor dedifferentiation in prostate cancer. Higher scores are associated with worse prognosis. The data consist of 198 tumors from Swedish prostate cancer patients diagnosed by transurethral resection of the prostate and followed expectantly; 89 of the tumors are high grade (Gleason \(\ge\) 8) and 109 of the tumors are low grade (Gleason \(\le\) 6). The data were used in a study by Dr. Kathryn Penney et al. http://jco.ascopubs.org/content/29/17/2391.full.pdf+html, which provides more information about these samples. The data may be read in using code such as:
prostate_data = read.csv("data/Swedish_Gleason_High_Vs_Low_for_3202_2N.csv")
head(prostate_data)
Variable | Description |
---|---|
gleason_hi |
1 for tumors with Gleason \(\ge\) 8 and 0 for tumors with Gleason \(\le\) 6 |
GM2A - TKTL1 |
gene expression levels |
As stated above, the overall objective is to identify genes that are different when comparing high vs low Gleason score. One possible way that a gene could be different is that it could be up-regulated or down-regulated in high Gleason, compared with low – this sort of difference could be detected by a (two-sample) t-test. Another way a gene could be different is that it could be more variable – for example, perhaps a pathway of genes maintains fairly stable expression in low grade disease, but in high grade disease the pathway is dysregulated, and this is manifested by genes in that pathway showing a lot of variability across patients. To explore this: