Abstract
The multiplicity problem is evident in the simplest form of statistical analysis of gene expression data - the identification of differentially expressed genes. In more complex analysis, the problem is compounded by the multiplicity of hypotheses per gene. Thus, in some cases, it may be necessary to consider testing millions of hypotheses. We present three general approaches for addressing multiplicity in large research problems, (a) Use the scalability of false discovery rate (FDR) controlling procedures; (b) apply FDR-controlling procedures to a selected subset of hypotheses; (c) apply hierarchical FDR-controlling procedures. We also offer a general framework for ensuring reproducible results in complex research, where a researcher faces more than just one large research problem. We demonstrate these approaches by analyzing the results of a complex experiment involving the study of gene expression levels in different brain regions across multiple mouse strains.
Original language | English |
---|---|
Pages (from-to) | 414-437 |
Number of pages | 24 |
Journal | Statistica Neerlandica |
Volume | 60 |
Issue number | 4 |
DOIs | |
State | Published - 1 Nov 2006 |
Externally published | Yes |
Keywords
- False discovery rate
- Hierarchical testing
- High throughput analysis
ASJC Scopus subject areas
- Statistics and Probability
- Statistics, Probability and Uncertainty