Course Description

410.671 - Gene Expression Data Analysis and Visualization

This course will introduce students to various methods for analyzing and interpreting transcriptomics data generated from technologies such as oligonucleotide or two-channel microarrays, qRT-PCR, and RNA sequencing. Topics will include scaling/normalization, outlier analysis, and missing value imputation. Students will learn how to identify differentially expressed genes and correlate their expression with clinical outcomes such as disease activity or survival with relevant statistical tests; methods to control for multiple testing will also be presented. An introduction to linear and nonlinear dimensionality reduction methods and both supervised and unsupervised clustering and classification approaches will be provided. Open source tools and databases for biological interpretation of results will be introduced. Assignments and concepts will make use of publicly available datasets and students will compute and visualize results using the statistical software R. Prerequisites: 410.601 Biochemistry, 410.602 Molecular Biology, 410.645 Biostatistics, 410.634 Practical Computer Concepts for Bioinformatics, or an undergraduate computer programming course.