Skip to main content
University of Wisconsin–Madison

Bioinformatics Resource Center

RNA-Seq Analysis

The UW-Madison Biotechnology Center is currently open and following the university's COVID-19 Response regarding campus operations.

Introduction:

mRNA-Seq is a method used to sequence fragments of cDNA which are reverse-transcribed from messenger RNA (mRNA). mRNA fragments are transcribed from genes by RNA polymerase for use as a template by the ribosome to produce the protein encoded by the transcript. As such, levels of a specific mRNA are viewed as the "expression" of a gene.

A general goal of most mRNA-seq projects is to determine the expression of genes across the genome in a specific cell/tissue/organism type. Integrating gene expression information from many genes with known biochemical and genetic interactions allows generation and testing of molecular hypotheses.

One parameter essential to success in mRNA-seq is the use of adequate numbers of control and test samples. Descriptions of adequate numbers of biological and technical replicates are provided in the guidelines section. 

Analysis Methods:

QC:
Initially each fastq file is assessed by a BRC member for quality control issues and trimmed to remove adapter sequences that were not removed by the initial demultiplexing.

Analysis:
After trimming and QC we align the reads to the genome of your choice using the splice-junction aware read aligner STAR.

We use RSEM to generate normalized read counts for each gene and its potential isoforms. Additionally, we filter genes with very low expression, which would otherwise reduce statistical power. EdgeR is then employed to perform differential gene expression analysis.

Two fundamental tasks are required of all differential gene expression (DGE) analyses. First, an estimation of the magnitude of differential expression between two or more conditions based on read counts from biologically replicated samples must be ascertained. This procedure requires calculation of the fold-change of read counts, taking into account the differences in sequencing depth and variation across samples and groups. Second, an estimation of the significance of the expression difference and a correction for multiple testing is required.

Differentially expressed genes are provided to you as an excel document (xlsx format) upon completion of the analysis.

Post Analysis Inspection:
Before DEGs are assessed we attempt to resolve sample differences through the unbiased method of sample clustering by unsupervised MDS using the top 500 expressing genes. This is an excellent opportunity to determine the sample to sample variation and see if it matches with your expectations of sample grouping. It also provides an opportunity to recognize errors in processing (sample switches & batch effects).  Ideally, each factor in the MDS plot will cluster well within the primary condition of interest and be separated from other conditions. This indicates that differences between groups (effect size) are larger than differences within groups.  
We also view the results of a complementary unsupervised clustering approach which uses sample Pearson correlation to confirm the results of the MDS.  

Example Output Report: 

Click the link to be directed to our: Example mRNA-Seq Report

Sequencing and Analysis Guidelines:

Biological and Technical Replicates (per comparison):

  • Biological Replicates
    • almost always allow further distinguishing of the effects of your treatment
    • minimum: 3 treatment, 3 control per comparison
    • recommended: >= 4 treatment, >= 4 control per comparison
  • Technical Replicates
    • Only amount for technical variation and for this reason are not generally recommended
    • Consult our group directly if you are considering this direction due to sample limitations etc...

Sequencing read number assuming paired ended Illumina 2x150 (per sample):

  • For prokaryotic organisms 5 x 106 reads per sample
  • For eukaryotic organisms 30 x 10reads per sample is usually adequate to cover the dynamic range of expression for most organisms
    • Polyploid plants with very large genomes may need to scale the reads by polidy value
    • A general document on mRNA-Seq guidelines can be found in the Encode site.

 

 

The Bioinformatic Resource Center is 100% cost recovery. We are able to provide free consultations as a result of the other core facilities hiring us to catch any design problems that could lead to mistakes or wasted time. Please be respectful of this time so we can keep all of our service rates low.