Genes types are based on ENSEMBL classification.
75th percentile of normalized expression by chromosomes.
The Pre-Process tab prepares expression data for downstream analysis. The workflow (1) filters genes with negligible expression, (2) converts uploaded identifiers to Ensembl or STRING IDs, and (3) applies appropriate transformations so comparisons across samples are meaningful. Each option in the sidebar updates the preview plots and summaries in real time.
RNA-seq counts are normalized to counts per million (CPM) using edgeR. By default a gene must exceed 0.5 CPM in at least one sample (adjustable via minCPM and n libraries). Genes failing this requirement are discarded, which commonly removes 30–50% of the genome where expression is undetectable in the chosen tissue.
Increase the CPM threshold when sequencing depth is high to keep only robust signals, or loosen it (e.g., 0.2 CPM for 50M-read libraries) to save low but potentially important transcripts. The settings also determine whether iDEP issues warnings about a very small number of retained genes (< 1,000 by default) because that affects enrichment background calculations.
After filtering, select the transformation that best matches the study design. The transformed matrix powers clustering, PCA, t-SNE, and optional limma-trend analysis.
Inspect the transformation effects using the technical replicate plots and density views. Aggressive transformations can shrink biological differences, whereas minimal transformation leaves heteroscedastic noise in place.
The figure above compares technical replicates before and after applying VST, rlog, or a simple log transform. Notice how VST compresses the range most aggressively, while rlog and log2(x + c) retain slightly more spread. Use it as a guide: if your replicates look similar to the middle panel you are removing unwanted noise without flattening real biology.
For already normalized matrices, filtering relies on absolute expression thresholds rather than CPM. The default keeps genes expressed at level ≥ 1 in at least one sample; adjust the value to reflect the measurement scale. For microarray log-ratio data, set the cutoff to a large negative number to disable filtering entirely.
iDEP automatically evaluates kurtosis across samples and recommends log2 transformation when extremely skewed values are detected. This safeguard reduces the impact of single outlier arrays or normalization artifacts.
This bar chart highlights an experiment where certain groups received far fewer reads. When the tallest bar is more than three times the height of the shortest one, downstream methods like limma-trend can become unreliable. In those cases consider alternative differential expression workflows or revisit library preparation and sequencing balance.
Once filtering and transformation options are set, iDEP maps uploaded gene identifiers to Ensembl or STRING IDs using the selected species database. A summary of matched and unmatched IDs appears in the data preview so you can confirm coverage before moving on to downstream tabs.
Note: In the gene type column, "C" indicates protein-coding genes, and "P" means pseduogenes.
The Clustering tab offers several tools to explore overall expression patterns. Genes are ranked by their across-sample variability, and the top genes enter a heatmap that supports both hierarchical clustering and k-means clustering. Sample-level dendrograms, pathway word clouds, and gene standard deviation plots help interpret the dominant trends in the dataset.
The interactive heatmap displays the standardized expression matrix for the selected number of high-variance genes. Switch between hierarchical and k-means clustering to reorganize the rows and columns. When hierarchical clustering is active, the sidebar controls let you tune several parts of the algorithm:
For k-means clustering choose the desired number of clusters and, if the results are unstable, re-run the algorithm with a new random seed. Use the brush tool to zoom into a portion of the heatmap, or click on rows to view gene-level expression details beneath the heatmap. Optional enrichment analysis summarizes the functional themes within the selected cluster.
Sample annotations imported during preprocessing appear as colored bars on the heatmap. Choose a single factor or display all available annotations, and change the color palette to match your figures. Adjusting the maximum Z-score truncates extreme values so that subtle expression differences are more visible.
After generating the heatmap, open the Word Cloud sub-tab to visualize the pathways associated with a selected cluster. Words are sized by frequency across pathway descriptions, highlighting enriched biological themes.
The Gene SD Distribution panel plots the distribution of gene-level standard deviations. This helps assess whether the current variance filter is too strict or too permissive, and provides a quick check for highly variable genes that might influence clustering results.
The Sample Tree tab clusters samples using genes with expression above the 75th percentile. It uses the same distance metric and transformation options selected for the heatmap, making it a complementary view of how samples segregate under the current preprocessing choices.
The PCA tab is a quick way to see which samples behave alike and which ones stand out. It condenses the thousands of genes measured in your experiment into easy-to-read plots, so you can spot patterns without any coding or statistics.
Principal component analysis (PCA) works by re-arranging the data so that the biggest differences between samples appear on the horizontal and vertical axes. When two samples sit close together on the plot they share similar gene-expression profiles. Samples that land far apart are reacting differently to the conditions in your study. Each axis label includes the percentage of "variation" it captures—higher percentages mean that axis is explaining more of the overall differences between samples.
The first panel shows an interactive two-dimensional scatter plot. Use the drop-down menus in the sidebar to choose which axes (principal components) you want to view. You can color or shape the points by any sample annotation, such as treatment group, time point, or tissue type, to check whether your experimental design explains the separation you see.
How to interpret the plot:
The 3D tab shows a rotating three-dimensional version of the PCA plot. Pick any three components and drag the plot to view it from different angles. The color and shape options you chose in the sidebar carry over automatically.
Additional tabs powered by the PCAtools package provide deeper diagnostics:
The MDS and t-SNE tabs offer alternative views of the same processed data. MDS (multidimensional scaling) focuses on overall distances, while t-SNE is a nonlinear method that can uncover tight clusters.
Buttons in the sidebar generate an HTML summary of the PCA results and let you download the PCA scores table. The table is useful for custom plots or for cross-checking how individual samples load onto each component.
The above graph is an UpSet plot that is an alternative to a venn diagram. The plot shows the intersections of the data in the combination matrix (bottom) and the columns show how many genes are in each intersection.
The Stats tab is where you define statistical comparisons (contrasts) and run differential expression analysis. After Stats finishes, the DEG tab lets you explore each contrast in detail using tables, plots, and enrichment tools. Think of Stats as the engine that creates the comparison results, and DEG as the dashboard that helps you interpret them.
Begin on the Experiment Design sub-tab. If you uploaded only a gene expression matrix, iDEP infers sample groups directly from the column names (for example, Control_1, Treat_1). If you also provided a design file, the additional columns appear as selectable factors. Use the drop-down lists to choose which annotations define your contrasts and, when needed, specify reference levels or block factors (paired samples, batches, donors). The model formula shown on this page is exactly what will be sent to DESeq2 or limma, so you can confirm it matches your experimental plan.
Contrasts define the specific comparisons you want to test. Once you choose a factor, iDEP automatically creates pairwise contrasts between its levels. For example, if Treatment includes Control and Drug, Stats will compute “Drug vs Control”. When only two groups exist, that is the sole contrast and Stats models the fold change of Drug relative to Control. If you include two factors and enable their interaction, DESeq2/limma can also test whether the effect of one factor depends on the other.
Example: genotype × treatment (2×2). Suppose you measure gene expression in wild-type and mutant cells, each with and without drug exposure. Select both Genotype and Treatment as factors. Stats generates contrasts such as “Mutant vs WT” (averaged across treatments), “Drug vs Control” (averaged across genotypes), and—if you turn on the interaction— “I:Genotype:Treatment”. A significant interaction indicates the drug response in mutants differs from wild type, a common question in pharmacogenomics studies.
For simpler studies with only a few sample groups, use the Model comparisons list on the Experiment Design page to pick exactly which pairs you want. For instance, with three groups (Control, DrugA, DrugB) you can uncheck unwanted contrasts and keep only “DrugA vs Control”. Remember that “DrugA vs Control” reports fold change relative to Control, so positive values indicate higher expression in DrugA.
Controlling batch effects. Add columns such as Batch, Patient, or SequencingRun to your design file and select them as block factors. The model (~ Batch + Treatment) accounts for these systematic differences without producing separate contrasts for them, reducing false positives caused by technical variation.
Other real-world scenarios. Time-course experiments can treat Time and Treatment as factors, enabling contrasts such as “12h vs 0h” and interaction terms that reveal time-dependent treatment effects. For paired clinical samples (tumor vs matched normal), include the patient ID as a block factor so inter-patient variability is removed before testing the tumor vs normal contrast.
The Method menu appears when you uploaded raw counts. Select:
If you only have normalized values (FPKM, TPM, log ratios), Stats can still run using limma-trend, but you lose some sensitivity compared with modelling raw counts. Whenever possible, upload the original count matrix and analyze it with DESeq2.
Additional switches appear based on the method:
After configuring options, click Submit to run Stats. A notification appears when results are ready. If you make changes, submit again to refresh the analysis.
Once Stats finishes, switch to the Results sub-tab to review the number of up- and down-regulated genes for every contrast. Download the table or the summary graphic for reports. These contrasts feed directly into DEG.
The Venn Diagram sub-tab visualizes gene list overlaps across contrasts. Use the check box to split results into upregulated and downregulated sets before exporting figures with the download buttons.
The Venn Diagram view is best when you have two to four contrasts and want to see shared genes quickly. If you selected more contrasts, the UpSet plot (displayed underneath) summarizes the same overlaps in a bar-chart layout that stays readable for larger numbers of comparisons.
The R Code sub-tab reveals the exact DESeq2/limma commands used so you can reproduce the analysis offline.
Copy the script from R Code or use the download button to save it as
an .R file. Running this script locally lets you tweak advanced
settings beyond the graphical interface while keeping your analysis fully
documented.
Note: In the gene type column, "C" indicates protein-coding genes, and "P" means pseduogenes.
The DEG tab helps you inspect each contrast produced in Stats. Select a comparison to review significant genes, visualize fold changes, and run enrichment analyses. Stats defines the comparisons; DEG explains what each comparison means biologically.
Use the menu on the left to pick any contrast from Stats. Labels follow the “B-A” convention (for example, “Drug-Control”), meaning fold changes are calculated relative to the baseline group (Control in this example). Interaction contrasts start with “I:” and test whether the effect of one factor depends on another.
The default view is a clustered heatmap of differentially expressed genes. Choose how many genes to display and whether to sort them by fold change or FDR. Download buttons provide both the image and the underlying expression matrix for custom plots or reporting.
Each plot offers download options so you can save high-resolution figures or export the data points behind them.
The Enrichment tab connects directly to the pathway analysis module. For the selected contrast, you can run GO, pathway, and TF/miRNA target enrichment. This helps translate lists of DEGs into biological processes and regulatory themes. Adjust settings exactly as you would elsewhere in iDEP.
DEG also provides the R script used to generate plots and tables. Download it if you want to customize visualizations or rerun the analysis offline.
Adjusting the width of the browser window can render figure differently and resolve the "Figure margin too wide" error.
Connected gene sets share more genes. Color of node correspond to adjuested Pvalues.
The Pathway tab reuses the fold-change values generated in Stats to score every pathway or gene set in the selected database. Because it analyses the full set of genes (not just the significant list), it can reveal coordinated but modest expression shifts that DEG enrichment might miss.
This approach differs from the enrichment tools under DEG, which only examine genes classified as up- or down-regulated. Pathway analysis here considers the entire ranked list of genes, so pathways may surface even when individual genes do not pass DEG thresholds.
Fabregat, A., Sidiropoulos, K., Garapati, P., Gillespie, M., Hausmann, K., Haw, R., Jassal, B., Jupe, S., Korninger, F., McKay, S., et al. (2016). The Reactome pathway Knowledgebase. Nucleic Acids Res 44, D481-487.
Furge, K., and Dykema, K. (2012). PGSEA: Parametric Gene Set Enrichment Analysis. R package version 1480.
Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y., and Morishima, K. (2017). KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res 45, D353-D361.
Kim, S.Y., and Volsky, D.J. (2005). PAGE: parametric analysis of gene set enrichment. BMC Bioinformatics 6, 144.
Luo, W., and Brouwer, C. (2013). Pathview: an R/Bioconductor package for pathway-based data integration and visualization. Bioinformatics 29, 1830-1831.
Sergushichev, A. (2016). An algorithm for fast preranked gene set enrichment analysis using cumulative statistic calculation. bioRxiv http://biorxiv.org/content/early/2016/06/20/060012.
Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., et al. (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102, 15545-15550.
Yu, G., and He, Q.Y. (2016). ReactomePA: an R/Bioconductor package for Reactome pathway analysis and visualization. Mol Biosyst 12, 477-479.
Select a region to zoom in. Mouse over the points to see more information on the gene. Enriched regions are highlighted by blue or red line segments paralell to the chromosomes.
Mouse over the figure to see more options on the top right, including download.
The Genome tab shows where your fold-changes of genes along each chromosome. It also scans for genomic regions where many neighbouring genes change together, highlighting potential co-regulation or shared copy-number events. The controls on the left let you choose the comparison, gene filters, and window settings used in these plots.
Beneath the basic filters you can search for chromosomal regions where many genes change together.
Enriched pathways in the selected cluster:
Note: In the gene type column, "C" indicates protein-coding genes, and "P" means pseduogenes.
The Bicluster tab searches for groups of genes that behave similarly across only a subset of samples. This can uncover condition-specific co-expression patterns that standard clustering (which uses all samples) may miss. Biclustering generally works best when you have at least 15 samples and more than two experimental groups.
The methods implemented here come from the biclust and QUBIC R packages.
Displays the expression values for genes and samples within the selected bicluster. Rows are clustered to highlight internal structure; colour scales come from the Pre-Process settings. Use the heatmap to check whether the bicluster isolates a distinct pattern.
Runs functional enrichment on the genes inside the bicluster, using the same enrichment module as other parts of iDEP. This helps you interpret whether the bicluster represents a particular pathway, transcription factor target set, or biological process.
Lists the gene members of the bicluster along with any available annotations. Use the download button to export the table for downstream analysis or validation.
Need background on biclustering? Visit the iDEP biclustering guide for additional examples and references.
Enriched pathways in the selected module
Note: In the gene type column, "C" indicates protein-coding genes, and "P" means pseduogenes.
The Network tab identifies co-expression modules using Weighted Gene Co-expression Network Analysis (WGCNA). Genes that track together across samples are grouped into modules, which may correspond to biological pathways or cell-type signatures. Because WGCNA relies on correlation patterns, having at least 15 samples and multiple conditions yields the most reliable results.
This implementation is based on the original WGCNA publication.
Visualises the selected module as a gene co-expression network.
Plots module eigengenes (the first principal component of each module) across samples, helping you relate modules to experimental factors. Use this to spot modules that track with treatments, time points, or other covariates.
Shows the expression of genes within the selected module. The heatmap follows the colour palette chosen in Pre-Process, and the “Heatmap Data” download provides the underlying matrix.
Runs enrichment analysis for the genes in the module, pointing to pathways, GO terms, or transcription factor targets that may explain the co-expression pattern.
Diagnostic plots from WGCNA:
For background and tutorials, visit the iDEP network analysis guide.
If you state your general research area and how iDEP makes you more productive, we can use it as a support letter when we apply for the next round of funding. Hundreds of strong, enthusiastic letters sent to us in 2019 were essential when we applied for the current grant from NIH/NHGRI (R01HG010805), which expires in 20 months. Your letters will help sustain and improve this service.
iDEP is developed and maintained by a small team at South Dakota State University (SDSU). Our team consists of Xijin Ge (PI), Jianli Qi (research staff), and several talented students. None of us are trained as software engineers. But we share the passion about developing an user-friendly tool for all biologists, especially those who do not have access to bioinformaticians.
Graduate students contributed to this project include Eun Wo Son, Runan Yao, Roberto Villegas-Diaz, Eric Tulowetzke, Emma Spors, Chris Trettel, and Ben Derenge. Undergraduate students include Jenna Thorstenson and Jakob Fossen. Research staff include Jianli Qi and Gavin Doering. Much of the new version of iDEP is rewritten by Gavin Doering. The iDEP logo was designed by Emma Spors. Technical support is kindly provided by the Office of Information Technology (OIT) at SDSU. Mirror site is enabled by a JetStream2 allocation award (BIO210175), which is supported by NSF.
If you use iDEP, even just for prelimiary analysis, please cite: Ge, Son & Yao, iDEP: an integrated web application for differential expression and pathway analysis of RNA-Seq data, BMC Bioinformatics 19:1-24, 2018. Merely mentioning iDEP with an URL is insufficient. It is difficult to track. Consider citing other tools that form the foundation of iDEP, such as ENSEMBL, STRING-db, DESeq2, limma and many others. If you use the KEGG diagram, please also cite pathview, and KEGG.
According to Google Scholar, more than 900 papers cited iDEP, as of April 19, 2024. Our website has been accessed over 600,000 times by 120,000 users, spending 15 minutes each time. For every 1000 users, only 6 cited the iDEP paper, which is disappointingly low.
Source code is available on GitHub, which also includes instructions to install iDEP on your local machine using our database.
Download the reports in the multiple tabs, which has the parameters and results. Also download and try the the R code on the Stats tab.
User uploaded data files are saved in a temporary folder during your session and automatically deleted. Our group does not keep a copy of the uploaded data. We monitor web traffic using Google Analytics, which tells us your IP address (approximate location down to the city level), and how long you are on this site. Error messages are recorded by Shiny server. By visiting this site, you agree to provide web activity data.
Please email Jenny gelabinfo@gmail.com (recommended) or Dr. Ge xijin.ge@sdstate.edu (unreliable). Follow us on Twitter for recent updates. File bug reports on GitHub.
iDEP is developed by a small team with limited resources. We have not thoroughly tested it. So please verify all findings using other tools or R scripts. We tried our best to ensure our analysis is correct, but there is no guarantee.
By offering so many combinations of methods to analyze a data set, iDEP enables users to rationalize. It is human nature to focus on results that we like to see (confirmation bias). It is unfortunate that you can almost found further support for almost any theory from the massive but noisy literature. We encourage users to be critical of the results obtained using iDEP. Try to focus on robust results, rather than those that only should up with a certain parameter using a particular method.
11/6/2025: iDEP 2.3.5. Fix bugs. Improve plots for big datasets. Make Prep report reproducible.
11/5/2025: iDEP 2.3.4. Fix bugs on labeling k-Means clusters.
10/29/2025: iDEP 2.3.0. Add Marker gene plots. Update reports for each tabs. New video.
11/4/2025: iDEP 2.3.2. Improve plots for large sample sets. Cap # of sample groups. Label pathway on k-Means heatmap.
10/29/2025: iDEP 2.3.0. Add Marker gene plots. Update reports for each tabs. New video.
9/28/2025: iDEP 2.20. UI improvements. Add documentations for each tab.
9/25/2025: iDEP 2.11. UI improvements for species selection. Remove submit buttons from Clustering, Bicluster, and Network tabs.
9/5/2025: iDEP 2.10. UI improvements. Database update to Ensembl 133.
4/20/2024: Fix bug in network tab related to module download. Enable download of network as a CSV file.
4/19/2024: iDEP 2.01. Minor upgrade. Fixed a bug related to insufficiant # of color in palettes. Optimized UI for load data. Reverted to basic Shiny theme due to an issue with new version of Shiny package.