5/1/2023: ShinyGO 0.80 release in testing mode. Thanks to Jenny's hardwork, we update to Ensembl release 107 which includes 620 species: 215 main, 177 metazoa, 124 plants, 33 protists and 1 bacteria. We also included 14,094 species from STRING-DB 11.5.
Thank you to the 1% of users who sent us support letters! To support us going forward, cite our paper.
Jan. 19, 2023: Thanks to a user's feedback, we found a serious bug in ShinyGO 0.76. As some genes are represented by multiple gene IDs in Ensebml, they are counted more than once in calculating enrichment. We believe this is fixed. If you pasted Ensembl gene IDs to ShinyGO 0.76 between April 4, 2022 and Jan. 19, 2023, please rerun your analysis. ShinyGO has not been throughly tested. Please always double check your results with other tools such as G:profiler, Enrichr, STRING-db, and DAVID.
If this server is busy, please use a mirror sever http://ge-lab.org/go/ hosted by NSF-funded JetStream2.
Email Jenny (gelabinfo@gmail.com) for questions, suggestions or data contributions. Follow Dr Ge on Twitter for updates.
Feb. 11, 2022: Like ShinyGO but your genome is not covered? Customized ShinyGO is now available. Its database includes several custom genomes requested by users. To request to add a new species/genome, fill in this Form.
A graphical tool for gene enrichment analysis
Just paste your gene list to get enriched GO terms and othe pathways for over 420 plant and animal species, based on annotation from Ensembl, Ensembl plants and Ensembl Metazoa. An additional 5000 genomes (including bacteria and fungi) are annotated based on STRING-db (v.11). In addition, it also produces KEGG pathway diagrams with your genes highlighted, hierarchical clustering trees and networks summarizing overlapping terms/pathways, protein-protein interaction networks, gene characterristics plots, and enriched promoter motifs. See example outputs below:
All query genes are first converted to ENSEMBL gene IDs or STRING-db protein IDs. Our gene ID mapping and pathway data are mostly derived from these two sources. For the 20 most studied species, we also manually collected a large number of pathways from various public databases.
FDR is calculated based on nominal P-value from the hypergeometric test. Fold Enrichment is defined as the percentage of genes in your list belonging to a pathway, divided by the corresponding percentage in the background. FDR tells us how likely the enrichment is by chance. Due to increased statistical power, large pathways tend to have smaller FDRs. As a measure of effect size, Fold Enrichment indicates how drastically genes of a certain pathway is overrepresented. This is a important metric, even though often ignored.
We highly recommend users upload a list of genes as the background. These could be all the genes passed a low filter in RNA-seq. If background genes are not uploaded, the default is to use all protein-coding genes. Alternatively, you can check the box next to 'Use pathway database for gene counts', which will calculate background genes as the total unique nubmer of genes in pathway database that users choose. As some pathway database can be huge and have genes not properly converted, we limit the total nubmer to between 5000 and 30,000. When this option is used, any genes in user's original gene list but not in the pathway database will also be ignored.
Only pathways that are within the specified size limits are used for enrichment analysis. After the analysis is done, pathways are first filtered based on a user specified FDR cutoff. Then the siginificant pathways are sorted by FDR, Fold Enrichment, or other metrics. When 'Sort by average ranks(FDR & Fold)' is selected, significant pathways are sorted by the average of the ranks by FDR and Fold Enrichment. By selecting 'Select by FDR, sort by Fold Enrichment', users first select the top pathways by FDR, then these are sorted by Fold Enrichment. When 'Remove redundancy' is selected, similar pathways sharing 95% of genes are represented by the most significant pathway. Redundant pathways also needs to share 50% of the words in their names. When 'Remove redundancy' is selected longer pathway names are also represented by the first 80 characters.
A hierarchical clustering tree summarizes the correlation among significant pathways listed in the Enrichment tab. Pathways with many shared genes are clustered together. Bigger dots indicate more significant P-values. The width of the plot can be changed by adjusting the width of your browser window.
Edge cutoff:
Similar to the Tree tab, this interactive plot also shows the relationship between enriched pathways. Two pathways (nodes) are connected if they share 20% (default) or more genes. You can move the nodes by dragging them, zoom in and out by scrolling, and shift the entire network by click on an empty point and drag. Darker nodes are more significantly enriched gene sets. Bigger nodes represent larger gene sets. Thicker edges represent more overlapped genes.
Please select KEGG from the pathway databases to conduct enrichment analysis first. Then you can visualize your genes on any of the significant pathways. Only for some species.
Your genes are highlighted in red. Downloading pathway diagram from KEGG can take 3 minutes.
Your genes are grouped by functional categories defined by high-level GO terms.
The characteristics of your genes are compared with the rest in the genome. Chi-squared and Student's t-tests are run to see if your genes have special characteristics when compared with all the other genes or, if uploaded, a customized background.
The genes are represented by red dots. The purple lines indicate regions where these genes are statistically enriched, compared to the density of genes in the background. We scanned the genome with a sliding window. Each window is further divided into several equal-sized steps for sliding. Within each window we used the hypergeometric test to determine if your genes are significantly overrepresented. Essentially, the genes in each window define a gene set/pathway, and we carried out enrichment analysis. The chromosomes may be only partly shown as we use the last gene's location to draw the line. Mouse over to see gene symbols. Zoom in regions of interest.
The promoter sequences of your genes are compared with those of the other genes in the genome in terms of transcription factor (TF) binding motifs. "*Query gene" indicates a transcription factor coded by a gene included in your list.
Your genes are sent to STRING-db website for enrichment analysis and retrieval of a protein-protein network. We tries to match your species with the archaeal, bacterial, and eukaryotic species in the STRING server and send the genes. If it is running, please wait until it finishes. This can take 5 minutes, especially for the first time when shinyGO downloads large annotation files.
ShinyGO is developed and maintained by a small team at South Dakota State University (SDSU). Our team consists of Xijin Ge (PI), Jianli Qi (research associate), and two talented graduate students (Emma Spors and Ben Derenge). None of us are trained as software engineers. But we share the passion about developing an user-friendly tool for all biologists, especially those who do not have access to bioinformaticians.
For feedbacks, email us, or file bug report or feature request on our GitHub repository, where you can also find the source code. For details, please see our paper and a detailed demo. ShinyGO shares many functionalities and databases with iDEP.Citation (Just including URL is not enough!):
Ge SX, Jung D & Yao R, Bioinformatics 36:2628–2629, 2020. If you use the KEGG diagram, please also cite the papers for pathview, and KEGG.
Previous versions are still functional:
ShinyGO V0.76, based on Ensembl Release 104 with revision, archived on September 2, 2022
ShinyGO V0.75, based on Ensembl Release 104 with revision, archived on April 4, 2022
ShinyGO V0.74, based on Ensembl Release 104, archived on Feb. 8, 2022
ShinyGO V0.65, based on Ensembl Release 103, archived on Oct. 15, 2021
ShinyGO V0.61, based on Ensembl Release 96, archived on May 23, 2020
ShinyGO V0.60, based on Ensembl Release version 96, archived on Nov 6, 2019
ShinyGO V0.51, based on Ensembl Release version 95, archived on May 20, 2019
ShinyGO V0.50, based on Ensembl Release version 92, archived on March 29, 2019
ShinyGO V0.41, based on Ensembl Release version 91, archived on July 11, 2018
Genomes based on STRING-db is marked as STRING-db. If the same genome is included in both Ensembl and STRING-db, users should use Ensembl annotation, as it is more updated and is supported in more functional modules.
Input:
A list of gene ids, separated by tab, space, comma or the newline characters. Ensembl gene IDs are used internally to identify genes. Other types of IDs will be mapped to Ensembl gene IDs using ID mapping information available in Ensembl BioMart.Output:
Enriched GO terms and pathways:In addition to the enrichment table, a set of plots are produced. If KEGG database is choosen, then enriched pathway diagrams are shown, with user's genes highlighted, like this one below:
Many GO terms are related. Some are even redundant, like "cell cycle" and "cell cycle process". To visualize such relatedness in enrichment results, we use a hierarchical clustering tree and network. In this hierarchical clustering tree, related GO terms are grouped together based on how many genes they share. The size of the solid circle corresponds to the enrichment FDR.
In this network below, each node represents an enriched GO term. Related GO terms are connected by a line, whose thickness reflects percent of overlapping genes. The size of the node corresponds to number of genes.
Through API access to STRING-db, we also retrieve a protein-protein interaction (PPI) network. In addition to a static network image, users can also get access to interactive graphics at the www.string-db.org web server.
ShinyGO also detects transcription factor (TF) binding motifs enriched in the promoters of user's genes.
Sources for human pathway databases:
Type |
Subtype/Database name |
#GeneSets |
Source |
Gene Ontology |
Biological Process (BP) |
15796 |
Ensembl 92 |
|
Cellular Component (CC) |
1916 |
Ensembl 92 |
|
Molecular Function (MF) |
4605 |
Ensembl 92 |
KEGG |
KEGG |
327 |
Release 86.1 |
Curated |
Biocarta |
249 |
Whichgenes 1.5 |
|
GeneSetDB.EHMN |
55 |
GeneSetDB |
|
Panther |
168 |
1.0.4 |
|
HumanCyc |
240 |
pathway Commons V9 |
|
INOH |
576 |
pathway Commons V9 |
|
NetPath |
27 |
pathway Commons V9 |
|
PID |
223 |
pathway Commons V9 |
|
PSP |
327 |
pathway Commons V9 |
|
Recon X |
2339 |
pathway Commons V9 |
|
Reactome |
2010 |
V64 |
|
Wiki |
457 |
20180610 |
TF.Target |
CircuitsDB.TF |
829 |
V2012 |
|
ENCODE |
181 |
V70.0 |
|
Marbach2016 |
628 |
regulatorycircuits Release 1.0 |
|
RegNetwork.TF |
1400 |
7/1/2017 |
|
TFacts |
428 |
Feb. 2012 |
|
tftargets.ITFP |
1926 |
tftargets May,2017 |
|
tftargets.Neph2012 |
16476 |
tftargets May,2017 |
|
tftargets.TRED |
131 |
tftargets May,2017 |
|
TRRUST |
793 |
V2 |
miRNA.Targets |
CircuitsDB.miRNA |
140 |
V. 2012 |
|
GeneSetDB.MicroCosm |
44 |
GeneSetDB |
|
miRDB |
2588 |
V 5.0 |
|
miRTarBase |
2599 |
V 7.0 |
|
RegNetwork.miRNA |
618 |
V. 2015 |
|
TargetScan |
219 |
V7.2 |
MSigDB.Computational |
Computational gene sets |
858 |
MSigDB 6.1 |
MSigDB.Curated |
Literature |
3465 |
MSigDB 6.1 |
MSigDB.Hallmark |
hallmark |
50 |
MSigDB 6.1 |
MSigDB.Immune |
Immune system |
4872 |
MSigDB 6.1 |
MSigDB.Location |
Cytogenetic band |
326 |
MSigDB 6.1 |
MSigDB.Motif |
TF and miRNA Motifs |
836 |
MSigDB 6.1 |
MSigDB.Oncogenic |
Oncogenic signatures |
189 |
MSigDB 6.1 |
PPI |
BioGRID |
15542 |
3.4.160 |
|
CORUM |
2178 |
02.07.2017 |
|
BIND |
3807 |
pathway Commons V9 |
|
DIP |
2630 |
pathway Commons V9 |
|
HPRD |
7141 |
pathway Commons V9 |
|
IntAct |
11991 |
pathway Commons V9 |
Drug |
GeneSetDB.MATADOR |
266 |
GeneSetDB |
|
GeneSetDB.SIDER |
473 |
GeneSetDB |
|
GeneSetDB.STITCH |
4616 |
GeneSetDB |
|
GeneSetDB.T3DB |
846 |
GeneSetDB |
|
SMPDB |
699 |
pathway Commons V9 |
|
CTD |
8758 |
pathway Commons V9 |
|
Drugbank |
2563 |
pathway Commons V9 |
Other |
GeneSetDB.CancerGenes |
23 |
GeneSetDB |
|
GeneSetDB.MethCancerDB |
21 |
GeneSetDB |
|
GeneSetDB.MethyCancer |
54 |
GeneSetDB |
|
GeneSetDB.MPO |
3134 |
GeneSetDB |
|
HPO |
6785 |
May,2018 |
Total: |
|
140,438 |
|
Sources for mouse pathway databases:
|
|
|
|
Type |
Source |
#Sets |
Note |
Co-expression |
Literature |
8,742 |
Differentially expressed genes from 2526 studies |
|
MSigDB |
3,964 |
Molecular Signature Database, v.6.0 |
|
L2L |
248 |
List of lists, v.2006.2 |
|
CancerGenes* |
23 |
Cancer gene lists |
|
GeneSigDB |
494 |
Gene Signature Database, R.4 |
Gene |
GO_BP |
11,943 |
V2017.5 |
Ontology |
GO_MF |
2,932 |
|
|
GO_CC |
1,475 |
|
Curated |
Biocarta* |
176 |
Metabolic and signaling pathways |
pathways |
PANTHER |
151 |
Ontology-based pathway database, v3.4.1 |
|
WikiPathways* |
146 |
Open platform for pathway curation |
|
INOH* |
73 |
Integrating network objects with hierarchies |
|
NetPath* |
25 |
Signal transduction pathways |
Metabolic |
KEGG |
314 |
Metabolic pathways, R.82.0 |
pathways |
EHMN* |
53 |
Edinburgh human metabolic network |
|
MouseCyc |
321 |
Mouse Biochemical Pathways , v2013.7 |
Drug |
CTD* |
910 |
The Comparative Toxicogenomics Database |
related |
SIDER* |
460 |
Side Effect Resource |
|
MATADOR* |
248 |
Manually Annotated Targets and Drugs Online Resource |
|
DrugBank* |
136 |
Open data drug and target database |
|
SMPDB* |
74 |
Small Molecule Pathway Database |
miRNA |
miRDB |
1,912 |
miRNA target prediction and annotations, v 5.0 |
Target |
microRNA.org |
314 |
Predicted miRNA targets, v.R2010 |
Genes |
Grimson et al. |
179 |
Predicted miRNA targets. v.6.2 |
|
TarBase |
84 |
Experimentally validated miRNA targets, v.6.0 |
|
miRTarBase |
775 |
Experimentally validated miRNA targets, V6.1 |
|
MicroCosm |
464 |
Predicted targets |
|
PicTar |
35 |
Predicted miRNA sites, v. 2007.3 |
TF Target |
TFactS* |
101 |
Predicted TF targets |
Genes |
TRED |
99 |
Confirmed TF target genes, v.2013.7 |
|
CircuitsDB |
94 |
Mixed miRNA/TF regulation, v. 2012 |
|
TRANSFAC |
78 |
Confirmed TF binding sites, v7.0 |
Others |
Location |
341 |
Genomic location on chromosomes, v.2017 |
|
HPO* |
1,518 |
The human phenotype ontology |
|
STITCH* |
3,929 |
Interaction networks of chemicals and proteins |
|
MPO* |
2,943 |
Mammalian Phenotype Ontology |
|
T3DB* |
722 |
Database of common toxins and their targets |
|
PID* |
193 |
Pathway Interaction Database |
|
MethyCancer* |
50 |
Human DNA methylation and cancer |
|
MethCancerDB* |
19 |
Aberrant DNA methylation in human cancer |
|
Total |
46,758 |
*Secondary data from GeneSetDB |
Changes:
Oct 26, 2022: V. 0.76.3 Add hover text. Change plot styles. When users select "Sort by Fold Enrichment", the minimum pathway size is raised to 10 to filter out noise from tiny gene sets.
Sept 28, 2022: In ShinyGO 0.76.2, KEGG is now the default pathway database. More importantly, we reverted to 0.76 for default gene counting method, namely all protein-coding genes are used as the background by default. The new feature introduced in 0.76.1, which uses the pathway database to determine total number of genes in the background, can be turned on as an option ('Use pathway database for gene counts'). This is based on feedback from some users that when using smaller pathway databases, such as KEGG, the new method changes the P values substantially.
Sept 3, 2022: ShinyGO 0.76.1. In this small improvement, we improved how we count the number of genes for calculating P value. A gene must match at least one pathway in the selected pathway database. Otherwise this gene is ignored in the calculation of P values based on hypergeometric distribution. This applies to both query and background genes.
April 19, 2022: ShinyGO 0.76 released. Improved pathway filtering, pathway sorting, figure downloading. Version 0.75 is available here.
April 17, 2022: Add more flexiblity for download figures in PDF, SVG and high-res PNG.
April 8, 2022: Add features to remove redundant pathways. Add filter to remove extrmely large or small pathways. Changed interface to always show KEGG tab.
Mar. 7, 2022: Fixed an R library issue affected KEGG diagrams for some organisms.
Feb. 26, 2022: Fixed a bug regarding the Plot tab when background genes are used. Background genes were not correctly used to calculate the distributions of various gene characteristics. If these plots are important in your study, please re-analyze your genes.
Feb. 19, 2022: R upgraded from 4.05 to 4.1.2. This solved the STRING API issues. Some Bioconductor packages are also upgraded.
Feb. 8, 2022: ShinyGO v0.75 officially released. Old versions are still available. See the last tab.
Nov. 15, 2021: Database update. ShinyGO v0.75 available in testing mode. It includes Ensembl database update, new species from Ensembl Fungi and Ensembl Protists, and STRINGdb (5090 species) update to 11.5.
Oct25, 2021: Interactive genome plot. Identificantion of genomic regions signficantly enriched with user genes.
Oct.23, 2021: Version 0.741 A fully customizable enrichment chart! Switch between bar, dot or lollipop plots. Detailed gene informations with links on the Genes tab.
Oct. 15, 2021: Version 0.74. Database updated to Ensembl Release 104 and STRING v11. We now recommends the use of background genes in enrichment analysis. V.0.74 is much faster with even large set of background genes.