TY - JOUR
T1 - PANDORA
T2 - Analysis of protein and peptide sets through the hierarchical integration of annotations
AU - Rappoport, Nadav
AU - Fromer, Menachem
AU - Schweiger, Regev
AU - Linial, Michal
N1 - Funding Information:
Prospects consortium (EU framework VII) and the BSF (grant number 2007219); Sudarsky Center for Computational Biology (to N.R., M.F. and R.S.). Funding for open access charge: Prospects consortium (EU framework VII) and the BSF (grant number 2007219).
PY - 2010/5/5
Y1 - 2010/5/5
N2 - Derivation of biological meaning from large sets of proteins or genes is a frequent task in genomic and proteomic studies. Such sets often arise from experimental methods including large-scale gene expression experiments and mass spectrometry (MS) proteomics. Large sets of genes or proteins are also the outcome of computational methods such as BLAST search and homology-based classifications. We have developed the PANDORA web server, which functions as a platform for the advanced biological analysis of sets of genes, proteins, or proteolytic peptides. First, the input set is mapped to a set of corresponding proteins. Then, an analysis of the protein set produces a graph-based hierarchy which highlights intrinsic relations amongst biological subsets, in light of their different annotations from multiple annotation resources. PANDORA integrates a large collection of annotation sources (GO, UniProt Keywords, InterPro, Enzyme, SCOP, CATH, Gene-3D, NCBI taxonomy and more) that comprise ~200 000 different annotation terms associated with ~3.2 million sequences from UniProtKB. Statistical enrichment based on a binomial approximation of the hypergeometric distribution and corrected for multiple hypothesis tests is calculated using several background sets, including major gene-expression DNA-chip platforms. Users can also visualize either standard or user-defined binary and quantitative properties alongside the proteins. PANDORA 4.2 is available at http://www.pandora.cs.huji.ac.il.
AB - Derivation of biological meaning from large sets of proteins or genes is a frequent task in genomic and proteomic studies. Such sets often arise from experimental methods including large-scale gene expression experiments and mass spectrometry (MS) proteomics. Large sets of genes or proteins are also the outcome of computational methods such as BLAST search and homology-based classifications. We have developed the PANDORA web server, which functions as a platform for the advanced biological analysis of sets of genes, proteins, or proteolytic peptides. First, the input set is mapped to a set of corresponding proteins. Then, an analysis of the protein set produces a graph-based hierarchy which highlights intrinsic relations amongst biological subsets, in light of their different annotations from multiple annotation resources. PANDORA integrates a large collection of annotation sources (GO, UniProt Keywords, InterPro, Enzyme, SCOP, CATH, Gene-3D, NCBI taxonomy and more) that comprise ~200 000 different annotation terms associated with ~3.2 million sequences from UniProtKB. Statistical enrichment based on a binomial approximation of the hypergeometric distribution and corrected for multiple hypothesis tests is calculated using several background sets, including major gene-expression DNA-chip platforms. Users can also visualize either standard or user-defined binary and quantitative properties alongside the proteins. PANDORA 4.2 is available at http://www.pandora.cs.huji.ac.il.
UR - http://www.scopus.com/inward/record.url?scp=77954267418&partnerID=8YFLogxK
U2 - 10.1093/nar/gkq320
DO - 10.1093/nar/gkq320
M3 - Article
C2 - 20444873
AN - SCOPUS:77954267418
SN - 0305-1048
VL - 38
SP - W84-W89
JO - Nucleic Acids Research
JF - Nucleic Acids Research
IS - SUPPL. 2
M1 - gkq320
ER -