Abstract

The PRoteomics IDEntifications (PRIDE, http://www.ebi.ac.uk/pride) database at the European Bioinformatics Institute is one of the most prominent data repositories of mass spectrometry (MS)-based proteomics data. Here, we summarize recent developments in the PRIDE database and related tools. First, we provide up-to-date statistics in data content, splitting the figures by groups of organisms and species, including peptide and protein identifications, and post-translational modifications. We then describe the tools that are part of the PRIDE submission pipeline, especially the recently developed PRIDE Converter 2 (new submission tool) and PRIDE Inspector (visualization and analysis tool). We also give an update about the integration of PRIDE with other MS proteomics resources in the context of the ProteomeXchange consortium. Finally, we briefly review the quality control efforts that are ongoing at present and outline our future plans.

INTRODUCTION

Mass spectrometry (MS)-based proteomics approaches are widely used in the life sciences. There are three main workflows, with bottom-up proteomics the most extensively used technique (also known as shot-gun proteomics) (1). In this experimental set up, the proteins to be analysed are enzymatically digested by a protease (most often trypsin) into potentially highly complex peptide mixtures, which are then subjected to fractionation by multidimensional liquid chromatography steps before they are measured in the mass spectrometer. Other main approaches are top-down, where intact proteins are measured (2), and targeted proteomics (such as Selected Reaction Monitoring, SRM), where the researcher tries to detect specific proteins in a given sample (3).

The PRoteomics IDEntifications (PRIDE, http://www.ebi.ac.uk/pride) database was originally set up in 2004 (4–8) to enable public data deposition in the MS proteomics field, and to support the experimental data described in publications during the manuscript review process. The main data types stored in PRIDE are protein and peptide identifications (IDs) and quantitative values (including post-translational modifications, PTMs), the analysed mass spectra and the related technical and biological metadata. PRIDE supports bottom-up proteomics approaches, mainly tandem MS (MS/MS) data, but also Peptide Mass Fingerprinting datasets, and presents the data as originally analysed by the researchers, with several popular search engines/analysis workflows fully supported. Unlike other MS proteomics resources, such as PeptideAtlas (9) and the Global Proteome Machine Database (GPMDB) (10), no reprocessing of the data is performed because PRIDE aims to reflect the author’s analysis view on the experimental data. In fact, PRIDE remains as the unique generic resource of this kind since National Center for Biotechnology Information (NCBI) the repository Peptidome (11), its sibling resource in the USA, was discontinued in April 2011. Other MS data repositories, such as MaxQB (12), are more specialized [for an extensive review, see (13)] or are restricted to one particular analysis workflow.

For SRM data, the new PeptideAtlaS SRM Experiment Library (PASSEL) (14) is the main available resource. At present, there is no widely used resource devoted to top-down proteomics approaches. In addition to the ‘pure’ MS proteomics resources, there are other databases that can present an extra layer of information on top of the MS experiments without storing the underlying mass spectra. Some recently developed databases of this kind are the Model Organism Protein Expression Database (MOPED) (15), PaxDB (16) (both of them focused on protein expression information) and neXtProt (17).

Several services have been developed by the PRIDE team, which are heavily used by external users but also by PRIDE itself, especially the ‘Protein Identifier Cross-Reference’ (PICR) service (a protein identifier mapping resource) (18) and the ‘Ontology Lookup Service’ (OLS) (to query, browse and navigate biomedical ontologies) (19). In addition, ‘Database on Demand’ is a service to generate tailored databases for performing proteomics searches (20). As a key point, to improve and make the data submission process easier, several tools have also been made available to the proteomics community, such as the popular PRIDE Converter (21), PRIDE Inspector (22) and the new PRIDE Converter 2 (23). It is important to highlight that all the softwares, including the PRIDE core and web modules (http://code.google.com/p/ebi-pride/), are developed in Java and are open source. PRIDE is a recommended submission site of key journals such as Proteomics, Molecular and Cellular Proteomics and Nature Biotechnology. Currently, scientific journals and funding agencies alike are increasingly mandating public deposition of MS data to support the publication of related proteomics manuscripts.

In this manuscript, we summarize developments in the PRIDE database and associated tools since the previous Nucleic Acids Research (NAR) database update (8). We will also outline the PRIDE data deposition process, introduce the ProteomeXchange (PX) consortium and quality control (QC) efforts and highlight future developments.

DATA CONTENT IN PRIDE AND HOW TO ACCESS IT

There has been a substantial increase in the amount of stored data in PRIDE during the past years. By September 2012, PRIDE contains 25 853 MS-based proteomics experiments (compared with 9908 when the last NAR manuscript was submitted, in September 2009), around 11.1 million identified proteins (2.5 million in September 2009), 61.9 million identified peptides (11.5 million in September 2009) and 324 million spectra (50.3 million in September 2009). Note that these data holdings are absolute figures, not distinguishing public and pre-publication data. At the moment of writing, 66.7% (17 219) of the experiments were publicly available.

The complete set of data in PRIDE comprises 323 taxonomy identifiers (compared with 60, in September 2009), including human and many model organisms (Table 1). In comparison with figures from 3 years ago, animal species still provide the majority of the data. A total of 89 animal species are represented, contributing 62.2% and 61.9% of all protein and peptide IDs in PRIDE, respectively. Human continues to be the most represented species (28.1% and 37.2% of protein and peptide IDs, respectively). As a matter of fact, human and mouse alone account for almost as many IDs as the other species together: 44.2 % and 49.4 % of the protein and peptide IDs, respectively.

Table 1.

Data content in PRIDE split by taxonomic divisions

Group of organisms (number of species)% Protein IDs% Peptide IDs
Animals (89)62.261.9
Plants (46)19.314.8
Fungi (22)2.73.0
Bacteria (122)12.717.4
Others (44)3.12.9
Species
    Homo sapiens28.137.2
    Mus musculus16.112.2
    Zea mays9.52.2
    Arabidopsis thaliana6.810.1
    B. subtilis5.54.1
    Sus scrofa5.51.7
    Rattus norvegicus3.72.8
    Drosophila melanogaster3.32.3
    Danio rerio1.70.9
    Puniceispirillum marinum1.51.5
    E. coli1.21.7
    S. cerevisiae1.21.8
Group of organisms (number of species)% Protein IDs% Peptide IDs
Animals (89)62.261.9
Plants (46)19.314.8
Fungi (22)2.73.0
Bacteria (122)12.717.4
Others (44)3.12.9
Species
    Homo sapiens28.137.2
    Mus musculus16.112.2
    Zea mays9.52.2
    Arabidopsis thaliana6.810.1
    B. subtilis5.54.1
    Sus scrofa5.51.7
    Rattus norvegicus3.72.8
    Drosophila melanogaster3.32.3
    Danio rerio1.70.9
    Puniceispirillum marinum1.51.5
    E. coli1.21.7
    S. cerevisiae1.21.8

Only the top 12 species in terms of protein and peptide identifications are shown.

Table 1.

Data content in PRIDE split by taxonomic divisions

Group of organisms (number of species)% Protein IDs% Peptide IDs
Animals (89)62.261.9
Plants (46)19.314.8
Fungi (22)2.73.0
Bacteria (122)12.717.4
Others (44)3.12.9
Species
    Homo sapiens28.137.2
    Mus musculus16.112.2
    Zea mays9.52.2
    Arabidopsis thaliana6.810.1
    B. subtilis5.54.1
    Sus scrofa5.51.7
    Rattus norvegicus3.72.8
    Drosophila melanogaster3.32.3
    Danio rerio1.70.9
    Puniceispirillum marinum1.51.5
    E. coli1.21.7
    S. cerevisiae1.21.8
Group of organisms (number of species)% Protein IDs% Peptide IDs
Animals (89)62.261.9
Plants (46)19.314.8
Fungi (22)2.73.0
Bacteria (122)12.717.4
Others (44)3.12.9
Species
    Homo sapiens28.137.2
    Mus musculus16.112.2
    Zea mays9.52.2
    Arabidopsis thaliana6.810.1
    B. subtilis5.54.1
    Sus scrofa5.51.7
    Rattus norvegicus3.72.8
    Drosophila melanogaster3.32.3
    Danio rerio1.70.9
    Puniceispirillum marinum1.51.5
    E. coli1.21.7
    S. cerevisiae1.21.8

Only the top 12 species in terms of protein and peptide identifications are shown.

However, the relative proportion of other groups of organisms has increased, especially in the case of plants (46 taxonomy identifiers, 19.3% and 14.8%, respectively) and bacteria (12.7% and 17.4%, respectively). Bacteria are again by far the group of organisms with the highest number of taxonomy identifiers (122). Fungi are also represented (22 taxonomy identifiers, 2.7 and 3.0%, respectively). Apart from human, the most represented organisms in PRIDE are (in this order) mouse, maize, Arabidopsis, Bacillus subtilis, pig, rat, Drosophila, zebrafish, Puniceispirillum marinum, Escherichia coli and Saccharomyces cerevisiae (Table 1).

Table 2 includes the most abundant PTMs present in the database. Not surprisingly, the most often found PTM is oxidation (5.7 million modified sites), mainly due to the high amount of methionine oxidation, a modification that can be biologically relevant (24) but that, in most cases for MS proteomics experiments, is an artifact. Formylation is the second most abundant PTM (around 1.3 million sites), mainly owing to the data present from just one organism (maize). Phosphorylation comes in third place (around 1.1 million sites), with the highest proportion of data coming from human experiments. There is also a considerable amount of other PTMs such as dioxidation, deamidation, acetylation or dehydration, among others (Table 2).

Table 2.

Protein modification content in PRIDE as a whole, and split by species (human and main model organisms represented in PRIDE)

Modification typeTotalHumanMouseMaizeArabidopsisDrosophilaE. coliS. cerevisiae
Oxidation5 707 4262 291 925883 59925 907449 530240058 92197 803
Deamidation663 884333 09148 1203069648317 33510 859
Phosphorylation1 143 766741 619112 910171 53935 521239033 670
Acetylation626 510380 12422 662141910 614290135618 762
Dioxidation1 123 20631 7881625928 6140000
Deamination132 14587 01629 37446202622720
Dehydration202 43615 49232 863148 68603000
Methylthio110 72774 1770024250820
Formylation1 297 28552866291 289 6600000
Monomethylation299 246261 941480277000
Modification typeTotalHumanMouseMaizeArabidopsisDrosophilaE. coliS. cerevisiae
Oxidation5 707 4262 291 925883 59925 907449 530240058 92197 803
Deamidation663 884333 09148 1203069648317 33510 859
Phosphorylation1 143 766741 619112 910171 53935 521239033 670
Acetylation626 510380 12422 662141910 614290135618 762
Dioxidation1 123 20631 7881625928 6140000
Deamination132 14587 01629 37446202622720
Dehydration202 43615 49232 863148 68603000
Methylthio110 72774 1770024250820
Formylation1 297 28552866291 289 6600000
Monomethylation299 246261 941480277000
Table 2.

Protein modification content in PRIDE as a whole, and split by species (human and main model organisms represented in PRIDE)

Modification typeTotalHumanMouseMaizeArabidopsisDrosophilaE. coliS. cerevisiae
Oxidation5 707 4262 291 925883 59925 907449 530240058 92197 803
Deamidation663 884333 09148 1203069648317 33510 859
Phosphorylation1 143 766741 619112 910171 53935 521239033 670
Acetylation626 510380 12422 662141910 614290135618 762
Dioxidation1 123 20631 7881625928 6140000
Deamination132 14587 01629 37446202622720
Dehydration202 43615 49232 863148 68603000
Methylthio110 72774 1770024250820
Formylation1 297 28552866291 289 6600000
Monomethylation299 246261 941480277000
Modification typeTotalHumanMouseMaizeArabidopsisDrosophilaE. coliS. cerevisiae
Oxidation5 707 4262 291 925883 59925 907449 530240058 92197 803
Deamidation663 884333 09148 1203069648317 33510 859
Phosphorylation1 143 766741 619112 910171 53935 521239033 670
Acetylation626 510380 12422 662141910 614290135618 762
Dioxidation1 123 20631 7881625928 6140000
Deamination132 14587 01629 37446202622720
Dehydration202 43615 49232 863148 68603000
Methylthio110 72774 1770024250820
Formylation1 297 28552866291 289 6600000
Monomethylation299 246261 941480277000

This wealth of data can be accessed in different ways (Figure 1):

  • PRIDE web interface (http://www.ebi.ac.uk/pride). The home page was updated earlier in 2012, but no other major changes have been done to the current web interface. However, it is possible to access all experiments launching the PRIDE Inspector tool using Java Web Start (see below).

  • PRIDE BioMart (http://www.ebi.ac.uk/pride/prideMart.do). The BioMart interface is useful for batch data retrieval (25). In the current version of the PRIDE BioMart (running on BioMart version 0.7), data integration with Reactome (26) has been extended (27), by enabling the link between phosphorylated proteins present in Reactome pathways and phosphorylated proteins detected by MS approaches stored in PRIDE. The PRIDE BioMart data can also be accessed using a Representational State Transfer web service, which is heavily used. In addition, it is also possible to access the PRIDE BioMart at www.biomart.org, together with many others. In that case, apart from Reactome, it is possible to perform common data searches involving PRIDE and other resources such as UniProt (28), InterPro (29), Ensembl (30) and the Catalogue of Somatic Mutations in Cancer (COSMIC) (31).

  • PRIDE FTP file server (ftp.ebi.ac.uk/pub/databases/pride/). At present, XML files corresponding to all public experiments in PRIDE can be downloaded in the mzData and PRIDE XML formats.

  • PRIDE Distributed Annotation System (DAS) server (http://www.ebi.ac.uk/pride-das/das/PrideDataSource/). A PRIDE DAS server (32) was set up following the new specification 1.6. This service is publicly available and can be accessed through DAS clients such as Dasty (33). The PRIDE DAS server has been designed for visualizing protein sequence and annotation, to display the identified peptides for the protein specified in the DAS request, together with the associated PTMs, and total peptide coverage.

  • A public PRIDE MySQL instance is now available and used, for instance, by the PRIDE Inspector, making this tool be the ideal way to access PRIDE data for most use cases (see next section).

Among the new datasets present in PRIDE, it is important to highlight that by July 2012, all data originally stored in the discontinued NCBI Peptidome (11) had been reannotated, converted into a PRIDE compatible format and made publicly available under experiment accessions 17900-18271.

Figure 1.

Summary of the ways the user can access and retrieve data from PRIDE. The web links to the existing PRIDE tools are also highlighted, including the PRIDE Converter 2 (needed for data submission).

PRIDE SUBMISSION PROCESS AND RELATED TOOLS

Submission tools

At the moment of writing, submissions to PRIDE are performed using a publicly available XML data format called PRIDE XML, which is derived from mzData. The first step to perform a submission to PRIDE is then to generate PRIDE XML files. Several tools are available to make that process feasible and as straightforward as possible. The new PRIDE Converter 2 (23) is now the recommended submission tool. It can convert a variety of popular proteomics data formats (e.g. Mascot.dat, X!Tandem.xml, OMSSA.csv, Proteome Discoverer.msf, plus all the mass spectral formats, among others) into well-annotated PRIDE XML files.

PRIDE Converter 2 can be used in two modes: (i) a Graphical User Interface mode, suited for most users; and (ii) a Command Line interface mode that makes possible the integration of the conversion process in external pipelines. Batch conversion of files is supported in both modes. Importantly, quantification results for the most popular techniques and 2D gel spot information can now be integrated in PRIDE XML files with PRIDE Converter 2, by providing that information in a new Proteomics Standards Initiative (PSI) tab-delimited standard format called mzTab (http://code.google.com/p/mztab/). Detailed documentation for general users and developers is available at http://code.google.com/p/pride-converter-2/. The PRIDE Converter 2 framework also includes the PRIDE mzTab Generator, PRIDE XML Merger and PRIDE XML Filter (23). A mechanism to make a combined submission to PRIDE and IntAct (34), the molecular interactions resource at the European Bioinformatics Institute, is also present in the PRIDE Converter 2.

The original PRIDE Converter tool (21), one of the main reasons behind the large increase in data contents in PRIDE, is no longer recommended as the main submission tool and will not be maintained any more. Apart from the tools provided by the PRIDE team, there are several existing third-party pipelines/tools that produce PRIDE XML files, such as ProteinLynx Global Server (Waters), hEIDI (http://biodev.extra.cea.fr/docs/heidi), OmicsHub Proteomics (Integromics), PeptideShaker (http://peptide-shaker.googlecode.com) and Proteios (35).

At the moment, we are implementing full support in PRIDE for the new PSI standard formats mzML v1.1 (for MS data) (36) and mzIdentML v1.1 (for protein and peptide IDs) (37), based on the Java libraries jmzML (38), jmzIdentML (39) and jmzReader (40). When this work is complete, data submissions in these formats will be natively supported, rather than having to be run through the PRIDE Converter 2. This will enable us to support in a much better way the reporting of protein inference (41) and ambiguity in modification position.

PRIDE Inspector

PRIDE Inspector is a popular tool introduced in 2011 (>4500 downloads by September 2012), which can be used to visualize and perform an initial quality assessment of the submitted data to PRIDE (22) (http://code.google.com/p/pride-toolsuite/wiki/PRIDEInspector). PRIDE Inspector provides different views on the available data, each focusing on a different aspect: experimental details (biological and technical metadata), protein, peptide, quantification values (if available) and summary charts. This last view is one of the major strengths of the tool because it is possible to perform an initial assessment of data quality, using a variety of simple charts that are generated automatically (22). Using PRIDE Inspector, proteomics researchers can examine their own data sets before the actual submission to PRIDE is performed, and journal editors and reviewers can perform a thorough review of submitted and private data at the pre-publication stage.

In addition, through the ‘Search PRIDE Database’ option, PRIDE Inspector can access data already in PRIDE for data mining purposes using the PRIDE public MySQL instance, which is updated regularly. Apart from being a stand-alone tool, as mentioned before, PRIDE Inspector can also be accessed using Java Web Start at the PRIDE web page.

PRIDE and the ProteomeXchange consortium

PRIDE is a founding member of the PX consortium (http://www.proteomexchange.org) (42). The members of the consortium, led by PRIDE and PeptideAtlas (9), are implementing a system to enable the automated and standardized sharing of MS-based proteomics data between the main existing MS proteomics repositories. In the first implementation of the data workflow, PRIDE acts as the initial submission point for MS/MS data. At the moment of writing, around 50 PX datasets have been submitted (see updated list of publicly available datasets at http://proteomecentral.proteomexchange.org). As a result of the PX efforts, PRIDE has started to accept raw data (in either binary or an open XML format) because it is a mandatory component of a PX submission. The files are accessible through FTP and are stored at the EBI raw file repository (43).

QUALITY CONTROL IN PRIDE

A major focus of PRIDE development in the past 2 years was to ensure at the very least minimal annotation of experiments and to perform basic quality checks of the submitted data to PRIDE. The development of PRIDE Inspector was the key step forward in that direction, and many of its components have been used in the internal PRIDE submission pipeline. The automated pipeline allows the detection of clear errors in the submitted data that are notified to the submitter and can then be corrected (44). Finally, the development of the PRIDE Converter 2 as the new submission tool has improved consistency at the level of experimental annotation.

In 2013, we will finalize and release a new resource called PRIDE-Q, as a quality-controlled subset of PRIDE, which can fulfil both a minimum level of annotation and Peptide Spectrum Match quality standards.

DISCUSSION

In the past 3 years, PRIDE has worked on two main tasks: (i) development of robust data submission pipelines (such as the development of the PRIDE Converter 2), including the initial implementation of the PX consortium data workflow, and the possibility to capture quantification information in a standardized way; and (ii) establishment of QC checks, including the development of PRIDE Inspector and an internal data submission pipeline, able to flag obvious errors that can be then communicated to the submitters.

However, many more future efforts will be needed in both directions. We are working on the development of the new PRIDE system (including a new database schema and web interface) that will fully support the PSI standards mzML and mzIdentML. This will be a gradual process, and support for these formats will be added sequentially to the PRIDE system (also as submission formats) and tools, while we will also keep supporting PRIDE XML.

On the other hand, the release of PRIDE-Q, envisioned to help non-expert proteomics biologists to ‘digest’ the potentially complex information coming from MS data, will happen in 2013. However, the quality requirements will need to be refined and will evolve dynamically over time.

It is worth highlighting that PRIDE has been used for research purposes in several recent studies involving the meta-analysis of combined data coming from very different proteomics experimental setups (45–47), the improvement and assessment of existing protein sequence databases (48,49) or to support genomics-related findings (50,51). Some research has also been performed to demonstrate the usefulness of data in PRIDE to perform a posteriori QC of the stored data (52). We expect that this trend will continue to grow in the near future, and that PRIDE continues on a trajectory from a publication-centric repository to an integrative resource for MS-based protein expression data. PRIDE will keep playing an important role for the community, also in the context of the nascent Human Proteome Project (53). We invite interested parties in PRIDE developments (including the associated software and tools) to follow the PRIDE Twitter account (@pride_ebi).

FUNDING

The PRIDE team is funded by the Wellcome Trust [WT085949MA]; EU FP7 grants ‘Sling’ [226073]; ‘ProteomeXchange’ [260558]; ‘PRIME-XS’ [262067]; ‘LipidomicNet’ [202272]; BBSRC grant ‘PRIDE Converter’ [reference BB/I024204/1] and EMBL core funding. Funding for open access charge: Wellcome Trust [WT085949MA].

Conflict of interest statement. None declared.

ACKNOWLEDGEMENTS

The PRIDE team would like to thank all data submitters for their contributions.

REFERENCES

1
Mallick
P
Kuster
B
Proteomics: a pragmatic perspective
Nat. Biotechnol.
2010
, vol. 
28
 (pg. 
695
-
709
)
2
Cui
W
Rohrs
HW
Gross
ML
Top-down mass spectrometry: recent developments, applications and perspectives
Analyst
2011
, vol. 
136
 (pg. 
3854
-
3864
)
3
Picotti
P
Aebersold
R
Selected reaction monitoring-based proteomics: workflows, potential, pitfalls and future directions
Nat. Methods
2012
, vol. 
9
 (pg. 
555
-
566
)
4
Martens
L
Hermjakob
H
Jones
P
Adamski
M
Taylor
C
States
D
Gevaert
K
Vandekerckhove
J
Apweiler
R
PRIDE: the proteomics identifications database
Proteomics
2005
, vol. 
5
 (pg. 
3537
-
3545
)
5
Jones
P
Cote
RG
Martens
L
Quinn
AF
Taylor
CF
Derache
W
Hermjakob
H
Apweiler
R
PRIDE: a public repository of protein and peptide identifications for the proteomics community
Nucleic Acids Res.
2006
, vol. 
34
 (pg. 
D659
-
D663
)
6
Jones
P
Cote
RG
Cho
SY
Klie
S
Martens
L
Quinn
AF
Thorneycroft
D
Hermjakob
H
PRIDE: new developments and new datasets
Nucleic Acids Res.
2008
, vol. 
36
 (pg. 
D878
-
D883
)
7
Vizcaino
JA
Côté
R
Reisinger
F
Foster
J
Mueller
M
Rameseder
J
Hermjakob
H
Martens
L
A guide to the Proteomics Identifications Database proteomics data repository
Proteomics
2009
, vol. 
9
 (pg. 
4276
-
4283
)
8
Vizcaino
JA
Foster
JM
Martens
L
Proteomics data repositories: providing a safe haven for your data and acting as a springboard for further research
J Proteomics
2010
, vol. 
73
 (pg. 
2136
-
2146
)
9
Deutsch
EW
Lam
H
Aebersold
R
PeptideAtlas: a resource for target selection for emerging targeted proteomics workflows
EMBO Rep.
2008
, vol. 
9
 (pg. 
429
-
434
)
10
Craig
R
Cortens
JP
Beavis
RC
Open source system for analyzing, validating, and storing protein identification data
J. Proteome Res.
2004
, vol. 
3
 (pg. 
1234
-
1242
)
11
Slotta
DJ
Barrett
T
Edgar
R
NCBI Peptidome: a new public repository for mass spectrometry peptide identifications
Nat. Biotechnol.
2009
, vol. 
27
 (pg. 
600
-
601
)
12
Schaab
C
Geiger
T
Stoehr
G
Cox
J
Mann
M
Analysis of high accuracy, quantitative proteomics data in the MaxQB database
Mol. Cell Proteomics
2012
, vol. 
11
 pg. 
M111.014068
 
13
Mead
JA
Bianco
L
Bessant
C
Recent developments in public proteomic MS repositories and pipelines
Proteomics
2009
, vol. 
9
 (pg. 
861
-
881
)
14
Farrah
T
Deutsch
EW
Kreisberg
R
Sun
Z
Campbell
DS
Mendoza
L
Kusebauch
U
Brusniak
MY
Huttenhain
R
Schiess
R
, et al. 
PASSEL: the PeptideAtlas SRMexperiment library
Proteomics
2012
, vol. 
12
 (pg. 
1170
-
1175
)
15
Kolker
E
Higdon
R
Haynes
W
Welch
D
Broomall
W
Lancet
D
Stanberry
L
Kolker
N
MOPED: model organism protein expression database
Nucleic Acids Res.
2012
, vol. 
40
 (pg. 
D1093
-
D1099
)
16
Wang
M
Weiss
M
Simonovic
M
Haertinger
G
Schrimpf
SP
Hengartner
MO
von Mering
C
PaxDb, a database of protein abundance averages across all three domains of life
Mol. Cell Proteomics
2012
, vol. 
11
 (pg. 
492
-
500
)
17
Lane
L
Argoud-Puy
G
Britan
A
Cusin
I
Duek
PD
Evalet
O
Gateau
A
Gaudet
P
Gleizes
A
Masselot
A
, et al. 
neXtProt: a knowledge platform for human proteins
Nucleic Acids Res.
2012
, vol. 
40
 (pg. 
D76
-
D83
)
18
Wein
SP
Cote
RG
Dumousseau
M
Reisinger
F
Hermjakob
H
Vizcaino
JA
Improvements in the protein identifier cross-reference service
Nucleic Acids Res.
2012
, vol. 
40
 (pg. 
W276
-
W280
)
19
Cote
R
Reisinger
F
Martens
L
Barsnes
H
Vizcaino
JA
Hermjakob
H
The Ontology Lookup Service: bigger and better
Nucleic Acids Res.
2010
, vol. 
38
 (pg. 
W155
-
W160
)
20
Reisinger
F
Martens
L
Database on demand—an online tool for the custom generation of FASTA formatted sequence databases
Proteomics
2009
, vol. 
9
 (pg. 
4421
-
4424
)
21
Barsnes
H
Vizcaino
JA
Eidhammer
I
Martens
L
PRIDE Converter: making proteomics data-sharing easy
Nat. Biotechnol.
2009
, vol. 
27
 (pg. 
598
-
599
)
22
Wang
R
Fabregat
A
Rios
D
Ovelleiro
D
Foster
JM
Cote
RG
Griss
J
Csordas
A
Perez-Riverol
Y
Reisinger
F
, et al. 
PRIDE Inspector: a tool to visualize and validate MS proteomics data
Nat. Biotechnol.
2012
, vol. 
30
 (pg. 
135
-
137
)
23
Cote
RG
Griss
J
Dianes
JA
Wang
R
Wright
JC
van den Toorn
HW
van Breukelen
B
Heck
AJ
Hulstaert
N
Martens
L
, et al. 
The PRIDE Converter 2 framework: an improved suite of tools to facilitate data submission to the PRIDE database and the ProteomeXchange consortium
Mol. Cell Proteomics
2012
, vol. 
11
 (pg. 
1682
-
1689
)
24
Stadtman
ER
Van Remmen
H
Richardson
A
Wehr
NB
Levine
RL
Methionine oxidation and aging
Biochim. Biophys. Acta
2005
, vol. 
1703
 (pg. 
135
-
140
)
25
Zhang
J
Haider
S
Baran
J
Cros
A
Guberman
JM
Hsu
J
Liang
Y
Yao
L
Kasprzyk
A
BioMart: a data federation framework for large collaborative projects
Database
2011
, vol. 
2011
 pg. 
bar038
 
26
Croft
D
O'Kelly
G
Wu
G
Haw
R
Gillespie
M
Matthews
L
Caudy
M
Garapati
P
Gopinath
G
Jassal
B
, et al. 
Reactome: a database of reactions, pathways and biological processes
Nucleic Acids Res.
2011
, vol. 
39
 (pg. 
D691
-
D697
)
27
Ndegwa
N
Cote
RG
Ovelleiro
D
D'Eustachio
P
Hermjakob
H
Vizcaino
JA
Croft
D
Critical amino acid residues in proteins: a BioMart integration of Reactome protein annotations with PRIDE mass spectrometry data and COSMIC somatic mutations
Database (Oxford)
2011
, vol. 
2011
 pg. 
bar047
 
28
The UniProt Consortium
Reorganizing the protein space at the Universal Protein Resource (UniProt)
Nucleic Acids Res.
2012
, vol. 
40
 (pg. 
D71
-
D75
)
29
Hunter
S
Jones
P
Mitchell
A
Apweiler
R
Attwood
TK
Bateman
A
Bernard
T
Binns
D
Bork
P
Burge
S
, et al. 
InterPro in 2011: new developments in the family and domain prediction database
Nucleic Acids Res.
2012
, vol. 
40
 (pg. 
D306
-
D312
)
30
Flicek
P
Amode
MR
Barrell
D
Beal
K
Brent
S
Carvalho-Silva
D
Clapham
P
Coates
G
Fairley
S
Fitzgerald
S
, et al. 
Ensembl 2012
Nucleic Acids Res.
2012
, vol. 
40
 (pg. 
D84
-
D90
)
31
Forbes
SA
Bindal
N
Bamford
S
Cole
C
Kok
CY
Beare
D
Jia
M
Shepherd
R
Leung
K
Menzies
A
, et al. 
COSMIC: mining complete cancer genomes in the catalogue of somatic mutations in cancer
Nucleic Acids Res.
2011
, vol. 
39
 (pg. 
D945
-
D950
)
32
Dowell
RD
Jokerst
RM
Day
A
Eddy
SR
Stein
L
The distributed annotation system
BMC Bioinformatics
2001
, vol. 
2
 pg. 
7
 
33
Villaveces
JM
Jimenez
RC
Garcia
LJ
Salazar
GA
Gel
B
Mulder
N
Martin
M
Garcia
A
Hermjakob
H
Dasty3, a WEB framework for DAS
Bioinformatics
2011
, vol. 
27
 (pg. 
2616
-
2617
)
34
Kerrien
S
Aranda
B
Breuza
L
Bridge
A
Broackes-Carter
F
Chen
C
Duesbury
M
Dumousseau
M
Feuermann
M
Hinz
U
, et al. 
The IntAct molecular interaction database in 2012
Nucleic Acids Res.
2012
, vol. 
40
 (pg. 
D841
-
D846
)
35
Hakkinen
J
Vincic
G
Mansson
O
Warell
K
Levander
F
The proteios software environment: an extensible multiuser platform for management and analysis of proteomics data
J. Proteome Res.
2009
, vol. 
8
 (pg. 
3037
-
3043
)
36
Martens
L
Chambers
M
Sturm
M
Kessner
D
Levander
F
Shofstahl
J
Tang
WH
Rompp
A
Neumann
S
Pizarro
AD
, et al. 
mzML—a community standard for mass spectrometry data
Mol. Cell Proteomics
2011
, vol. 
10
 pg. 
R110 000133
 
37
Jones
AR
Eisenacher
M
Mayer
G
Kohlbacher
O
Siepen
J
Hubbard
SJ
Selley
JN
Searle
BC
Shofstahl
J
Seymour
SL
, et al. 
The mzIdentML data standard for mass spectrometry-based proteomics results
Mol. Cell Proteomics
2012
, vol. 
11
 pg. 
M111.014381
 
38
Cote
RG
Reisinger
F
Martens
L
jmzML, an open-source Java API for mzML, the PSI standard for MS data
Proteomics
2010
, vol. 
10
 (pg. 
1332
-
1335
)
39
Reisinger
F
Krishna
R
Ghali
F
Rios
D
Hermjakob
H
Vizcaino
JA
Jones
AR
jmzIdentML API: a Java interface to the mzIdentML standard for peptide and protein identification data
Proteomics
2012
, vol. 
12
 (pg. 
790
-
794
)
40
Griss
J
Reisinger
F
Hermjakob
H
Vizcaino
JA
jmzReader: a Java parser library to process and visualize multiple text and XML-based mass spectrometry data formats
Proteomics
2012
, vol. 
12
 (pg. 
795
-
798
)
41
Nesvizhskii
AI
Aebersold
R
Interpretation of shotgun proteomic data: the protein inference problem
Mol. Cell Proteomics
2005
, vol. 
4
 (pg. 
1419
-
1440
)
42
Hermjakob
H
Apweiler
R
The Proteomics Identifications Database (PRIDE) and the ProteomExchange Consortium: making proteomics data accessible
Expert Rev. Proteomics
2006
, vol. 
3
 (pg. 
1
-
3
)
43
Editorial
A home for raw proteomics data
Nat. Methods
2012
, vol. 
9
 pg. 
419
 
44
Csordas
A
Ovelleiro
D
Wang
R
Foster
JM
Rios
D
Vizcaino
JA
Hermjakob
H
PRIDE: quality control in a proteomics data repository
Database (Oxford)
2012
, vol. 
2012
 pg. 
bas004
 
45
Mueller
M
Vizcaino
JA
Jones
P
Cote
R
Thorneycroft
D
Apweiler
R
Hermjakob
H
Martens
L
Analysis of the experimental detection of central nervous system-related genes in human brain and cerebrospinal fluid datasets
Proteomics
2008
, vol. 
8
 (pg. 
1138
-
1148
)
46
Klie
S
Martens
L
Vizcaino
JA
Cote
R
Jones
P
Apweiler
R
Hinneburg
A
Hermjakob
H
Analyzing large-scale proteomics projects with latent semantic indexing
J. Proteome Res.
2008
, vol. 
7
 (pg. 
182
-
191
)
47
Gonnelli
G
Hulstaert
N
Degroeve
S
Martens
L
Towards a human proteomics atlas
Anal Bioanal Chem
2012
, vol. 
404
 (pg. 
1069
-
1077
)
48
Griss
J
Cote
RG
Gerner
C
Hermjakob
H
Vizcaino
JA
Published and perished? The influence of the searched protein database on the long-term storage of proteomics data
Mol. Cell Proteomics
2011
, vol. 
10
 pg. 
M111.008490
 
49
Griss
J
Martin
M
O'Donovan
C
Apweiler
R
Hermjakob
H
Vizcaino
JA
Consequences of the discontinuation of the International Protein Index (IPI) database and its substitution by the UniProtKB “complete proteome” sets
Proteomics
2011
, vol. 
11
 (pg. 
4434
-
4438
)
50
Knowles
DG
McLysaght
A
Recent de novo origin of human protein-coding genes
Genome Res.
2009
, vol. 
19
 (pg. 
1752
-
1759
)
51
Panchin
AY
Gelfand
MS
Ramensky
VE
Artamonova
II
Asymmetric and non-uniform evolution of recently duplicated human genes
Biol. Direct.
2010
, vol. 
5
 pg. 
54
 
52
Foster
JM
Degroeve
S
Gatto
L
Visser
M
Wang
R
Griss
J
Apweiler
R
Martens
L
A posteriori quality control for the curation and reuse of public proteomics data
Proteomics
2011
, vol. 
11
 (pg. 
2182
-
2194
)
53
Paik
YK
Omenn
GS
Uhlen
M
Hanash
S
Marko-Varga
G
Aebersold
R
Bairoch
A
Yamamoto
T
Legrain
P
Lee
HJ
, et al. 
Standard guidelines for the chromosome-centric human proteome project
J. Proteome Res.
2012
, vol. 
11
 (pg. 
2005
-
2013
)
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial reuse, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com.

Comments

0 Comments
Submit a comment
You have entered an invalid code
Thank you for submitting a comment on this article. Your comment will be reviewed and published at the journal's discretion. Please check for further notifications by email.