This project is hosted at
A robust and accurate statistical approach, based on the expectation maximization algorithm, for validation of peptide identifications made by tandem mass spectrometry (MS/MS) and database searching. By employing database search scores, number of tryptic termini, number of missed cleavages, and other information, the method learns to distinguish correctly from incorrectly assigned peptides in the data set and computes for each peptide assignment to an MS/MS spectrum a probability of being correct. We show that using the probabilities computed from the model, one can achieve much higher sensitivity for any given error rate compared to the results of using conventional filtering criteria. The method enables high-throughput analysis of proteomics data by eliminating the need to manually validate database search results. In addition, it can facilitate the benchmarking of various experimental procedures and serve as a common standard by which the results of different experimental groups can be compared.The software implementing this analysis, named PeptideProphetTM, is available open source and free of charge. Currently the programs work with the database search results obtained using SEQUEST or Mascot, but could be adapted to work with other software tools, e.g. COMET or Sonar.
Please see the PeptideProphet page on the SPC Tools wiki for more information, including download instructions.
Reference: "Empirical Statistical Model to Estimate the Accuracy of Peptide Identifications made by MS/MS and database search", by A. Keller, A.I. Nesvizhskii, E. Kolker, and R. Aebersold, Analytical Chemistry, 74, 5383-5392 (2002)
Since MS/MS spectra are produced by peptides, and not proteins, there is a need for an additional statistical model for validation of the identifications at the protein level. We developed a model that has as input the list of peptides assigned to MS/MS spectra and corresponding probabilities that those peptide assignments are correct. Different peptide identifications corresponding to the same protein are combined together to estimate the probability that their corresponding protein is present in the sample. This protein grouping information is then employed to adjust the individual peptide probabilities, thus making the approach more discriminative. We also address the problem that we call degeneracy, which occurs when one peptide corresponds to several different proteins. The software implementing the protein analysis is named ProteinProphetTM.
Please see the ProteinProphet page on the SPC Tools wiki for more information, including download instructions.
Reference: "A Statistical Model for Identifying Proteins by Tandem Mass Spectrometry", by A.I. Nesvizhskii, A. Keller, E. Kolker, and R. Aebersold, Analytical Chemistry, 75, 2279-2287 (2003)