Database searching in mass spectrometry based proteomics

A. Kertesz-Farkas; Myers M. P.

?

Database searching in mass spectrometry based proteomics

Current Bioinformatics. 2012. Vol. 7. No. 2. P. 221–230.

Bottom-up proteomics (mass spectrometry analysis of peptides obtained by proteolysis and separated by liquid chromatography, (LCMS/MS)) is one of the most frequently used techniques for identifying and characterizing proteins in biological samples. A key element of the analysis is database searching when the mass spectra of the peptides are compared with a database of theoretically computed (or experimental) peptide spectra. Here we discuss the main computational approaches to spectrum database searching and the statistical analysis of the results

Language: English

Full text

Keywords: mass spectrometry Proteomics Database search Protein identification

Blood Plasma Lipid Alterations Differentiating Psychotic and Affective Disorder Patients

Petrova D., Biomolecules 2025

Added: January 18, 2026

Fast and Memory-Efficient Searching of Large-Scale Mass Spectrometry Data Using Tide

Kertesz-Farkas A., Acquaye F. L., Ostapenko V. et al., Journal of Proteome Research 2025 Vol. 24 No. 9 P. 4831–4837

Over the past 30 years, software for searching tandem mass spectrometry data against a protein database has improved dramatically in speed and statistical power. However, existing tools can still struggle to analyze truly massive data sets when either the number of spectra or the number of proteins being analyzed grows too large. Here, we describe ...

Added: September 11, 2025

The Sphingolipid Asset Is Altered in the Nigrostriatal System of Mice Models of Parkinson’s Disease

Blokhin V., Shupik M., Gutner U. et al., Biomolecules 2022 Vol. 12 No. 1 Article 93

Parkinson’s disease (PD) is a neurodegenerative disease incurable due to late diagnosis and treatment. Therefore, one of the priorities of neurology is to study the mechanisms of PD pathogenesis at the preclinical and early clinical stages. Given the important role of sphingolipids in the pathogenesis of neurodegenerative diseases, we aimed to analyze the gene expression ...

Added: March 4, 2024

Proteomics-based scoring of cellular response to stimuli for improved characterization of signaling pathway activity

Kazakova E., Solovyeva E., Levitsky L. et al., Proteomics 2023 Vol. 53 No. 5 Article 2200275

Omics technologies focus on uncovering the complex nature of molecular mechanisms in cells and organisms, including biomarkers and drug targets discovery. Aiming at these tasks, we see that information extracted from omics data is still underused. In particular, characteristics of differentially regulated molecules can be combined in a single score to quantify the signaling pathway ...

Added: September 14, 2023

The Crux toolkit for analysis of bottom-up tandem mass spectrometry proteomics data

Kertesz-Farkas A., Acquaye F. L., Kishankumar Bhimani et al., Journal of Proteome Research 2023 Vol. 22 No. 2 P. 561–569

The Crux tandem mass spectrometry data analysis toolkit provides a collection of algorithms for analyzing bottom-up proteomics tandem mass spectrometry data. Many publications have described various individual components of Crux, but a comprehensive summary has not been published since 2014. The goal of this work is to summarize the functionality of Crux, focusing on developments ...

Added: December 2, 2022

An integrative proteomics method identifies a regulator of translation during stem cell maintenance and differentiation

Sabatier P., Beusch C., Maltseva D. et al., Nature Communications 2021 No. 12 Article 6558

Detailed characterization of cell type transitions is essential for cell biology in general and particularly for the development of stem cell-based therapies in regenerative medicine. To systematically study such transitions, we introduce a method that simultaneously measures protein expression and thermal stability changes in cells and provide the web-based visualization tool ProteoTracker. We apply our ...

Added: November 12, 2021

Bias in False Discovery Rate Estimation in Mass-Spectrometry-Based Peptide Identification

Danilova Yulia, Voronkova Anastasia, Sulimov Pavel et al., Journal of Proteome Research 2019 Vol. 18 No. 5 P. 2354–2358

Accurate target-decoy-based false discovery rate (FDR) control of peptide identification from tandem mass-spectrometry data relies on an important but often neglected assumption that incorrect spectrum annotations are equally likely to receive either target or decoy peptides. Here we argue that this assumption is often violated in practice, even by popular methods. Preference can be given ...

Added: October 6, 2021

Tailor: A Nonparametric and Rapid Score Calibration Method for Database Search-Based Peptide Identification in Shotgun Proteomics

Sulimov P., Kertesz-Farkas A., Journal of Proteome Research 2020 No. 19(4) P. 1481–1490

Peptide-spectrum-match (PSM) scores used in database searching are calibrated to spectrum- or spectrum-peptide-specific null distributions. Some calibration methods rely on specific assumptions and use analytical models (e.g., binomial distributions), whereas other methods utilize exact empirical null distributions. The former may be inaccurate because of unjustified assumptions, while the latter are accurate, albeit computationally exhaustive. Here, ...

Added: June 29, 2020

ColocML: machine learning quantifies co-localization between mass spectrometry images

Ovchinnikova K., Lachlan S., Rakhlin A. et al., Bioinformatics 2020 P. 1–10

Motivation Imaging mass spectrometry (imaging MS) is a prominent technique for capturing distributions of molecules in tissue sections. Various computational methods for imaging MS rely on quantifying spatial correlations between ion images, referred to as co-localization. However, no comprehensive evaluation of co-localization measures has ever been performed; this leads to arbitrary choices and hinders method development. Results We ...

Added: March 15, 2020

A novel trityl/acridine derivatization agent for analysis of thiols by (matrix-assisted)(nanowire-assisted)laser desorption/ionization and electrospray ionization mass spectrometry

Vladimir A. Korshun, Analytical Methods 2017 Vol. 9 No. 45 P. 6335–6340

The derivatization reagent was prepared in situ by the reaction of tris(2,6-dimethoxyphenyl)methylium hexafluorophosphate with N-(2- aminoethyl)maleimide and used for the modification of a number of low molecular weight thiols. The adducts were analyzed by (MA)(NA) LDI MS and ESI MS. All registered mass spectra ((MA)(NA)LDI, ESI) revealed intense peaks of the cations of the derivatization products. The increment of the derivatization agent ...

Added: November 8, 2019

200+ Protein Concentrations in Healthy Human Blood Plasma: Targeted Quantitative SRM SIS Screening of Chromosomes 18, 13, Y, and the Mitochondrial Chromosome Encoded Proteome

Kopylov A., Ponomarenko E., Ilgisonis E. et al., Journal of Proteome Research 2019 Vol. 18 No. 1 P. 120–129

This work continues the series of the quantitative measurements of the proteins encoded by different chromosomes in the blood plasma of a healthy person. Selected Reaction Monitoring with Stable Isotope-labeled peptide Standards (SRM SIS) and a gene-centric approach, which is the basis for the implementation of the international Chromosome-centric Human Proteome Project (C-HPP), were applied ...

Added: October 7, 2019

Brain Proteome of Drosophila melanogaster Is Enriched with Nuclear Proteins

Kuznetsova K., Ivanov M., Pyatnitskiy M. et al., Biochemistry. Biokhimiia 2019 Vol. 84 No. 1 P. 71–78

The brain proteome of Drosophila melanogaster was characterized by liquid chromatography/high-resolution mass spectrometry and compared to the earlier characterized Drosophila whole-body and head proteomes. Raw data for all the proteomes were processed in a similar manner. Approximately 4000 proteins were identified in the brain proteome that represented, as expected, the subsets of the head and ...

Added: October 7, 2019

Post-translational modifications of FDA-approved plasma biomarkers in glioblastoma samples

Petushkova N., Zgoda V., Pyatnitskiy M. et al., Plos One 2017 Vol. 12 No. 5 P. 0177427-1–0177427-21

Liquid chromatography-tandem mass spectrometry was used to analyze plasma proteins of volunteers (control) and patients with glioblastoma multiform (GBM). A database search was pre-set with a variable post-translational modification (PTM): phosphorylation, acetylation or ubiquitination. There were no significant differences between the control and the GBM groups regarding the number of protein identifications, sequence coverage or number of PTMs. However, in GBM plasma, we unambiguously observed a decreased fraction in post-translationally modified peptides ...

Added: March 14, 2018

The Size of the Human Proteome: The Width and Depth

Ponomarenko E., Poverennaya E., Ilgisonis E. et al., International Journal of Analytical Chemistry 2016 P. 1–6

This work discusses bioinformatics and experimental approaches to explore the human proteome, a constellation of proteins expressed in different tissues and organs. As the human proteome is not a static entity, it seems necessary to estimate the number of different protein species (proteoforms) and measure the number of copies of the same protein in a ...

Added: March 14, 2018

Human aqueous humor proteome in cataract, glaucoma and pseudoexfoliation syndrome

Kliuchnikova A., Samokhina N., Ilina I. et al., Proteomics 2016 Vol. 16 No. 13 P. 1938–1946

Twenty-nine human aqueous humor samples from patients with eye diseases such as cataract and glaucoma with and without pseudoexfoliation syndrome were characterized by LC-high resolution MS analysis. In total, 269 protein groups were identified with 1% false discovery rate including 32 groups that were not reported previously for this biological fluid. Since the samples were analyzed individually, but not pooled, 36 proteins were identified ...

Added: March 14, 2018

Threonine versus isothreonine in synthetic peptides analyzed by high-resolution liquid chromatography/tandem mass spectrometry

Kuznetsova K., Trufanov P., Moysa A. et al., Rapid Communications in Mass Spectrometry 2016 Vol. 30 No. 11 P. 1323–1331

One of the problems in proteogenomic research aimed at identification of variant peptides is the presence of peptides with amino acid isomers of different origin in the analyzed samples. Among the most challenging examples are peptides with threonine and isothreonine (homoserine) in their sequences. Indeed, the latter residue may appear in vitro as a methionine substitution during sample preparation for shotgun proteome analysis. Yet, this substitution of ...

Added: March 14, 2018

FDR-controlled metabolite annotation for high-resolution imaging mass spectrometry

Palmer A., Phapale P., Chernyavsky I. et al., Nature Methods 2017 No. 14 P. 57–60

High-mass-resolution imaging mass spectrometry promises to localize hundreds of metabolites in tissues, cell cultures, and agar plates with cellular resolution, but it is hampered by the lack of bioinformatics tools for automated metabolite identification. We report pySM, a framework for false discovery rate (FDR)-controlled metabolite annotation at the level of the molecular sum formula, for ...

Added: February 7, 2017

Crux: rapid open source protein tandem mass spectrometry analysis

Kertesz-Farkas A., Grant C. E., Howbert J. J. et al., Journal of Proteome Research 2014 Vol. 13 No. 10 P. 4488–4491

Efficiently and accurately analyzing big protein tandem mass spectrometry data sets requires robust software that incorporates state-of-the-art computational, machine learning, and statistical methods. The Crux mass spectrometry analysis software toolkit (http://cruxtoolkit.sourceforge.net) is an open source project that aims to provide users with a crossplatform suite of analysis tools for interpreting protein mass spectrometry data. ...

Added: November 18, 2015

Precursor mass dependent filtering of mass spectra for proteomics analysis

Kertesz-Farkas A., Myers M. P., Protein and peptide letters 2014 Vol. 21 No. 8 P. 858–863

Identification and elimination of noise peaks in mass spectra from large proteomics data streams simultaneously improves the accuracy of peptide identification and significantly decreases the size of the data. There are a number of peak filtering strategies that can achieve this goal. Here we present a simple algorithm wherein the number of highest intensity peaks ...

Added: November 18, 2015

Data preprocessing and filtering in mass spectrometry based proteomics

Kertesz-Farkas A., Myers M. P., Current Bioinformatics 2012 Vol. 7 No. 2 P. 212–220

Mass spectrometry based proteomics analysis can produce many thousands of spectra in a single experiment, and much of this data, frequently greater than 50%, cannot be properly evaluated computationally. Therefore a number of strategies have been developed to aid the processing of mass spectra and typically focus on the identification and elimination of noise, which ...

Added: November 18, 2015