Data preprocessing and filtering in mass spectrometry based proteomics

A. Kertesz-Farkas; Myers M. P.

?

Data preprocessing and filtering in mass spectrometry based proteomics

Current Bioinformatics. 2012. Vol. 7. No. 2. P. 212–220.

Mass spectrometry based proteomics analysis can produce many thousands of spectra in a single experiment, and much of this data, frequently greater than 50%, cannot be properly evaluated computationally. Therefore a number of strategies have been developed to aid the processing of mass spectra and typically focus on the identification and elimination of noise, which can provide an immediate improvement in the analysis of large data streams. This is mostly carried out with proprietary software. Here we review the current main principles underlying the preprocessing of mass spectrometry data give an overview of the publicly available tools.

Language: English

Full text

Keywords: mass spectrometry Proteomics Data filtering

Blood Plasma Lipid Alterations Differentiating Psychotic and Affective Disorder Patients

Petrova D., Biomolecules 2025

Added: January 18, 2026

The Sphingolipid Asset Is Altered in the Nigrostriatal System of Mice Models of Parkinson’s Disease

Blokhin V., Shupik M., Gutner U. et al., Biomolecules 2022 Vol. 12 No. 1 Article 93

Parkinson’s disease (PD) is a neurodegenerative disease incurable due to late diagnosis and treatment. Therefore, one of the priorities of neurology is to study the mechanisms of PD pathogenesis at the preclinical and early clinical stages. Given the important role of sphingolipids in the pathogenesis of neurodegenerative diseases, we aimed to analyze the gene expression ...

Added: March 4, 2024

Proteomics-based scoring of cellular response to stimuli for improved characterization of signaling pathway activity

Kazakova E., Solovyeva E., Levitsky L. et al., Proteomics 2023 Vol. 53 No. 5 Article 2200275

Omics technologies focus on uncovering the complex nature of molecular mechanisms in cells and organisms, including biomarkers and drug targets discovery. Aiming at these tasks, we see that information extracted from omics data is still underused. In particular, characteristics of differentially regulated molecules can be combined in a single score to quantify the signaling pathway ...

Added: September 14, 2023

An integrative proteomics method identifies a regulator of translation during stem cell maintenance and differentiation

Sabatier P., Beusch C., Maltseva D. et al., Nature Communications 2021 No. 12 Article 6558

Detailed characterization of cell type transitions is essential for cell biology in general and particularly for the development of stem cell-based therapies in regenerative medicine. To systematically study such transitions, we introduce a method that simultaneously measures protein expression and thermal stability changes in cells and provide the web-based visualization tool ProteoTracker. We apply our ...

Added: November 12, 2021

Bias in False Discovery Rate Estimation in Mass-Spectrometry-Based Peptide Identification

Danilova Yulia, Voronkova Anastasia, Sulimov Pavel et al., Journal of Proteome Research 2019 Vol. 18 No. 5 P. 2354–2358

Accurate target-decoy-based false discovery rate (FDR) control of peptide identification from tandem mass-spectrometry data relies on an important but often neglected assumption that incorrect spectrum annotations are equally likely to receive either target or decoy peptides. Here we argue that this assumption is often violated in practice, even by popular methods. Preference can be given ...

Added: October 6, 2021

Multidecadal and 6-year variations of LOD

Zotov L., Bizouard C., Sidorenkov N. et al., , in: Journal of Physics: Conference Series (JPCS), Proceedings of FAPM 2019 conference.: IOP Publishing, 2020. Ch. 5 P. 1–17.

The subject of our study is 6-, 20- and 60-year oscillations in the length of day (LOD). Spectral and wavelet analyses of LOD time series have been performed, multidecadal harmonics have been adjusted and simple prediction has been made. Input from the variations of the angular momentum of the ocean and the atmosphere into LOD ...

Added: November 30, 2020

ColocML: machine learning quantifies co-localization between mass spectrometry images

Ovchinnikova K., Lachlan S., Rakhlin A. et al., Bioinformatics 2020 P. 1–10

Motivation Imaging mass spectrometry (imaging MS) is a prominent technique for capturing distributions of molecules in tissue sections. Various computational methods for imaging MS rely on quantifying spatial correlations between ion images, referred to as co-localization. However, no comprehensive evaluation of co-localization measures has ever been performed; this leads to arbitrary choices and hinders method development. Results We ...

Added: March 15, 2020

A novel trityl/acridine derivatization agent for analysis of thiols by (matrix-assisted)(nanowire-assisted)laser desorption/ionization and electrospray ionization mass spectrometry

Vladimir A. Korshun, Analytical Methods 2017 Vol. 9 No. 45 P. 6335–6340

The derivatization reagent was prepared in situ by the reaction of tris(2,6-dimethoxyphenyl)methylium hexafluorophosphate with N-(2- aminoethyl)maleimide and used for the modification of a number of low molecular weight thiols. The adducts were analyzed by (MA)(NA) LDI MS and ESI MS. All registered mass spectra ((MA)(NA)LDI, ESI) revealed intense peaks of the cations of the derivatization products. The increment of the derivatization agent ...

Added: November 8, 2019

200+ Protein Concentrations in Healthy Human Blood Plasma: Targeted Quantitative SRM SIS Screening of Chromosomes 18, 13, Y, and the Mitochondrial Chromosome Encoded Proteome

Kopylov A., Ponomarenko E., Ilgisonis E. et al., Journal of Proteome Research 2019 Vol. 18 No. 1 P. 120–129

This work continues the series of the quantitative measurements of the proteins encoded by different chromosomes in the blood plasma of a healthy person. Selected Reaction Monitoring with Stable Isotope-labeled peptide Standards (SRM SIS) and a gene-centric approach, which is the basis for the implementation of the international Chromosome-centric Human Proteome Project (C-HPP), were applied ...

Added: October 7, 2019

Brain Proteome of Drosophila melanogaster Is Enriched with Nuclear Proteins

Kuznetsova K., Ivanov M., Pyatnitskiy M. et al., Biochemistry. Biokhimiia 2019 Vol. 84 No. 1 P. 71–78

The brain proteome of Drosophila melanogaster was characterized by liquid chromatography/high-resolution mass spectrometry and compared to the earlier characterized Drosophila whole-body and head proteomes. Raw data for all the proteomes were processed in a similar manner. Approximately 4000 proteins were identified in the brain proteome that represented, as expected, the subsets of the head and ...

Added: October 7, 2019

Post-translational modifications of FDA-approved plasma biomarkers in glioblastoma samples

Petushkova N., Zgoda V., Pyatnitskiy M. et al., Plos One 2017 Vol. 12 No. 5 P. 0177427-1–0177427-21

Liquid chromatography-tandem mass spectrometry was used to analyze plasma proteins of volunteers (control) and patients with glioblastoma multiform (GBM). A database search was pre-set with a variable post-translational modification (PTM): phosphorylation, acetylation or ubiquitination. There were no significant differences between the control and the GBM groups regarding the number of protein identifications, sequence coverage or number of PTMs. However, in GBM plasma, we unambiguously observed a decreased fraction in post-translationally modified peptides ...

Added: March 14, 2018

The Size of the Human Proteome: The Width and Depth

Ponomarenko E., Poverennaya E., Ilgisonis E. et al., International Journal of Analytical Chemistry 2016 P. 1–6

This work discusses bioinformatics and experimental approaches to explore the human proteome, a constellation of proteins expressed in different tissues and organs. As the human proteome is not a static entity, it seems necessary to estimate the number of different protein species (proteoforms) and measure the number of copies of the same protein in a ...

Added: March 14, 2018

Human aqueous humor proteome in cataract, glaucoma and pseudoexfoliation syndrome

Kliuchnikova A., Samokhina N., Ilina I. et al., Proteomics 2016 Vol. 16 No. 13 P. 1938–1946

Twenty-nine human aqueous humor samples from patients with eye diseases such as cataract and glaucoma with and without pseudoexfoliation syndrome were characterized by LC-high resolution MS analysis. In total, 269 protein groups were identified with 1% false discovery rate including 32 groups that were not reported previously for this biological fluid. Since the samples were analyzed individually, but not pooled, 36 proteins were identified ...

Added: March 14, 2018

Threonine versus isothreonine in synthetic peptides analyzed by high-resolution liquid chromatography/tandem mass spectrometry

Kuznetsova K., Trufanov P., Moysa A. et al., Rapid Communications in Mass Spectrometry 2016 Vol. 30 No. 11 P. 1323–1331

One of the problems in proteogenomic research aimed at identification of variant peptides is the presence of peptides with amino acid isomers of different origin in the analyzed samples. Among the most challenging examples are peptides with threonine and isothreonine (homoserine) in their sequences. Indeed, the latter residue may appear in vitro as a methionine substitution during sample preparation for shotgun proteome analysis. Yet, this substitution of ...

Added: March 14, 2018

Inconsistencies in bond market quotes: is it the wrong model or the wrong data?

Lapshin V. A., Journal of Computational Science 2018 Vol. 24 P. 255–265

We use the linear programming approach to quantify quote inconsistencies in risk-free bond markets. We present an algorithm to identify whether an inconsistency is probably due to the insufficient framework flexibility, the insufficient data quality, or the non-homogeneity of the dataset. In the latter case we study the problem of filtering out some instruments so ...

Added: May 26, 2017

FDR-controlled metabolite annotation for high-resolution imaging mass spectrometry

Palmer A., Phapale P., Chernyavsky I. et al., Nature Methods 2017 No. 14 P. 57–60

High-mass-resolution imaging mass spectrometry promises to localize hundreds of metabolites in tissues, cell cultures, and agar plates with cellular resolution, but it is hampered by the lack of bioinformatics tools for automated metabolite identification. We report pySM, a framework for false discovery rate (FDR)-controlled metabolite annotation at the level of the molecular sum formula, for ...

Added: February 7, 2017

Database searching in mass spectrometry based proteomics

Kertesz-Farkas A., Myers M. P., Current Bioinformatics 2012 Vol. 7 No. 2 P. 221–230

Bottom-up proteomics (mass spectrometry analysis of peptides obtained by proteolysis and separated by liquid chromatography, (LCMS/MS)) is one of the most frequently used techniques for identifying and characterizing proteins in biological samples. A key element of the analysis is database searching when the mass spectra of the peptides are compared with a database of theoretically ...

Added: November 18, 2015

Crux: rapid open source protein tandem mass spectrometry analysis

Kertesz-Farkas A., Grant C. E., Howbert J. J. et al., Journal of Proteome Research 2014 Vol. 13 No. 10 P. 4488–4491

Efficiently and accurately analyzing big protein tandem mass spectrometry data sets requires robust software that incorporates state-of-the-art computational, machine learning, and statistical methods. The Crux mass spectrometry analysis software toolkit (http://cruxtoolkit.sourceforge.net) is an open source project that aims to provide users with a crossplatform suite of analysis tools for interpreting protein mass spectrometry data. ...

Added: November 18, 2015

Precursor mass dependent filtering of mass spectra for proteomics analysis

Kertesz-Farkas A., Myers M. P., Protein and peptide letters 2014 Vol. 21 No. 8 P. 858–863

Identification and elimination of noise peaks in mass spectra from large proteomics data streams simultaneously improves the accuracy of peptide identification and significantly decreases the size of the data. There are a number of peak filtering strategies that can achieve this goal. Here we present a simple algorithm wherein the number of highest intensity peaks ...

Added: November 18, 2015

Improved False Discovery Rate Estimation Procedure for Shotgun Proteomics

Kertesz-Farkas A., Keich U., Noble W., Journal of Proteome Research 2015 Vol. 14 No. 8 P. 3148–3161

Interpreting the potentially vast number of hypotheses generated by a shotgun proteomics experiment requires a valid and accurate procedure for assigning statistical confidence estimates to the identified tandem mass spectra. Despite the crucial role such procedures play in most highthroughput proteomics experiments, the scientific literature has not reached a consensus about the best confidence estimation ...

Added: November 18, 2015