?
Single amino acid variation identification in high resolution tandem mass spectrometry data in bottom up proteomics
Database-searching based precursor ion identification in tandem mass spectrometry data analysis is limited to the search space. Once a single amino acid variation (SAAV) or a modification is not included to the search space, then its observed spectra will not be annotated correctly. Several methods have been developed to identify and localize post-translational modifications (PTMs); however, few methods have been introduced to identify peptide sequences with amino acid mutations. Here, we present our approach to detect SAAVs, called SeVa (standing for Sequence Variation). SeVa is based on the High-Resolution Exact P-Value (HR-XPV) method (doi:10.1002/pmic.202300145), which builds an exact empirical null distribution by implicitly scoring the spectra against all possible amino acid sequences in high-resolution fragmentation settings. SeVa extracts the amino acid sequence from HR-XPV, which produces the highest score. The SeVa peptides identified are subjected to a homology search against a proteome database containing shuffled decoy protein sequences. This step increases the sensitivity of the results and the decoy identifications can be used to estimate the FDR. We tested SeVa with two experimental datasets related to immunopeptidomics (PXD017407) and cancer (PDC000224), and our method identified 781 and 15,764 peptide sequences with mutations at 1.68% and 0.52% of FDRs.