Apart from a big bunch of bug-fixes and tweaks to the functions, there are a couple of new functions related to psms and spectra information.
MS2 fragmentation spectra are sort of hidden in a separate raw column in the MassSpectrumItems table in .pdResult files. It turned out that the data is actually xml data in a compressed (zip) format.
The following functions deal with this type raw data:
Name | Remark |
---|---|
transformSpectrumRaw | takes raw spectrum data and ‘translates’ it into an xml object for further processing. All other functions work with the XML object coming from transformSpectrumRaw |
spectrum.header | |
spectrum.scanEvent | |
spectrum.centroid | The only format which seems to be present in these spectra is centroided |
spectrum.profile | As I have not encountered any profile data yet, this function always returns NA |
spectrum.precursor.header | |
spectrum.precursor.scanEvent | |
spectrum.precursor.info | |
spectrum.precursor.centroid | The only format which seems to be present in these spectra is centroided |
spectrum.precursor.profile | As I have not encountered any profile data yet, this function always returns NA |
spectrum.precursor.additionalInfo |
Please note that the transformSpectrumRaw function writes a temporary file and then deletes it again due to having to decompress the data.
library(proteinDiscover)
library(dplyr)
library(stringr)
db <- dbOpen(fileName = "hela001.pdResult")
Get the 5 proteins with the highest (summed) Sequest score:
best5 <- dbGetProteinTable(db,
columns = c("Accession", "Description",
"ProteinGroupIDs", "ScoreSequestHT")) %>%
dfTransformRaws() %>%
arrange(desc(ScoreSequestHT_1)) %>%
slice(1:5) %>%
dplyr::mutate(
Description = substr(
str_replace_all(Description, pattern = "\\[.*\\]", replacement = ""), 1, 30))
best5 %>%
tibble()
## # A tibble: 5 × 4
## Accession Description ProteinGroupIDs ScoreSequestHT_1
## <chr> <chr> <chr> <dbl>
## 1 P60709 "Actin, cytoplasmic 1 OS=Homo s" 2218 1278.
## 2 P06733 "Alpha-enolase OS=Homo sapiens " 3518 1200.
## 3 P21333 "Filamin-A OS=Homo sapiens OX=9" 1179 983.
## 4 O75369 "Filamin-B OS=Homo sapiens OX=9" 3093 932.
## 5 Q09666 "Neuroblast differentiation-ass" 2062 834.
Then get the best peptides for the first protein
peptide5 <- dbGetPeptideTable(db = db, proteinGroupIDs = 2218) %>%
dfTransformRaws() %>%
dplyr::select(PeptideGroupID, Sequence, Modifications, PsmCount, XCorrbySearchEngine_1) %>%
dplyr::arrange(desc(XCorrbySearchEngine_1)) %>%
slice(1:5)
peptide5 %>%
tibble()
## # A tibble: 5 × 5
## PeptideGroupID Sequence Modifications PsmCount XCorrbySearchEngine_1
## <int> <chr> <chr> <int> <dbl>
## 1 68421 TTGIVMDSGDGVTHTVP… "" 24 8.39
## 2 7425 CPEALFQPSFLGMESCG… "Carbamidome… 70 8.19
## 3 10695 DLYANTVLSGGTTMYPG… "" 69 7.29
## 4 10697 DLYANTVLSGGTTMYPG… "1×Oxidation… 30 7.23
## 5 7427 CPEALFQPSFLGMESCG… "Carbamidome… 33 6.27
Now retrieve the PSM’s for the first peptide
psms5 <- dbGetPsmTable(db = db, peptideGroupIDs = 10695) %>%
dplyr::select(PeptideID, Charge, MassOverCharge, XCorr) %>%
dplyr::arrange(desc(XCorr)) %>%
slice(1:5)
psms5
## PeptideID Charge MassOverCharge XCorr
## 1 616594 2 1108.0387 7.286522
## 2 610318 2 1108.0387 6.880840
## 3 613318 2 1108.0392 6.832521
## 4 617110 3 739.0286 6.653973
## 5 619826 2 1108.0388 6.643941
with the PeptideID we can now retrieve information on the psms spectra
MSnInfo <- dbGetMSnSpectrumInfo(db = db, peptideID = psms5$PeptideID)
MSnInfo %>%
dplyr::select(RetentionTime, MasterScanNumbers, ScanNumbers, PercentIsolationInterference)
## RetentionTime MasterScanNumbers ScanNumbers PercentIsolationInterference
## 1 75.61137 43595 43645 43.630718
## 2 75.93290 43852 43890 3.204555
## 3 76.27213 44110 44147 0.000000
## 4 76.32849 44146 44187 3.603348
## 5 76.60362 44352 44394 15.745718
MSnInfo %>%
dplyr::select(IonInjectTime, MassOverCharge, Charge, OriginalPrecursorMass, Intensity)
## IonInjectTime MassOverCharge Charge OriginalPrecursorMass Intensity
## 1 35.000 1108.0387 2 1108.0387 1323099
## 2 1.543 1108.0392 2 1108.0392 339010432
## 3 30.875 1108.0387 2 1108.0387 20638660
## 4 35.000 739.0286 3 739.0286 5674726
## 5 35.000 1108.0388 2 1108.0388 8470223
…and we can get the Mass Spectrum Info:
spectrumInfo <- dbGetMassSpectrumItems(db = db, peptideID = psms5$PeptideID)
spectrumInfo %>%
dplyr::select(RetentionTime, ResolutionAtMass200, IsolationWidth, Spectrum)
## RetentionTime ResolutionAtMass200 IsolationWidth Spectrum
## 1 75.61137 30000 1.6 blob[12.26 kB]
## 2 75.93290 30000 1.6 blob[9.64 kB]
## 3 76.27213 30000 1.6 blob[11.81 kB]
## 4 76.32849 30000 1.6 blob[11.77 kB]
## 5 76.60362 30000 1.6 blob[10.07 kB]
Most interesting in this last table is obviously the spectrum object which is in a raw (blob) format. With the transformSpectrumRaw function we can translate this into useful information:
theSpectrum <- transformSpectrumRaw(spectrumInfo$Spectrum[[2]])
class(theSpectrum)
## [1] "list"
names(theSpectrum)
## [1] "Header" "ScanEvent" "PrecursorInfo" "PeakCentroids"
## [5] "ProfilePoints"
And with the helper functions we can get the info out of this somewhat cumbersome list object
spectrum.header(theSpectrum) %>%
dplyr::select(DataType, RetentionTime, BasePeakPosition, BasePeakIntensity, TotalIntensity)
## DataType RetentionTime BasePeakPosition BasePeakIntensity TotalIntensity
## 1 Centroid 75.93290145335 628.341003417969 31201228 464254629.15625
And of course the spectrum (used for identification) too:
centroided <- spectrum.centroid(theSpectrum)
centroided %>%
dplyr::slice(1:10)
## mz intensity charge resolution signalToNoise
## 1 156.1659 84623.61 0 26800 2.2
## 2 157.1081 113200.70 0 27800 2.9
## 3 158.0924 259970.20 0 37600 6.6
## 4 158.9321 75278.96 0 28300 1.9
## 5 163.4241 66772.50 0 23200 1.7
## 6 173.1283 203509.00 3 26904 4.8
## 7 173.4685 1646527.00 0 41700 38.9
## 8 175.1187 5782023.00 0 37900 135.5
## 9 176.1223 183542.80 0 27400 4.3
## 10 183.1125 219083.90 0 33900 4.9
Which we can then plot!
(the plotSpectrum function comes from the MS-Analysis repo)
plotSpectrum(centroided, centroidPlot = T, labels = NA)
With the aid of the MassSpectrometryR package we can calculate the y- and b-series of ions (single charged)
library(massSpectrometryR)
peptide$new(sequence = peptide5$Sequence[2])$fragments()
## # A tibble: 28 × 13
## aions aionsH2O aionsNH3 bions bionsH2O bionsNH3 bionsH2Oadd cions xions
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 76.0 74.0 59.0 104. 86.0 87.0 122. 121. 173.
## 2 173. 155. 156. 201. 183. 184. 219. 218. 304.
## 3 302. 284. 285. 330. 312. 313. 348. 347. 417.
## 4 373. 355. 356. 401. 383. 384. 419. 418. 504.
## 5 486. 468. 469. 514. 496. 497. 532. 531. 618.
## 6 633. 615. 616. 661. 643. 644. 679. 678. 765.
## 7 761. 743. 744. 789. 771. 772. 807. 806. 866.
## 8 858. 840. 841. 886. 868. 869. 904. 903. 967.
## 9 945. 927. 928. 973. 955. 956. 991. 990. 1096.
## 10 1093. 1075. 1075. 1121. 1103. 1103. 1139. 1138. 1234.
## # ℹ 18 more rows
## # ℹ 4 more variables: yions <dbl>, yionsH2O <dbl>, yionsNH3 <dbl>, zions <dbl>
yFragments <- data.frame(
yion = createIonSeries(theSeries = 1:21),
mzy = peptide$new(sequence = peptide5$Sequence[3])$fragments()$yions,
bion = createIonSeries(whichSeries = "b",theSeries = 1:21, whichCharge = 1),
mzb = peptide$new(sequence = peptide5$Sequence[3])$fragments(chargeState = 1)$bions)
yFragments %>%
dplyr::slice(5:0)
## yion mzy bion mzb
## 1 y+_5 531.2885 b+_5 577.2617
## 2 y+_4 474.2671 b+_4 463.2187
## 3 y+_3 361.1830 b+_3 392.1816
## 4 y+_2 290.1459 b+_2 229.1183
## 5 y+_1 175.1190 b+_1 116.0342
Using the spectrumAnnotation routines we can overlay (existing) ions as a series, similair to how most
theOverlay <- list()
theOverlay[[1]] <- spectrumAnnotation$new(annotationName = "y 1+",
mzs = yFragments$mzy,
labels = yFragments$yion)
theOverlay[[2]] <- spectrumAnnotation$new(annotationName = "b 1+",
mzs = yFragments$mzb,
labels = yFragments$bion)
theOverlay[[1]]$check(spectrum = centroided)
theOverlay[[2]]$check(spectrum = centroided)
theOverlay[[2]]$color <- "blue"
# give the second label series a 'ramp'
theOverlay[[2]]$labelWhere <- 0.7 + (0.005 * 1:(length(theOverlay[[2]]$mz)))
plotSpectrum(centroided, centroidPlot = T, annotateMz = theOverlay[1])
plotSpectrum(centroided, centroidPlot = T, annotateMz = theOverlay)
With the aid of the plotly library you can make the plots interactive to be zoom in on ions/masses, though at the loss of some functionality
# turning of the 'ramp': doesn't work when ggplotly is used
theOverlay[[2]]$labelWhere <- 0.8
library(plotly)
ggplotly(plotSpectrum(centroided, centroidPlot = T, annotateMz = theOverlay))
dbClose(db)