Apart from a big bunch of bug-fixes and tweaks to the functions, there are a couple of new functions related to psms and spectra information.
MS2 fragmentation spectra are sort of hidden in a separate raw column in the MassSpectrumItems table in .pdResult files. It turned out that the data is actually xml data in a compressed (zip) format.
The following functions deal with this type raw data:
| Name | Remark | 
|---|---|
| transformSpectrumRaw | takes raw spectrum data and ‘translates’ it into an xml object for further processing. All other functions work with the XML object coming from transformSpectrumRaw | 
| spectrum.header | |
| spectrum.scanEvent | |
| spectrum.centroid | The only format which seems to be present in these spectra is centroided | 
| spectrum.profile | As I have not encountered any profile data yet, this function always returns NA | 
| spectrum.precursor.header | |
| spectrum.precursor.scanEvent | |
| spectrum.precursor.info | |
| spectrum.precursor.centroid | The only format which seems to be present in these spectra is centroided | 
| spectrum.precursor.profile | As I have not encountered any profile data yet, this function always returns NA | 
| spectrum.precursor.additionalInfo | 
Please note that the transformSpectrumRaw function writes a temporary file and then deletes it again due to having to decompress the data.
library(proteinDiscover)
library(dplyr)
library(stringr)
db <- dbOpen(fileName = "hela001.pdResult")
Get the 5 proteins with the highest (summed) Sequest score:
best5 <- dbGetProteinTable(db,
                           columns = c("Accession", "Description",
                                       "ProteinGroupIDs", "ScoreSequestHT")) %>%
  dfTransformRaws() %>%
  arrange(desc(ScoreSequestHT_1)) %>%
  slice(1:5) %>%
  dplyr::mutate(
    Description = substr(
      str_replace_all(Description, pattern = "\\[.*\\]", replacement = ""), 1, 30))
best5 %>%
  tibble()
## # A tibble: 5 × 4
##   Accession Description                      ProteinGroupIDs ScoreSequestHT_1
##   <chr>     <chr>                            <chr>                      <dbl>
## 1 P60709    "Actin, cytoplasmic 1 OS=Homo s" 2218                       1278.
## 2 P06733    "Alpha-enolase OS=Homo sapiens " 3518                       1200.
## 3 P21333    "Filamin-A OS=Homo sapiens OX=9" 1179                        983.
## 4 O75369    "Filamin-B OS=Homo sapiens OX=9" 3093                        932.
## 5 Q09666    "Neuroblast differentiation-ass" 2062                        834.
Then get the best peptides for the first protein
peptide5 <- dbGetPeptideTable(db = db, proteinGroupIDs = 2218) %>%
  dfTransformRaws() %>%
  dplyr::select(PeptideGroupID, Sequence, Modifications, PsmCount, XCorrbySearchEngine_1) %>%
  dplyr::arrange(desc(XCorrbySearchEngine_1)) %>%
  slice(1:5)
peptide5 %>%
  tibble()
## # A tibble: 5 × 5
##   PeptideGroupID Sequence           Modifications PsmCount XCorrbySearchEngine_1
##            <int> <chr>              <chr>            <int>                 <dbl>
## 1          68421 TTGIVMDSGDGVTHTVP… ""                  24                  8.39
## 2           7425 CPEALFQPSFLGMESCG… "Carbamidome…       70                  8.19
## 3          10695 DLYANTVLSGGTTMYPG… ""                  69                  7.29
## 4          10697 DLYANTVLSGGTTMYPG… "1×Oxidation…       30                  7.23
## 5           7427 CPEALFQPSFLGMESCG… "Carbamidome…       33                  6.27
Now retrieve the PSM’s for the first peptide
psms5 <- dbGetPsmTable(db = db, peptideGroupIDs = 10695) %>%
  dplyr::select(PeptideID, Charge, MassOverCharge, XCorr) %>%
  dplyr::arrange(desc(XCorr)) %>%
  slice(1:5)
psms5
##   PeptideID Charge MassOverCharge    XCorr
## 1    616594      2      1108.0387 7.286522
## 2    610318      2      1108.0387 6.880840
## 3    613318      2      1108.0392 6.832521
## 4    617110      3       739.0286 6.653973
## 5    619826      2      1108.0388 6.643941
with the PeptideID we can now retrieve information on the psms spectra
MSnInfo <- dbGetMSnSpectrumInfo(db = db, peptideID = psms5$PeptideID)
MSnInfo %>%
  dplyr::select(RetentionTime, MasterScanNumbers, ScanNumbers, PercentIsolationInterference)
##   RetentionTime MasterScanNumbers ScanNumbers PercentIsolationInterference
## 1      75.61137             43595       43645                    43.630718
## 2      75.93290             43852       43890                     3.204555
## 3      76.27213             44110       44147                     0.000000
## 4      76.32849             44146       44187                     3.603348
## 5      76.60362             44352       44394                    15.745718
MSnInfo %>%
  dplyr::select(IonInjectTime, MassOverCharge, Charge, OriginalPrecursorMass, Intensity)
##   IonInjectTime MassOverCharge Charge OriginalPrecursorMass Intensity
## 1        35.000      1108.0387      2             1108.0387   1323099
## 2         1.543      1108.0392      2             1108.0392 339010432
## 3        30.875      1108.0387      2             1108.0387  20638660
## 4        35.000       739.0286      3              739.0286   5674726
## 5        35.000      1108.0388      2             1108.0388   8470223
…and we can get the Mass Spectrum Info:
spectrumInfo <-  dbGetMassSpectrumItems(db = db, peptideID = psms5$PeptideID)
spectrumInfo %>% 
  dplyr::select(RetentionTime, ResolutionAtMass200, IsolationWidth, Spectrum)
##   RetentionTime ResolutionAtMass200 IsolationWidth       Spectrum
## 1      75.61137               30000            1.6 blob[12.26 kB]
## 2      75.93290               30000            1.6  blob[9.64 kB]
## 3      76.27213               30000            1.6 blob[11.81 kB]
## 4      76.32849               30000            1.6 blob[11.77 kB]
## 5      76.60362               30000            1.6 blob[10.07 kB]
Most interesting in this last table is obviously the spectrum object which is in a raw (blob) format. With the transformSpectrumRaw function we can translate this into useful information:
theSpectrum <- transformSpectrumRaw(spectrumInfo$Spectrum[[2]])
class(theSpectrum)
## [1] "list"
names(theSpectrum)
## [1] "Header"        "ScanEvent"     "PrecursorInfo" "PeakCentroids"
## [5] "ProfilePoints"
And with the helper functions we can get the info out of this somewhat cumbersome list object
spectrum.header(theSpectrum) %>%
  dplyr::select(DataType, RetentionTime, BasePeakPosition, BasePeakIntensity, TotalIntensity)
##   DataType  RetentionTime BasePeakPosition BasePeakIntensity  TotalIntensity
## 1 Centroid 75.93290145335 628.341003417969          31201228 464254629.15625
And of course the spectrum (used for identification) too:
centroided <- spectrum.centroid(theSpectrum)
centroided %>%
  dplyr::slice(1:10)
##          mz  intensity charge resolution signalToNoise
## 1  156.1659   84623.61      0      26800           2.2
## 2  157.1081  113200.70      0      27800           2.9
## 3  158.0924  259970.20      0      37600           6.6
## 4  158.9321   75278.96      0      28300           1.9
## 5  163.4241   66772.50      0      23200           1.7
## 6  173.1283  203509.00      3      26904           4.8
## 7  173.4685 1646527.00      0      41700          38.9
## 8  175.1187 5782023.00      0      37900         135.5
## 9  176.1223  183542.80      0      27400           4.3
## 10 183.1125  219083.90      0      33900           4.9
Which we can then plot!
(the plotSpectrum function comes from the MS-Analysis repo)
plotSpectrum(centroided, centroidPlot = T, labels = NA)
With the aid of the MassSpectrometryR package we can calculate the y- and b-series of ions (single charged)
library(massSpectrometryR)
peptide$new(sequence = peptide5$Sequence[2])$fragments()
## # A tibble: 28 × 13
##     aions aionsH2O aionsNH3 bions bionsH2O bionsNH3 bionsH2Oadd cions xions
##     <dbl>    <dbl>    <dbl> <dbl>    <dbl>    <dbl>       <dbl> <dbl> <dbl>
##  1   76.0     74.0     59.0  104.     86.0     87.0        122.  121.  173.
##  2  173.     155.     156.   201.    183.     184.         219.  218.  304.
##  3  302.     284.     285.   330.    312.     313.         348.  347.  417.
##  4  373.     355.     356.   401.    383.     384.         419.  418.  504.
##  5  486.     468.     469.   514.    496.     497.         532.  531.  618.
##  6  633.     615.     616.   661.    643.     644.         679.  678.  765.
##  7  761.     743.     744.   789.    771.     772.         807.  806.  866.
##  8  858.     840.     841.   886.    868.     869.         904.  903.  967.
##  9  945.     927.     928.   973.    955.     956.         991.  990. 1096.
## 10 1093.    1075.    1075.  1121.   1103.    1103.        1139. 1138. 1234.
## # ℹ 18 more rows
## # ℹ 4 more variables: yions <dbl>, yionsH2O <dbl>, yionsNH3 <dbl>, zions <dbl>
yFragments <- data.frame(
  yion = createIonSeries(theSeries = 1:21),
  mzy = peptide$new(sequence = peptide5$Sequence[3])$fragments()$yions,
  bion = createIonSeries(whichSeries = "b",theSeries = 1:21, whichCharge = 1),
  mzb = peptide$new(sequence = peptide5$Sequence[3])$fragments(chargeState = 1)$bions)
yFragments %>%
  dplyr::slice(5:0)
##   yion      mzy bion      mzb
## 1 y+_5 531.2885 b+_5 577.2617
## 2 y+_4 474.2671 b+_4 463.2187
## 3 y+_3 361.1830 b+_3 392.1816
## 4 y+_2 290.1459 b+_2 229.1183
## 5 y+_1 175.1190 b+_1 116.0342
Using the spectrumAnnotation routines we can overlay (existing) ions as a series, similair to how most
theOverlay <- list()
theOverlay[[1]] <- spectrumAnnotation$new(annotationName = "y 1+",
                                          mzs = yFragments$mzy,
                                          labels = yFragments$yion)
theOverlay[[2]] <- spectrumAnnotation$new(annotationName = "b 1+",
                                          mzs = yFragments$mzb,
                                          labels = yFragments$bion)
theOverlay[[1]]$check(spectrum = centroided)
theOverlay[[2]]$check(spectrum = centroided)
theOverlay[[2]]$color <- "blue"
# give the second label series a 'ramp'
theOverlay[[2]]$labelWhere <- 0.7 + (0.005 * 1:(length(theOverlay[[2]]$mz)))
plotSpectrum(centroided, centroidPlot = T, annotateMz = theOverlay[1])
plotSpectrum(centroided, centroidPlot = T, annotateMz = theOverlay)
With the aid of the plotly library you can make the plots interactive to be zoom in on ions/masses, though at the loss of some functionality
# turning of the 'ramp': doesn't work when ggplotly is used
theOverlay[[2]]$labelWhere <- 0.8
library(plotly)
ggplotly(plotSpectrum(centroided, centroidPlot = T, annotateMz = theOverlay))
dbClose(db)