massSpectrometryR package

Introduction

Aim of the package

The aim for this package is to make doing calculations and making graphs with/for mass spectrometry results more easy. This first iteration (version 0.1.3) lays the foundation by providing a framework and is mainly focussed on mass & m/z calculations.

Most of the calculations have been thoroughly checked, but errors/bugs are still possible. Most of the element/formula calculations have been compared to:

commercially available software
tools on websites
- Protein Prospector
- enviPat Web
manual calculations

Why another possibility to calculate masses and predict (peptide fragement) formulas?

I often end up having to use 2-3 different tools side by side because none of them really do everything
online calculators are somewhat frowned upon when working in industry for various reasons
- external/online tools should be used with caution when it comes to proprietary data
- it will be difficult to evaluate these tools in an environment where reproducibility is needed
software tools of mass spectrometer firms are usually vendor specific, so usually you cannot work with data from one firm in the software of another. Of course some calculators can be used (manually) between vendors, but there are usually some smaller or bigger differences which make
It’s not really possible to use the calculators from within other applications, such as R (but also Excel, Word, etc). This makes it that there is always a some copy/paste actions going on, which don’t make for reproducible workflows.
Also automation of calculations on and data manipulation of mass spectrometry data is usually limited and nearly without exception vendor specific
Many of the graphs in vendor specific data are difficult to manipulate (eg zooming in/out precisely, alignment of different graphs, etc etc)

Note: I already created a ton of functions/procedures that deal with a lot of these issues, but they are not yet in this package. Over time I will put more and more of them in.

Installation instructions

Since the current version is still at the development stage, the package can only be installed via github:

devtools::install_github("BenBruyneel/massSpectrometryR")

Another possibility is to download it from github and then install it as a local package.

General use

Basics

As per usual the package first needs to be loaded

library(massSpectrometryR)

A lot of the calculations are centered around formulas which are (essentially) a groups of elements bound together. Glucose for example is C₆H₁₂O₆. The package defines every formula as a named numeric vector.

glucose <- c(C=6, H=12, O=6)
glucose

##  C  H  O 
##  6 12  6

You can also define one of the elements as being zero or even negative. While a negative number of atoms of a certain kind may seem strange, it is not meant for actual compounds, but for calculations where formulas are added or subtracted from one another

Alanine:

alanine <- c(C=3, H=7, N=1, O=2, S=0)
alanine

## C H N O S 
## 3 7 1 2 0

You can also get alanine by retrieving it the formula from the function aminoAcidResidues and then adding H₂O to it. The aminoacidResidues function generates a R6 chemicals object that contains info on all ‘natural’ amino acids in the form of residues (as that is how they are present in peptides & proteins)

aminoAcidResidues()

## # A tibble: 20 × 4
##    letter name          short formula  
##    <chr>  <chr>         <chr> <list>   
##  1 A      Alanine       Ala   <dbl [5]>
##  2 R      Arginine      Arg   <dbl [5]>
##  3 N      Asparagine    Asn   <dbl [5]>
##  4 D      Aspartate     Asp   <dbl [5]>
##  5 E      Glutamate     Glu   <dbl [5]>
##  6 Q      Glutamine     Gln   <dbl [5]>
##  7 G      Glycine       Gly   <dbl [5]>
##  8 H      Histidine     His   <dbl [5]>
##  9 I      Isoleucine    Ile   <dbl [5]>
## 10 L      Leucine       Leu   <dbl [5]>
## 11 K      Lysine        Lys   <dbl [5]>
## 12 M      Methionine    Met   <dbl [5]>
## 13 F      Phenylalanine Phe   <dbl [5]>
## 14 P      Proline       Pro   <dbl [5]>
## 15 S      Serine        Ser   <dbl [5]>
## 16 T      Threonine     Thr   <dbl [5]>
## 17 W      Tryptophane   Trp   <dbl [5]>
## 18 Y      Tyrosine      Tyr   <dbl [5]>
## 19 V      Valine        Val   <dbl [5]>
## 20 C      Cysteine      Cys   <dbl [5]>

aminoAcidResidues()$getFormula("alanine")

## C H N O S 
## 3 5 1 1 0

addFormulas(aminoAcidResidues()$getFormula("alanine"), c(H=2, O=1))

## C H N O S 
## 3 7 1 2 0

Another way to do addition of molecules is the operator %f+%

aminoAcidResidues()$getFormula("alanine") %f+% waterFormula()

## C H N O S 
## 3 7 1 2 0

Note: the waterFormula function simply generates the formula c(H=2, O=1)

Of course there is also an subtractFormulas (and matching operator: %f-%)

identical(aminoAcidResidues()$getFormula("alanine"), (alanine %f-% waterFormula()))

## [1] TRUE

The purpose of the R6 class chemicals is to create as a ‘storage space’ for info on amino acid residues (available via the function aminoAcidResidues), but can be used for other (classes of) compounds too, eg sugars, steroids, etc etc

All mass and mass/charge calculations are done using R6 class elements objects. Two are predefined and accessible via the elementsMonoisotpic and elementsAverage functions.

Please note that neither contains all elements, I just put in the ones I regularly use myself. If needed simply create a new elements object.

elementsMonoisotopic()

##          name short        mass
## 1      Carbon     C 12.00000000
## 2    Hydrogen     H  1.00782503
## 3    Nitrogen     N 14.00307401
## 4      Oxygen     O 15.99491462
## 5     Sulphur     S 31.97207073
## 6  Phosphorus     P 30.97376149
## 7     Bromine    Br 78.91833790
## 8    Chlorine    Cl 34.96885271
## 9    Fluorine     F 18.99840320
## 10    Silicon    Si 27.97692649
## 11   electron     e  0.00054858
## 12     Sodium    Na 22.98976966
## 13  Potassium     K 38.96370690
## 14   Carbon13   13C 13.00335484
## 15 Nitrogen15   15N 15.00010897
## 16   Oxygen18   18O 17.99916040
## 17  Deuterium    2H  2.01410178

newElements <- elementsMonoisotopic()$clone()$addElement(shorts = "33S",
                                                         name = "Suplhur33",
                                                         mass = 32.97145854)
newElements

##          name short        mass
## 1      Carbon     C 12.00000000
## 2    Hydrogen     H  1.00782503
## 3    Nitrogen     N 14.00307401
## 4      Oxygen     O 15.99491462
## 5     Sulphur     S 31.97207073
## 6  Phosphorus     P 30.97376149
## 7     Bromine    Br 78.91833790
## 8    Chlorine    Cl 34.96885271
## 9    Fluorine     F 18.99840320
## 10    Silicon    Si 27.97692649
## 11   electron     e  0.00054858
## 12     Sodium    Na 22.98976966
## 13  Potassium     K 38.96370690
## 14   Carbon13   13C 13.00335484
## 15 Nitrogen15   15N 15.00010897
## 16   Oxygen18   18O 17.99916040
## 17  Deuterium    2H  2.01410178
## 18  Suplhur33   33S 32.97145854

newElements$getMass("33S")

## [1] 32.97146

Calculations of mass & m/z

To calculate the neutral mass of a molecule, use formulaToMass

caffeine <- c(C=8, H=10, N=4, O=2)
caffeine

##  C  H  N  O 
##  8 10  4  2

# mono isotopic mass
formulaToMass(caffeine)

## [1] 194.0804

# to see more digits than R will print by default
formulaToMass(caffeine) |> format(digits = 10)

## [1] "194.0803756"

# average mass
formulaToMass(caffeine, enviPat = TRUE, exact = FALSE)

## [1] 194.1898

For the protonated m/z (mass over charge), use massToMzH

formulaToMass(caffeine) |> massToMzH(charge = 1)

## [1] 195.0877

formulaToMass(caffeine) |> massToMzH(charge = 2)

## [1] 98.04746

To calculate the m/z when you have another adduct, use massToMz

# protonated form
formulaToMass(glucose) |> massToMzH(charge = 1)

## [1] 181.0707

# sodtum adducts
formulaToMass(glucose) |> massToMz(adducts = 1, adductFormula = c(Na=1), adductCharge = 1)

## [1] 203.0526

formulaToMass(glucose) |> massToMz(adducts = 2, adductFormula = c(Na=1), adductCharge = 1)

## [1] 113.0209

Isotope clusters/patterns

For some more complex calculations, such as relative istope patterns, please refer to the enviPat package. Here is an example of what can be done:

chloroform <- c(C=1, H = 1, Cl = 3)
# monoisotopic mass
chloroform |> formulaToMass() |> format(digits = 10, nsmall = 6)

## [1] "117.9143832"

# enviPat
library(enviPat)
data(isotopes)
ChloroformIsotopePattern <- isopattern(
  isotopes = isotopes,
  chemforms = chloroform |> formulaString(),
  threshold=0.01,
  plotit=FALSE, verbose = FALSE)
ChloroformIsotopePattern

## $C1H1Cl3
##            m/z    abundance 12C 13C 1H 2H 35Cl 37Cl
##  [1,] 117.9144 100.00000000   1   0  1  0    3    0
##  [2,] 118.9177   1.08157283   0   1  1  0    3    0
##  [3,] 118.9207   0.01150132   1   0  0  1    3    0
##  [4,] 119.9114  95.98732841   1   0  1  0    2    1
##  [5,] 120.9148   1.03817286   0   1  1  0    2    1
##  [6,] 120.9177   0.01103981   1   0  0  1    2    1
##  [7,] 121.9085  30.71189071   1   0  1  0    1    2
##  [8,] 122.9118   0.33217147   0   1  1  0    1    2
##  [9,] 123.9055   3.27550260   1   0  1  0    0    3
## [10,] 124.9089   0.03542695   0   1  1  0    0    3

# as a picture
ChloroformEnvelope <- envelope(
  pattern = ChloroformIsotopePattern,
  resolution = 1E3,
  plotit = FALSE,
  verbose = FALSE)
plot(ChloroformEnvelope[[1]], type = "l")

# same for glucose
glucoseIsotopePattern <- isopattern(
  isotopes = isotopes,
  chemforms = glucose |> formulaString(),
  threshold=0.01,
  plotit=FALSE, verbose = FALSE)
glucoseIsotopePattern

## $C6H12O6
##              m/z       abundance 12C 13C 1H 2H 16O 18O 17O
## [1,] 180.0633881 100.00000000000   6   0 12  0   6   0   0
## [2,] 181.0667429   6.48943697564   5   1 12  0   6   0   0
## [3,] 181.0676050   0.22855538960   6   0 12  0   5   0   1
## [4,] 181.0696648   0.13801587183   6   0 11  1   6   0   0
## [5,] 182.0676339   1.23299618072   6   0 12  0   5   1   0
## [6,] 182.0700978   0.17546996775   4   2 12  0   6   0   0
## [7,] 182.0709598   0.01483195796   5   1 12  0   5   0   1
## [8,] 183.0709887   0.08001451006   5   1 12  0   5   1   0

# as a picture
glucoseEnvelope <- envelope(
  pattern = glucoseIsotopePattern,
  resolution = 1E3,
  plotit = FALSE,
  verbose = FALSE)
plot(glucoseEnvelope[[1]], type = "l")

Peptides

Intro

Peptides are a ‘string’ of amino acid residues and essentially a ‘class’ of chemicals. However one has to take into account that some of the amino acid residues may be modified. These modifications can be almost anything it sometimes seems… there are naturally occuring ones such as phosporylation, methylation, glycosylation, but also chemically induced ones such as alkylation or conjugation to all kinds of chemicals. Apart from those, there are also so-called artefacts such as oxidation which sometimes occurs during isolation, storage or preparation of proteins.

General functions

If no modifications are present, there are 2 easy fast functions to quickly figure out the formula and m/z of a peptide

# to figure out the chemical formula of the neutral peptide
peptideFormula("SAMPLER")

##  C  H  N  O  S 
## 33 58 10 11  1

# to calculate the m/z of a protonated peptide
peptideMzH("SAMPLER", charge = 1)

## [1] 803.408

peptideMzH("SAMPLER", charge = 2)

## [1] 402.2076382

Modifications

No matter what the cause, every modification comes from a ‘chemical’ reaction where some atoms leave the peptide and some atoms are attached to it. Sometimes certain amino acids (eg alkylated cysteine) are all modified and sometimes some of these amino acids are modified (eg oxidated Methionine). The first type is called fixed while the other named variable. To calculate mass or (m/z) of peptide with only fixed modifications is relatively straightforward: simply subtract x times what is lost and add x times what is gained. Variable modifications are a bit more tricky: we have to calculate all variants of the peptide with/without modifications in some of the positions.

In the package there’s a separate R6 class (not surprisingly called modifications) for ‘storage’ of these changes that can be applied to peptides/amino acids. Creation of such an object with two fixed modifications and 1 variable modification.

aaModifications <- modifications$new()
aaModifications$addTable(
  tibble::tibble(
    name = c("Carbamidomethyl (C)",
             "Oxidation (M)"),
    position = c("C","M"),
    fixed = c(TRUE,FALSE),
    gain = list(c(C = 2, H = 4, N = 1, O = 1),
                c(C = 0, H = 0, N = 0, O = 1)),
    loss = list(c(protonFormula()),
                c(emptyFormula())),
    category = c("Cys-state","Preparation Artefact")
  )
)
aaModifications

## 
## Fixed Modifications
## # A tibble: 1 × 6
##   name                position fixed gain      loss      category 
##   <chr>               <chr>    <lgl> <list>    <list>    <chr>    
## 1 Carbamidomethyl (C) C        TRUE  <dbl [4]> <dbl [1]> Cys-state
## 
## Variable Modifications
## # A tibble: 1 × 6
##   name          position fixed gain      loss      category            
##   <chr>         <chr>    <lgl> <list>    <list>    <chr>               
## 1 Oxidation (M) M        FALSE <dbl [4]> <dbl [5]> Preparation Artefact

The first one is a fixed modification which will cause the modification to be applied always. The other one is a variable modification and will only be applied when specifically asked for. The gain & loss columns are formulas in the named numeric vector format as before.

Adding modifications to the table is possible one modification at a time or via a table of modifications

aaModifications$add(name = "Deamidation",
                    position = "NQ",
                    fixed = FALSE,
                    gain = c(O = 1, H = 1),
                    loss = c(N = 1, H =2),
                    category = "Preparation Artefact")
aaModifications

## 
## Fixed Modifications
## # A tibble: 1 × 6
##   name                position fixed gain      loss      category 
##   <chr>               <chr>    <lgl> <list>    <list>    <chr>    
## 1 Carbamidomethyl (C) C        TRUE  <dbl [4]> <dbl [1]> Cys-state
## 
## Variable Modifications
## # A tibble: 2 × 6
##   name          position fixed gain      loss      category            
##   <chr>         <chr>    <lgl> <list>    <list>    <chr>               
## 1 Oxidation (M) M        FALSE <dbl [4]> <dbl [5]> Preparation Artefact
## 2 Deamidation   NQ       FALSE <dbl [2]> <dbl [2]> Preparation Artefact

Note: in calculations the variable modifications are used via the rownumber of the unordered table. In the following example Oxidation (M) is 1 and Deamidation (NQ) is 2.

library(dplyr)
aaModifications$table %>% dplyr::filter(!fixed)

## # A tibble: 2 × 6
##   name          position fixed gain      loss      category            
##   <chr>         <chr>    <lgl> <list>    <list>    <chr>               
## 1 Oxidation (M) M        FALSE <dbl [4]> <dbl [5]> Preparation Artefact
## 2 Deamidation   NQ       FALSE <dbl [2]> <dbl [2]> Preparation Artefact

Note: if two fixed modifications apply to the same amino acid(s), then they will both be applied! With variable modifications this is less likely to occur (as will be shown later) as they are applied per position in the sequence in stead of per amino acid.

There is an example modifications object which contains a number of often used modifications. It should be noted however that for cysteine there are two fixed modifications. To be able to use this table, remove one (or both).

aminoAcidModifications()$table

## # A tibble: 6 × 6
##   name                 position fixed gain      loss      category            
##   <chr>                <chr>    <lgl> <list>    <list>    <chr>               
## 1 Carbamidomethyl (C)  C        TRUE  <dbl [4]> <dbl [1]> Cys-state           
## 2 Carboxymethyl (C)    C        TRUE  <dbl [4]> <dbl [1]> Cys-state           
## 3 Oxidation (M)        M        FALSE <dbl [4]> <dbl [5]> Preparation Artefact
## 4 Deamidated (NQ)      NQ       FALSE <dbl [2]> <dbl [2]> Preparation Artefact
## 5 Oxidation (W)        W        FALSE <dbl [1]> <dbl [5]> Preparation Artefact
## 6 Oxidation Double (W) W        FALSE <dbl [1]> <dbl [5]> Preparation Artefact

# remove one of the fixed modifications for cysteine
aminoAcidModifications()$table[-2,]

## # A tibble: 5 × 6
##   name                 position fixed gain      loss      category            
##   <chr>                <chr>    <lgl> <list>    <list>    <chr>               
## 1 Carbamidomethyl (C)  C        TRUE  <dbl [4]> <dbl [1]> Cys-state           
## 2 Oxidation (M)        M        FALSE <dbl [4]> <dbl [5]> Preparation Artefact
## 3 Deamidated (NQ)      NQ       FALSE <dbl [2]> <dbl [2]> Preparation Artefact
## 4 Oxidation (W)        W        FALSE <dbl [1]> <dbl [5]> Preparation Artefact
## 5 Oxidation Double (W) W        FALSE <dbl [1]> <dbl [5]> Preparation Artefact

# to create a new modification table without the second fixed modification
newModifications <- modifications$new()$addTable(aminoAcidModifications()$table[-2,])
newModifications

## 
## Fixed Modifications
## # A tibble: 1 × 6
##   name                position fixed gain      loss      category 
##   <chr>               <chr>    <lgl> <list>    <list>    <chr>    
## 1 Carbamidomethyl (C) C        TRUE  <dbl [4]> <dbl [1]> Cys-state
## 
## Variable Modifications
## # A tibble: 4 × 6
##   name                 position fixed gain      loss      category            
##   <chr>                <chr>    <lgl> <list>    <list>    <chr>               
## 1 Oxidation (M)        M        FALSE <dbl [4]> <dbl [5]> Preparation Artefact
## 2 Deamidated (NQ)      NQ       FALSE <dbl [2]> <dbl [2]> Preparation Artefact
## 3 Oxidation (W)        W        FALSE <dbl [1]> <dbl [5]> Preparation Artefact
## 4 Oxidation Double (W) W        FALSE <dbl [1]> <dbl [5]> Preparation Artefact

It doesn’t matter that there two different modifications for tryptophan: they’re variable and can only be used one at a time on an amino acid position (by using variable modification 2 or 3). This is done via a so called modification string which is a character vector of the same length as the peptide sequence.

# no modifications
testPeptide1 <- peptide$new(sequence = "SAMPLER")
testPeptide1

## Sequence               : SAMPLER
## Variable modifications : NA
## Modifications Table    : NA

# with modifications, but none are applied to the peptide
testPeptide2 <- peptide$new(sequence = "SAMPLER",
                            modificationTable = newModifications$table,
                            variableModifications = "0000000")
testPeptide2

## Sequence               : SAMPLER
## Variable modifications : 0000000
## Modifications Table    :
## # A tibble: 5 × 6
##   name                 position fixed gain      loss      category            
##   <chr>                <chr>    <lgl> <list>    <list>    <chr>               
## 1 Carbamidomethyl (C)  C        TRUE  <dbl [4]> <dbl [1]> Cys-state           
## 2 Oxidation (M)        M        FALSE <dbl [4]> <dbl [5]> Preparation Artefact
## 3 Deamidated (NQ)      NQ       FALSE <dbl [2]> <dbl [2]> Preparation Artefact
## 4 Oxidation (W)        W        FALSE <dbl [1]> <dbl [5]> Preparation Artefact
## 5 Oxidation Double (W) W        FALSE <dbl [1]> <dbl [5]> Preparation Artefact

# with modifications, oxidation present on third amino acid
testPeptide3 <- peptide$new(sequence = "SAMPLER",
                            modificationTable = newModifications$table,
                            variableModifications = "0010000")
testPeptide3

## Sequence               : SAMPLER
## Variable modifications : 0010000
## Modifications Table    :
## # A tibble: 5 × 6
##   name                 position fixed gain      loss      category            
##   <chr>                <chr>    <lgl> <list>    <list>    <chr>               
## 1 Carbamidomethyl (C)  C        TRUE  <dbl [4]> <dbl [1]> Cys-state           
## 2 Oxidation (M)        M        FALSE <dbl [4]> <dbl [5]> Preparation Artefact
## 3 Deamidated (NQ)      NQ       FALSE <dbl [2]> <dbl [2]> Preparation Artefact
## 4 Oxidation (W)        W        FALSE <dbl [1]> <dbl [5]> Preparation Artefact
## 5 Oxidation Double (W) W        FALSE <dbl [1]> <dbl [5]> Preparation Artefact

Peptides are objects and as such have a set of functions that can be called:

# formula of peptide
testPeptide1$formula()

##  C  H  N  O  S 
## 33 58 10 11  1

# mass of peptide
testPeptide1$mass()

## [1] 802.4007235

# m/z of peptide sodium adduct
testPeptide1$mz(adducts = 1, adductFormula = c(Na=1))

## [1] 825.3899446

# m/z of the double-charged (protonated) peptide
testPeptide1$mzH(charge = 2)

## [1] 402.2076382

Fragments

An important function available for every object of class peptide is the fragments: it allows the automatic calculation of expected fragments when performing some form of peptide fragmentation. An online tool that does a similair thing is protein prospector’s MS-Product

testPeptide3$fragments()

## # A tibble: 7 × 13
##   aions aionsH2O aions…¹ bions bions…² bions…³ bions…⁴ cions xions yions yions…⁵
##   <dbl>    <dbl>   <dbl> <dbl>   <dbl>   <dbl>   <dbl> <dbl> <dbl> <dbl>   <dbl>
## 1  60.0     42.0    43.0  88.0    70.0    71.0    106.  105.  201.  175.    157.
## 2 131.     113.    114.  159.    141.    142.     177.  176.  330.  304.    286.
## 3 278.     260.    261.  306.    288.    289.     324.  323.  443.  417.    399.
## 4 375.     357.    358.  403.    385.    386.     421.  420.  540.  514.    496.
## 5 488.     470.    471.  516.    498.    499.     534.  533.  687.  661.    643.
## 6 617.     599.    600.  645.    627.    628.     663.  662.  758.  732.    714.
## 7 773.     755.    756.  801.    783.    784.     819.  818.  845.  819.    801.
## # … with 2 more variables: yionsNH3 <dbl>, zions <dbl>, and abbreviated
## #   variable names ¹aionsNH3, ²bionsH2O, ³bionsNH3, ⁴bionsH2Oadd, ⁵yionsH2O

# only b-ions, doubly charged
testPeptide3$fragments(chargeState = 2)$bions

## [1]  44.52329066  80.04184755 153.55954719 202.08592911 258.62796110
## [6] 323.14925765 401.19981317

# to get the formulas of the doubly charged y-ions (first 3 displayed)
testPeptide3$fragments(chargeState = 2, returnFormulas = T)$yions[1:3]

## [[1]]
##  C  H  N  O  S 
##  6 14  4  2  0 
## 
## [[2]]
##  C  H  N  O  S 
## 11 21  5  5  0 
## 
## [[3]]
##  C  H  N  O  S 
## 17 32  6  6  0

# to get the expected immonium ions (modifications are NOT taken into account)
testPeptide3$fragments.immoniumIons()

## [1]  60.0444  70.0651  86.0964  87.0917 100.0869 102.0550 104.0528 112.0869
## [9] 126.0550

Please note that these are essentially dumb calculations: the code doesn’t look if a fragment is possible at all! For example b+H₂O ions only appear when certain sequence ‘conditions’ are met.

Other packages

Obviously this package is not unique in attempting to work with chemical formulas. Other notable packages include:

Please note that some results from calculations in these packages can differ slightly from each other (usually after the n^th digit)

An example of how to work with enviPat was presented here.

For the rcdk package there is one specific function which helps ‘translating’ an cdkFormula object to a named numeric vector, ie a formula as this package can work with.

Closing remarks

Obviously this package is a work in progress. A lot more testing is still required. Serious errors and bugs are corrected as soon as possible.

Contact info: benbruyneel@gmail.com