The aim for this package is to make doing calculations and making graphs with/for mass spectrometry results more easy. This first iteration (version 0.1.3) lays the foundation by providing a framework and is mainly focussed on mass & m/z calculations.
Most of the calculations have been thoroughly checked, but errors/bugs are still possible. Most of the element/formula calculations have been compared to:
commercially available software
tools on websites
manual calculations
Why another possibility to calculate masses and predict (peptide fragement) formulas?
I often end up having to use 2-3 different tools side by side because none of them really do everything
online calculators are somewhat frowned upon when working in industry for various reasons
software tools of mass spectrometer firms are usually vendor specific, so usually you cannot work with data from one firm in the software of another. Of course some calculators can be used (manually) between vendors, but there are usually some smaller or bigger differences which make
It’s not really possible to use the calculators from within other applications, such as R (but also Excel, Word, etc). This makes it that there is always a some copy/paste actions going on, which don’t make for reproducible workflows.
Also automation of calculations on and data manipulation of mass spectrometry data is usually limited and nearly without exception vendor specific
Many of the graphs in vendor specific data are difficult to manipulate (eg zooming in/out precisely, alignment of different graphs, etc etc)
Note: I already created a ton of functions/procedures that deal with a lot of these issues, but they are not yet in this package. Over time I will put more and more of them in.
Since the current version is still at the development stage, the package can only be installed via github:
devtools::install_github("BenBruyneel/massSpectrometryR")
Another possibility is to download it from github and then install it as a local package.
As per usual the package first needs to be loaded
library(massSpectrometryR)
A lot of the calculations are centered around formulas which are (essentially) a groups of elements bound together. Glucose for example is C6H12O6. The package defines every formula as a named numeric vector.
glucose <- c(C=6, H=12, O=6)
glucose
## C H O
## 6 12 6
You can also define one of the elements as being zero or even negative. While a negative number of atoms of a certain kind may seem strange, it is not meant for actual compounds, but for calculations where formulas are added or subtracted from one another
Alanine:
alanine <- c(C=3, H=7, N=1, O=2, S=0)
alanine
## C H N O S
## 3 7 1 2 0
You can also get alanine by retrieving it the formula from the function aminoAcidResidues and then adding H2O to it. The aminoacidResidues function generates a R6 chemicals object that contains info on all ‘natural’ amino acids in the form of residues (as that is how they are present in peptides & proteins)
aminoAcidResidues()
## # A tibble: 20 × 4
## letter name short formula
## <chr> <chr> <chr> <list>
## 1 A Alanine Ala <dbl [5]>
## 2 R Arginine Arg <dbl [5]>
## 3 N Asparagine Asn <dbl [5]>
## 4 D Aspartate Asp <dbl [5]>
## 5 E Glutamate Glu <dbl [5]>
## 6 Q Glutamine Gln <dbl [5]>
## 7 G Glycine Gly <dbl [5]>
## 8 H Histidine His <dbl [5]>
## 9 I Isoleucine Ile <dbl [5]>
## 10 L Leucine Leu <dbl [5]>
## 11 K Lysine Lys <dbl [5]>
## 12 M Methionine Met <dbl [5]>
## 13 F Phenylalanine Phe <dbl [5]>
## 14 P Proline Pro <dbl [5]>
## 15 S Serine Ser <dbl [5]>
## 16 T Threonine Thr <dbl [5]>
## 17 W Tryptophane Trp <dbl [5]>
## 18 Y Tyrosine Tyr <dbl [5]>
## 19 V Valine Val <dbl [5]>
## 20 C Cysteine Cys <dbl [5]>
aminoAcidResidues()$getFormula("alanine")
## C H N O S
## 3 5 1 1 0
addFormulas(aminoAcidResidues()$getFormula("alanine"), c(H=2, O=1))
## C H N O S
## 3 7 1 2 0
Another way to do addition of molecules is the operator %f+%
aminoAcidResidues()$getFormula("alanine") %f+% waterFormula()
## C H N O S
## 3 7 1 2 0
Note: the waterFormula function simply generates the formula c(H=2, O=1)
Of course there is also an subtractFormulas (and matching operator: %f-%)
identical(aminoAcidResidues()$getFormula("alanine"), (alanine %f-% waterFormula()))
## [1] TRUE
The purpose of the R6 class chemicals is to create as a ‘storage space’ for info on amino acid residues (available via the function aminoAcidResidues), but can be used for other (classes of) compounds too, eg sugars, steroids, etc etc
All mass and mass/charge calculations are done using R6 class elements objects. Two are predefined and accessible via the elementsMonoisotpic and elementsAverage functions.
Please note that neither contains all elements, I just put in the ones I regularly use myself. If needed simply create a new elements object.
elementsMonoisotopic()
## name short mass
## 1 Carbon C 12.00000000
## 2 Hydrogen H 1.00782503
## 3 Nitrogen N 14.00307401
## 4 Oxygen O 15.99491462
## 5 Sulphur S 31.97207073
## 6 Phosphorus P 30.97376149
## 7 Bromine Br 78.91833790
## 8 Chlorine Cl 34.96885271
## 9 Fluorine F 18.99840320
## 10 Silicon Si 27.97692649
## 11 electron e 0.00054858
## 12 Sodium Na 22.98976966
## 13 Potassium K 38.96370690
## 14 Carbon13 13C 13.00335484
## 15 Nitrogen15 15N 15.00010897
## 16 Oxygen18 18O 17.99916040
## 17 Deuterium 2H 2.01410178
newElements <- elementsMonoisotopic()$clone()$addElement(shorts = "33S",
name = "Suplhur33",
mass = 32.97145854)
newElements
## name short mass
## 1 Carbon C 12.00000000
## 2 Hydrogen H 1.00782503
## 3 Nitrogen N 14.00307401
## 4 Oxygen O 15.99491462
## 5 Sulphur S 31.97207073
## 6 Phosphorus P 30.97376149
## 7 Bromine Br 78.91833790
## 8 Chlorine Cl 34.96885271
## 9 Fluorine F 18.99840320
## 10 Silicon Si 27.97692649
## 11 electron e 0.00054858
## 12 Sodium Na 22.98976966
## 13 Potassium K 38.96370690
## 14 Carbon13 13C 13.00335484
## 15 Nitrogen15 15N 15.00010897
## 16 Oxygen18 18O 17.99916040
## 17 Deuterium 2H 2.01410178
## 18 Suplhur33 33S 32.97145854
newElements$getMass("33S")
## [1] 32.97146
To calculate the neutral mass of a molecule, use formulaToMass
caffeine <- c(C=8, H=10, N=4, O=2)
caffeine
## C H N O
## 8 10 4 2
# mono isotopic mass
formulaToMass(caffeine)
## [1] 194.0804
# to see more digits than R will print by default
formulaToMass(caffeine) |> format(digits = 10)
## [1] "194.0803756"
# average mass
formulaToMass(caffeine, enviPat = TRUE, exact = FALSE)
## [1] 194.1898
For the protonated m/z (mass over charge), use massToMzH
formulaToMass(caffeine) |> massToMzH(charge = 1)
## [1] 195.0877
formulaToMass(caffeine) |> massToMzH(charge = 2)
## [1] 98.04746
To calculate the m/z when you have another adduct, use massToMz
# protonated form
formulaToMass(glucose) |> massToMzH(charge = 1)
## [1] 181.0707
# sodtum adducts
formulaToMass(glucose) |> massToMz(adducts = 1, adductFormula = c(Na=1), adductCharge = 1)
## [1] 203.0526
formulaToMass(glucose) |> massToMz(adducts = 2, adductFormula = c(Na=1), adductCharge = 1)
## [1] 113.0209
For some more complex calculations, such as relative istope patterns, please refer to the enviPat package. Here is an example of what can be done:
chloroform <- c(C=1, H = 1, Cl = 3)
# monoisotopic mass
chloroform |> formulaToMass() |> format(digits = 10, nsmall = 6)
## [1] "117.9143832"
# enviPat
library(enviPat)
data(isotopes)
ChloroformIsotopePattern <- isopattern(
isotopes = isotopes,
chemforms = chloroform |> formulaString(),
threshold=0.01,
plotit=FALSE, verbose = FALSE)
ChloroformIsotopePattern
## $C1H1Cl3
## m/z abundance 12C 13C 1H 2H 35Cl 37Cl
## [1,] 117.9144 100.00000000 1 0 1 0 3 0
## [2,] 118.9177 1.08157283 0 1 1 0 3 0
## [3,] 118.9207 0.01150132 1 0 0 1 3 0
## [4,] 119.9114 95.98732841 1 0 1 0 2 1
## [5,] 120.9148 1.03817286 0 1 1 0 2 1
## [6,] 120.9177 0.01103981 1 0 0 1 2 1
## [7,] 121.9085 30.71189071 1 0 1 0 1 2
## [8,] 122.9118 0.33217147 0 1 1 0 1 2
## [9,] 123.9055 3.27550260 1 0 1 0 0 3
## [10,] 124.9089 0.03542695 0 1 1 0 0 3
# as a picture
ChloroformEnvelope <- envelope(
pattern = ChloroformIsotopePattern,
resolution = 1E3,
plotit = FALSE,
verbose = FALSE)
plot(ChloroformEnvelope[[1]], type = "l")
# same for glucose
glucoseIsotopePattern <- isopattern(
isotopes = isotopes,
chemforms = glucose |> formulaString(),
threshold=0.01,
plotit=FALSE, verbose = FALSE)
glucoseIsotopePattern
## $C6H12O6
## m/z abundance 12C 13C 1H 2H 16O 18O 17O
## [1,] 180.0633881 100.00000000000 6 0 12 0 6 0 0
## [2,] 181.0667429 6.48943697564 5 1 12 0 6 0 0
## [3,] 181.0676050 0.22855538960 6 0 12 0 5 0 1
## [4,] 181.0696648 0.13801587183 6 0 11 1 6 0 0
## [5,] 182.0676339 1.23299618072 6 0 12 0 5 1 0
## [6,] 182.0700978 0.17546996775 4 2 12 0 6 0 0
## [7,] 182.0709598 0.01483195796 5 1 12 0 5 0 1
## [8,] 183.0709887 0.08001451006 5 1 12 0 5 1 0
# as a picture
glucoseEnvelope <- envelope(
pattern = glucoseIsotopePattern,
resolution = 1E3,
plotit = FALSE,
verbose = FALSE)
plot(glucoseEnvelope[[1]], type = "l")
Peptides are a ‘string’ of amino acid residues and essentially a ‘class’ of chemicals. However one has to take into account that some of the amino acid residues may be modified. These modifications can be almost anything it sometimes seems… there are naturally occuring ones such as phosporylation, methylation, glycosylation, but also chemically induced ones such as alkylation or conjugation to all kinds of chemicals. Apart from those, there are also so-called artefacts such as oxidation which sometimes occurs during isolation, storage or preparation of proteins.
If no modifications are present, there are 2 easy fast functions to quickly figure out the formula and m/z of a peptide
# to figure out the chemical formula of the neutral peptide
peptideFormula("SAMPLER")
## C H N O S
## 33 58 10 11 1
# to calculate the m/z of a protonated peptide
peptideMzH("SAMPLER", charge = 1)
## [1] 803.408
peptideMzH("SAMPLER", charge = 2)
## [1] 402.2076382
No matter what the cause, every modification comes from a ‘chemical’ reaction where some atoms leave the peptide and some atoms are attached to it. Sometimes certain amino acids (eg alkylated cysteine) are all modified and sometimes some of these amino acids are modified (eg oxidated Methionine). The first type is called fixed while the other named variable. To calculate mass or (m/z) of peptide with only fixed modifications is relatively straightforward: simply subtract x times what is lost and add x times what is gained. Variable modifications are a bit more tricky: we have to calculate all variants of the peptide with/without modifications in some of the positions.
In the package there’s a separate R6 class (not surprisingly called modifications) for ‘storage’ of these changes that can be applied to peptides/amino acids. Creation of such an object with two fixed modifications and 1 variable modification.
aaModifications <- modifications$new()
aaModifications$addTable(
tibble::tibble(
name = c("Carbamidomethyl (C)",
"Oxidation (M)"),
position = c("C","M"),
fixed = c(TRUE,FALSE),
gain = list(c(C = 2, H = 4, N = 1, O = 1),
c(C = 0, H = 0, N = 0, O = 1)),
loss = list(c(protonFormula()),
c(emptyFormula())),
category = c("Cys-state","Preparation Artefact")
)
)
aaModifications
##
## Fixed Modifications
## # A tibble: 1 × 6
## name position fixed gain loss category
## <chr> <chr> <lgl> <list> <list> <chr>
## 1 Carbamidomethyl (C) C TRUE <dbl [4]> <dbl [1]> Cys-state
##
## Variable Modifications
## # A tibble: 1 × 6
## name position fixed gain loss category
## <chr> <chr> <lgl> <list> <list> <chr>
## 1 Oxidation (M) M FALSE <dbl [4]> <dbl [5]> Preparation Artefact
The first one is a fixed modification which will cause the modification to be applied always. The other one is a variable modification and will only be applied when specifically asked for. The gain & loss columns are formulas in the named numeric vector format as before.
Adding modifications to the table is possible one modification at a time or via a table of modifications
aaModifications$add(name = "Deamidation",
position = "NQ",
fixed = FALSE,
gain = c(O = 1, H = 1),
loss = c(N = 1, H =2),
category = "Preparation Artefact")
aaModifications
##
## Fixed Modifications
## # A tibble: 1 × 6
## name position fixed gain loss category
## <chr> <chr> <lgl> <list> <list> <chr>
## 1 Carbamidomethyl (C) C TRUE <dbl [4]> <dbl [1]> Cys-state
##
## Variable Modifications
## # A tibble: 2 × 6
## name position fixed gain loss category
## <chr> <chr> <lgl> <list> <list> <chr>
## 1 Oxidation (M) M FALSE <dbl [4]> <dbl [5]> Preparation Artefact
## 2 Deamidation NQ FALSE <dbl [2]> <dbl [2]> Preparation Artefact
Note: in calculations the variable modifications are used via the rownumber of the unordered table. In the following example Oxidation (M) is 1 and Deamidation (NQ) is 2.
library(dplyr)
aaModifications$table %>% dplyr::filter(!fixed)
## # A tibble: 2 × 6
## name position fixed gain loss category
## <chr> <chr> <lgl> <list> <list> <chr>
## 1 Oxidation (M) M FALSE <dbl [4]> <dbl [5]> Preparation Artefact
## 2 Deamidation NQ FALSE <dbl [2]> <dbl [2]> Preparation Artefact
Note: if two fixed modifications apply to the same amino acid(s), then they will both be applied! With variable modifications this is less likely to occur (as will be shown later) as they are applied per position in the sequence in stead of per amino acid.
There is an example modifications object which contains a number of often used modifications. It should be noted however that for cysteine there are two fixed modifications. To be able to use this table, remove one (or both).
aminoAcidModifications()$table
## # A tibble: 6 × 6
## name position fixed gain loss category
## <chr> <chr> <lgl> <list> <list> <chr>
## 1 Carbamidomethyl (C) C TRUE <dbl [4]> <dbl [1]> Cys-state
## 2 Carboxymethyl (C) C TRUE <dbl [4]> <dbl [1]> Cys-state
## 3 Oxidation (M) M FALSE <dbl [4]> <dbl [5]> Preparation Artefact
## 4 Deamidated (NQ) NQ FALSE <dbl [2]> <dbl [2]> Preparation Artefact
## 5 Oxidation (W) W FALSE <dbl [1]> <dbl [5]> Preparation Artefact
## 6 Oxidation Double (W) W FALSE <dbl [1]> <dbl [5]> Preparation Artefact
# remove one of the fixed modifications for cysteine
aminoAcidModifications()$table[-2,]
## # A tibble: 5 × 6
## name position fixed gain loss category
## <chr> <chr> <lgl> <list> <list> <chr>
## 1 Carbamidomethyl (C) C TRUE <dbl [4]> <dbl [1]> Cys-state
## 2 Oxidation (M) M FALSE <dbl [4]> <dbl [5]> Preparation Artefact
## 3 Deamidated (NQ) NQ FALSE <dbl [2]> <dbl [2]> Preparation Artefact
## 4 Oxidation (W) W FALSE <dbl [1]> <dbl [5]> Preparation Artefact
## 5 Oxidation Double (W) W FALSE <dbl [1]> <dbl [5]> Preparation Artefact
# to create a new modification table without the second fixed modification
newModifications <- modifications$new()$addTable(aminoAcidModifications()$table[-2,])
newModifications
##
## Fixed Modifications
## # A tibble: 1 × 6
## name position fixed gain loss category
## <chr> <chr> <lgl> <list> <list> <chr>
## 1 Carbamidomethyl (C) C TRUE <dbl [4]> <dbl [1]> Cys-state
##
## Variable Modifications
## # A tibble: 4 × 6
## name position fixed gain loss category
## <chr> <chr> <lgl> <list> <list> <chr>
## 1 Oxidation (M) M FALSE <dbl [4]> <dbl [5]> Preparation Artefact
## 2 Deamidated (NQ) NQ FALSE <dbl [2]> <dbl [2]> Preparation Artefact
## 3 Oxidation (W) W FALSE <dbl [1]> <dbl [5]> Preparation Artefact
## 4 Oxidation Double (W) W FALSE <dbl [1]> <dbl [5]> Preparation Artefact
It doesn’t matter that there two different modifications for tryptophan: they’re variable and can only be used one at a time on an amino acid position (by using variable modification 2 or 3). This is done via a so called modification string which is a character vector of the same length as the peptide sequence.
# no modifications
testPeptide1 <- peptide$new(sequence = "SAMPLER")
testPeptide1
## Sequence : SAMPLER
## Variable modifications : NA
## Modifications Table : NA
# with modifications, but none are applied to the peptide
testPeptide2 <- peptide$new(sequence = "SAMPLER",
modificationTable = newModifications$table,
variableModifications = "0000000")
testPeptide2
## Sequence : SAMPLER
## Variable modifications : 0000000
## Modifications Table :
## # A tibble: 5 × 6
## name position fixed gain loss category
## <chr> <chr> <lgl> <list> <list> <chr>
## 1 Carbamidomethyl (C) C TRUE <dbl [4]> <dbl [1]> Cys-state
## 2 Oxidation (M) M FALSE <dbl [4]> <dbl [5]> Preparation Artefact
## 3 Deamidated (NQ) NQ FALSE <dbl [2]> <dbl [2]> Preparation Artefact
## 4 Oxidation (W) W FALSE <dbl [1]> <dbl [5]> Preparation Artefact
## 5 Oxidation Double (W) W FALSE <dbl [1]> <dbl [5]> Preparation Artefact
# with modifications, oxidation present on third amino acid
testPeptide3 <- peptide$new(sequence = "SAMPLER",
modificationTable = newModifications$table,
variableModifications = "0010000")
testPeptide3
## Sequence : SAMPLER
## Variable modifications : 0010000
## Modifications Table :
## # A tibble: 5 × 6
## name position fixed gain loss category
## <chr> <chr> <lgl> <list> <list> <chr>
## 1 Carbamidomethyl (C) C TRUE <dbl [4]> <dbl [1]> Cys-state
## 2 Oxidation (M) M FALSE <dbl [4]> <dbl [5]> Preparation Artefact
## 3 Deamidated (NQ) NQ FALSE <dbl [2]> <dbl [2]> Preparation Artefact
## 4 Oxidation (W) W FALSE <dbl [1]> <dbl [5]> Preparation Artefact
## 5 Oxidation Double (W) W FALSE <dbl [1]> <dbl [5]> Preparation Artefact
Peptides are objects and as such have a set of functions that can be called:
# formula of peptide
testPeptide1$formula()
## C H N O S
## 33 58 10 11 1
# mass of peptide
testPeptide1$mass()
## [1] 802.4007235
# m/z of peptide sodium adduct
testPeptide1$mz(adducts = 1, adductFormula = c(Na=1))
## [1] 825.3899446
# m/z of the double-charged (protonated) peptide
testPeptide1$mzH(charge = 2)
## [1] 402.2076382
An important function available for every object of class peptide is the fragments: it allows the automatic calculation of expected fragments when performing some form of peptide fragmentation. An online tool that does a similair thing is protein prospector’s MS-Product
testPeptide3$fragments()
## # A tibble: 7 × 13
## aions aionsH2O aions…¹ bions bions…² bions…³ bions…⁴ cions xions yions yions…⁵
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 60.0 42.0 43.0 88.0 70.0 71.0 106. 105. 201. 175. 157.
## 2 131. 113. 114. 159. 141. 142. 177. 176. 330. 304. 286.
## 3 278. 260. 261. 306. 288. 289. 324. 323. 443. 417. 399.
## 4 375. 357. 358. 403. 385. 386. 421. 420. 540. 514. 496.
## 5 488. 470. 471. 516. 498. 499. 534. 533. 687. 661. 643.
## 6 617. 599. 600. 645. 627. 628. 663. 662. 758. 732. 714.
## 7 773. 755. 756. 801. 783. 784. 819. 818. 845. 819. 801.
## # … with 2 more variables: yionsNH3 <dbl>, zions <dbl>, and abbreviated
## # variable names ¹aionsNH3, ²bionsH2O, ³bionsNH3, ⁴bionsH2Oadd, ⁵yionsH2O
# only b-ions, doubly charged
testPeptide3$fragments(chargeState = 2)$bions
## [1] 44.52329066 80.04184755 153.55954719 202.08592911 258.62796110
## [6] 323.14925765 401.19981317
# to get the formulas of the doubly charged y-ions (first 3 displayed)
testPeptide3$fragments(chargeState = 2, returnFormulas = T)$yions[1:3]
## [[1]]
## C H N O S
## 6 14 4 2 0
##
## [[2]]
## C H N O S
## 11 21 5 5 0
##
## [[3]]
## C H N O S
## 17 32 6 6 0
# to get the expected immonium ions (modifications are NOT taken into account)
testPeptide3$fragments.immoniumIons()
## [1] 60.0444 70.0651 86.0964 87.0917 100.0869 102.0550 104.0528 112.0869
## [9] 126.0550
Please note that these are essentially dumb calculations: the code doesn’t look if a fragment is possible at all! For example b+H2O ions only appear when certain sequence ‘conditions’ are met.
Obviously this package is not unique in attempting to work with chemical formulas. Other notable packages include:
Please note that some results from calculations in these packages can differ slightly from each other (usually after the nth digit)
An example of how to work with enviPat was presented here.
For the rcdk package there is one specific function which helps ‘translating’ an cdkFormula object to a named numeric vector, ie a formula as this package can work with.
Obviously this package is a work in progress. A lot more testing is still required. Serious errors and bugs are corrected as soon as possible.
Contact info: benbruyneel@gmail.com