Protein Annotation

An R-package for the display of protein information

Author

Ben Bruyneel

Published

March 4, 2024

Introduction

Aim of this package

The main reason for writing this package stems from my attempts to mine and visualize experimental protein data. It draws some inspiration from the drawProteins package, though the package here is more geared towards working with the data generated from protein digest experiments.

Since I work mainly with proteomics data coming from the Proteome Discoverer software of ThermoFisher Scientific, a lot of the package’s functions are geared to preparing/wrangling data/results coming from that particular software. It should be relatively easy to create similar functions for other data formats.

Installation instructions

Since the current version is still at the development stage, the package can only be installed via github:

devtools::install_github("BenBruyneel/proteinAnnotation")

General use

library(proteinAnnotation)

Test data

First we need to get the protein sequence data into a data.frame. The package contains the a function standardProtein which returns the amino acid sequence for either Ovalbumin (OVA) or Bovine Serum Albumin (BSA). This function is only a helper function for examples & testing.

# Amino acid sequence of Ovalbumin
standardProtein()

[1] "MGSIGAASMEFCFDVFKELKVHHANENIFYCPIAIMSALAMVYLGAKDSTRTQINKVVRFDKLPGFGDSIEAQCGTSVNVHSSLRDILNQITKPNDVYSFSLASRLYAEERYPILPEYLQCVKELYRGGLEPINFQTAADQARELINSWVESQTNGIIRNVLQPSSVDSQTAMVLVNAIVFKGLWEKAFKDEDTQAMPFRVTEQESKPVQMMYQIGLFRVASMASEKMKILELPFASGTMSMLVLLPDEVSGLEQLESIINFEKLTEWTSSNVMEERKIKVYLPRMKMEEKYNLTSVLMAMGITDVFSSSANLSGISSAESLKISQAVHAAHAEINEAGREVVGSAEAGVDAASVSEEFRADHPFLFCIKHIATNAVLFFGRCVSP"

Another helper function with some test data is the OVATable function which generates a protein table or a peptide table with some (fake) experimental data. The format of the these data.frames is the same as what comes out of Proteome Discoverer. The names of the columns may differ a bit depending on how you get the data out of that program. What is generated here is what you get when you use the proteinDiscover package. The protein/peptide data is all related to Ovalbumin.

library(dplyr)
library(knitr)
# Protein Table
OVATable() |> tibble()

# A tibble: 1 × 3
  Sequence                                            Accession SequenceCoverage
  <chr>                                               <chr>     <chr>           
1 GSIGAASMEFCFDVFKELKVHHANENIFYCPIAIMSALAMVYLGAKDSTR… P01012    385;0-380

# Peptide table
OVATable("peptide") |> head() |> kable()

Sequence	Modifications	PositionsinProteins	XCorrbySearchEngine_1	Abundances_1	Abundances_2
ISQAVHAAHAEINEAGR		P00000 [323-339]; P01012 [323-339]	7.746809	6922540038	7817180355
ADHPFLFCIK	1×Carbamidomethyl [C8]	P00000 [360-369]; P01012 [360-369]	4.090933	6459800097	11641298738
AFKDEDTQAMPFR		P00000 [187-199]; P01012 [187-199]	5.751740	5561482511	8982076379
LTEWTSSNVMEER		P00000 [264-276]; P01012 [264-276]	5.085020	4562266198	5516495806
GGLEPINFQTAADQAR		P00000 [127-142]; P01012 [127-142]	5.391547	4475159720	5717628078
YPILPEYLQCVK	1×Carbamidomethyl [C10]	P00000 [111-122]; P01012 [111-122]	5.270951	2387118310	5403322654

Protein data

Create protein data

createProteinData is an important function in this package because it takes a protein sequence and creates a data.frame which can then be used for the rest of available functions.

createProteinData(sequence = standardProtein(), start = 2, nterm = 1) |> head()

  position sequence
1        1         
2        2        G
3        3        S
4        4        I
5        5        G
6        6        A

createProteinData(sequence = standardProtein("bsa"), start = 25) |> head()

  position sequence
1       25        D
2       26        T
3       27        H
4       28        K
5       29        S
6       30        E

Protein Data manipulation

These data.frame’s can be modified to add data. This can be both numerical and non-numerical data.

An example of adding all positions in the protein where the amino acid lysine (K) is present:

ovaProtein <- createProteinData(sequence = OVATable("protein")$Sequence)
ovaProtein$K <- ifelse(ovaProtein$sequence == "K", "Lysine", " ")
ovaProtein |> slice(15:25)

   position sequence      K
1        15        F       
2        16        K Lysine
3        17        E       
4        18        L       
5        19        K Lysine
6        20        V       
7        21        H       
8        22        H       
9        23        A       
10       24        N       
11       25        E

Another way of adding data to a protein data data.frame is using the addDataToProtein function. It takes a data.frame with a set of positions and the data to put ‘onto’ those positions.

# find all positions of K or R in the protein
trypticSites <- data.frame(position = ovaProtein$position[
 ovaProtein$sequence == "K" | ovaProtein$sequence == "R"],
 data = "tryptic")
trypticSites |> head()

  position    data
1       16 tryptic
2       19 tryptic
3       46 tryptic
4       50 tryptic
5       55 tryptic
6       58 tryptic

# add the new data to the protein data
ovaProtein <- addDataToProtein(proteinDF = ovaProtein, dataframe = trypticSites,
 dataColumn = "data", dataName = "trypticSite", NAValue = "")
ovaProtein[276:285,]

    position sequence      K trypticSite
276      276        R            tryptic
277      277        K Lysine     tryptic
278      278        I                   
279      279        K Lysine     tryptic
280      280        V                   
281      281        Y                   
282      282        L                   
283      283        P                   
284      284        R            tryptic
285      285        M

Obviously the tryptic sites as determined by the above code leave out that if Lysine (K) or Arginine (R) follows a Proline (P) it will not be digested (by Trypsin). Of course you can code that in a somewhat more elaborate piece of R code, but the function getPeptideStart can help as it allows to find patterns in a protein sequence and regular expressions can be used!

Note: the data to be added can be of any class!

# find all tryptic sites w/o a preceding Proline (P)
trypticSites <- getPeptideStart(proteinSequence = OVATable("protein")$Sequence,
                                peptideSequence = "(?<!P)R|(?<!P)K")
trypticSites |> head()

[1] 16 19 46 50 55 58

ovaProtein <- addDataToProtein(proteinDF = ovaProtein,
                               dataframe = data.frame(position = trypticSites,
                                                      data = "tryptic"),
                               dataColumn = "data",
                               dataName = "realTrypticSite", NAValue = "non tryptic")
ovaProtein[276:285,]

    position sequence      K trypticSite realTrypticSite
276      276        R            tryptic         tryptic
277      277        K Lysine     tryptic         tryptic
278      278        I                        non tryptic
279      279        K Lysine     tryptic         tryptic
280      280        V                        non tryptic
281      281        Y                        non tryptic
282      282        L                        non tryptic
283      283        P                        non tryptic
284      284        R            tryptic     non tryptic
285      285        M                        non tryptic

Adding peptide information

Data can also be added via the mapPeptidesToProtein function. For this function you need the information to be added in a data.frame with three required columns: Sequence, Peptide Position and the data to be added. The Sequence and Peptide Position are needed to figure out the positions of the amino acids in the protein sequence. The peptide position coming from Proteome Discoverer has a certain format: <Accession> [start-end] ; .... This is used to extract the start position. An example of this info:

# info needed to map information to protein position
OVATable("peptide") %>%
  select(Sequence, PositionsinProteins) %>%
  slice(1:5) %>% tibble()

# A tibble: 5 × 2
  Sequence          PositionsinProteins               
  <chr>             <chr>                             
1 ISQAVHAAHAEINEAGR P00000 [323-339]; P01012 [323-339]
2 ADHPFLFCIK        P00000 [360-369]; P01012 [360-369]
3 AFKDEDTQAMPFR     P00000 [187-199]; P01012 [187-199]
4 LTEWTSSNVMEER     P00000 [264-276]; P01012 [264-276]
5 GGLEPINFQTAADQAR  P00000 [127-142]; P01012 [127-142]

# peptide information in the data.frame()
OVATable("peptide") %>%
  select(-c(Sequence, PositionsinProteins)) %>%
  slice(1:5) %>% tibble()

# A tibble: 5 × 4
  Modifications            XCorrbySearchEngine_1 Abundances_1 Abundances_2
  <chr>                                    <dbl>        <dbl>        <dbl>
1 ""                                        7.75   6922540038   7817180355
2 "1×Carbamidomethyl [C8]"                  4.09   6459800097  11641298738
3 ""                                        5.75   5561482511   8982076379
4 ""                                        5.09   4562266198   5516495806
5 ""                                        5.39   4475159720   5717628078

To map the info in the data.frame to the protein data, we can use:

# mapping Abundance information
ovaProtein <- mapPeptidesToProtein(proteinDF = ovaProtein, peptideTable = OVATable("peptide"),
 Accession = "P01012", positionColumn = "PositionsinProteins", variable = "Abundances_1",
 dataName = "Abundance", combineFunction = sum)

# mapping Score information
ovaProtein <- mapPeptidesToProtein(proteinDF = ovaProtein, peptideTable = OVATable("peptide"),
 Accession = "P01012", positionColumn = "PositionsinProteins", variable = "XCorrbySearchEngine_1",
 dataName = "Score", combineFunction = max)

ovaProtein[35:45,] |> tibble()

# A tibble: 11 × 7
   position sequence K     trypticSite realTrypticSite Abundance Score
      <int> <chr>    <chr> <chr>       <chr>               <dbl> <dbl>
 1       35 M        " "   ""          non tryptic             0  0   
 2       36 S        " "   ""          non tryptic             0  0   
 3       37 A        " "   ""          non tryptic             0  0   
 4       38 L        " "   ""          non tryptic             0  0   
 5       39 A        " "   ""          non tryptic      45376732  2.06
 6       40 M        " "   ""          non tryptic      45376732  2.06
 7       41 V        " "   ""          non tryptic      45376732  2.06
 8       42 Y        " "   ""          non tryptic      45376732  2.06
 9       43 L        " "   ""          non tryptic      45376732  2.06
10       44 G        " "   ""          non tryptic      45376732  2.06
11       45 A        " "   ""          non tryptic      45376732  2.06

As can be seen, position 35-38 was not present in any of the peptides in the peptide information, so both Score and Abundance are both zero. Position 39-45 were present, so they get the score and abundances from the peptides that contain them. They way the function in the above code-block was called:

Score at a position = maximum XCorr score of all peptides containing the position
Abundance at a position = sum of all peptides containing the position

Displaying protein data

Plotting protein data

Obviously base R can be used to get some insights into the peptide info across protein positions

par(mfrow = c(2,1))
plot(ovaProtein$position, ovaProtein$Abundance, type = "l",
     main = "Peptide Abundance across the protein",
     xlab = "Position", ylab = "Abundance")
plot(ovaProtein$position, ovaProtein$Score, type = "l",
     main = "Peptide Score across the protein",
     xlab = "Position", ylab = "Score")

par(mfrow = c(1,1))

To make a bit prettier graphs, the function plotProteinVariable can be used.

plotProteinVariable(proteinDF = ovaProtein, column = "Score", yLabel = "Score",
                    title = "Peptide Score across the protein")

plotProteinVariable(proteinDF = ovaProtein, column = "Abundance",
                    yLabel = "Abundance (log10)", yLog = TRUE,
                    ylogSetValue = 1E4, yLimits = c(1E3, 2E10),
                    drawYzero = TRUE,  zeroType = "dotted", zeroAlpha = 0.5,
                    title = "Peptide Abundance across the protein",
                    caption = "Due to log10 transformation all 'zero' values were set to 10000")

Protein coverage

proteinCoverage is a special function which can map peptide (or protein) information on to protein data. It takes a similar data.frame as mapPeptidesToProtein, but uses only the Sequence and peptide position information to setup a so called coverage column which specifies positions in the protein are found in the peptide information data.frame. Example

proteinCoverage(sequence = standardProtein(),
                peptideTable = OVATable("peptide"),
                start = 2, nterm = 1,
                Accession = "P01012", positionColumn = "PositionsinProteins") %>%
  slice(56:65)

   position sequence coverage
1        56        K        0
2        57        V        0
3        58        V        0
4        59        R        1
5        60        F        1
6        61        D        1
7        62        K        1
8        63        L        1
9        64        P        1
10       65        G        1

As can be seen, position 56 to 58 were ‘tagged’ with a 0 indicating none of the peptides in the peptide table contained these protein positions. The positions marked 1 were found in at least one of the peptides.

ovaCoverage <- proteinCoverage(sequence = OVATable("protein")$Sequence,
                               peptideTable = OVATable("peptide"),
                               Accession = "P01012", positionColumn = "PositionsinProteins")
ovaProtein <- bind_cols(ovaProtein,
                        ovaCoverage %>% select(coverage))
ovaProtein[56:65,] %>% tibble()

# A tibble: 10 × 8
   position sequence K      trypticSite realTrypticSite Abundance Score coverage
      <int> <chr>    <chr>  <chr>       <chr>               <dbl> <dbl> <chr>   
 1       56 V        " "    ""          non tryptic             0   0   0       
 2       57 V        " "    ""          non tryptic             0   0   0       
 3       58 R        " "    "tryptic"   tryptic                 0   0   0       
 4       59 F        " "    ""          non tryptic      78966057  10.7 1       
 5       60 D        " "    ""          non tryptic      78966057  10.7 1       
 6       61 K        "Lysi… "tryptic"   tryptic          78966057  10.7 1       
 7       62 L        " "    ""          non tryptic     135071761  11.0 1       
 8       63 P        " "    ""          non tryptic     135071761  11.0 1       
 9       64 G        " "    ""          non tryptic     135071761  11.0 1       
10       65 F        " "    ""          non tryptic     135071761  11.0 1

Displaying protein Info

There’s two main functions for this:

displayProtein for graphical display (via ggplot2’s geom_tile function)
displayProteinText for displaying the protein information as colored text in the console or in HTML format

Examples of displaying protein information in different ways…

displayProtein(ovaProtein, columns = 50, 
               textColorColumn = "coverage", textColors = c("black","red"))

For Quarto/R Markdown you can’t really use displayProteinText except when using html-output with for example ‘html’ tables as with the gt library

displayProteinText & gt tables

coverageProtein <- displayProteinText(ovaProtein, columns = 50, 
                                      textColorColumn = "coverage", textColors = c("black","red"),
                                      HTMLoutput = TRUE)
trypticProtein <- ovaProtein %>%
  mutate(realTrypticSite = as.integer(realTrypticSite == "tryptic")) %>%
  displayProteinText(columns = 50,
                     textColorColumn = "realTrypticSite", textColors = c("black","red"),
                     HTMLoutput = TRUE)
combinedProtein <- ovaProtein %>%
  mutate(realTrypticSite = as.integer(realTrypticSite == "tryptic")) %>%
  displayProteinText(columns = 50,
                     textColorColumn = "coverage", textColors = c("black","red"),
                     textBackgroundColorColumn = "realTrypticSite",
                     textBackgroundColors = c("white","steelblue"),
                     HTMLoutput = TRUE)
library(gt)
data.frame(Description = c("Coverage",
                           "Tryptic sites",
                           "Coverage & Tryptic sites"),
           Sequence = c(coverageProtein,
                        trypticProtein,
                        combinedProtein)) %>%
  gt::gt() %>%
  gt::fmt_markdown(columns = Sequence) %>%
  gt::tab_style(
    style = list(
      gt::cell_text(font = "courier")
    ),
    locations = gt::cells_body(
      columns = "Sequence"
    )
  )

Description	Sequence
Coverage	GSIGAASMEFCFDVFKELKVHHANENIFYCPIAIMSALAMVYLGAKDSTR TQINKVVRFDKLPGFGDSIEAQCGTSVNVHSSLRDILNQITKPNDVYSFS LASRLYAEERYPILPEYLQCVKELYRGGLEPINFQTAADQARELINSWVE SQTNGIIRNVLQPSSVDSQTAMVLVNAIVFKGLWEKAFKDEDTQAMPFRV TEQESKPVQMMYQIGLFRVASMASEKMKILELPFASGTMSMLVLLPDEVS GLEQLESIINFEKLTEWTSSNVMEERKIKVYLPRMKMEEKYNLTSVLMAM GITDVFSSSANLSGISSAESLKISQAVHAAHAEINEAGREVVGSAEAGVD AASVSEEFRADHPFLFCIKHIATNAVLFFGRCVSP
Tryptic sites	GSIGAASMEFCFDVFKELKVHHANENIFYCPIAIMSALAMVYLGAKDSTR TQINKVVRFDKLPGFGDSIEAQCGTSVNVHSSLRDILNQITKPNDVYSFS LASRLYAEERYPILPEYLQCVKELYRGGLEPINFQTAADQARELINSWVE SQTNGIIRNVLQPSSVDSQTAMVLVNAIVFKGLWEKAFKDEDTQAMPFRV TEQESKPVQMMYQIGLFRVASMASEKMKILELPFASGTMSMLVLLPDEVS GLEQLESIINFEKLTEWTSSNVMEERKIKVYLPRMKMEEKYNLTSVLMAM GITDVFSSSANLSGISSAESLKISQAVHAAHAEINEAGREVVGSAEAGVD AASVSEEFRADHPFLFCIKHIATNAVLFFGRCVSP
Coverage & Tryptic sites	GSIGAASMEFCFDVFKELKVHHANENIFYCPIAIMSALAMVYLGAKDSTR TQINKVVRFDKLPGFGDSIEAQCGTSVNVHSSLRDILNQITKPNDVYSFS LASRLYAEERYPILPEYLQCVKELYRGGLEPINFQTAADQARELINSWVE SQTNGIIRNVLQPSSVDSQTAMVLVNAIVFKGLWEKAFKDEDTQAMPFRV TEQESKPVQMMYQIGLFRVASMASEKMKILELPFASGTMSMLVLLPDEVS GLEQLESIINFEKLTEWTSSNVMEERKIKVYLPRMKMEEKYNLTSVLMAM GITDVFSSSANLSGISSAESLKISQAVHAAHAEINEAGREVVGSAEAGVD AASVSEEFRADHPFLFCIKHIATNAVLFFGRCVSP

To display the coverage and tryptic sites with displayProtein

displayProtein(ovaProtein, columns = 50, 
               textColorColumn = "coverage", textColors = c("black","red"),
               fillColorColumn = "realTrypticSite",
               fillColors = c("white", "steelblue"), replaceFillNA = "non tryptic",
               fillColorLevels = c("non tryptic","tryptic"), fillLegendTitle = "Tryptic sites")

Originally the background color of the letters (tiles) was meant to be of a continous type (like a gradient)

displayProtein(ovaProtein, columns = 50, 
               textColorColumn = "coverage", textColors = c("black","red"),
               fillColorColumn = "Score",
               fillColors = c("white", "blue"), alpha = 0.55,
               fillLegendTitle = "Score")

displayProtein(ovaProtein, columns = 50, 
               textColorColumn = "coverage", textColors = c("black","red"),
               fillColorColumn = "Abundance",
               fillColors = c("white", "blue"), alpha = 0.55,
               fillLegendTitle = "Abundance")

Since the range of Abundance is rather big, the ‘fillColor’ can be made logarithmic to show lower ‘levels’ of Abundance a bit better.

displayProtein(ovaProtein, columns = 50, 
               textColorColumn = "coverage", textColors = c("black","red"),
               fillColorColumn = "Abundance",
               fillColors = c("white", "blue"), alpha = 0.55,
               fillTransformation = "log10",
               fillLegendTitle = "Abundance")

Modifications

To display modifications, there are a number of options. A few are displayed below.

This code shows, based on the example peptide table, where there are ‘Carbamidomethyl’ (C) or “Phospho” (S/T/Y) modifications. The column clear shows if the position is known:

means exact position known
exact position not known, the actual location may be another position (amino acid) usually close by
combination of 1 & 2: there are peptide hits with exact known positions and there are those where it’s not clear where the modifications is located precisely

getAllModPositionInProtein(peptideTable = OVATable("peptide"),
                           Accession = "P01012",
                           whichModification = c("Carbamidomethyl","Phospho"),
                           positionColumn = "PositionsinProteins",
                           sumPositions = TRUE)

$Carbamidomethyl
  position clear
1       73     1
2      120     1
3      367     1

$Phospho
  position clear
1       68     3
2       75     2
3       76     2
4       81     2
5       82     2
6      344     1

To use the info we must map the info onto the protein data

newData <- getAllModPositionInProtein(peptideTable = OVATable("peptide"),
                           Accession = "P01012",
                           whichModification = "Phospho",
                           positionColumn = "PositionsinProteins",
                           sumPositions = TRUE)
ovaProtein <- addDataToProtein(proteinDF = ovaProtein,
                               dataframe = newData$Phospho,
                               dataColumn = "clear", dataName = "Phospho", NAValue = "0")

ovaProtein %>%
  select(position, sequence, coverage, Phospho) %>%
  slice(66:75) %>%
  tibble()

# A tibble: 10 × 4
   position sequence coverage Phospho
      <int> <chr>    <chr>    <chr>  
 1       66 G        1        0      
 2       67 D        1        0      
 3       68 S        1        3      
 4       69 I        1        0      
 5       70 E        1        0      
 6       71 A        1        0      
 7       72 Q        1        0      
 8       73 C        1        0      
 9       74 G        1        0      
10       75 T        1        2

And to display it:

displayProtein(ovaProtein, columns = 50, 
               textColorColumn = "coverage", textColors = c("black","red"),
               fillColorColumn = "Phospho",
               fillColors = c("white", "lightblue","steelblue","blue"),
               fillColorLevels = c("0","1","2","3"), replaceFillNA = "0",
               fillColorLabels = c("No label","Clear", "Unclear", "Clear & Unclear"),
               fillLegendTitle = "Phosphorylation sites")

Another way to show abundance and the modification positions at the same time:

ovaProtein$Phospho <- ifelse(ovaProtein$Phospho == "0", "", "*")

plotProteinVariable(proteinDF = ovaProtein, column = "Abundance",
                    yLabel = "Abundance (log10)", yLog = TRUE,
                    ylogSetValue = 1E4, yLimits = c(1E3, 2E10),
                    drawYzero = TRUE,  zeroType = "dotted", zeroAlpha = 0.5,
                    title = "Peptide Abundance & Phosphorylation across the protein",
                    addSequence = TRUE, sequenceColumn = "Phospho",
                    textSize = 4, textColor = "blue",
                    caption = "Due to log10 transformation all 'zero' values were set to 10000",
                    )

3-D visualization

It’s also possible to map the information onto 3D displays of protein structures via the r3dmol library. Main difference here is that since ggplot2 is not used the colors need to be calculated beforehand and then mapped onto the protein structure. The proteinAnnotation package provides a set of color ‘translation’ functions.

library(r3dmol)
# get pdb file from AlphaFold: https://alphafold.ebi.ac.uk/entry/P01012
downloadedFile <- tempfile(fileext = ".pdb")
download.file("https://alphafold.ebi.ac.uk/files/AF-P01012-F1-model_v4.pdb", destfile = downloadedFile)
# generate a 3d object
p3d <- r3dmol(viewer_spec = m_viewer_spec(
  cartoonQuality = 20,
  lowerZoomLimit = 50,
  upperZoomLimit = 350
  )) %>%
  m_add_model(downloadedFile) %>%
  m_set_style(style = m_style_cartoon(arrows = F)) %>%
  m_zoom_to()
# display w/o modifications
p3d

Since the protein data we’ve worked with so far is actually missing the Methionine residue at the n-terminus (since what you work with in the lab usually is the mature version of a protein), we need to add it to the protein data

ovaProtein <- rbind(data.frame(position = 0, sequence = "M", K = " ", trypticSite = "",
                                realTrypticSite = "non tryptic", Abundance = 0,
                                Score = 0, coverage = 0, Phospho = "") , ovaProtein)
ovaProtein$position <- 1:386

To map the tryptic sites on the 3D structure:

colorFunction <- translateRangeToColors(rangeValues = "tryptic", insideColor = "red", outsideColor = "white", inbetween = FALSE)

ovaProtein$realTrypticSiteColor <- colorFunction(ovaProtein$realTrypticSite)

addStyleElement(p3d, proteinDF = ovaProtein, colorColumn = "realTrypticSiteColor", arrows = FALSE)

Of course we can still add styles/items to the 3D display

addStyleElement(p3d, proteinDF = ovaProtein, colorColumn = "realTrypticSiteColor", arrows = FALSE) %>%
  m_add_style(
    style = c(
      m_style_stick(colorScheme = "default")
    ),
    sel = m_sel(resi = ovaProtein$position[ovaProtein$realTrypticSite == "tryptic"])
  )

To map, as an example, the peptide abundance information we can use the translateToGradientColors function

linearColors <- translateToGradientColors(minValue = min(ovaProtein$Abundance),
                                          maxValue = max(ovaProtein$Abundance),
                                          colors = c("white", "darkblue"))
ovaProtein$AbundanceColors <- linearColors(ovaProtein$Abundance)

addStyleElement(p3d, proteinDF = ovaProtein, colorColumn = "AbundanceColors", arrows = FALSE)

To use a ‘logarithmic’ color scale:

logColors <- translateToGradientColors(minValue = min(ovaProtein$Abundance),
                                       maxValue = max(ovaProtein$Abundance),
                                       colors = c("white", "darkblue"),
                                       transformFunction = log10p)
ovaProtein$AbundanceLogColors <- logColors(ovaProtein$Abundance)

addStyleElement(p3d, proteinDF = ovaProtein, colorColumn = "AbundanceLogColors", arrows = FALSE)

Of course we can also map other data, such as the b-values which are present in the pdb file itself. The b-values are a measure for the model confidence of AlphaFold, see also: Ovalbumin AlphaFold entry. This code reproduces what is seen in the 3D viewer on the AlphaFold site:

pdbInfo <- bio3d::read.pdb(downloadedFile)

atoms <- as.data.frame(pdbInfo$atom) %>%
  dplyr::distinct(resno, .keep_all = T)

ovaProtein$b <- atoms$b

dfColors <- data.frame(rangeValues = NA,
                       insideColors = c("#0053D6", "#65CBF3", "#FFDB13", "#FF7D45"),
                       columnName = "b", inbetween = T)
dfColors$rangeValues <- list(c(90,Inf), c(70,90), c(50, 70), c(-Inf, 50))

ovaProtein$bColor <- translateRangesToColors(ranges = dfColors,
                                             proteinData = ovaProtein, outsideColor = "purple")

addStyleElement(p3d, proteinDF = ovaProtein, colorColumn = "bColor", arrows = FALSE)

And of course we can make the whole thing spin around:

addStyleElement(p3d, proteinDF = ovaProtein, colorColumn = "bColor", arrows = FALSE) %>%
  m_rotate(angle = 90, axis = "y") %>%
  m_spin(speed = 1)

Closing remarks

This package is a work in progress. One of the things that needs adjustment is the naming conventions of especially the function arguments.

I noticed myself that all functions are very dependent on using the right sequence, especially when creating the protein data. If care is not taken it may easily result in shifting of data. I intend to build some safeguards for this into the functions, but for now it’s recommended to double-check everything that comes out.

Contact Info: benbruyneel@gmail.com