Introduction

Since the initial release of the package there has been some debugging & changes to the original functions and a set of new instructions have been developed. Most of the data structures/tables in the .pdResult databases used by this package have been ‘reverse engineered’, ie I looked what Proteome Discoverer produces and attempt to figure out how this relates to what’s in the file on a database level. Some functions have a tendency to be slow because of repeated database calls, especially when you access the database over a low bandwidth connection.

This package remains a work in progress. My first focus at the moment is both debugging and streamlining the code. Because some code is not really ready to be included into the package or maybe should even stay out due to eg being very specific (not general enough). Some of this code I’ve put in the repository proteinDiscoverExtra

Another important thing that needs to be worked on is some form of package tests, but due to the size of .pdResult (test) files, I’m still looking on how to do this exactly.

Back to top

Changes to existing functions

Only a single major one. Most changes are updated/corrected manuals, etc.

  • The way so-called special columns are ‘translated’ from raw data format () has changed. They used to be translated into separate logical/boolean columns. Now they are simply transformed into strings of 0’s & 1’s. Where 0 = FALSE and 1 = TRUE (eg: “0110000110001”). The reason for this change is that the previous method gave rise to a lot of (extra) columns that were kind of hard to work with. This applies to the columns:

    • AspectBiologicalProcess
    • AspectCellularComponent
    • AspectMolecularFunction

New functions

Protein Grouping

Name Description
dbGetProteinGroups get Protein Group information
dbGetProteins get protein table/info based on their UniqueSequenceID
dbGetProteinUniqueSequenceIDs get protein UniqueSequenceIDs via the Accession
dbGetProteinGroupIDs retrieve the ProteinGroupID of proteins via their UniqueSequenceID
dbGetProteinIDs obtain the UniqueSequenceID for all proteins belonging to a Protein Group

Protein Annotation

(work in progress)

Name Description
dbGetProteinAnnotationGroupIDs get the Protein Annotations based on the UniqueSequenceID of proteins
dbGetAnnotatedProteins retrieve UniqueSequenceID of proteins having an Annotation
dbGetAnnotationGroups get Protein Annotation info table for the Annotation ID’s
dbGetAnnotationGroupsFiltered Obtain Protein Annotation info table via the GroupAnnotationAccession or description

Modifications

Note that to use these functions the modification table needs to be present in the .pdResult file (use Modifications node in Proteome Discoverer)

Name Description
dbGetModificationsSitesIDs get the modificationSite ID’s from (a set of) protein ID’s
dbGetModificationsTable get data from the ModificationSides table using the modificiationSiteId’s
dbGetModificationPeptideIDs obtain the peptideID’s ‘belonging’ to a modification site

Workflows

Name Description
allNodesTable Allows a parameters from all nodes in a workflow to be put in a single (big) ordered table

TMT quantitation

Work in progress: some of these functions may still change substantially or even disappear in the future. Functions marked with status R will probably be removed from exports in later versions because they have no real function outside the package. M marked functions may be removed, but this is not certain. Everything works but may still change significantly.

Name Description Status
calcData helper function to calculate a row-wise function (like mean, median etc) R
knockOutProteins helper function to generate an (example) data.frame of protein info for other functions M
tmt11Channels helper function to generate data.frame with info on TMT11 triple knockout digest sample M
tmt10Channels same as TMT11Channels but now for TMT10 triple knockout digest sample M
getProteinInfoRaw get protein information based on their Accessions
getProteinInfo same as getProteinInfoRaw but does (automated) translation of raw columns
getPeptideInfoRaw get peptide information for proteins based on their Accessions
getPeptideInfo same as getPeptideRaw but does (automated) translation of raw columns
calcIFIs calculate the IFI (interference free index) of a protein
calcAllIFIs calculate the IFI for a set of (knock out) protein channels

Examples

A few examples of the use of these (new) functions can be found here: proteinDiscoverExtra. There are also a few examples of ‘extensions’ of the functions in the package.

Back to top