proteinDiscover: Changes & New functions

Introduction

Since the initial release of the package there has been some debugging & changes to the original functions and a set of new instructions have been developed. Most of the data structures/tables in the .pdResult databases used by this package have been ‘reverse engineered’, ie I looked what Proteome Discoverer produces and attempt to figure out how this relates to what’s in the file on a database level. Some functions have a tendency to be slow because of repeated database calls, especially when you access the database over a low bandwidth connection.

This package remains a work in progress. My first focus at the moment is both debugging and streamlining the code. Because some code is not really ready to be included into the package or maybe should even stay out due to eg being very specific (not general enough). Some of this code I’ve put in the repository proteinDiscoverExtra

Another important thing that needs to be worked on is some form of package tests, but due to the size of .pdResult (test) files, I’m still looking on how to do this exactly.

Changes to existing functions

Only a single major one. Most changes are updated/corrected manuals, etc.

The way so-called special columns are ‘translated’ from raw data format () has changed. They used to be translated into separate logical/boolean columns. Now they are simply transformed into strings of 0’s & 1’s. Where 0 = FALSE and 1 = TRUE (eg: “0110000110001”). The reason for this change is that the previous method gave rise to a lot of (extra) columns that were kind of hard to work with. This applies to the columns:
- AspectBiologicalProcess
- AspectCellularComponent
- AspectMolecularFunction

New functions

Protein Grouping

Name	Description
dbGetProteinGroups	get Protein Group information
dbGetProteins	get protein table/info based on their UniqueSequenceID
dbGetProteinUniqueSequenceIDs	get protein UniqueSequenceIDs via the Accession
dbGetProteinGroupIDs	retrieve the ProteinGroupID of proteins via their UniqueSequenceID
dbGetProteinIDs	obtain the UniqueSequenceID for all proteins belonging to a Protein Group

Protein Annotation

(work in progress)

Name	Description
dbGetProteinAnnotationGroupIDs	get the Protein Annotations based on the UniqueSequenceID of proteins
dbGetAnnotatedProteins	retrieve UniqueSequenceID of proteins having an Annotation
dbGetAnnotationGroups	get Protein Annotation info table for the Annotation ID’s
dbGetAnnotationGroupsFiltered	Obtain Protein Annotation info table via the GroupAnnotationAccession or description

Modifications

Note that to use these functions the modification table needs to be present in the .pdResult file (use Modifications node in Proteome Discoverer)

Name	Description
dbGetModificationsSitesIDs	get the modificationSite ID’s from (a set of) protein ID’s
dbGetModificationsTable	get data from the ModificationSides table using the modificiationSiteId’s
dbGetModificationPeptideIDs	obtain the peptideID’s ‘belonging’ to a modification site

Workflows

Name	Description
allNodesTable	Allows a parameters from all nodes in a workflow to be put in a single (big) ordered table

TMT quantitation

Work in progress: some of these functions may still change substantially or even disappear in the future. Functions marked with status R will probably be removed from exports in later versions because they have no real function outside the package. M marked functions may be removed, but this is not certain. Everything works but may still change significantly.

Name	Description	Status
calcData	helper function to calculate a row-wise function (like mean, median etc)	R
knockOutProteins	helper function to generate an (example) data.frame of protein info for other functions	M
tmt11Channels	helper function to generate data.frame with info on TMT11 triple knockout digest sample	M
tmt10Channels	same as TMT11Channels but now for TMT10 triple knockout digest sample	M
getProteinInfoRaw	get protein information based on their Accessions
getProteinInfo	same as getProteinInfoRaw but does (automated) translation of raw columns
getPeptideInfoRaw	get peptide information for proteins based on their Accessions
getPeptideInfo	same as getPeptideRaw but does (automated) translation of raw columns
calcIFIs	calculate the IFI (interference free index) of a protein
calcAllIFIs	calculate the IFI for a set of (knock out) protein channels

Examples

A few examples of the use of these (new) functions can be found here: proteinDiscoverExtra. There are also a few examples of ‘extensions’ of the functions in the package.