Staurosporine 2021 dataset (protein-level)
1 Introduction
In this file, we discuss the parameters for the Staurosporine 2021 dataset (Zinn et al. 2021).
The goal of this page is only to specify the choice of the parameters.txt
file, and the results won’t be discussed here (see (Le Sueur, Rattray, and Savitski 2024) for a discussion about the results).
2 Data loading and preprocessing.
The dataset can be downloaded from Zenodo .
The preprocessed dataset can be found in Staurosporine2021/data/Dataset_TMT11_2Rep2Cond_PearsonCorrelationAbove04.rds
.
Note: The code describing the preprocessing of the data for this analysis has not been added to the gitlab repository yet. The TMT11 dataset was used, keeping only proteins with 2 replicates in each condition, and a pearson correlation above 0.4 between replicates within a condition.
3 The complete parameters.txt
file for the three-level HGP model.
1"Scaling" : None,
2"modelType" : "3Levels_OneLengthscale_FixedLevels1and2_FreeLevels3",
"lengthscale_prior": None,
"lengthscale_MinConstraint" : "min",
"mean" : "gpytorch.means.ZeroMean()",
"control_condition": "Control",
3"training_iterations": 300 ,
"LearningRate" : 0.1,
"Amsgrad" : False,
"n_PredictionsPoints" : 50,
"PlotSave" : True,
"prediction_type" : "predicted_functions",
4"GPMelt_statistic" : "dataset-wise"
- 1
-
The original
Fold Change
values (found in the downloaded data) were kept, thusScaling
was chosen to beNone
. - 2
- We use only one lengthscale, because the data do not contain enough information (only two replicates and two conditions) to robustly fit a second lengthscale. We allow each replicate to have a different output-scale to capture larger variations observed in some noisier replicates.
- 3
- The data being relatively simple, we observed that 300 iterations were enough here. Using more iterations would not have been too intensive either.
- 4
-
A
dataset-wise
approximation has been used, with a number \(S_p=10\) sampled for each ID of each dataset.
References
Le Sueur, Cecile, Magnus Rattray, and Mikhail Savitski. 2024. “GPMelt: A Hierarchical Gaussian Process Framework to Explore the Dark Meltome of Thermal Proteome Profiling Experiments.” PLOS Computational Biology 20 (9): e1011632.
Zinn, Nico, Thilo Werner, Carola Doce, Toby Mathieson, Christine Boecker, Gavain Sweetman, Christian Fufezan, and Marcus Bantscheff. 2021. “Improved Proteomics-Based Drug Mechanism-of-Action Studies Using 16-Plex Isobaric Mass Tags.” Journal of Proteome Research 20 (3): 1792–1801.