Four datases used to benchmark GPMelt against other methods (protein-level)
1 Introduction
In this file, we discuss the parameters for the following datasets:
- Staurosporine 2014 (Savitski et al. 2014)
- Dasatinib (Savitski et al. 2014)
- ATP 2015 (Reinhard et al. 2015)
- Panobinostat (Franken et al. 2015)
The goal of this page is only to specify the choice of the parameters.txt
file, and the results won’t be discussed here (see (Le Sueur, Rattray, and Savitski 2024) for a discussion about the results).
2 Data loading and preprocessing.
The pre-processed data have been directly downloaded from the supplement material of (Childs et al. 2019).
These pre-processed data with named changed to match GPMelt nomenclature can be downloaded from Zenodo , as
GPMelt_results/GPMelt_inputDataset.rds
3 The complete parameters.txt
file for the three-level HGP model.
1"Scaling" : None,
2"modelType" : "3Levels_OneLengthscale_FixedLevels1and2_FreeLevels3",
"lengthscale_prior": None,
"lengthscale_MinConstraint" : "min",
"mean" : "gpytorch.means.ZeroMean()",
3"control_condition": "Conc_0",
4"training_iterations": 300 ,
"LearningRate" : 0.1,
"Amsgrad" : False,
"n_PredictionsPoints" : 50,
"PlotSave" : True,
"prediction_type" : "predicted_functions",
5"GPMelt_statistic" : "dataset-wise"
- 1
-
The original
Fold Change
values (found in the pre-processed data) were kept, thusScaling
was chosen to beNone
. - 2
- We use only one lengthscale, because the data do not contain enough information to robustly fit a second lengthscale (only two replicates and two conditions for all datasets but Dasatinib, which present two to four replicates and two to three conditions). We allow each replicate to have a different output-scale to capture larger variations observed in some noisier replicates (see for example Fig6A and B from (Le Sueur, Rattray, and Savitski 2024)).
- 3
-
For all datasets, the name
Conc_0
was used for the control condition. - 4
- The data being relatively simple, we observed that 300 iterations were enough here. Using more iterations would not have been too intensive either.
- 5
-
A
dataset-wise
approximation has been used, with a number \(S_p=10\) sampled for each ID of each dataset.
References
Childs, Dorothee, Karsten Bach, Holger Franken, Simon Anders, Nils Kurzawa, Marcus Bantscheff, Mikhail M Savitski, and Wolfgang Huber. 2019. “Nonparametric Analysis of Thermal Proteome Profiles Reveals Novel Drug-Binding Proteins.” Molecular & Cellular Proteomics 18 (12): 2506–15.
Franken, Holger, Toby Mathieson, Dorothee Childs, Gavain MA Sweetman, Thilo Werner, Ina Tögel, Carola Doce, et al. 2015. “Thermal Proteome Profiling for Unbiased Identification of Direct and Indirect Drug Targets Using Multiplexed Quantitative Mass Spectrometry.” Nature Protocols 10 (10): 1567–93.
Le Sueur, Cecile, Magnus Rattray, and Mikhail Savitski. 2024. “GPMelt: A Hierarchical Gaussian Process Framework to Explore the Dark Meltome of Thermal Proteome Profiling Experiments.” PLOS Computational Biology 20 (9): e1011632.
Reinhard, Friedrich BM, Dirk Eberhard, Thilo Werner, Holger Franken, Dorothee Childs, Carola Doce, Maria Fälth Savitski, et al. 2015. “Thermal Proteome Profiling Monitors Ligand Interactions with Cellular Membrane Proteins.” Nature Methods 12 (12): 1129–31.
Savitski, Mikhail M, Friedrich BM Reinhard, Holger Franken, Thilo Werner, Maria Fälth Savitski, Dirk Eberhard, Daniel Martinez Molina, et al. 2014. “Tracking Cancer Drugs in Living Cells by Thermal Profiling of the Proteome.” Science 346 (6205): 1255784.