Phospho-TPP: a peptide-level dataset
1 Introduction
In this file, we discuss the analyses of the peptide-level dataset, called the Phospho-TPP dataset in (Le Sueur, Rattray, and Savitski 2024). This dataset comes from Potel et al. (2021).
In (Le Sueur, Rattray, and Savitski 2024), we performed two types of analyses for this dataset. The first one using a three-level HGP model and the second using a four-level HGP model.
The goal of this page is only to specify the choice of the parameters.txt
file, and the results won’t be discussed here (see (Le Sueur, Rattray, and Savitski 2024) for a discussion about the results).
2 Data loading and preprocessing.
The dataset can be downloaded from Zenodo .
Data are in folders PhosphoTPP/data
and PhosphoTPP/prerun
.
The preprocessing of the data is done in the preprocessing file found on the gitlab repository: gpmelt/Analysis/ATP2019/PhosphoTPP_DataPreparation.qmd
.
3 The complete parameters.txt
file for the three-level HGP model.
1"Scaling" : "mean",
2"modelType" : "3Levels_TwoLengthscales_FixedLevels1and2_FreeLevel3",
"lengthscale_prior": None,
3"lengthscale_MinConstraint" : "max",
"mean" : "gpytorch.means.ZeroMean()",
"control_condition": "Non-phospho",
"training_iterations": 700 , <4>
"LearningRate" : 0.1,
"Amsgrad" : True, <5>
"n_PredictionsPoints" : 50,
"PlotSave" : True,
"prediction_type" : "predicted_functions",
6"GPMelt_statistic" : "ID-wise"
- 1
-
We use the
mean scaling
on this dataset, which presents about half of non-sigmoidal melting curves. - 2
- We use two lengthscales, as peptide-level replicates present fast variations. We also allow each replicate to have a different output-scale to capture larger variations originating from a higher level of noise in peptide-level TPP-TR dataset.
- 3
- Because peptide-level observations are more noisy, we favor larger lengthscales to obtain smoother melting curves.
- 6
-
A final size of \(S = 1e4\) has been chosen. Note: We also propose a
group-wise
analysis in (Le Sueur, Rattray, and Savitski 2024), seeSupporting Information B
, paragraph “The choice of the null distribution approximation: an example”, andFigS
in S1 file.
4 The complete parameters.txt
file for the four-level HGP model.
Note: The code describing the preprocessing of the data for this analysis has not been added to the gitlab repository yet.
"Scaling" : "mean",
1"modelType" : "4Levels_TwoLengthscales_FixedLevels1and2and3_FreeLevel4",
"lengthscale_prior": None,
"lengthscale_MinConstraint" : "max",
"mean" : "gpytorch.means.ZeroMean()",
2"control_condition": "Control",
"training_iterations": 500 ,
"LearningRate" : 0.05,
"Amsgrad" : False,
"n_PredictionsPoints" : 50,
"PlotSave" : True,
"prediction_type" : "predicted_functions",
3"GPMelt_statistic" : "ID-wise"
- 1
- We use two lengthscales, as peptide-level replicates present fast variations. We also allow each replicate to have a different output-scale to capture larger variations originating from a higher level of noise in peptide-level TPP-TR dataset.
- 2
-
We compared any peptide to any other peptide. Because this feature is not implemented yet, we defined the dataset such that any peptide appears once as
Control
with all other peptides astreatment conditions
.. - 3
- A final size of \(S = 1e4\) has been chosen.