1 The `parameters.txt` file

In this step, we create the parameters.txt file, that should be saved with the data.

This file contains the parameters for the model specification and fitting.

We restate hereafter the important pages to look at to understand these parameters and their effects on the model specifications and fits. Do not hesitate to play around these parameters to better understand them!

2 A complete example of the `parameters.txt` file.

To start, we present below an example of the file:

{
"Scaling" : "Scaling_ToDefine",
"modelType" : "Model_ToDefine",
"lengthscale_prior": None, 
"lengthscale_MinConstraint" : "lengthscale_Constraint_ToDefine", 
"mean" : "gpytorch.means.ZeroMean()", 
"control_condition": "ControlCondition_ToDefine", 
"training_iterations": 500 , 
"LearningRate" : 0.1,
"Amsgrad" : False, 
"n_PredictionsPoints" : 50, 
"PlotSave" : True, 
"prediction_type" : "predicted_functions",
"GPMelt_statistic" : "Statistic_ToDefine"
}

Note:

These parameters cannot be left blank!
The following values in the example below do not correspond to default values of the parameters and should be consciously chosen by the user: Scaling, modelType, lengthscale_MinConstraint, control_condition, GPMelt_statistic.

We now go step by step to explain this parameters.txt file.

3 Parameters related to the model specification

3.1 Scaling

1'Scaling' : "Scaling_ToDefine"

1: Scaling_ToDefine can be replaced by FC, mean, log, median, GeometricMean, None. See here.

We encourage you to play around to see how different scaling methods impact the curves shapes!

3.1.1 Application to ATP 2019

In the GPMelt paper (Le Sueur, Rattray, and Savitski 2024), the original FC values (found in the downloaded dataset) were kept, thus Scaling was chosen to be None.

3.2 modelType

The variable modelType defines the structure of the selected hierarchical model. It can be decomposed in three parts:

the number of levels
the number of lengthscales
the constraints on the outputscales.

3.2.1 Number of levels

Currently implemented: 3Levels or 4Levels. More information can be found here.

3.2.2 Number of lengthscales

Currently implemented: OneLengthscale or TwoLengthscales.

This relates to the constraints chosen to be imposed on \(\lambda_1\) and \(\lambda_2\) (see here, section 2):

if \(\lambda_1 = \lambda_2 \equiv \lambda\), then use OneLengthscale,
if \(\lambda_1 \neq \lambda_2\), then use TwoLengthscales.

3.2.3 Constraints on the output-scales

These constraints relate to the constraints chosen to be imposed on the outputscales (see here, section 3).

For the sake of the explanation, we describe the case of the three-level HGP model, with outpuscales \(\sigma^2_h\), \(\sigma^2_{g_c}\) and \(\sigma^2_{f_{cr}}\).

The current implementation of the hypothesis testing framework do not allow to have different values for \(\sigma^2_{g_c}\) between the different conditions. Thus, the constraints on the output-scales always starts by FixedLevels1and2.

Additionally, you can:

constrain the output-scales for the bottom level to be the same for all conditions \(c\) and replicates \(r\), i.e. \(\sigma^2_{f_{cr}} \equiv \sigma^2_{f}\): FixedLevels1and2and3,
allow for a different output-scale value for each replicate of each condition \(\sigma^2_{f_{cr}}\) : FixedLevels1and2_FreeLevel3.

A similar principle applies for the four-level HGP model.

3.2.4 Possible values of modelType

Currently implemented models are:

For three-level HGPM :

3Levels_OneLengthscale_FixedLevels1and2_FreeLevel3 (\(\lambda_1 = \lambda_2 \equiv \lambda\) and \(\sigma^2_{f_{cr}} \forall c,r\))
3Levels_OneLengthscale_FixedLevels1and2and3 (\(\lambda_1 = \lambda_2 \equiv \lambda\) and \(\sigma^2_{f_{cr}} \equiv \sigma^2_{f} \quad \forall c,r\))
3Levels_TwoLengthscales_FixedLevels1and2_FreeLevel3 (\(\lambda_1 \neq \lambda_2\) and \(\sigma^2_{f_{cr}} \forall c,r\))

For four-level HGPM :

(2a) Case \(\lambda_1 = \lambda_2 \equiv \lambda\):

4Levels_OneLengthscale_FixedLevels1and2_FreeLevels3and4 (\(\sigma^2_{f_{c\pi_{j}}} \forall c,i,j\) and \(\sigma^2_{\eta_{c\pi_{j}r}} \forall c,i,j,r\))
4Levels_OneLengthscale_FixedLevels1and2_FreeLevels3FixedLevels4 (\(\sigma^2_{f_{c\pi_{j}}} \forall c,i,j\) and \(\sigma^2_{\eta_{c\pi_{j}r}} \equiv \sigma^2_{\eta} \forall c,i,j,r\))
4Levels_OneLengthscale_FixedLevels1and2and3_FreeLevels4 (\(\sigma^2_{f_{c\pi_{j}}} \equiv \sigma^2_{f} \forall c,i,j\) and \(\sigma^2_{\eta_{c\pi_{j}r}} \forall c,i,j,r\))
4Levels_OneLengthscale_FixedLevels1and2and3and4 ( \(\sigma^2_{f_{c\pi_{j}}} \equiv \sigma^2_{f} \forall c,i,j\) and \(\sigma^2_{\eta_{c\pi_{j}r}} \equiv \sigma^2_{\eta} \forall c,i,j,r\))

(2b) Case \(\lambda_1 \neq \lambda_2\):

4Levels_TwoLengthscales_FixedLevels1and2and3_FreeLevel4 (\(\sigma^2_{f_{c\pi_{j}}} \equiv \sigma^2_{f} \forall c,i,j\) and \(\sigma^2_{\eta_{c\pi_{j}r}} \forall c,i,j,r\))
4Levels_TwoLengthscales_FixedLevels1and2_FreeLevels3and4 (\(\sigma^2_{f_{c\pi_{j}}} \forall c,i,j\) and \(\sigma^2_{\eta_{c\pi_{j}r}} \forall c,i,j,r\))

In addition, the following models are implemented (but NOT compatible with the hypothesis testing framework):

3Levels_OneLengthscale_FixedLevel1_FreeLevels2and3 (\(\lambda_1 = \lambda_2 \equiv \lambda\), \(\sigma^2_{g_c} \forall c\) and \(\sigma^2_{f_{cr}} \forall c,r\))
3Levels_TwoLengthscales_FixedLevel1_FreeLevels2and3 (\(\lambda_1 \neq \lambda_2\), \(\sigma^2_{g_c} \forall c\) and \(\sigma^2_{f_{cr}} \forall c,r\))

3.2.5 Application to ATP 2019

In (Le Sueur, Rattray, and Savitski 2024), the hierarchical model had three levels, one lengthscale and output-scales of the bottom level were fixed, hence we used:

"modelType" : "3Levels_OneLengthscale_FixedLevels1and2and3"

3.3 Prior on the lengthscale

The user can choose to define a prior on the lengthscale parameter (see here, section 4.2).

Note: if the model specification involves two lengtshcales, the prior will only apply to the lengthscale associated to the higher levels (denoted \(\lambda_1\)). Defining a prior of the lengthscale associated to the bottom level of the hierarchy (\(\lambda_2\)) is not yet implemented.

Possible priors can be found on the gpytorch documentation page.

'lengthscale_prior' : None

3.3.1 Application to ATP 2019

In (Le Sueur, Rattray, and Savitski 2024), lengthscale_prior was set to None:

'lengthscale_prior' : None

3.4 Constraint on the lengthscale

1'lengthscale_MinConstraint' : 'lengthscale_Constraint_ToDefine'

1: lengthscale_Constraint_ToDefine can be replaced by min, mean, median, max, None.

The user can choose to define a constraint on the lengthscale parameter (see here, section 4.1). This constraint limits the lowest possible value for the lengthscale (implemented via gpytorch.constraints.GreaterThan).

Note: if the model specification involves two lengtshcales, the constraint will only apply to the lengthscale associated to the higher levels (i.e. \(\lambda_1\)). Defining a constraint of the lengthscale associated to the bottom level of the hierarchy (\(\lambda_2\)) is not yet implemented.

The constraint should be selected among min, mean, median, max, None, and is defined as follows. Considering the set of temperatures at which observations have been obtained, the distance between each pair of consecutive temperatures is computed. The constraint is then defined such that the lowest possible value for the lengthscale is:

min: the smallest distance between temperatures,
mean: the mean distance between temperatures,
median: the median distance between temperatures,
max: the largest distance between temperatures,
None: no constraint on the lowest value of the lengtshcale is applied.

3.4.1 Application to ATP 2019

In (Le Sueur, Rattray, and Savitski 2024), lengthscale_MinConstraint was set to min:

'lengthscale_MinConstraint' : 'min'

3.5 Mean of the HGP model

We discuss means of GPs here, section 7.

1"mean" : "gpytorch.means.ZeroMean()"

1: While not recommended, mean can also take the value gpytorch.means.ConstantMean().

In (Le Sueur, Rattray, and Savitski 2024), all GPs have been chosen to be centered in \(0\).

Hence, the validity of GPMelt results with mean : gpytorch.means.ConstantMean() cannot be guaranteed. This is only recommended for advanced users who want to use a constant mean.

3.5.1 Application to ATP 2019

In (Le Sueur, Rattray, and Savitski 2024), we used:

'mean' : "gpytorch.means.ZeroMean()"

3.6 Definition of the control condition

The control condition will be used as reference condition in the hypothesis testing framework. This is particularly relevant in the presence of multiple conditions and one control condition.

1"control_condition" : "ControlConditionToDefine"

1: ControlConditionToDefine should be replaced by one condition name that can be found in the columns condition or Level_2 of the input dataset.

3.6.1 Application to ATP 2019

In the ATP 2019, the control condition is called “Vehicle”:

unique(Data_forPython$condition)

[1] "Treatment" "Vehicle"

Hence we update the value of the control_condition variable in the parameters.txt file.

"control_condition" : "Vehicle"

4 Parameters for the model estimation

Following Gpytorch(Gardner et al. 2018) routine, we use Type II MLE to train the hyper-parameters of the full HGP model \(\mathcal{M}_1\). Some parameters of this algorithm can be tuned, see here, section 5.

4.1 Number of iterations

1'training_iterations': 500

1: A larger number of iterations might be required if the model is more complex (e.g. a large number of conditions).

4.2 Learning rate

Also see Gpytorch documentation for more information.

1'LearningRate': 0.1

1: Can be adjusted if needed

4.3 Whether to use the AMSGrad variant of the Adam algorithm

Also see the Adam documentation for more information.

1'Amsgrad' : False

1: Can be set to True or False

4.4 Number of points in which to predict the posterior mean and 95% confidence regions

See here, section 8, for a visualisation of how this number of points affect the prediction.

1'n_PredictionsPoints' : 50

1: Can be adjusted if needed

4.5 Application to ATP 2019

In (Le Sueur, Rattray, and Savitski 2024), the following values have been selected:

'training_iterations': 500
'LearningRate': 0.1, 
'Amsgrad' : False ,
'n_PredictionsPoints' : 50

5 Parameters for the plots

5.1 Type of predictions for the fits plots

1"prediction_type" : "prediction_type_ToDefine"

1: Can take values predicted_functions or predicted_observations.

We refer to the GPyTorch documentation about GP regression:

predicted_functions : returns the model posterior distribution \(p(f* | x*, X, y)\), for training data \(X, y\). This posterior is the distribution over the function we are trying to model, and thus quantifies our model uncertainty.
predicted_observations : returns the posterior predictive distribution \(p(y* | x*, X, y)\) which is the probability distribution over the predicted output value \(\Rightarrow\) here the prediction is over the observed value of the test points.

5.2 Should the set of plots generated for each ID (monitoring convergence, depicting the fits and the covariance matrices of the full and joint models) be saved?

1'PlotSave' : True

1: Can be changed to False if needed

5.3 Application to ATP 2019

In (Le Sueur, Rattray, and Savitski 2024), the following values have been selected:

"prediction_type" : "predicted_functions",
'PlotSave' : True

6 Parameters related to the approximation of the null distribution and the p-value computation

6.1 GPMelt statistic and p-value computation

We described the hypothesis testing framework here, and especially how the null distribution of the statistic \(\Lambda\) is approximated. We also discussed three approximation methods namely the dataset-wise, protein-wise and group-wise methods.

Note:

For general purpose, we renamed the protein-wise method ID-wise.
Not Implemented: the group-wise method is unfortunately not implemented yet, but can easily be derived:
- from the code in Nextflow/bin/compute_likelihood_statistic.py
- or using the file NumberSamples_perID.csv as discussed here, section 2.5.

1"GPMelt_statistic" : "Statistic_ToDefine"

1: Currently implemented: ID-wise or dataset-wise

6.2 Application to ATP 2019

In (Le Sueur, Rattray, and Savitski 2024), we use the dataset-wise approximation:

"GPMelt_statistic" : "dataset-wise"

7 The complete `parameters.txt` file for the ATP 2019

{
"Scaling" : None,
"modelType" : "3Levels_OneLengthscale_FixedLevels1and2and3",
"lengthscale_prior": None, 
"lengthscale_MinConstraint" : "min", 
"mean" : "gpytorch.means.ZeroMean()", 
"control_condition": "Vehicle", 
"training_iterations": 500 , 
"LearningRate" : 0.1,
"Amsgrad" : False, 
"n_PredictionsPoints" : 50, 
"PlotSave" : True, 
"prediction_type" : "predicted_functions",
"GPMelt_statistic" : "dataset-wise"
}

7.1 Save the `parameters.txt` file

The updated parameters.txt file should be saved in the folder Nextflow/dummy_data/ATP2019, using the name parameters.txt.

References

Gardner, Jacob, Geoff Pleiss, Kilian Q Weinberger, David Bindel, and Andrew G Wilson. 2018. “Gpytorch: Blackbox Matrix-Matrix Gaussian Process Inference with Gpu Acceleration.” Advances in Neural Information Processing Systems 31.

Le Sueur, Cecile, Magnus Rattray, and Mikhail Savitski. 2024. “GPMelt: A Hierarchical Gaussian Process Framework to Explore the Dark Meltome of Thermal Proteome Profiling Experiments.” PLOS Computational Biology 20 (9): e1011632.

1 The parameters.txt file

2 A complete example of the parameters.txt file.

3 Parameters related to the model specification

3.1 Scaling

3.1.1 Application to ATP 2019

3.2 modelType

3.2.1 Number of levels

3.2.2 Number of lengthscales

3.2.3 Constraints on the output-scales

3.2.4 Possible values of modelType

3.2.5 Application to ATP 2019

3.3 Prior on the lengthscale

3.3.1 Application to ATP 2019

3.4 Constraint on the lengthscale

3.4.1 Application to ATP 2019

3.5 Mean of the HGP model

3.5.1 Application to ATP 2019

3.6 Definition of the control condition

3.6.1 Application to ATP 2019

4 Parameters for the model estimation

4.1 Number of iterations

4.2 Learning rate

4.3 Whether to use the AMSGrad variant of the Adam algorithm

4.4 Number of points in which to predict the posterior mean and 95% confidence regions

4.5 Application to ATP 2019

5 Parameters for the plots

5.1 Type of predictions for the fits plots

5.2 Should the set of plots generated for each ID (monitoring convergence, depicting the fits and the covariance matrices of the full and joint models) be saved?

5.3 Application to ATP 2019

6 Parameters related to the approximation of the null distribution and the p-value computation

6.1 GPMelt statistic and p-value computation

6.2 Application to ATP 2019

7 The complete parameters.txt file for the ATP 2019

7.1 Save the parameters.txt file

References

1 The `parameters.txt` file

2 A complete example of the `parameters.txt` file.

7 The complete `parameters.txt` file for the ATP 2019

7.1 Save the `parameters.txt` file