GPMelt’s additional results

We now go through all the output files produced by GPMelt, some of them likely useful for more advanced users.

1 Scaled data

If the parameter Scaling is not set to None (see here), the file data_with_scalingFactors.csv is saved, which contains the scaling factors for the mean, median, geometric mean, fold change and log, along with the scaled data.

2 Loss values

full_hgpm_fit_loss_values_df.csv: the loss values are discussed here, section 2.

3 Posterior distribution: mean and \(95\%\) confidence region

As mentioned here, section 4, the GP predictions can be found in the files:

Full model (\(\mathcal{M}_1\))
- prediction_full_hgpm_Level_1.csv for the top level
- prediction_full_hgpm_Levels_1tol.csv for level l, with \(l=2...L\).
Joint model(s) (\(\mathcal{M}_0\))
- prediction_joint_hgpm_Level_1.csv for the top level
- prediction_joint_hgpm_Levels_1tol.csv for level l, with \(l=2...L\).

4 Null distribution approximation

sample_df.csv contains all samples that have been drawn and used to compute the null distribution approximation (step 7 explained in this figure)

5 \(\Lambda\) statistic and p-values

likelihood_ratios_df.csv: the statistics \(\Lambda\) computed for the real IDs (\(\Lambda_p\) in the blue box of this figure)
likelihood_ratios_sampled_dataset_df.csv: the statistics \(\Lambda\) computed for the sampled IDs/observations (\(\Lambda_s^0\) in the orange box of this figure). The sampled IDs/observations are found in sample_df.csv.
p_values_*_wise.csv with * being dataset or ID : this data frame combines likelihood_ratios_df.csv with likelihood_ratios_sampled_dataset_df.csv and compute the empirical p-values according to the dataset-wise or ID-wise method (as explained at the end of this Video).

6 Parameters fits

full_hgpm_fit_parameters_df.csv contains the type-II MLE parameters obtained by fitting the full model (\(\mathcal{M}_1\)) to the real data.
full_hgpm_fit_parameters_sampled_dataset_df.csv contains the type-II MLE parameters obtained by fitting the full model (\(\mathcal{M}_1\)) to the samples (found in sample_df.csv).

7 Plots

In the Results folder, you will find a Plots folder.

This contains all the plots generated for each ID of the dataset.

As explained here, section 2, the first plot can be used to monitor the convergence of the fit by looking at the evolution of the loss values with the number of iterations of the optimisation algorithm.

The last plots represent the predictions (see Section 3 and here, section 4).

The middle plots represent the covariance matrix decomposition into index kernel matrices and correlation matrix (advances users, see explanations).

8 Indices of the tasks (advanced users)

We implemented this hierarchical model using a multi-task learning approach (see explanations).

This implementation requires the definition of tasks, to define the index kernel matrices.

Each level is associated to a different index kernel matrix, hence we define tasks for each level.

In the case of a three-level HGP model, the replicates are the tasks of the bottom level, the conditions are the tasks of the second level, then the protein ID is the task of the top level.

The following files contain, for each replicate and each level, to which task it is associated (given by the index).

joint_hgpm_index_df.csv
full_hgpm_index_df.csv

9 Python objects (advanced users)

full_hgpm_state_dict_dict.pth : python dictionary containing the parameters of the model. See this tutorial for explanation on how to load this dictionary in a model.
combined_results.pkl: contains all the previously discuss results in the form of a python dictionary, for direct use with python.