GPMelt’s additional results
We now go through all the output files produced by GPMelt, some of them likely useful for more advanced users.
1 Scaled data
If the parameter Scaling
is not set to None
(see here), the file data_with_scalingFactors.csv
is saved, which contains the scaling factors for the mean
, median
, geometric mean
, fold change
and log
, along with the scaled data.
2 Loss values
full_hgpm_fit_loss_values_df.csv
: the loss values are discussed here, section 2.
3 Posterior distribution: mean and \(95\%\) confidence region
As mentioned here, section 4, the GP predictions can be found in the files:
- Full model (\(\mathcal{M}_1\))
prediction_full_hgpm_Level_1.csv
for the top levelprediction_full_hgpm_Levels_1tol.csv
for levell
, with \(l=2...L\).
- Joint model(s) (\(\mathcal{M}_0\))
prediction_joint_hgpm_Level_1.csv
for the top levelprediction_joint_hgpm_Levels_1tol.csv
for levell
, with \(l=2...L\).
4 Null distribution approximation
sample_df.csv
contains all samples that have been drawn and used to compute the null distribution approximation (step 7 explained in this figure)
5 \(\Lambda\) statistic and p-values
likelihood_ratios_df.csv
: the statistics \(\Lambda\) computed for the real IDs (\(\Lambda_p\) in the blue box of this figure)likelihood_ratios_sampled_dataset_df.csv
: the statistics \(\Lambda\) computed for the sampled IDs/observations (\(\Lambda_s^0\) in the orange box of this figure). The sampled IDs/observations are found insample_df.csv
.p_values_*_wise.csv
with*
beingdataset
orID
: this data frame combineslikelihood_ratios_df.csv
withlikelihood_ratios_sampled_dataset_df.csv
and compute the empirical p-values according to thedataset-wise
orID-wise
method (as explained at the end of this Video).
6 Parameters fits
full_hgpm_fit_parameters_df.csv
contains the type-II MLE parameters obtained by fitting the full model (\(\mathcal{M}_1\)) to the real data.full_hgpm_fit_parameters_sampled_dataset_df.csv
contains the type-II MLE parameters obtained by fitting the full model (\(\mathcal{M}_1\)) to the samples (found insample_df.csv
).
7 Plots
In the Results folder, you will find a Plots
folder.
This contains all the plots generated for each ID of the dataset.
As explained here, section 2, the first plot can be used to monitor the convergence of the fit by looking at the evolution of the loss values with the number of iterations of the optimisation algorithm.
The last plots represent the predictions (see Section 3 and here, section 4).
The middle plots represent the covariance matrix decomposition into index kernel matrices and correlation matrix (advances users, see explanations).
8 Indices of the tasks (advanced users)
We implemented this hierarchical model using a multi-task learning approach (see explanations).
This implementation requires the definition of tasks, to define the index kernel matrices.
Each level is associated to a different index kernel matrix, hence we define tasks for each level.
In the case of a three-level HGP model, the replicates are the tasks of the bottom level, the conditions are the tasks of the second level, then the protein ID is the task of the top level.
The following files contain, for each replicate and each level, to which task it is associated (given by the index).
joint_hgpm_index_df.csv
full_hgpm_index_df.csv
9 Python objects (advanced users)
full_hgpm_state_dict_dict.pth
: python dictionary containing the parameters of the model. See this tutorial for explanation on how to load this dictionary in a model.combined_results.pkl
: contains all the previously discuss results in the form of a python dictionary, for direct use with python.