Applying GPMelt: the example of the ATP 2019 dataset


In this tutorial, we use the ATP 2019 dataset (Sridharan et al. 2019) analysed in GPMelt (Le Sueur, Rattray, and Savitski 2024).

1 Data download

The dataset can be downloaded from Zenodo DOI.

We will use the two following files found in the folder ATP2019/data: SuppTable8.rds and SuppTable8_ExpDetails.rds.

2 Data preprocessing

To preprocess these data, we can run the preprocessing file found on the gitlab repository: gpmelt/Analysis/ATP2019/ATP2019_DataPreparation.qmd. This code will directly provide you with the data ready to be used with GPMelt, but for the purpose of the tutorial, we will only run gpmelt/Analysis/ATP2019/ATP2019_DataPreparation.qmd until section ‘preparation for GPMelt analysis’. At this stage, we have the data frame called PreprocessedData.

Code
library(dplyr)
library(ggplot2)
source("../../utils.R")

We start by subsetting the IDs which pass the quality criteria (see gpmelt/Analysis/ATP2019/ATP2019_DataPreparation.qmd for details):

Data_forPython <- PreprocessedData %>%
    inner_join(PassQualityMetric_ids %>% dplyr::select(-Nrep)) %>%
    ungroup() %>%
    dplyr::select(-Labels)

head(Data_forPython)
# A tibble: 6 × 6
  protein_id      gene_name condition   replicate     FC temperature
  <chr>           <chr>     <chr>       <chr>      <dbl>       <dbl>
1 Q6PD74|Q6PD74-2 AAGAB     10mM_Mg_ATP 1         0.998         37  
2 Q6PD74|Q6PD74-2 AAGAB     10mM_Mg_ATP 1         0.857         44  
3 Q6PD74|Q6PD74-2 AAGAB     10mM_Mg_ATP 1         0.996         40.4
4 Q6PD74|Q6PD74-2 AAGAB     10mM_Mg_ATP 1         0.349         49.8
5 Q6PD74|Q6PD74-2 AAGAB     10mM_Mg_ATP 1         0.744         46.9
6 Q6PD74|Q6PD74-2 AAGAB     10mM_Mg_ATP 1         0.0485        55.5

The ATP 2019 dataset is an example of an experimental design with replicates & conditions for each ID.

In the next step, we will see how we can translate this experimental protocol into a three-level hierarchical Gaussian process (HGP) model, and how to define the variables names accordingly in the dataset.

References

Le Sueur, Cecile, Magnus Rattray, and Mikhail Savitski. 2024. “GPMelt: A Hierarchical Gaussian Process Framework to Explore the Dark Meltome of Thermal Proteome Profiling Experiments.” PLOS Computational Biology 20 (9): e1011632.
Sridharan, Sindhuja, Nils Kurzawa, Thilo Werner, Ina Günthner, Dominic Helm, Wolfgang Huber, Marcus Bantscheff, and Mikhail M Savitski. 2019. “Proteome-Wide Solubility and Thermal Stability Profiling Reveals Distinct Regulatory Roles for ATP.” Nature Communications 10 (1): 1155.