Code
library(dplyr)
library(ggplot2)
source("../../utils.R")
In this tutorial, we use the ATP 2019 dataset (Sridharan et al. 2019) analysed in GPMelt (Le Sueur, Rattray, and Savitski 2024).
The dataset can be downloaded from Zenodo .
We will use the two following files found in the folder ATP2019/data
: SuppTable8.rds
and SuppTable8_ExpDetails.rds
.
To preprocess these data, we can run the preprocessing file found on the gitlab repository: gpmelt/Analysis/ATP2019/ATP2019_DataPreparation.qmd
. This code will directly provide you with the data ready to be used with GPMelt, but for the purpose of the tutorial, we will only run gpmelt/Analysis/ATP2019/ATP2019_DataPreparation.qmd
until section ‘preparation for GPMelt analysis’. At this stage, we have the data frame called PreprocessedData
.
library(dplyr)
library(ggplot2)
source("../../utils.R")
We start by subsetting the IDs which pass the quality criteria (see gpmelt/Analysis/ATP2019/ATP2019_DataPreparation.qmd
for details):
<- PreprocessedData %>%
Data_forPython inner_join(PassQualityMetric_ids %>% dplyr::select(-Nrep)) %>%
ungroup() %>%
::select(-Labels)
dplyr
head(Data_forPython)
# A tibble: 6 × 6
protein_id gene_name condition replicate FC temperature
<chr> <chr> <chr> <chr> <dbl> <dbl>
1 Q6PD74|Q6PD74-2 AAGAB 10mM_Mg_ATP 1 0.998 37
2 Q6PD74|Q6PD74-2 AAGAB 10mM_Mg_ATP 1 0.857 44
3 Q6PD74|Q6PD74-2 AAGAB 10mM_Mg_ATP 1 0.996 40.4
4 Q6PD74|Q6PD74-2 AAGAB 10mM_Mg_ATP 1 0.349 49.8
5 Q6PD74|Q6PD74-2 AAGAB 10mM_Mg_ATP 1 0.744 46.9
6 Q6PD74|Q6PD74-2 AAGAB 10mM_Mg_ATP 1 0.0485 55.5
The ATP 2019 dataset is an example of an experimental design with replicates & conditions for each ID.
In the next step, we will see how we can translate this experimental protocol into a three-level hierarchical Gaussian process (HGP) model, and how to define the variables names accordingly in the dataset.