Define a subset of IDs for model specification testing

This step consists in selecting a subset of IDs on which we will test our currently selected GPMelt’s model specification (i.e. definition of the hierarchy, choice of the null distribution approximation, choice of the optimisation algorithm parameters and model’s parameters constraints, see next page).

If not adequate, these choices can be updated and tested again.

Running the code on a subset of IDs helps to make this process of trial and error faster and less computationally intensive.

1 Create `subset_ID.csv`

We propose to use the same subset of IDs than used for plotting here, section 4.

We selected IDs with different melting curve patterns (sigmoidal and non-sigmoidal), and different levels of noise (e.g. BANF1 present an outlier replicate, the vehicle condition of MYO1G and NDUFB6 are also noisy, while other IDs like IMPDH1 or AAR2 have well-reproducible replicates). Comparing the quality of the fits for these IDs will help us deciding how to update the model specification if needed (see here).

subsetID <- data.frame(Level_1 = c("AAGAB", "CDK13", "DPM1", "AAK1", "AAMDC", "AAMP", "AAR2", "NDUFB6" ,"MYO1G","BANF1", "DDX50", "FBL", "NOP56", "EIF3H", "IMPDH1", "NUCKS1"))

head(subsetID)

  Level_1
1   AAGAB
2   CDK13
3    DPM1
4    AAK1
5   AAMDC
6    AAMP

write.csv(subsetID,
          file ="Nextflow/dummy_data/ATP2019/subset_ID.csv")

1 Create subset_ID.csv

1 Create `subset_ID.csv`