A function to identify PPCC Data Doppelgangers
getPPCCDoppelgangers.Rd
This function performs the following steps to identify PPCC data dopplgangers between batches:
Batch correct batches with sva::ComBat
Calculate PPCC values between samples of different batches
Label sample pairs according to their patient id and class similarities
Calculate PPCC cut off point (maximum PPCC of any "Different Class Different Patient" sample pair)
Identify PPCC Data Doppelgangers as sample pairs with "Same Class Different Patient" labels with PPCC values > PPCC cut-off.
Usage
getPPCCDoppelgangers(
raw_data,
meta_data,
do_batch_corr = TRUE,
correlation_function = cor,
batch_corr_method = "ComBat",
do_min_max = FALSE
)
Arguments
- raw_data
Data frame where each column is a sample and each row is a variable where rowname of each row is the variable name.
- meta_data
Data frame with the columns "Class", "Patient_ID", "Batch" indicating the class, patient id and batch of the sample respectively and each row is a sample name. Ensure the sample names are row names of the data frame not a separate column in the data set.
- do_batch_corr
If False, no batch correction is carried out before doppelgangers are found
- correlation_function
Correlation function use. Pearson's Correlation Coefficient is used as the default correlation function. User defined functions should accept two vector parameters, x and y.
- batch_corr_method
Batch correlation method used. Only 2 options are accepted "ComBat" or "ComBat_seq".
- do_min_max
If True, min max normalization is carried out just before PPCC calulation
Details
This function also identifies PPCC data doppelgangers within a batch (if only 1 batch is detected in the metadata document). In this case it performs the following steps:
Calculate PPCC values between samples within the batch
Label sample pairs according to their patient id and class similarities
Calculate PPCC cut off point (maximum PPCC of any "Different Class Different Patient" sample pair)
Identify PPCC Data Doppelgangers as sample pairs with "Same Class Different Patient" labels with PPCC values > PPCC cut-off.
Troubleshooting Tips:
Ensure all (rownames) samples in the meta_data can be found in the colnames in the raw_data and vice versa.
Examples
ppccDoppelgangerResults = getPPCCDoppelgangers(rc, rc_metadata)
#> [1] "1. Batch correcting the 2 data sets with sva:ComBat..."
#> Found2batches
#> Adjusting for0covariate(s) or covariate level(s)
#> Standardizing Data across genes
#> Fitting L/S model and finding priors
#> Finding parametric adjustments
#> Adjusting the Data
#> [1] "- Data is not min-max normalized"
#> [1] "2. Calculating PPCC between samples of each batch..."
#>
|
|= | 1%
|
| | 0%
|
| | 1%
|
|= | 1%
|
|= | 2%
|
|== | 2%
|
|== | 3%
|
|=== | 4%
|
|=== | 5%
|
|==== | 5%
|
|==== | 6%
|
|===== | 6%
|
|===== | 7%
|
|===== | 8%
|
|====== | 8%
|
|====== | 9%
|
|======= | 10%
|
|======== | 11%
|
|======== | 12%
|
|========= | 12%
|
|========= | 13%
|
|========== | 14%
|
|========== | 15%
|
|=========== | 15%
|
|=========== | 16%
|
|============ | 17%
|
|============ | 18%
|
|============= | 18%
|
|============= | 19%
|
|============== | 19%
|
|============== | 20%
|
|============== | 21%
|
|=============== | 21%
|
|=============== | 22%
|
|================ | 22%
|
|================ | 23%
|
|================= | 24%
|
|================= | 25%
|
|================== | 25%
|
|================== | 26%
|
|=================== | 27%
|
|=================== | 28%
|
|==================== | 28%
|
|==================== | 29%
|
|===================== | 29%
|
|===================== | 30%
|
|===================== | 31%
|
|====================== | 31%
|
|====================== | 32%
|
|======================= | 32%
|
|======================= | 33%
|
|======================== | 34%
|
|======================== | 35%
|
|========================= | 35%
|
|========================= | 36%
|
|========================== | 37%
|
|========================== | 38%
|
|=========================== | 38%
|
|=========================== | 39%
|
|============================ | 40%
|
|============================= | 41%
|
|============================= | 42%
|
|============================== | 42%
|
|============================== | 43%
|
|============================== | 44%
|
|=============================== | 44%
|
|=============================== | 45%
|
|================================ | 45%
|
|================================ | 46%
|
|================================= | 47%
|
|================================= | 48%
|
|================================== | 48%
|
|================================== | 49%
|
|=================================== | 49%
|
|=================================== | 50%
|
|=================================== | 51%
|
|==================================== | 51%
|
|==================================== | 52%
|
|===================================== | 52%
|
|===================================== | 53%
|
|====================================== | 54%
|
|====================================== | 55%
|
|======================================= | 55%
|
|======================================= | 56%
|
|======================================== | 56%
|
|======================================== | 57%
|
|======================================== | 58%
|
|========================================= | 58%
|
|========================================= | 59%
|
|========================================== | 60%
|
|=========================================== | 61%
|
|=========================================== | 62%
|
|============================================ | 62%
|
|============================================ | 63%
|
|============================================= | 64%
|
|============================================= | 65%
|
|============================================== | 65%
|
|============================================== | 66%
|
|=============================================== | 67%
|
|=============================================== | 68%
|
|================================================ | 68%
|
|================================================ | 69%
|
|================================================= | 69%
|
|================================================= | 70%
|
|================================================= | 71%
|
|================================================== | 71%
|
|================================================== | 72%
|
|=================================================== | 72%
|
|=================================================== | 73%
|
|==================================================== | 74%
|
|==================================================== | 75%
|
|===================================================== | 75%
|
|===================================================== | 76%
|
|====================================================== | 77%
|
|====================================================== | 78%
|
|======================================================= | 78%
|
|======================================================= | 79%
|
|======================================================== | 79%
|
|======================================================== | 80%
|
|======================================================== | 81%
|
|========================================================= | 81%
|
|========================================================= | 82%
|
|========================================================== | 82%
|
|========================================================== | 83%
|
|=========================================================== | 84%
|
|=========================================================== | 85%
|
|============================================================ | 85%
|
|============================================================ | 86%
|
|============================================================= | 87%
|
|============================================================= | 88%
|
|============================================================== | 88%
|
|============================================================== | 89%
|
|=============================================================== | 90%
|
|================================================================ | 91%
|
|================================================================ | 92%
|
|================================================================= | 92%
|
|================================================================= | 93%
|
|================================================================= | 94%
|
|================================================================== | 94%
|
|================================================================== | 95%
|
|=================================================================== | 95%
|
|=================================================================== | 96%
|
|==================================================================== | 97%
|
|==================================================================== | 98%
|
|===================================================================== | 98%
|
|===================================================================== | 99%
|
|======================================================================| 99%
|
|======================================================================| 100%
#> [1] "3. Labelling Sample Pairs according to their Class and Patient Similarities..."
#> [1] "4. Calculating PPCC cut off to identify PPCC data doppelgangers..."
#> [1] "5. Identifying PPCC data doppelgangers..."