Acute Lymphoblastic Leukaemia (ALL) Microarray Data Set
all.Rd
Unprocessed data set (no batch correction performed) formed from the combination of 2 independently-derived ALL data sets, Ross et al. and Yeoh et al.
Usage
data(all)
Details
Both data sets were combined into a single data set with the following procedure:
All probes of both data sets were converted to ENSEMBL IDs using biomaRt.
To ensure a one-to-one mapping between the probes and ENSEMBL IDs in both data sets, all probes with no ENSEMBL ID were removed. Probes with multiple ENSEMBL IDs were replaced by the ENSEMBL ID with the smallest value (ENSEMBL IDs were ordered using the default order function and all ENSEMBL IDs after the first ENSEMBL ID was removed). We took the median values of probes sharing the same ENSEMBL ID. After this procedure, both data sets would consist of unique ENSEMBL ID variables.
To join both data sets without any null values or data imputation (since both data sets may not have the same number and type of ENSEMBL IDs), we took the intersection of ENSEMBL IDs between both data sets. This set of ENSEMBL IDs would be the ENSEMBL IDs of the joined data set.
Both data sets were joined along the shared set of ENSEMBL IDs.
References
Ross ME, Mahfouz R, Onciu M, Liu H-C, Zhou X, Song G, et al. Gene expression profiling of pediatric acute myelogenous leukemia Blood. 2004; 104:3679-87.
Yeoh E-J, Ross ME, Shurtleff SA, Williams WK, Patel D, Mahfouz R, et al. Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling Cancer Cell. 2002; 1:133-43.