Myers and colleagues generated massive neocortical transcriptome data sets for a set of unrelated elderly neurologically and neuropathologically normal humans and from confirmed late onset Alzheimer's disease patients (LOAD, n = 187 normal and 176 LOAD cases, see DOI:10.1016/j.ajhg.2009.03.011 for detail). They used an Illumina Sentrix Bead array (HumanRef-8) that measures expression of approximately 19,730 curated RefSeq sequences (Human Build 34).
Case identifiers: All case identifiers (IDs) in GeneNetwork begin with a capital C followed by a six digit GEO identifier, followed by the sex and age in years. Non-Alzheimer cases are labeled with the suffix letter N: C225652M85N. Alzheimer cases are labeled with the suffix letter A: C388217F97A.
Data were initially downloaded from the NCBI GEO archive under the experiment ID GSE15222. All data were generated using the Illumina HumanRef-8 expression BeadChip (GPL2700) v2 Rev0. This data set in GeneNetwork includes data for 24,354 probes. We have realigned the 50-mer sequences by BLAT to the latest version of the human genome (Feb 2009, hg19) and reannotated the array (August 2009). The annotation in GN will differ from that provided in GEO for this platform. We were unable to obtain 50-mer sequences for several thousand probes (e.g., HTT), and these probes have therefore not been realigned to the human genome.
The GEO data set was processed by Myers and colleagues using Illumina's Rank Invariant transform. We performed a series of QC and renormalization steps to the data to allow more facile comparison to other data sets in GeneNetwork. In brief, data is log2 transformed. We recentered each array to a mean expression of 8 units and a standard deviation of 2 units (2z + 8 transform). The values are therefore modified z scores and each unit represents roughly a two-fold difference in expression. Average expression across all 363 cases range from a low of 6 units (e.g., SYT15) to a high of 19 units for ARSK. APOE has an average expression of 15 units and APP has an average expression of 11.5 units.. The distribution is far from normal with a great excess of measurements of genes with low to moderate expression clustered between 6.5 and 8.5 units.
A small number of arrays (n = 6, GSM226040, GSM226041, GSM226042, GSM226044, GSM226045, GSM226046) had a different distribution from the great majority of other arrays. This was probably due to a batch processing effect. Members of this minority group belonged to both normal and LOAD cases. This putative batch effect has been eliminated in the GeneNetwork rendition of the Myers data. To eliminate the putative batch effect, we simply computed a mean offset for each probe in the "minority set" relative to the remaining "majority set" and added or subtracted this offset to force the mean of each probe in the minority set to conform to mean of the same probe in the majority set.