The High Q Foundation Striatum Exon 1.0 Array Expression Dataset of July 2007
EXPERIMENTAL EXON ST TEST DATA SET (preliminary text, not error checked). The July 2007 data freeze provides estimates of mRNA expression in the striatum (caudate nucleus of the forebrain) of 50 lines of mice, including the C57BL/6J and DBA/2J parental strains, their F1 hybrid (B6D2F1), 30 BXD recombinant inbred strains, and 17 more common inbred strains of mice. Data were generated using the new Affymetrix Mouse Exon 1.0 ST short oligomer microarrays by Weikuan Gu, Yan Jiao, David Kulp, and Lu Lu, Glenn D. Rosen, and Robert W. Williams with the support of a grant from the High Q Foundation. This is the first "all exons" array that we have entered into GeneNetwork and the data are still experimental. Approximately 300 brain samples (males and females) from 50 strains were used in this experiment. This data set includes 97 arrays that passed very stringent quality control procedures. Data were processed using the RMA method of Irizarry, Bolstad, Speed, and colleagues. To simplify comparison among transforms, RMA values of each array were adjusted to an average expression of 8 units and a standard deviation of 2 units.
About the strains and cases used to generate this set of data:
We have used a set of 30 BXD recombinant inbred strains generated by crossing C57BL/6J (B6 or B) with DBA/2J (D2 or D). The BXDs are particularly useful for systems genetics because both parental strains have been sequenced (8x coverage of B6 and 1.5x coverage of D). Physical maps in WebQTL maps incorporate approximately 1.75 million B vs D SNPs from Celera. BXD2 through BXD32 were bred by Benjamin A. Taylor starting in the late 1970s. BXD33 through 42 were bred by Taylor in the 1990s. All of these strains are available from The Jackson Laboratory.
Mouse Diversity Panel (MDP). We have also profiled a MDP consisting at total of 19 inbred strains (this number includes the C57BL/6J and DBA/2J strains) and one F1 hybrid (B6D2F1 only; not D2B6F1 yet). Strains were selected for several reasons:
- genetic and phenotypic diversity, including use by the Phenome Project
- their use in making genetic reference populations including recombinant inbred strains, cosomic strains, congenic and recombinant congenic strains
- their use by the Complex Trait Consortium to make the Collaborative Cross (Nairobi/Wellcome, Oak Ridge/DOE, and Perth/UWA)
- genome sequence data from three sources (NHGRI, Celera, and Perlegen-NIEHS)
- availability from The Jackson Laboratory
Seven of the eight parents of the Collaborative Cross (129, A, C57BL/6J, NOD, NZO, PWK, and WSB) have been included. CAST/Ei is the member of the Collaborative Cross that is currently missing from this data set. Thirteen of the MDP strains have been sequenced by Celera, NIH, or by Perlegen for the NIEHS. This panel will be extremely helpful in systems genetic analysis of a wide variety of traits, and will be a powerful adjunct in fine mapping modulators using what is essentially an association analysis of sequence variants.
Collaborative Cross strain sequenced by NIEHS; background for many knockouts; Phenome Project A list
Collaborative Cross strain sequenced by Perlegen/NIEHS; parent of the AXB/BXA panel
Sequenced by NIEHS; Phenome Project B list
Sequenced by NIEHS; maternal parent of the CXB panel; Phenome Project A list
- BTBR T<+> tf/J
Phenome Project group D strain. Used in mutagenesis studies. This black and tan strain carries the recessive tufted allele and is wildtype at the T locus (brachyury).
An isolated recombinant inbred strain generated by crossing C57BL/6J and SB/Le that is used to study autoimmune disease. Males are deficient in pre-B cells.
Sequenced by Perlegen/NIEHS; paternal parent of the BXH panel; Phenome Project A list
Sequenced by NHGRI; parental strain of AXB/BXA, BXD, and BXH; Phenome Project A list
Sequenced by Perlegen/NIEHS and Celera; paternal parent of the BXD panel; Phenome Project A list
Sequenced by Perlegen/NIEHS. Phenome Project group A strain.
Sequenced by Perlegen/NIEHS
Sequenced by Perlegen/NIEHS. Phenome Project B strain.
Phenome Project B list. Please note that the substrain is B-el-J not B-eye-NJ.
Collaborative Cross strain sequenced by NIEHS; Phenome Project B list; diabetic
Collaborative Cross strain
Phenome Project D strain
Sequenced by Perlegen/NIEHS; parental strain for a consomic set by Forjet and colleagues. Not part of the Phenome Project.
Collaborative Cross strain; Phenome Project D list
Collaborative Cross strain sequenced by NIEHS; Phenome Project C list
This F1 hybrid was generated by crossing C57BL/6J with DBA/2J at the Jackson Laboratory. They are also be designated (incorrectly) as B6D2F1/J.
All of these strains are available from The Jackson Laboratory.
About the tissue used to generate this set of data:
Many of the tissue samples used in this exon array study were also used in our previous M430 analysis of the striatum, providing a partially matched Exon-M430 pair of data sets. However, the previous study included fewer samples (47) and fewer strains (31 total). Animals were obtained from The Jackson Laboratory and housed for several weeks at BIDMC until they reached ~2 months of age (range from 55 to 62 days). Mice were killed by cervical dislocation and brains were removed and placed in RNAlater for 20 to 25 minutes prior to dissection. Cerebella and olfactory bulbs were removed; brains were hemisected, and both striata were dissected using a medial approach by GD Rosen that typically yields 5 to 7 mg of tissue per side.
All striatal dissections were performed by one person (GD Rosen) using a midsagittal approach that minimizes the likelihood of contamination across tissues. This dissection recovers most, but not all, of neostraitum. We have histologically examined dissected tissue and have found that no evidence of inclusion of cortical or thalamic tissue at the margins. We have further confirmed the dissections by comparative assays for acetylcholinesterase (AChE) protein levels using Western blots. The concentration of AChE in the striatum is far higher than that in cortex or cerebellum. A pool of dissected tissue from 3 or 4 adults (approximately 25 to 30 mg of tissue) of the same strain, sex, and age was collected in one session and used to generate cRNA samples.
Roughly 90 to 95% of all cells in the striatum are medium spiny neurons (Gerfen, 1992, for a review of the structure and function of the neostriatum).
RNA Extraction: RNA was extracted by Rosen and colleagues between June 2, 2004 and March 8, 2006. In brief, we used the RNA STAT-60 protocol (TEL-TEST "B" Bulletin No. 1), steps 5.1A (homogenization of tissue), 5.2 (RNA extraction), 5.3 (RNA precipitation), and 5.4 (RNA wash). In Step 5.4 we stopped after adding 75% ethanol (1 ml per 1 ml RNA STAT-60) and stored the mix at -80 deg C until further use. Before RNA labeling we thawed samples and proceeded with the remainder of Step 5.4; pelleting, drying, and redissovling the pellet in RNAase-free water.
RNA samples were then processed by the array core at the VA Medical Center by Drs. Yan Jiao and Weikuan Gu (Director of the the DNA Discovery Core of the UTHSC Center of Genomics and Bioinformatics). Labeled cRNA was generated using the standard Affymetrix whole transcript sense target labeling protocol.
Legend: Summary of protocol from http://www.affymetrix.com/products/reagents/wt_cdna_synthesis_amp_chart.jsp) as carried out by Dr. Yan Jiao.
Replication and Sample Balance: The aim of our standard operating procedure is to obtain data for independent biological sample pools from each sex for all strains. We have succeeded for 44 of 50 strains. Several strains are represented by only a single sex or a single sample pool. This sex imbalance can lead to bias with respect to transcripts that have genuine sex differences. One way to handle this issue is to study the correlation between a proxy variable for this bias, as represented by the Xist probe set 5153684, and a data set of interest.
Legend: Sex balance in this data set is illustrated using the sex-specific Xist gene and one of its probe sets (Affy Exon ST probe set: 5153684). Most samples include one male sample pool with very low Xist expression (6 or 7) and one female sample pool with high Xist expression (10 to 12). As a result 43 of the 50 strains have both intermediate values and high variance. The B6D2F1 sample has no error bar due to an early data entry error. Strains for which samples are only male or only female are at the extreme left and right sides of this bar chart, respectively.
- Strains with two male samples: KK/HlJ, BTBRT<+>tf/J
- Strains with two female samples:BXD5, BXD22
- Only a single female sample:BXD29
- The status of BXD23 is not clear and may represent a single male sample or a possible mixed sex pool.
Batch Structure: This data set consists of 97 arrays processed in 8 batches. All arrays were processed by a single skilled operator (Dr. Yan Jiao) between and October 20 and Nov 29, 2006 (scan dates from Oct 26 to Nov 29). In general, the male and female samples from a single strain were run within a single batch.
Data Table 1:
Mouse Exon 1.0 ST data: The table below lists arrays by strain, age, sex, case id, and batch ID. Each array was hybridized to a pool of mRNA from 3 to 4 mice. All mice were between 48 and 71 days.
About the array platfrom :
Affymetrix Mouse Exon ST 1.0 array: The Exon 1.0 ST (sense target) array consists of approximately 4.5 million useful 25-nucleotide probes that estimate the expression of approximately 1 million exon clusters. The array sequences were selected in 2006 using Unigene Build XXX.
About data processing:
Probe (cell) level and Probe set data from the CEL file:
1. Probes overlapping SNPs were removed from the design file
2. Affymetrix Power Tools(APT) package was used extract CEL values and perform RMA normalization
3. Probe set values were normalized to mean=8 and sd=2 (per chip)
4. Strain average was calculated by averaging over chips that belong to same strain
Probe set data from the CHP file: The expression values were
generated by Manjunatha in David Kulp's group at the University of Massachusetts Amherst using RMA. The same simple steps described above
were also applied to these values. Every microarray data set
therefore has a mean expression of 8 with a standard deviation of 2.
A 1 unit difference represents roughly a two-fold difference
in expression level. Expression levels below 5 are usually close to
background noise levels.
- Step 1: Probes overlapping SNPs were removed from the design file
- Step 2: Affymetrix Power Tools(APT) package was used extract CEL values and perform RMA normalization
- Step 3: Probe set values were normalized to mean=8 and sd=2 (per array)
- Step 4: Strain averages were calculated by averaging over all arrays that belong to same strain (3 maximum in this data set)
Data quality control: A total of 97 samples passed RNA quality control.
Part1: Testing if replicates come from the same strain
- RMA normalized values were used in this analysis
- Pair-wise correlations were calculated between all the arrays using the probesets with high variance and high median
- Probability density of correlations between non-replicate pairs and replicate-pairs were calculated
- Threshold of 0.85 using Maximum likelihood estimate
- In total 5 set of replicates might not have come from the same strains. (They are marked as 0 in Manju_Quality Score column)
Part 2: Testing if strain labeling is correct
- RMA normalized values were used in this analysis
- Only BXD strains were tested
- A set of strongly cis-linked probesets were identified (using linkage to nearest marker)
- The expression of these probesets was used to re-estimate the genotype of nearest marker
- The values of all re-estimated marker genotypes were compared to genotypes of all the BXD strains and optimal match was identified
- In total four set of replicates were found to be mislabeled.
Probe set level QC: The final normalized array data were evaluated for outliers. XXX arrays were considered outliers. These XXX suspect arrays were elimated from this data set. The following arrays were eliminated: XXX, YYY, ZZZ.
Data source acknowledgment:
Data were generated with funds to Weikuan Gu, Rob Williams, Glenn Rosen from the High Q Foundation. Samples and arrays were processed by Dr. Yan Jiao
Array Core at the University of Tennessee Health Science Center and VA Medical Center, Memphis.
About this text file:
This text file originally generated by RWW on July 24, 2007 using a template from a previous M430 Striatum data set. Updated by RWW July 26, 2007; MJ and RWW, Aug 7, 2007.