FINAL database. Error-checked.
Please cite: Alberts R, Lu L, Williams RW, Schughart K (2011) Genome-wide analysis of the mouse lung transcriptome reveals novel molecular gene interaction networks and cell-specific expression signatures. Respir Res 12:61
This is the final lung gene expression data set for 57 strains of mice generated using the M430 2.0 Affymetrix array. The data set includes estimates of expression for 8 common inbred strains, 47 BXD strains, and reciprocal F1 hybrids (B6D2F1 and D2B6F1). Data were generated by Klaus Schughart, Lu Lu, and Rob Williams. Arrays were processed by Yan Jiao and Weikuan Gu at the Memphis VA. For questions about these data please contact Prof. Klaus Schughart (Helmholtz Centre for Infection Research, Braunschweig, Germany) at email@example.com.
This data set was processed using the RMA protocol. A total of 2223 probes sets are associated with LRS values greater than 46 (LOD >10).
About the cases used to generate this set of data:
This is the final HZI Lung data set. Almost all animals are young adults between 50 and 80 days of age. We measured expression in conventional inbred strains, BXD recombinant inbred (RI) strains, reciprocal F1s between C57BL/6J and DBA/2J, and several mutant and knockout lines. We have combined all common strains, F1 hybrids, and mutants into a group called the Mouse Diversity Panel (MDP). Four lines, namely, C57BL/6J (B6), DBA/2J (D2), and the pair of B6D2F1 and D2B6F1 hybrids are common to both the MDP and the BXD set. This is a breakdown of cases that are part of HEIMED:
- 47 BXD strains. The first 32 of these strains are from the Taylor series of BXD strains generated at the Jackson Laboratory by Benjamin A. Taylor. BXD1 through BXD32 were started in the late 1970s, whereas BXD33 through 42 were started in the 1990s. Only one of these strains, BXD24 (know also known as BXD24b), has retinal degeneration (a spontaneous mutation). The other 36 BXD strains (BXD43 and higher) were bred by Lu Lu, Jeremy Peirce, Lee M. Silver, and Robert W. Williams starting in 1997 using B6D2 generation 10 advanced intercross progeny. This modified breeding protocol doubles the number of recombinations per BXD strain and improves mapping resolution (Peirce et al. 2004). All of the Taylor series of BXD strains and many of the new BXD strains are available from the Jackson Laboratory. All of the new BXD strains (BXD43 and higher) are also available directly from Lu Lu and colleagues at the University of Tennessee Health Science Center in Memphis, TN, USA.
- 10 MDP lines, including some of the most widely used common Mus musculus domesticus inbred strains (e.g., C57BL/6J and 129X1/SvJ), and one inbred but wild-derived representatives this subspecies (WSB/EiJ).
- BALB/cByJ: Sequenced by NIEHS; maternal parent of the CXB panel; Phenome Project old group A list. A tyrosinase (Tyr c allele) albino mutant and also a tyrosinase related protein 1 (Tyrp1 b) brown allele mutant. Small brain, not aggressive (JAX Stock Number: 001026)
- C57BL/6J: Sequenced by NIH/NHGRI; parental strain of AXB/BXA, BXD, and BXH; Phenome Project A list. Single most widely used inbred strain of mouse. (JAX Stock Number: 000664)
- DBA/2J: The dilute, brown, agouti (dba) strain is the oldest inbred strain of mouse. Inbreeding was started in 1909 by Little. A tyrosinase related protein 1 (Tyrp1 b) brown allele mutant. A myosin 5a (Myo5a d) dilute allele mutant. Sequenced by Perlegen/NIEHS and Celera; paternal parent of the BXD panel; Phenome Project old A group list. (JAX Stock Number: 000671)
- FVB/NJ: Friend's leukemia virus B (FVB) strain. Sequenced by Perlegen/NIEHS and Celera. Tyr c locus albino and a Pdeb6 rd1 mutant derived from Swiss mice at NIH. This has been the most common strain used to make transgenic mice due to large and easily injected oocytes; Phenome Project A list (JAX Stock Number: 001800).
- LP/J: White-bellied agouit strain with a piebald mutation in the endothelin receptor type B Ednrb gene from at the Jackson Laboratory. Some reduction in melanocytes in choroid of eye due to neural creast migration abnormalities. (JAX Stock Number: 000676)
- SJL/J: Swiss Webster inbred strain from Jim Lambert's lab at the Jackson Laboratory. This strain has the retinal degeneration rd1 allele in Pde6b. It also carries both the Tyr c albino mutation and the pink-eye dilution mutation in the Oca2 or p locus. Highly aggressive males. (JAX Stock Number: 000686)
- WSB/EiJ: Watkin Star line B (or "wild son-of-a-bitch") is a wild-derived Mus musculus domesticus inbred strain from samples caught in Maryland, USA. A Collaborative Cross strain sequenced by NIEHS; Phenome Project C list (JAX Stock Number: 001145)
- B6D2F1 and D2B6F1 (also listed as BDF1 and DBF1 in some graphs and tables): F1 hybrids generated by crossing C57BL/6J with DBA/2J. These black reciprocal F1 can be used to detect dominance effects. Comparison of the two reciprocal F1s can be used to detect parental origin (imprinting) effects. The D2B6F1 animals are currently available from the Jackson Laboratory as a special order.) (JAX Stock Number for B6D2F1 hybrids obtained from the Jackson Laboratory, aka B6D2F1/J 100006)
About the tissue used to generate this set of data:
Tissue preparation protocol. Animal were killed by rapid cervical dislocation. Lungs were removed immediately and placed in RNAlater at room temperature. Usually lungs from 2 to 4 animals with a common sex, age, and strain were stored in a single tube.
Each array was hybridized with a pool of cRNA from lungs from 2 to 4 animals. RNA was extracted at UTHSC by Zhiping Jia. If tissue was saved for RNA extraction at a later time, eyes were placed directly in RNAlater (Ambion, Inc.) and treated per the manufacturer’s directions. If eyes were used for immediate RNA extraction then we proceeded immediately to the next steps.
Dissecting and preparing lungs for RNA extraction
- Place lungs for RNA extraction in RNA STAT-60 (Tel-Test Inc.) and process per manufacturer’s instructions (in brief form below).
- Store RNA in 75% ethanol at –80 deg. C until use.
Total RNA was extracted with RNA STAT-60 (Tel-Test Inc.) according to the manufacturer's instructions. Briefly we:
- homogenize tissue samples in the RNA STAT-60 (1 ml/50 to 100 mg tissue)
- allowed the homogenate to stand for 5 min at room temperature
- added 0.2 ml of chloroform per 1 ml RNA STAT-60
- shook the sample vigorously for 15 sec and let the sample sit at room temperature for 3 min
- centrifuged at 12,000 G for 15 min
- transfered the aqueous phase to a fresh tube
- added 0.5 ml of isopropanol per 1 ml RNA STAT-60
- vortexed and allowed sample to stand at room temperature for 5-10 min
- centrifuged at 12,000 G for 10-15 min
- removed the supernatant and washed the RNA pellet with 75% ethanol
- stored the pellet in 75% ethanol at -80 deg C until use
Sample Processing. All samples were processed in the VA Medical Center, Memphis, Rheumatology Disease Research Core Center led by Dr. Weikuan Gu. All arrays were processed by Dr. Yan Jiao. In brief, samples were purified using a standard sodium acetate in alcohol method (recommended by Affymetrix). The RNA quality was checked using a 1% agarose gel. The 18S and 28S bands had to be clear and the 28S band had to be more prominent. RNA concentation was measured using a spectrophotometer. The 260/280 ratios had to be greater than 1.7, and the majority were 1.8 or higher. We used a total of 8 micrograms of RNA as starting amount for cDNA synthesis using a standard Eberwine T7 polymerase method (Superscript II RT, Invitrogen Inc., Affy Part No 900431, GeneChip Expression 3' Amplification One-Cyle cDNA Synthesis Kit). The Affymetrix IVT labeling kit (Affy 900449) was used to generate labeled cRNA. At this point the cRNA was evaluated again using both the 260/280 ratio (values of 2.0 or above were acceptable) and 1% agarose gel inspection of the product (a size range from 200 to 7000 bp is considered suitable for use). We used 45 micrograms of labeled cRNA for fragmentation. Those samples that passed both QC steps (<10% usually fail) were then sheared using a fragmentation buffer included in the Affymetrix GeneChip Sample Cleanup Module (Part No.900371). After fragmentation, samples were either stored at -80 deg. C until use (roughly one third) or were used immediately for hybridization.
Replication, sex, and sample balance: Our goal was to obtain data for independent biological sample pools for as many lines of mice as possible. We studies both sexes only for the 10 MDP strains and BXD98 (11 strains total). All other strains we sampled only for a single sex pool.
Table 1: Lung case IDs, including sample tube ID, strain, age, sex, and source of mice
FORMAT THIS CORRECTLY
Index RNA_tube_ID Strain Age Sex F_generation Batch_ID Pool_size Source
1 R4495LU C57BL/6J 65 F 4 3 UTM RW
2 R4496LU C57BL/6J 65 M 4 2 UTM RW
3 R4499LU DBA/2J 65 F 4 3 ORNL
4 R4500LU DBA/2J 59 M 4 2 JAX
5 R4486LU B6D2F1 70 F 4 2 UTM RW
6 R4485LU B6D2F1 62 M 4 5 UTM RW
7 R4489LU D2B6F1 61 F 4 2 UTM RW
8 R4490LU D2B6F1 61 M 4 3 UTM RW
9 R4442LU BXD1 88 F 1 3 UTM RW
10 R4470LU BXD2 84 M 152 3 3 UTM RW
11 R4478LU BXD6 92 M 161 3 3 UTM RW
12 R4475LU BXD9 78 M 132 3 3 UTM RW
13 R4444LU BXD12 61 F 1 3 ORNL
14 R4436LU BXD14 85 F 126 1 2 UTM RW
15 R4443LU BXD16 79 F 1 5 UTM RW
16 R4446LU BXD19 49 F 1 3 ORNL
17 R4445LU BXD21 50 F 1 3 ORNL
18 R4483LU BXD22 66 M 4 2 UTM RW
19 R4484LU BXD25 54 M 135 4 3 UTM RW
20 R4447LU BXD27 85 F 1 3 UTM RW
21 R4448LU BXD31 81 F 124 1 3 UTM RW
22 R4449LU BXD32 68 F 2 5 ORNL
23 R4450LU BXD33 61 F 2 2 ORNL
24 R4437LU BXD34 58 F 1 5 UTM RW
25 R4438LU BXD39 63 F 60 1 3 UTM RW
26 R4439LU BXD40 54 F 1 3 ORNL
27 R4451LU BXD42 65 F 2 2 UTM RW
28 R4452LU BXD43 79 F 33 2 2 UTM RW
29 R4440LU BXD45 unk unk 32 1 2 UTM RW
30 R4453LU BXD45 60 F 30 2 4 UTM RW
31 R4462LU BXD48 61 F 20 2 3 UTM RW
32 R4441LU BXD50 64 F 1 4 ORNL
33 R4460LU BXD51 81 M 31 2 2 UTM RW
34 R4454LU BXD55 80 M 2 3 ORNL
35 R4455LU BXD56 91 M 2 3 ORNL
36 R4463LU BXD60 93 M 33 2 2 UTM RW
37 R4464LU BXD62 80 M 30 2 2 UTM RW
38 R4477LU BXD65 59 F 29 3 3 UTM RW
39 R4456LU BXD66 80 F 28 2 3 UTM RW
40 R4457LU BXD68 65 F 25 2 4 UTM RW
41 R4465LU BXD69 63 M 31 2 5 UTM RW
42 R4466LU BXD70 75 M 25 2 3 UTM RW
43 R4467LU BXD71 64 M 20 2 4 UTM RW
44 R4468LU BXD73 59 M 34 2 3 UTM RW
45 R4469LU BXD75 51 M 30 3 4 UTM RW
46 R4471LU BXD83 75 M 20 3 2 UTM RW
47 R4472LU BXD84 78 M 21 3 2 UTM RW
48 R4473LU BXD86 77 M 28 3 3 UTM RW
49 R4474LU BXD87 67 M 24 3 3 UTM RW
50 R4459LU BXD89 79 F 25 2 2 UTM RW
51 R4476LU BXD90 63 M 29 3 3 UTM RW
52 R4479LU BXD96 71 M 26 3 3 UTM RW
53 R4461LU BXD97 80 M 21 2 3 UTM RW
54 R4480LU BXD97 80 M 28 3 3 UTM RW
55 R4481LU BXD98 80 M 25 3 2 UTM RW
56 R4482LU BXD99 72 M 21 3 2 UTM RW
57 R4435LU BXD100 64 F 20 1 2 UTM RW
58 R4497LU 129X1/SvJ 65 F 4 4 JAX
59 R4498LU 129X1/SvJ 66 M 4 4 JAX
60 R4487LU BALB/cByJ 91 F 4 3 UTM RW
61 R4488LU BALB/cByJ 91 M 4 2 UTM RW
62 R4491LU FVB/NJ 62 F 4 5 UTM RW
63 R4492LU FVB/NJ 73 M 4 3 UTM RW
64 R4501LU LP/J 65 F 4 4 JAX
65 R4502LU LP/J 65 M 4 4 JAX
66 R4503LU SJL/J 63 F 4 4 JAX
67 R4504LU SJL/J 65 M 4 4 JAX
68 R4493LU WSB/EiJ 76 F 4 3 UTM RW
69 R4494LU WSB/EiJ 76 M 4 3 UTM RW
About downloading this data set:
This data set will eventually be available as a bulk download in several formats. Please contact Arthur Centeno or Robert W. Williams for a link to the FTP site associated with this Lung RMA GeneNetwork data set. The data will be available as either strain means or the individual arrays.
About the array platfrom:
Affymetrix Mouse Genome 430 2.0 arrays: The 430 2.0 array consists of 992936 25-nucleotide probes that estimate the expression of approximately 39,000 transcripts (many probes overlap and target the same transcript). The array sequences were selected late in 2002 using Unigene Build 107. The array nominally contains the same probe sequences as the old M430A and 430B array pair. However, we have found that roughy 75000 probes differ between those on A and B arrays and those on the new 430 2.0.
About data values and data processing:
Range of Gene Expression in the Lung. Expression of transcripts in the lung and most other GN data sets is measured on a log2 scale. Each unit corresponding approximately to a 2-fold difference in hybridization signal intensity. To simplify comparisons among different data sets and cases, log2 RMA values of each array have been adjusted to an average expression of 8 units and a standard deviation of 2 units (variance stabilized). Values of all 45,101 probe sets in this data set range from a low of 5.04 (Clca2, probe set 1437578_at) to a high of 15.1 (hemaglobin alpha, adult chain 1, Hba-a1, probe set 1428361_x_at). This corresponds to about 10 units or a dynamic range of expression 2^10.
We calibrated this log intensity scale using Affymetrix spike-in control probe sets. (This analysis was done using the very similar HEIMED EYE data.) These 18 control probe sets target exogenous bacterial mRNAs that are added to each sample (a graded dose spike cocktail) during preparation at concentrations of 1.5, 5, 25, and 100 pM. (To find these probe sets, search GN’s ALL search field using the string “AFFX pM”.) A value of 6 or less is equivalent to an mRNA concentration of under 0.4 pM, a value of 8 is equivalent to ~1.5 pM, 9.5 is equivalent to ~5 pM, 11.5 is equivalent to ~25 pM, 13.5 is equivalent to ~100 pM, and a value of 15.5 is equivalent to an mRNA concentration of 400 pM or greater.
This range can be converted to the mRNA molecules per cell in the eye assuming that a value of 8 is equivalent to about 1 mRNA copy per cell (Kanno et al. 2006, see http://www.biomedcentral.com/1471-2164/7/64).
Note that some probe sets with very low expression still provide reliable data. For example, probe set 1445621_at (Kbtbd4 ) has expression of only 5.1 units (a value that would be declared as "absent" using conventional Affymetrix procedures), but the values for this transcript are associated with a very strong cis QTL with an LRS of 55 (LOD > 10, high D2 allele). This strong linkage is definitely not due to chance since the probability of the expression data mapping precisely to the location of the parent gene itself is about 10e-12. This indicates a high signal to noise ratio and the detection of significant strain variation of the correct transcript.
The standard errors of the mean for the lung data was computed only for 11 strains. The standard error of such small samples tends to systematically underestimate the population standard error. With n = 2 the underestimate is about 25%, whereas for n = 6 the underestimate is 5%. Gurland and Tripathi (1971) provide a correction and equation for this effect (see Sokal and Rohlf, Biometry, 2nd ed., 1981, p 53 for an equation of the correction factor for small samples of n < 20.)
Probe (cell) level data from the CEL file: These CEL values produced by GCOS are 75% quantiles from a set of 91 pixel values per cell. The CEL files were processed using the RMA protocol. Data were processed as a single batch.
- Step 1: We added an offset of 1.0 unit to each cell signal to ensure that all values could be logged without generating negative values. We then computed the log base 2 of each cell.
- Step 2: We performed a quantile normalization of the log base 2 values for the total set of arrays using the same initial steps used by the RMA transform.
- Step 3: We computed the Z scores for each cell value.
- Step 4: We multiplied all Z scores by 2.
- Step 5: We added 8 to the value of all Z scores. The consequence of this simple set of transformations is to produce a set of Z scores that have a mean of 8, a variance of 4, and a standard deviation of 2. The advantage of this modified Z score is that a two-fold difference in expression level corresponds approximately to a 1 unit difference.
- Step 6: Finally, when appropriate, we computed the arithmetic mean of the values for the set of microarrays for each strain. Technical replicates were averaged before computing the mean for independent biological samples.
Data source acknowledgment:
Data were generated with funds provided by a variety of public and private source to members of the Kidney Consortium. We thank the following sources for financial support of this effort:
Klaus Schughart: Grant Support: Helmholtz Centre for Infection Research, Helmholtz Association
Robert W. Williams: Grant Support: NIH U01AA013499, P20MH062009, U01AA013499, U01AA013513
Information about this text file:
This text file originally generated by Klaus Schughart 3.2.2009.