Information on Human Data Sets
The Genotype-Tissue Expression (GTEx) project (GSE45878 from GEO, entered into GeneNetwork, April 2014):
Best Recent Citation: The GTEx Consortium (2015) The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans. Science 348(6235): 648-660
Please review and cite the paper above as well as: John Lonsdale, Jeffrey Thomas, Mike Salvatore, Rebecca Phillips, Edmund Lo, Saboor Shad, Richard Hasz, Gary Walters, Fernando Garcia, Nancy Young, Barbara Foster, Mike Moser, Ellen Karasik, Bryan Gillard, Kimberley Ramsey, Susan Sullivan, Jason Bridge, Harold Magazine, John Syron, Johnelle Fleming, Laura Siminoff, Heather Traino, Maghboeba Mosavel, Laura Barker, Scott Jewell, Dan Rohrer, Dan Maxim, Dana Filkins, Philip Harbach, Eddie Cortadillo, Bree Berghuis, Lisa Turner, Eric Hudson, Kristin Feenstra, Leslie Sobin, James Robb, Phillip Branton, Greg Korzeniewski, Charles Shive, David Tabor, Liqun Qi, Kevin Groch, Sreenath Nampally, Steve Buia, Angela Zimmerman, Anna Smith, Robin Burges, Karna Robinson, Kim Valentino, Deborah Bradbury, Mark Cosentino, Norma Diaz-Mayoral, Mary Kennedy, Theresa Engel, Penelope Williams, Kenyon Erickson, Kristin Ardlie, Wendy Winckler, Gad Getz, David DeLuca, Daniel MacArthur, Manolis Kellis, Alexander Thomson, Taylor Young, Ellen Gelfand, Molly Donovan, Yan Meng, George Grant, Deborah Mash, Yvonne Marcus, Margaret Basile, Jun Liu, Jun Zhu, Zhidong Tu, Nancy J Cox, Dan L Nicolae, Eric R Gamazon, Hae Kyung Im, Anuar Konkashbaev, Jonathan Pritchard, Matthew Stevens, Timothèe Flutre, Xiaoquan Wen, Emmanouil T Dermitzakis, Tuuli Lappalainen, Roderic Guigo, Jean Monlong, Michael Sammeth, Daphne Koller, Alexis Battle, Sara Mostafavi, Mark McCarthy, Manual Rivas, Julian Maller, Ivan Rusyn, Andrew Nobel, Fred Wright, Andrey Shabalin, Mike Feolo, Nataliya Sharopova, Anne Sturcke, Justin Paschal, James M Anderson, Elizabeth L Wilder, Leslie K Derr, Eric D Green, Jeffery P Struewing, Gary Temple, Simona Volpi, Joy T Boyer, Elizabeth J Thomson, Mark S Guyer, Cathy Ng, Assya Abdallah, Deborah Colantuoni, Thomas R Insel, Susan E Koester, A Roger Little, Patrick K Bender, Thomas Lehner, Yin Yao, Carolyn C Compton, Jimmie B Vaught, Sherilyn Sawyer, Nicole C Lockhart, Joanne Demchok & Helen F Moore. Nature Genetics 45, 580–585 (2013).
GTEx explore all tissues:
The Genotype-Tissue Expression (GTEx) project. Genome-wide association studies have identified thousands of loci for common diseases, but, for the majority of these, the mechanisms underlying disease susceptibility remain unknown. Most associated variants are not correlated with protein-coding changes, suggesting that polymorphisms in regulatory regions probably contribute to many disease phenotypes. Here we describe the Genotype-Tissue Expression (GTEx) project, which will establish a resource database and associated tissue bank for the scientific community to study the relationship between genetic variation and gene expression in human tissues.
Summary from GEO: "The Genotype-Tissue Expression (GTEx) project is a collaborative effort that aims to identify correlations between genotype and tissue-specific gene expression levels that will help identify regions of the genome that influence whether and how much a gene is expressed. GTEx is funded through the Common Fund, and managed by the NIH Office of the Director in partnership with the National Human Genome Research Institute, National Institute of Mental Health, the National Cancer Institute, the National Center for Biotechnology Information at the National Library of Medicine, the National Heart, Lung and Blood Institute, the National Institute on Drug Abuse, and the National Institute of Neurological Diseases and Stroke, all part of NIH.
This series of 837 samples represents multiple tissues collected from 102 GTEX donors and 1 control cell line. In total, 30 tissue sites are represented including Adipose, Artery, Heart, Lung, Whole Blood, Muscle, Skin, and 11 brain subregions. RNA-seq expression data, robust clinical data, pathological annotations, and genotypes are also available for these samples from dbGaP (http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000424.v2.p1) and the GTEx portal. While GTEx is no longer generating Affymetrix expression data, donor enrollment continues and is expected to reach 1,000 by the end of 2015. Updates to the GTEx data in dbGaP and the GTEx Portal will be made periodically.
contributor: GTEx Laboratory, Data Analysis, and Coordinating Center (LDACC)
contributor: The Broad Institute of MIT and Harvard (LDACC PIs: Kristin Ardlie and Gaddy Getz)"
North American Brain Expression Consortium and UK Human Brain Expression Database: Gene Expression. (Series GSE36192 and GSE36194 from GEO, entered into GeneNetwork, January 2014):
Functional and Evolutionary Insights Into Human Brain Development Through Global Transcriptome Analysis. Entered into GeneNetwork, Jul 2011):
Please review and cite: Gibbs JR, Hernandez DG, Dillman A, Ryten M, Trabzuni D, Traynor BJ, Nalls MA, Arepalli S, Ramasamy A, van der Brug MP, Troncoso J, Johnson R, O'Brien R, Zielke HR, Zonderman A, Ferrucci L, Longo DL, Smith C, Walker R, Weale M, Hardy JA, Cookson MR, Singleton AB. PMID: 22433082.
North American Brain Expression Consortium and UK Human Brain Expression Database: Gene Expression. Genome-wide association studies have nominated many genetic variants for common human traits, including diseases, but in many cases the underlying biological reason for a trait association is unknown. Subsets of genetic polymorphisms show a statistical association with transcript expression levels, and have therefore been nominated as expression quantitative trait loci (eQTL). However, many tissue and cell types have specific gene expression patterns and so it is not clear how frequently eQTLs found in one tissue type will be replicated in others. In the present study we used two appropriately powered sample series to examine the genetic control of gene expression in blood and brain. We find that while many eQTLs associated with human traits are shared between these two tissues, there are also examples where blood and brain differ, either by restricted gene expression patterns in one tissue or because of differences in how genetic variants are associated with transcript levels. These observations suggest that design of eQTL mapping experiments should consider tissue of interest for the disease or other traits studied. Published by Elsevier Inc.
Summary from GEO Series GSE36192 and GSE: GSE36194 "A fundamental challenge in the post-genome era is to understand and annotate the consequences of genetic variation, particularly within the context of human tissues. We describe a set of integrated experiments designed to investigate the effects of common genetic variability on mRNA expression distinct human brain regions. We show that brain tissues may be readily distinguished based on expression profile. We find an abundance of genetic cis regulation mRNA expression. We observe that the largest magnitude effects occur across distinct brain regions. We believe these data, which we have made publicly available, will be useful in understanding the biological effects of genetic variation."
Human Connectome Project
Please review and cite: Johnson MB, Kawasawa YI, Mason CE, Krsnik Z, Coppola G, Bogdanović D, Geschwind DH, Mane SM, State MW, Sestan N (2009) Functional and evolutionary insights into human brain development through global transcriptome analysis.. Neuron 62: 494–509.
Exon Array expression data from 13 areas of the late second trimester human brain. Our understanding of the evolution, formation, and pathological disruption of human brain circuits is impeded by a lack of comprehensive data on the developing brain transcriptome. Thus, we have undertaken whole-genome, exon-level expression analysis of thirteen regions from left and right sides of the mid-fetal human brain, finding 76% of genes to be expressed, and 44% of these to be differentially regulated. These data reveal a large number of specific gene expression and alternative splicing patterns, as well as co-expression networks, associated with distinct regions and neurodevelopmental processes. Of particular relevance to cognitive specializations, we have characterized the transcriptional landscapes of prefrontal cortex and perisylvian speech and language areas, which exhibit a population-level global expression symmetry. Finally, we show that differentially expressed genes are more frequently associated with human-specific evolution of putative cis-regulatory elements. Altogether, these data provide a wealth of novel biological insights into the complex transcriptional and molecular underpinnings of human brain development and evolution.
Summary from GEO Series: GSE13344 "Tissue was microdissected from 13 regions, including 9 distinct neocortical areas, from both left and right sides of four late second trimester human brain specimens. Gene- and exon-level differential expression analyses were performed by mixed model, nested analysis of variance using the XRAY software from Biotique Systems. Further details available in Johnson, Kawasawa, et al., "Functional and Evolutionary Insights into Human Brain Development through Global Transcriptome Analysis" Neuron, Volume 62, Issue 4, 2009"
Human Liver Cohort (GSE9588 from GEO, entered into GeneNetwork, March 2011):
WU-Minn HCP Consortium Open Access Data Use Terms
Last updated: Apr 26, 2013.
I request access to data collected by the Washington University - University of Minnesota Consortium of the Human Connectome Project (WU-Minn HCP), and I agree to the following:
- I will not attempt to establish the identity of or attempt to contact any of the included human subjects.
- I understand that under no circumstances will the code that would link these data to Protected Health Information be given to me, nor will any additional information about individual human subjects be released to me under these Open Access Data Use Terms.
- I will comply with all relevant rules and regulations imposed by my institution. This may mean that I need my research to be approved or declared exempt by a committee that oversees research on human subjects, e.g. my IRB or Ethics Committee. The released HCP data are not considered de-identified, insofar as certain combinations of HCP Restricted Data (available through a separate process) might allow identification of individuals. Different committees operate under different national, state and local laws and may interpret regulations differently, so it is important to ask about this. If needed and upon request, the HCP will provide a certificate stating that you have accepted the HCP Open Access Data Use Terms.
- I may redistribute original WU-Minn HCP Open Access data and any derived data as long as the data are redistributed under these same Data Use Terms.
- I will acknowledge the use of WU-Minn HCP data and data derived from WU-Minn HCP data when publicly presenting any results or algorithms that benefitted from their use.
- Papers, book chapters, books, posters, oral presentations, and all other printed and digital presentations of results derived from HCP data should contain the following wording in the acknowledgments section: "Data were provided [in part] by the Human Connectome Project, WU-Minn Consortium (Principal Investigators: David Van Essen and Kamil Ugurbil; 1U54MH091657) funded by the 16 NIH Institutes and Centers that support the NIH Blueprint for Neuroscience Research; and by the McDonnell Center for Systems Neuroscience at Washington University."
- Authors of publications or presentations using WU-Minn HCP data should cite relevant publications describing the methods used by the HCP to acquire and process the data. The specific publications that are appropriate to cite in any given study will depend on what HCP data were used and for what purposes. An annotated and appropriately up-to-date list of publications that may warrant consideration is available at http://www.humanconnectome.org/about/acknowledgehcp.html
- The WU-Minn HCP Consortium as a whole should not be included as an author of publications or presentations if this authorship would be based solely on the use of WU-Minn HCP data.
- Failure to abide by these guidelines will result in termination of my privileges to access WU-Minn HCP data.
Alzheimer's disease Cases and Controls Liang (July 2009):
Please review and cite: Schadt EE, Molony C, Chudin E, Hao K, Yang X, Lum PY, Kasarskis A, Zhang B, Wang S, Suver C, Zhu J, Millstein J, Sieberts S, Lamb J, GuhaThakurta D, Derry J, Storey JD, Avila-Campillo I, Kruger MJ, Johnson JM, Rohl CA, van Nas A, Mehrabian M, Drake TA, Lusis AJ, Smith RC, Guengerich FP, Strom SC, Schuetz E, Rushmore TH, Ulrich R (2008) Mapping the genetic architecture of gene expression in human liver. PLoS Biology 6(5):e107. PMID: 18462017
Systematic Genetic and Genomic Analysis of Cytochrome P450 Enzyme Activities in Human Liver. Xia Yang, Bin Zhang, Cliona Molony, Eugene Chudin, Ke Hao, Jun Zhu, Christine Suver, Hua Zhong, F. Peter Guengerich, Stephen C. Strom, Erin Schuetz, Thomas H. Rushmore, Roger G. Ulrich, J. Greg Slatter, Eric E. Schadt, Andrew Kasarskis, Pek Yee Lum. Genome Res. 2010 Aug;20(8):1020-36.
The Human Liver Cohort (HLC) study aimed to characterize the genetic
architecture of gene expression in human liver using genotyping, gene expression
profiling, and enzyme activity measurements of Cytochrom P450. The HLC was
assembled from a total of 780 liver samples screened. These liver samples
were acquired from caucasian individuals from three independant tissue
collection centers. DNA samples were genotyped on the Affymetrix 500K SNP
and Illumina 650Y SNP genotyping arrays representing a total of 782,476 unique
single nucleotide polymorphisms (SNPs). Only the genotype data from those
samples which were collected postmortem are accessible in dbGap. These 228
samples represent a subset of the 427 samples included in the Human Liver
Cohort Publication (Schadt, Molony et al. 2008). RNA samples were profiled on
a custom Agilent 44,000 feature microarray composed of 39,280 oligonucleotide
probes targeting transcripts representing 34,266 known and predicted genes,
including high-confidence, noncoding RNA sequences. Each of the liver samples
was processed into cytosol and microsomes using a standard differential
centrifugation method. The activities of nine P450 enzymes (CYP1A2, 2A6, 2B6,
2C8, 2C9, 2C19, 2D6, 2E1, and 3A4) in isolated microsomes from 398 HLC liver
samples were measured in the microsome preparations using probe substrate
metabolism assays expressed as nmol/min/mg protein. Each was measured with a
single substrate except for the CYP3A4 activity that was measured using two
substrates, midazolam and testosterone.
Summary from GEO: "To uncover the genetic determinants affecting expression in a metabolically active tissue relevant to the study of obesity, diabetes, atherosclerosis, and other common human diseases, we profiled 427 human liver samples on a comprehensive gene expression microarray targeting greater than 40,000 transcripts and genotyped DNA from each of these samples at greater than 1,000,000 SNPs. The relatively large sample size of this study and the large number of SNPs genotyped provided the means to assess the relationship between genetic variants and gene expression and it provided this look for the first time in a non-blood derived, metabolically active tissue. A comprehensive analysis of the liver gene expression traits revealed that thousands of these traits are under the control of well defined genetic loci, with many of the genes having already been implicated in a number of human diseases."
Alzheimer's disease Cases and Controls Myers (April 2009):
Please cite: Liang WS, Reiman EM, Valla J, Dunckley T, Beach TG, Grover A, Niedzielko TL, Schneider LE, Mastroeni D, Caselli R, Kukull W, Morris JC, Hulette CM, Schmechel D, Rogers J, Stephan DA (2008) Alzheimer's disease is associated with reduced expression of energy metabolism genes in posterior cingulate neurons. Proc Natl Acad Sci USA 105:4441-4446.
Summary from GEO: "Information about the genes that are preferentially expressed during the course of Alzheimer's disease (AD) could improve our understanding of the molecular mechanisms involved in the pathogenesis of this common cause of cognitive impairment in older persons, provide new opportunities in the diagnosis, early detection, and tracking of this disorder, and provide novel targets for the discovery of interventions to treat and prevent this disorder. Information about the genes that are preferentially expressed in relationship to normal neurological aging could provide new information about the molecular mechanisms that are involved in normal age-related cognitive decline and a host of age-related neurological disorders, and they could provide novel targets for the discovery of interventions to mitigate some of these deleterious effects."
The CANDLE STUDY: Conditions Affecting Neurocognitive Development and Learning (June 2011):
Expression quantitative trait loci study using human brain from 363 cortical samples. Affymetrix 500K chip for genotyping, Illumina ref-seq 8 chip for expression. Genotypes are available at dbGAP.
Please cite: Webster JA, Gibbs JR, Clarke J, Ray M, Zhang W, Holmans P, Rohrer K, Zhao A, Marlowe L, Kaleem M, McCorquodale DS 3rd, Cuello C, Leung D, Bryden L, Nath P, Zismann VL, Joshipura K, Huentelman MJ, Hu-Lince D, Coon KD, Craig DW, Pearson JV; NACC-Neuropathology Group, Heward CB, Reiman EM, Stephan D, Hardy J, Myers AJ (2009) Genetic control of human brain transcript expression in Alzheimer disease. Am J Hum Genet 84:445-58.
Summary from GEO: Myers and colleagues generated massive neocortical transcriptome data sets for a set of unrelated elderly neurologically and neuropathologically normal humans and from confirmed late onset Alzheimer's disease patients (LOAD, n = 187 normal and 176 LOAD cases, see DOI:10.1016/j.ajhg.2009.03.011 for detail). They used an Illumina Sentrix Bead array (HumanRef-8) that measures expression of approximately 19,730 curated RefSeq sequences (Human Build 34).
Case identifiers: All case identifiers (IDs) in GeneNetwork begin with a capital C followed by a six digit GEO identifier, followed by the sex and age in years. Non-Alzheimer cases are labeled with the suffix letter N: C225652M85N. Alzheimer cases are labeled with the suffix letter A: C388217F97A.
Data were initially downloaded from the NCBI GEO archive under the experiment ID GSE15222. All data were generated using the Illumina HumanRef-8 expression BeadChip (GPL2700) v2 Rev0. This data set in GeneNetwork includes data for 24,354 probes. We have realigned the 50-mer sequences by BLAT to the latest version of the human genome (Feb 2009, hg19) and reannotated the array (August 2009). The annotation in GN will differ from that provided in GEO for this platform. We were unable to obtain 50-mer sequences for several thousand probes (e.g., HTT), and these probes have therefore not been realigned to the human genome.
The GEO data set was processed by Myers and colleagues using Illumina's Rank Invariant transform. We performed a series of QC and renormalization steps to the data to allow more facile comparison to other data sets in GeneNetwork. In brief, data is log2 transformed. We recentered each array to a mean expression of 8 units and a standard deviation of 2 units (2z + 8 transform). The values are therefore modified z scores and each unit represents roughly a two-fold difference in expression. Average expression across all 363 cases range from a low of 6 units (e.g., SYT15) to a high of 19 units for ARSK. APOE has an average expression of 15 units and APP has an average expression of 11.5 units.. The distribution is far from normal with a great excess of measurements of genes with low to moderate expression clustered between 6.5 and 8.5 units.
CEPH Immortalized B Cells (October 2008):
The CANDLE Study is a large multidisciplinary study of early child development that involves genetic, genomic, environmental, and large-scale behavioral evaluation of children and their families from the second trimester of development through to 4 years of age. The full study involves more than 1000 children and their mothers and fathers.
For information on genomic and genetic studies related to CANDLE, please contact: Drs. Ronald M. Adkins (email@example.com) and Julia Krushkal (firstname.lastname@example.org).
For information on the overall design of CANDLE, please contact: Dr. Frances A. Tylavsky (email@example.com).
Summary from The Urban Child Institute: The primary goal of the CANDLE study is to study factors that affect brain development in young children. To this end, the current study will test specific hypotheses regarding factors that may negatively influence cognitive development in children. Participants in this cohort study will include 1,500 mother-child dyads, recruited during the second trimester of pregnancy and followed from birth to age 3. Data on a wide range of possible influences on children's cognitive outcomes is being collected during pregnancy, at delivery, and at 1, 2, 3, and 4 years of age from numerous sources, including questionnaires, interviews, psychosocial assessments, medical chart abstraction, environmental samples from the child's home environment, blood and urine samples from the mother, cord blood, and placental tissue. The primary outcomes of the current study are those associated with cognitive measures. Outcomes are being measured using standardized cognitive assessments conducted at 12 months, 24 months, and 36 months of age. Epidemiological, clinical, and laboratory-based research may be undertaken using data from the project, with sub-studies including, but not limited to, molecular genetics, environmental exposure assessments, and micronutrient deficiency analyses. Results of this cohort study may provide information that will ultimately lead to improvements in the health, development, and well-being of children in Shelby County, Tennessee through interventions and policy enforcement and/or development. Full participant recruitment and complete data collection began in November 2006.
Adkins RM, Thomas F, Tylavsky FA, Krushkal J (2011) Parental ages and levels of DNA methylation in the newborn are correlated. BMC Med Genet. 12:47.
Adkins RM, Krushkal J, Tylavsky FA, Thomas F (2011) Racial differences in gene-specific DNA methylation levels are present at birth. Birth Defects Res A Clin Mol Teratol. 91:728-736
Schroeder JW, Conneely KN, Cubells JC, Kilaru V, Newport DJ, Knight BT, Stowe ZN, Brennan PA, Krushkal J, Tylavsky FA, Taylor RN, Adkins RM, Smith AK (2011) Neonatal DNA methylation patterns associate with gestational age. Epigenetics 6:1498-504.
UTHSC CEPH C-cells Illumina (Sept09) RankInv data were generated by Malak Kotb, Robert W. Williams, and colleagues. Please contact Robert Williams at UTHSC regarding use of these data.
Monks CEPH-D-cells Agilent (Dec04) Log10Ratio data were generated by Stephanie Monks (Stephanie Santorico), Eric Schadt, and collaborators.
About this file:
The file started, Aug 6, 2009 by AC. Last update by RWW, June 7, 2011.
The GTEx SNP files was updated by Ashutosh Pandey and Lei Yan, February 20, 2015.