Cellular Information from Microarray
From Biocourse
Hybridization of RNA or DNA derived samples to probes (e.g., oligonucleotides or peptide nucleic acids or spotting of DNA fragments) on microarrays allows the monitoring various cellular events(Jain 2000 pharmacogenomics). Widespread use of these microarrays allows for the simultaneous interrogation of the status of a cell's components an any given time. Microarray data, available in public repositories in a standardized format, is also able to be integrated with other relevant data, including the available genome sequences and the literature and finally be emerged as biological networks. Here we will briefly review cellular information from microarray data and their availability in public sites.
1. DNA level
1-1. SNP genotyping
SNP(single nucleotide polymorphisms) are most abundent form of genomic variontion of humans (Brooks et al., The essence of SNPs, Gene, 1999) and they have been used in disease association research, population studies, and so on. Thus, there are many technique developments for the genotyping of SNPs on a large scale effectively and efficiently (Syvanen et al. 2005 Nature Genetics 37; refer table 1 of syvanen’s article, http://www.nature.com/ng/journal/v37/n6s/full/ng1558.html). Currently SNP data from various platform is available in many databases (refer Database category of SNP@Web; http://bioportal.ngic.re.kr/SNPatWEB/Wiki.jsp?page=Database). However main public repository is the dbSNP managed by NCBI (http://www.ncbi.nlm.nih.gov/projects/SNP/). Also, the phase 1 and 2 data of international HapMap is available from an online (http://www.hapmap.org). The HapMap homepage provides the linkage disequilibrium map of SNPs and tagSNPs selections.
1-2. STR typing
Short tandem repeat polymorphisms (STRPs, also known as microsatellites) present high heterozygosities and genetic diversities in the human genome (Calafell et al., 1998, European Journal of Human Genetics 6, 38-49) as they have been supported identification of individuals in forensic studies and of disease related loci in linkage studies as a genetic marker. Radteky’s group developed a novel STR oligonucleotide system designed to exploit differences in base stacking energy. (Radtkey et al., Nucleic Acids Research 2000 vol28 no 7 E17-e17) and Kemp’s group also developed the “variable-length probe array” method for STR profiling (VLPA) which involves hybridization of the unknown STR target sequence to a DNA microarray displaying complementary probes that vary in length to cover the range of possible STRs. With a post-hybridization enzymatic digestion of the DNA hybrids to selectively remove labeled single-stranded regions of DNA and the number of repeats deduced based on the pattern of target DNA that remains hybridized to the array (Kemp et al., Journal of Forensic Sciences, 2005, vol 50, issue 5). STR database developed by the National Institute of Standards and Technology (NIST) is available from (http://www.cstl.nist.gov/biotech/strbase/; Ruitberg et al. 2001 Nucleic Acids Research 2001 vol 29 no1 320-322)
1-3. CNP(copy-number polymorphism) detection
CNP (also known as large-scale variation) is a variation in the number of copies of a sequence within the DNA. A fundamental step towards identifying such CNP was the development of microarray-based comparative genome hybridization(array CGH) based on the assessment of fluorescence ratios between differentially labeled test and reference DNA, hybridized to a microarray. Altered fluorescence ratios are therefore indicative of DNA copy-number loss or gain in the test versus the reference genome (Buckley et al., 2005, Trends in genetics, vol 21. 6). Two recent studies presented surveys of copy-number variation in the human genome using comparative microarray technology (oligonucleotide and BAC-based). The Genome Variation Database about the CNPs is available at http://projects.tcag.ca/variation/(Iafrate et al., Nature genetics 36 949-951).
1-4. DNA methylation
DNA methylation is the most common covalent modification of the human genome (keith et al., Nature Reviews Genetics 1, 11-19(2000). Since, DNA methylation is directly connected to transcriptional repression through chromatin-remodeling complexes DNA methylation is important in imprinting, X-inactivation, cancer and for the developmental control of gene expression (keith et al., Nature Reviews Genetics 1, 11-19(2000). Briefly two kinds of microarray-based methods have been developed for detecting DNA methylation patterns. One set of methods uses methylation-sensitive restriction endonucleases such as enzyme McrBC which cuts methylated, but not unmethylated, DNA. On the other hands, different method uses treatment of genomic DNA with sodium bisulfite, which converts cytosine, but not cytosine with methylation of the C5 position, to uracil ( Steensel et al. Nature Genetics 37, S18 - S24 (2005)). The DNA Methylation database, also known as MethDB, is the public database for DNA methylation (http://www.methdb.net) containing methylation patterns, profiles and total methylation content data from a total of 6667 experiments (as of September 4, 2002; christoph et al Nucleic Acids Research, 2001, Vol. 29, No. 1 270-274;celine et al Nucleic Acids Research, 2003, Vol. 31, No. 1 75-77).
2. RNA level
2-1 mRNA expression
The principle of a microarray experiment is that mRNA from a given cell line or tissue is used to generate a labeled sample(target), which is hybridized in parallel to a large number of DNA sequences, immobilized on a solid surface in an ordered array (Schena et al., 1995, Science 270, 467-470). As a result, tens of thousands of transcripts can be detected and quantified simultaneously. During recent years, DNA microarray technology has been advancing rapidly and the most commonly used systems today can be divided into two groups, according to the arrayed material: complementary DNA (cDNA) and oligonucleotide microarrays (Schulze et al., Nature Cell Biology 3, E190-195). The main public repository for mRNA expression microarray data is GEO (Gene Expression Omnibus; Edgar et al., 2002, Nucleic Acids Research, Vol. 30, No. 1 207-210; http://www.ncbi.nlm.nih.gov/geo). And there are tons of applications for gene expression data analysis including normalization, imputation, clustering, and statistical analysis for selection and interpretation gene groups.
2-2. miRNA expression
MicroRNA, also known as miRNA are endogenous ~22 nt RNAs that can play important regulatory roles in animals and plants by targeting mRNAs for cleavage or translational repression. The miRNAs comprise one of the more abundant classes of gene regulatory molecules in multicellular organisms and likely influence the output of many protein-coding genes. (David et al., 2004, Cell, Vol 116, 281-297, 23) Recently Isaac's group proposed total number of human miRNAs is at least 800(Isaac et al.,2005 Nature G).
enetics 37, 766-770, 2005) High-throughput miRNA gene expression analysis is a technical challenge since very small RNAs are known to be difficult to reliably amplify or label without introducing bias (Peter et al., 2004, Nature Methods 1, 155-161). Peter’s group developed a method to achieve high-throughput gene expression analyses of miRNAs beyond earlier attempts including dot plots or northern blots (Peter et al., 2004, Nature Methods 1, 155-161). A comprehensive and searchable database of published miRNA sequences is accessible via a web interface (http://www.sanger.ac.uk/Software/Rfam/mirna/) (Nucleic Acids Research, 2004, Vol. 32, Database issue D109-D111).
2-3. exon expression
Alternative pre-messenger RNA splicing process plays important roles in development, physiology, and disease (Jason M.J et al., 2003 Science 302 2141-2144). Jason’s group demonstrates at least 74% of human multi-exon genes are alternatively spliced (Jason M.J et al., 2003 Science 302 2141-2144). Recently alternative splicing microarrays have been designed with oligonucleotide probes, typically 25–60 nucleotides in length, that are specific to both exons and exon–exon junctions (Arianne J. et al., Nature Reviews Molecular Cell Biology 6, 386-398 2005). MAASE(http://maase.genomics.purdue.edu) is an alternative splicing database designed for supporting splicing microarray applications including a user-friendly database of annotated events that allows convenient export of information to aid in microarray design and data analysis (Christina L. Z. et al., RNA,2005, 11:1767-1776)
3. Interaction of DNA and Proteins
Interactions between proteins and DNA mediate transcription, DNA replication, recombination, and DNA repair, thus a comprehensive understanding of where proteins interact with the genome in vivo would greatly increase our understanding of cellular events (Michael J.B. et al., genomics 83 2004 349-360). To probe the interaction directly, a technique fusing chromatin immunoprecipitation(ChIP) and microarray analysis(chip), known as ChIP-chip, was developed and used from yeast to mammalian cells(Sean E.H. et al., Current Opinion in Genetics & Development 2004, 14, 697-705).
The general protocol of Chip-chip techniques including: 1) cells or tissues are treated with formaldehyde, that can mediate protein-DNA cross-linking, 2) the cells are lysed, the chromatin is isolated and sheared into 200-1000 base pair fragments, 3) the protein of interest is then immunoprecipirated using a specific antibody and any DNA that is cross-linked to it will be co-immunoprecipirated, 4) The DNAs enriched during the IP step is analyzed(Devanjan S. et al., Current Opinion in Chemical Biology 2005, 9:38-45). Data generated by ChIP-chip is archived in GEO (Gene Expression Omnibus; http://www.ncbi.nlm.nih.gov/geo/; Tanya et al., NAR 2005 33:D562-D566).
