Balkan Journal of Medical Genetics

DNA MICROARRAYS – HUMAN GENOME SURVEYED IN ONE AFTERNOON?
Nikolova D*, Toncheva D
*Corresponding Author: Dragomira Nikolova, M.Sc., Department of Medical Genetics, Medical Univer-sity, Zdrave, 2 Str, 1431 Sofia, Bulgaria; Tel./Fax: +359-2-952-03-57; E-mail: dmb@abv.bg
page: 11

cDNA MICROARRAYS

Until recently, assessing RNA in cells was time-consuming. Through DNA microarrays, the status of thousands of genes from any biological origin can be monitored simultaneously for changes in levels of gene expression. The difference between the old and new methods is striking. Traditional assays measure RNA transcripts from one gene at a time over a 3-day period. Gene chips can measure transcripts from thousands of genes in a single afternoon.
cDNA microarrays are capable of profiling gene expression patterns of tens of thousands of genes in a single experiment. DNA targets, in the form of 3' expressed sequence tags (ESTs), are arrayed onto glass slides (or membranes) and probed with fluorescent- or radioactively-labeled cDNAs [10]. More than 1.1 million expressed sequence tagged sites (ESTs) have been catalogued. They correspond with 52,907 unique human genes. However, the function, expression and regulation of more than 80% of these has yet to be fathomed.
Principle of Method. Templates for genes of interest are obtained and amplified by PCR. Following purification and quality control, aliquots (~5 nanoliters) are printed on coated glass microscope slides using a computer-controlled, high-speed robot. Total RNA from both the test and reference sample is fluorescently labeled with either Cye3- or Cye5-dUTP using a single round of reverse transcription. The fluorescent targets are pooled and allowed to hybridize under stringent conditions to the clones on the array. Laser excitation of the incorporated targets yields an emission with a characteristic spectrum, measured with a scanning confocal laser microscope. Monochrome images from the scanner are imported into software in which the images are pseudo-colored and merged. Information about the clones, including gene name, clone identifier, intensity values, intensity ratios, normalization constant and confidence intervals, is attached to each target. Results from a single hybridization experiment are viewed as a normalized ratio (Cye3/Cye5) in which significant deviations from 1 (no change) are indicative of increased (>1) or decreased (<1) levels of gene expression relative to the reference sample. In addition, data from multiple experiments can be examined using any number of data-mining tools.
The adaptable nature of the fabrication and hybridization methods allows the technique to be applied widely, the only limitations are the availability of clones for the solid phase and the quality of RNA samples derived from the cells (or tissues) to be compared. This is illustrated by diverse applications that include: investigating gene expression in the roots and leaves of Arabidopsis thaliana [11], human T cells exposed to phorbol ester [12], rheumatoid arthritis and inflammatory bowel disease [13], tumorigenic vs. non-tumorigenic cell lines [14], and the diauxic shift from anaerobic to aerobic metabolism in S.cerevisiae [15].
Fabrication. Production of arrays begins with the selection of the ‘probes’ to be printed on the array. In many cases, these are chosen directly from databases that include GenBank [16], dbEST [17] and UniGene [18]. Additionally, full-length cDNAs, collections of partially sequenced cDNAs (or ESTs), or randomly chosen cDNAs from any library of interest, can be used. Arrays for higher eukaryotes are typically based on the EST portions of these projects, whereas for yeast and prokaryotes, probes are usually generated by amplifying genomic DNA with gene-specific primers. Some ESTs correspond with known genes, but the majority represent partially sequenced novel genes. Ideally, each cluster would correspond with one gene, but as several non overlapping clusters may exist for large or low abundance genes, the number of clusters is likely to exceed the number of separate genes from whose sequence they are derived. Additionally, errors in alignment programs can produce false clusters (over-clustering). Clone sets, comprising a single representative of each cluster, are sold by licensed vendors [19].
A DNA chip is made using a glass microscope slide. The DNA spots adhere to the slide, each spot being a cloned DNA sequence that represents a gene. The DNA molecules that make up the spots include either fully sequenced genes of known function, or collections of partially sequenced, unknown genes. Printing or spotting is done with a machine called an arrayer. In some configurations it is possible to print up to 50,000 genes on one chip, and efforts are underway to increase that number as demand grows. The spotted genes/DNA are linked to the surface of the glass slide either by covalent bonds or charge interactions. For both glass and membrane matrices, each array element is generated by the deposition of a few nanoliters of purified PCR product, typically of 100-500 mg/mL [20]. The first spotting robots relied on contact printing but many variations on this design are now available [19], in addition to a ‘spotter’ that is essentially a capillary tube, to which a low but constant pressure is applied. Non contact printing modes, using either piezo or ink-jet devices, are also being evaluated.
A clear limitation to the application of this technology is the large amount of RNA required per hybridization. For adequate fluorescence, the total RNA required per target, per array, is 50-200 mg [2-5 mg are required when using polyadenylation signal (poly A) mRNA]. For low amounts of RNA, the levels of signal are under the limit of fluorescence detection, and could easily be rendered undetectable by assay noise. There are a variety of means by which to improve the signal from limited RNA. Methods that produce multiple copies of mRNA using highly efficient phage RNA polymerases have been developed [21]. A version of this approach, in which labeled target (cRNA) is made directly from a cDNA pool, having a T7 RNA polymerase promoter site at one end via in vitro transcription, has been applied to arrays [22]. Detection of hybridized species using mass spectroscopy or local changes in electronic properties can also be imagined [23]. Low complexity representations (LCRs) of mRNA are also used as the targets for cDNA microarrays [24]. The LCRs targets permit the measurement of abundance changes that are difficult to measure using oligo(dT) priming for target synthesis. An oligo(dT)-primed target and three LCRs detect twice as many differentially regulated genes as could be detected by the oligo(dT)-primed target alone.
Image Analysis and Data Extraction. Normalization of the Data. Image analysis software generates a report that contains numerical data, that are more informative than the colorful spotted figure. The results are generated in a variety of formats, depending on the software used. The report contains ratios calculated for each gene. A gene expressed in equal intensities in the two samples would exhibit a ratio close to one. Using a statistical formula, the software determines the set of genes whose expression is significantly altered in the treated cells. This method is highly sensitive. It is possible to detect changes in the expression level of a gene of about 1.5 times using this technology. The process of data normalization begins after comparison of the expression data. It is based on a consideration of all of the genes in the sample. The data is used to generate estimates of expected variance, leading to predicted confidence intervals. The transcript level of many genes will remain unchanged, making global normalization a useful tool. As samples become more divergent, the fraction of genes showing altered transcript levels increases. Explicit methods have been developed that extract from the variance of genes statistics for evaluating the significance of observed changes in the complete data set [25].
Data Management and Mining. The quality of image analysis programs is crucial for accurate interpretation of signals for slide and filters. All array methods require the construction of databases for the management of information on the genes represented on the array. The programs make it possible to examine the primary results of hybridization. Methods applied to microarray data analysis are highly constrained (such as protein or amino acid sequence comparisons) than at the transcript level. This level of analysis on large data sets could provide new perspectives of the operation of genetic networks. Comparison of expression profiles will undoubtedly provide useful insights into the molecular pathogenesis of a variety of diseases [26].
Problems and Eventual Solutions. Typical microarray results are usually burdened with substantial amounts of noise [27,28]. That is why rigorous statistical methods must be applied for interpretation of the data. Non specific hybridization can be measured through the use of specificity controls on the microarray and addressed as a statistical problem [29,30]. The problem with using ratio data alone is that it does not take into account the absolute signal intensity measurements used to calculate the ratios. This approach may work adequately for ratios of moderate to highly expressed genes that yield bright fluorescent signals. The weak signals from the low transcript levels are often masked or biased by noise (non specific hybridization). Non specific hybridization is a characteristic of cDNA microarray hybridization [28,31,32]. The frequently used threshold values are often arbitrarily chosen and do not take into account the statistical significance of absolute signal intensity. For example, microarray data showing a 4-fold change derived from low signal intensities may have no statistical significance, whereas a 1.4 fold change derived from strong signal intensities may be highly significant in terms of reflecting actual changes in mRNA concentration within a biological sample. Thus, focusing on fold-changes alone is insufficient and confidence statements about differential expression must take into account absolute signal intensities [30].

Number 27 VOL. 27 (2), 2024	Number 27 VOL. 27 (1), 2024
Number 26 Number 26 VOL. 26(2), 2023 All in one	Number 26 VOL. 26(2), 2023
Number 26 VOL. 26, 2023 Supplement	Number 26 VOL. 26(1), 2023
Number 25 VOL. 25(2), 2022	Number 25 VOL. 25 (1), 2022
Number 24 VOL. 24(2), 2021	Number 24 VOL. 24(1), 2021
Number 23 VOL. 23(2), 2020	Number 22 VOL. 22(2), 2019
Number 22 VOL. 22(1), 2019	Number 22 VOL. 22, 2019 Supplement
Number 21 VOL. 21(2), 2018	Number 21 VOL. 21 (1), 2018
Number 21 VOL. 21, 2018 Supplement	Number 20 VOL. 20 (2), 2017
Number 20 VOL. 20 (1), 2017	Number 19 VOL. 19 (2), 2016
Number 19 VOL. 19 (1), 2016	Number 18 VOL. 18 (2), 2015
Number 18 VOL. 18 (1), 2015	Number 17 VOL. 17 (2), 2014
Number 17 VOL. 17 (1), 2014	Number 16 VOL. 16 (2), 2013
Number 16 VOL. 16 (1), 2013	Number 15 VOL. 15 (2), 2012
Number 15 VOL. 15, 2012 Supplement	Number 15 Vol. 15 (1), 2012
Number 14 14 - Vol. 14 (2), 2011	Number 14 The 9th Balkan Congress of Medical Genetics
Number 14 14 - Vol. 14 (1), 2011	Number 13 Vol. 13 (2), 2010
Number 13 Vol.13 (1), 2010	Number 12 Vol.12 (2), 2009
Number 12 Vol.12 (1), 2009	Number 11 Vol.11 (2),2008
Number 11 Vol.11 (1),2008	Number 10 Vol.10 (2), 2007
Number 10 10 (1),2007	Number 9 1&2, 2006
Number 9 3&4, 2006	Number 8 1&2, 2005
Number 8 3&4, 2004	Number 7 1&2, 2004
Number 6 3&4, 2003	Number 6 1&2, 2003
Number 5 3&4, 2002	Number 5 1&2, 2002
Number 4 Vol.3 (4), 2000	Number 4 Vol.2 (4), 1999
Number 4 Vol.1 (4), 1998	Number 4 3&4, 2001
Number 4 1&2, 2001	Number 3 Vol.3 (3), 2000
Number 3 Vol.2 (3), 1999	Number 3 Vol.1 (3), 1998
Number 2 Vol.3(2), 2000	Number 2 Vol.1 (2), 1998
Number 2 Vol.2 (2), 1999	Number 1 Vol.3 (1), 2000
Number 1 Vol.2 (1), 1999	Number 1 Vol.1 (1), 1998

About the journal ::: Editorial ::: Subscription ::: Information for authors ::: Contact