WEB-BASED SOFTWARE FOR STORAGE, STATISTICAL PROCESSING AND ANALYSIS OF SNP DATA IN STUDIES ON COMPLEX DISORDERS
Betcheva E1, Betchev C2, Toncheva DI1,*
*Corresponding Author: Professor Draga Ivanova Toncheva, M.D., Ph.D., Department of Medical Genetics, Medical Faculty, Medical University, 2 Zdrave str., SBALAG “Maichin dom”, 6 Fl., 1431 Sofia, Bulgaria; Tel./Fax: +35-92-952-0357; E-mail: dragatoncheva@yahoo.com
page: 9

INTRODUCTION

Large bodies of experimental data from family, adoption and twin studies suggest a genetic component of the individual differences in susceptibility to complex disorders. It is clear that multifactorial disorders are, in part, heritable and their etiology results from a complex interaction between environmental and genetic factors [1,2]. In contrast to the single gene (Mendelian) disorders, they have more compound pathogenesis. According to the contemporary models, the potential effect of many genes and genetic variants in several different loci determines genetic susceptibility to such disorders [3]. Emerging data from linkage and association studies support the hypothesis that the triggering effect of certain environmental risk factors, such as a particular lifestyle, might provoke phenotype expression when affecting individuals with certain genetic background [1].

Intense interest has been focused on genome-based studies of complex diseases and accelerated with the completion of the human genome project and the progression of advanced technologies. Comparison of the DNA sequences of people from the major population groups has established a comprehensive map of genetic variants in the human genome, which conveniently serve as genetic markers. Detailed information about genetic diseases, genes, sequences and a great variety of polymorphisms is available in on-line public databases and provides an irreplaceable tool for molecular genetic studies [4,5].

The quest for genetic factors in the susceptibility to complex disorders has focused on single nucleotide polymorphisms (SNPs) which are the most common type of genetic variants in the human genome and occur in approximately every 100 to 300 bp [6,7]. Most SNPs have only two possible alleles that differ between the individuals of the same population group, where the frequency of the minor allele is usually specific. Although SNPs offer a limited number of possible alleles, which is a prerequisite for the selection of markers for DNA analysis, they are very convenient and highly informative for haplotype analysis, because of their abundance (over 106 deposited in the dbSNP database of the National Center for Biotechnology Information (NCBI): http://www.ncbi.nlm.nih.gov/ About/primer/snps.html) and their genetic stability in the human genome [7-9].

The SNPs occur within coding gene regions, non coding intra- and intergenetic sequences. Most fall in introns, untranslated 3' and 5' regions (UTR3' and UTR5') of the genes and spacer DNA [8]. Although they do not cause gene product modifications, some may play an important role in the control of gene transcription level, by influencing the affinity of promoters or other regulating sequences to trans-regulating factors that modify the gene expression rate, or by affecting pre-mRNA processing [6]. A small portion of SNPs are in coding DNA sequences, however, most are synonymous, i.e., they do not alter the polypeptide structure and only a few are non synonymous SNPs, causing an amino acid exchange [8]. The distribution of SNPs could be explained by a negative effect on survival and fast elimination of expressed variations by natural selection [6].

Single nucleotide polymorphisms are associated with population diversity and individual differences in complex traits [6,8]. Therefore, they are convenient for genetic association studies on identification of susceptibility loci for multifactorial disorders. An association between a disorder and a non synonymous SNP makes the pheno-type-genotype relationship very clear. However, an association with a synonymous SNP or a SNP in a non coding sequence is difficult to explain, and usually another causative marker needs to be identified [7,8,10].

Currently, SNPs are preferred as genetic markers in case-control and whole genome association studies. They have been used in studies for mapping and discovery of susceptibility genes for many complex disorders: cardiovascular (essential hypertension), neurological (Alzheimer’s disease, multiple sclerosis), psychiatric (schizophrenia, bipolar affective disorders), autoimmune (rheumatoid arthritis) disorders, diabetes mellitus type 2, and different types of cancer [11,12]. Linkage studies and genome scans have identified several candidate chromosomal regions for common diseases [7,13]. Selection of SNPs in such loci has become a basic approach in candidate gene(s) association studies [7]. However, the candidate gene approach, is time-consuming, cost-intensive, and insufficient, and has largely failed in prediction of risk for disease susceptibility, since only a limited number of genetic markers in a relatively small region are investigated [7,12]. Results from meta analyses are often inconsistent and demonstrate the need for more efficient and cost-effective high-throughput SNP genotyping technologies, such as DNA-microarray-based technology, for revealing disease causing genes [7,9,12,13].

Application of DNA-microarray technologies in large-scale studies of complex disorders facilitates genotyping of large number of SNPs. A DNA chip consists of an arrayed series of thousands of sequences for detection of tag SNPs from the entire genome [13]. Selection of popu-lation-specific tag SNPs has become available since the haplotype block structure of the human genome was established in the International HapMap Project (www. HapMap.org). Tag SNPs are representative markers for a set of variants within a region of high linkage disequilibrium in the genome. Thus, they are useful for economical and efficient genotyping of a relatively small number of markers which provide adequate information on disease-associated genes and loci. Candidate genes identified by such large-scale approaches require further analysis, to elucidate their role in disease etiology (http://www.ornl.gov/sci/techresources/Human_ Genome/faq/snps.shtm l) [13]. A whole genome association study based on array technology produces large amounts of data and requires a sufficient database, appropriate computational statistical methods, techniques for false-positive error detection and maintenance. Moreover, the use of such technology is allied to high costs and significant time, effort, and resource consumption.

We have performed a whole genome association study (WGAS) of DNA samples from unrelated Bulgarian patients with schizophrenia and healthy volunteers (unpublished data). Subsequently to the WGAS, the 100 top SNPs showing lowest p values were validated (genotyped by alternative method in the same samples) and replicated (genotyped in additional DNA samples). The large amount of data produced required comprehensive statistical analysis. For this reason we have created a client-server web-based application for statistical processing and for reliable storage of data from an automated genotyping study in a set of DNA samples as specified below.

The SNPs occur within coding gene regions, non coding intra- and intergenetic sequences. Most fall in introns, untranslated 3' and 5' regions (UTR3' and UTR5') of the genes and spacer DNA [8]. Although they do not cause gene product modifications, some may play an important role in the control of gene transcription level, by influencing the affinity of promoters or other regulating sequences to trans-regulating factors that modify the gene expression rate, or by affecting pre-mRNA processing [6]. A small portion of SNPs are in coding DNA sequences, however, most are synonymous, i.e., they do not alter the polypeptide structure and only a few are non synonymous SNPs, causing an amino acid exchange [8]. The distribution of SNPs could be explained by a negative effect on survival and fast elimination of expressed variations by natural selection [6].

Single nucleotide polymorphisms are associated with population diversity and individual differences in complex traits [6,8]. Therefore, they are convenient for genetic association studies on identification of susceptibility loci for multifactorial disorders. An association between a disorder and a non synonymous SNP makes the pheno-type-genotype relationship very clear. However, an association with a synonymous SNP or a SNP in a non coding sequence is difficult to explain, and usually another causative marker needs to be identified [7,8,10].

Currently, SNPs are preferred as genetic markers in case-control and whole genome association studies. They have been used in studies for mapping and discovery of susceptibility genes for many complex disorders: cardiovascular (essential hypertension), neurological (Alzheimer’s disease, multiple sclerosis), psychiatric (schizophrenia, bipolar affective disorders), autoimmune (rheumatoid arthritis) disorders, diabetes mellitus type 2, and different types of cancer [11,12]. Linkage studies and genome scans have identified several candidate chromosomal regions for common diseases [7,13]. Selection of SNPs in such loci has become a basic approach in candidate gene(s) association studies [7]. However, the candidate gene approach, is time-consuming, cost-intensive, and insufficient, and has largely failed in prediction of risk for disease susceptibility, since only a limited number of genetic markers in a relatively small region are investigated [7,12]. Results from meta analyses are often inconsistent and demonstrate the need for more efficient and cost-effective high-throughput SNP genotyping technologies, such as DNA-microarray-based technology, for revealing disease causing genes [7,9,12,13].

Application of DNA-microarray technologies in large-scale studies of complex disorders facilitates genotyping of large number of SNPs. A DNA chip consists of an arrayed series of thousands of sequences for detection of tag SNPs from the entire genome [13]. Selection of popu-lation-specific tag SNPs has become available since the haplotype block structure of the human genome was established in the International HapMap Project (www. HapMap.org). Tag SNPs are representative markers for a set of variants within a region of high linkage disequilibrium in the genome. Thus, they are useful for economical and efficient genotyping of a relatively small number of markers which provide adequate information on disease-associated genes and loci. Candidate genes identified by such large-scale approaches require further analysis, to elucidate their role in disease etiology (http://www.ornl.gov/sci/techresources/Human_ Genome/faq/snps.shtm l) [13]. A whole genome association study based on array technology produces large amounts of data and requires a sufficient database, appropriate computational statistical methods, techniques for false-positive error detection and maintenance. Moreover, the use of such technology is allied to high costs and significant time, effort, and resource consumption.

We have performed a whole genome association study (WGAS) of DNA samples from unrelated Bulgarian patients with schizophrenia and healthy volunteers (unpublished data). Subsequently to the WGAS, the 100 top SNPs showing lowest p values were validated (genotyped by alternative method in the same samples) and replicated (genotyped in additional DNA samples). The large amount of data produced required comprehensive statistical analysis. For this reason we have created a client-server web-based application for statistical processing and for reliable storage of data from an automated genotyping study in a set of DNA samples as specified below.




Number 26
Number 26 VOL. 26(2), 2023 All in one
Number 26
VOL. 26(2), 2023
Number 26
VOL. 26, 2023 Supplement
Number 26
VOL. 26(1), 2023
Number 25
VOL. 25(2), 2022
Number 25
VOL. 25 (1), 2022
Number 24
VOL. 24(2), 2021
Number 24
VOL. 24(1), 2021
Number 23
VOL. 23(2), 2020
Number 22
VOL. 22(2), 2019
Number 22
VOL. 22(1), 2019
Number 22
VOL. 22, 2019 Supplement
Number 21
VOL. 21(2), 2018
Number 21
VOL. 21 (1), 2018
Number 21
VOL. 21, 2018 Supplement
Number 20
VOL. 20 (2), 2017
Number 20
VOL. 20 (1), 2017
Number 19
VOL. 19 (2), 2016
Number 19
VOL. 19 (1), 2016
Number 18
VOL. 18 (2), 2015
Number 18
VOL. 18 (1), 2015
Number 17
VOL. 17 (2), 2014
Number 17
VOL. 17 (1), 2014
Number 16
VOL. 16 (2), 2013
Number 16
VOL. 16 (1), 2013
Number 15
VOL. 15 (2), 2012
Number 15
VOL. 15, 2012 Supplement
Number 15
Vol. 15 (1), 2012
Number 14
14 - Vol. 14 (2), 2011
Number 14
The 9th Balkan Congress of Medical Genetics
Number 14
14 - Vol. 14 (1), 2011
Number 13
Vol. 13 (2), 2010
Number 13
Vol.13 (1), 2010
Number 12
Vol.12 (2), 2009
Number 12
Vol.12 (1), 2009
Number 11
Vol.11 (2),2008
Number 11
Vol.11 (1),2008
Number 10
Vol.10 (2), 2007
Number 10
10 (1),2007
Number 9
1&2, 2006
Number 9
3&4, 2006
Number 8
1&2, 2005
Number 8
3&4, 2004
Number 7
1&2, 2004
Number 6
3&4, 2003
Number 6
1&2, 2003
Number 5
3&4, 2002
Number 5
1&2, 2002
Number 4
Vol.3 (4), 2000
Number 4
Vol.2 (4), 1999
Number 4
Vol.1 (4), 1998
Number 4
3&4, 2001
Number 4
1&2, 2001
Number 3
Vol.3 (3), 2000
Number 3
Vol.2 (3), 1999
Number 3
Vol.1 (3), 1998
Number 2
Vol.3(2), 2000
Number 2
Vol.1 (2), 1998
Number 2
Vol.2 (2), 1999
Number 1
Vol.3 (1), 2000
Number 1
Vol.2 (1), 1999
Number 1
Vol.1 (1), 1998

 

 


 About the journal ::: Editorial ::: Subscription ::: Information for authors ::: Contact
 Copyright © Balkan Journal of Medical Genetics 2006