Imputing Genotypes Using Regularized Generalized Linear Regression Models

The Atrium, University of Guelph Institutional Repository

Imputing Genotypes Using Regularized Generalized Linear Regression Models

Show simple item record

dc.contributor.advisor Feng, Zeny
dc.contributor.author Griesman, Joshua
dc.date 2012-05-22
dc.date.accessioned 2012-06-14T14:34:56Z
dc.date.available 2012-06-14T14:34:56Z
dc.date.issued 2012-06-14
dc.identifier.uri http://hdl.handle.net/10214/3731
dc.description.abstract As genomic sequencing technologies continue to advance, researchers are furthering their understanding of the relationships between genetic variants and expressed traits (Hirschhorn and Daly, 2005). However, missing data can significantly limit the power of a genetic study. Here, the use of a regularized generalized linear model, denoted GLMNET is proposed to impute missing genotypes. The method aimed to address certain limitations of earlier regression approaches in regards to genotype imputation, particularly multicollinearity among predictors. The performance of GLMNET-based method is compared to the performance of the phase-based method fastPHASE. Two simulation settings were evaluated: a sparse-missing model, and a small-panel expan- sion model. The sparse-missing model simulated a scenario where SNPs were missing in a random fashion across the genome. In the small-panel expansion model, a set of test individuals that were only genotyped at a small subset of the SNPs of the large panel. Each imputation method was tested in the context of two data-sets: Canadian Holstein cattle data and human HapMap CEU data. Although the proposed method was able to perform with high accuracy (>90% in all simulations), fastPHASE per- formed with higher accuracy (>94%). However, the new method, which was coded in R, was able to impute genotypes with better time efficiency than fastPHASE and this could be further improved by optimizing in a compiled language. en_US
dc.language.iso en en_US
dc.subject Bioinformatics en_US
dc.subject Computational Biology en_US
dc.subject Quantitative Genetics en_US
dc.subject Genome Wide Association Study en_US
dc.subject Genotype Imputation en_US
dc.subject Generalized Linear Models en_US
dc.title Imputing Genotypes Using Regularized Generalized Linear Regression Models en_US
dc.type Thesis en_US
dc.degree.programme Bioinformatics en_US
dc.degree.name Master of Science en_US
dc.degree.department Department of Mathematics and Statistics en_US


Files in this item

Files Size Format View Description
THESIS_V4.pdf 350.0Kb PDF View/Open EDITED VERSION AS PER INSTRUCTIONS

This item appears in the following Collection(s)

Show simple item record

Search the Atrium


Advanced Search

Browse

My Account