Main content

Imputing Genotypes Using Regularized Generalized Linear Regression Models

Show simple item record

dc.contributor.advisor Feng, Zeny Griesman, Joshua 2012-05-22 2012-06-14T14:34:56Z 2012-06-14T14:34:56Z 2012-06-14
dc.description.abstract As genomic sequencing technologies continue to advance, researchers are furthering their understanding of the relationships between genetic variants and expressed traits (Hirschhorn and Daly, 2005). However, missing data can significantly limit the power of a genetic study. Here, the use of a regularized generalized linear model, denoted GLMNET is proposed to impute missing genotypes. The method aimed to address certain limitations of earlier regression approaches in regards to genotype imputation, particularly multicollinearity among predictors. The performance of GLMNET-based method is compared to the performance of the phase-based method fastPHASE. Two simulation settings were evaluated: a sparse-missing model, and a small-panel expan- sion model. The sparse-missing model simulated a scenario where SNPs were missing in a random fashion across the genome. In the small-panel expansion model, a set of test individuals that were only genotyped at a small subset of the SNPs of the large panel. Each imputation method was tested in the context of two data-sets: Canadian Holstein cattle data and human HapMap CEU data. Although the proposed method was able to perform with high accuracy (>90% in all simulations), fastPHASE per- formed with higher accuracy (>94%). However, the new method, which was coded in R, was able to impute genotypes with better time efficiency than fastPHASE and this could be further improved by optimizing in a compiled language. en_US
dc.language.iso en en_US
dc.subject Bioinformatics en_US
dc.subject Computational Biology en_US
dc.subject Quantitative Genetics en_US
dc.subject Genome Wide Association Study en_US
dc.subject Genotype Imputation en_US
dc.subject Generalized Linear Models en_US
dc.title Imputing Genotypes Using Regularized Generalized Linear Regression Models en_US
dc.type Thesis en_US Bioinformatics en_US Master of Science en_US Department of Mathematics and Statistics en_US

Files in this item

Files Size Format View Description

This item appears in the following Collection(s)

Show simple item record