Variational Approximations and Other Topics in Mixture Models

The Atrium, University of Guelph Institutional Repository

Variational Approximations and Other Topics in Mixture Models

Show simple item record

dc.contributor.advisor McNicholas, Paul D.
dc.contributor.author Dang, Sanjeena
dc.date 2012-07-16
dc.date.accessioned 2012-08-24T20:24:04Z
dc.date.available 2012-08-24T20:24:04Z
dc.date.issued 2012-08-24
dc.identifier.uri http://hdl.handle.net/10214/3876
dc.description.abstract Mixture model-based clustering has become an increasingly popular data analysis technique since its introduction almost fifty years ago. Families of mixture models are said to arise when the component parameters, usually the component covariance matrices, are decomposed and a number of constraints are imposed. Within the family setting, it is necessary to choose the member of the family --- i.e., the appropriate covariance structure --- in addition to the number of mixture components. To date, the Bayesian information criterion (BIC) has proved most effective for this model selection process, and the expectation-maximization (EM) algorithm has been predominantly used for parameter estimation. We deviate from the EM-BIC rubric, using variational Bayes approximations for parameter estimation and the deviance information criterion (DIC) for model selection. The variational Bayes approach alleviates some of the computational complexities associated with the EM algorithm. We use this approach on the most famous family of Gaussian mixture models known as Gaussian parsimonious clustering models (GPCM). These models have an eigen-decomposed covariance structure. Cluster-weighted modelling (CWM) is another flexible statistical framework for modelling local relationships in heterogeneous populations on the basis of weighted combinations of local models. In particular, we extend cluster-weighted models to include an underlying latent factor structure of the independent variable, resulting in a novel family of models known as parsimonious cluster-weighted factor analyzers. The EM-BIC rubric is utilized for parameter estimation and model selection. Some work on a mixture of multivariate t-distributions is also presented, with a linear model for the mean and a modified Cholesky-decomposed covariance structure leading to a novel family of mixture models. In addition to model-based clustering, these models are also used for model-based classification, i.e., semi-supervised clustering. Parameters are estimated using the EM algorithm and another approach to model selection other than the BIC is also considered. en_US
dc.description.sponsorship NSERC PGS-D en_US
dc.language.iso en en_US
dc.subject High-dimensional data en_US
dc.subject Variational Bayes Approximations en_US
dc.subject Mixture Models en_US
dc.subject EM Algorithm en_US
dc.subject Factor Analyzers en_US
dc.subject Longitudinal Data en_US
dc.subject Gene Expression Data en_US
dc.subject Cluster-Weighted Models en_US
dc.subject Classification en_US
dc.subject Clustering en_US
dc.subject Model-based clustering en_US
dc.subject Family of Mixture Models en_US
dc.subject Model-based Classification en_US
dc.subject Cluster-Weighted Factor Analyzers en_US
dc.title Variational Approximations and Other Topics in Mixture Models en_US
dc.type Thesis en_US
dc.degree.programme Applied Statistics en_US
dc.degree.name Doctor of Philosophy en_US
dc.degree.department Department of Mathematics and Statistics en_US


Files in this item

Files Size Format View Description
thesis.pdf 5.055Mb PDF View/Open Updated Thesis 2

This item appears in the following Collection(s)

Show simple item record

Search the Atrium


Advanced Search

Browse

My Account