Cross-Validation for Model Selection in Model-Based Clustering

The Atrium, University of Guelph Institutional Repository

Cross-Validation for Model Selection in Model-Based Clustering

Show simple item record

dc.contributor.advisor Paul, McNicholas
dc.contributor.author O'Reilly, Rachel
dc.date 2012-08-22
dc.date.accessioned 2012-09-04T20:53:22Z
dc.date.available 2012-09-04T20:53:22Z
dc.date.issued 2012-09-04
dc.identifier.uri http://hdl.handle.net/10214/3911
dc.description.abstract Clustering is a technique used to partition unlabelled data into meaningful groups. This thesis will focus on the area of clustering called model-based clustering, where it is assumed that data arise from a finite number of subpopulations, each of which follows a known statistical distribution. The number of groups and shape of each group is unknown in advance, and thus one of the most challenging aspects of clustering is selecting these features. Cross-validation is a model selection technique which is often used in regression and classification, because it tends to choose models that predict well, and are not over-fit to the data. However, it has rarely been applied in a clustering framework. Herein, cross-validation is applied to select the number of groups and covariance structure within a family of Gaussian mixture models. Results are presented for both real and simulated data. en_US
dc.description.sponsorship Ontario Graduate Scholarship Program en_US
dc.language.iso en en_US
dc.subject model-based clustering, model selection, cross-validation en_US
dc.title Cross-Validation for Model Selection in Model-Based Clustering en_US
dc.type Thesis en_US
dc.degree.programme Applied Statistics en_US
dc.degree.name Master of Science en_US
dc.degree.department Department of Mathematics and Statistics en_US


Files in this item

Files Size Format View
mythesis.pdf 8.704Mb PDF View/Open

This item appears in the following Collection(s)

Show simple item record

Search the Atrium


Advanced Search

Browse

My Account