• Login
    View Item 
    •   eScholar Home
    • Faculty of Engineering & Applied Science
    • Doctoral Dissertations
    • View Item
    •   eScholar Home
    • Faculty of Engineering & Applied Science
    • Doctoral Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Pushing the limits of traditional unsupervised learning

    Thumbnail
    View/Open
    Gultepe_Eren.pdf (5.451Mb)
    Date
    2018-08-01
    Author
    Gultepe, Eren
    Metadata
    Show full item record
    Abstract
    Unsupervised learning has important applications in extremely large data settings such as in medical, biological, social, and environmental data. Typically in these settings, copious amounts of data are collected, with the additional burden of high dimensionality and unavailability of class labels. Improving the performance and usability of unsupervised learning algorithms provides improved resource management and delivery of services to users. Although deep learning methods have become popular due to their success in the supervised learning problem of classification and unsupervised learning problems of feature extraction and cluster analysis, traditional machine learning methods can still provide state-of-the-art performance. In this thesis, a novel clustering framework that combines common clustering and feature extraction methods along with careful parameter selection is presented. This framework is able to achieve state-of-the-art clustering performance that is better than many deep learning-based methods on large benchmark and web-based text and image datasets. This pipeline incorporates deep learning-style feature extraction, but without the onerous hyper-parameter tuning procedure. Then two novel methods are provided for testing the significance and reliability of clusters, in which the null-hypothesis statistical distribution is formed either by: (1) a uniform distribution projected onto the principal components of the original data; or (2) a randomized, weighted adjacency matrix. Significance testing of clusters is important when the nature or underlying properties of the data are unknown, especially in large data settings or in nonstandard datasets. Since, a random sample of the population data could contain properties that are not representative of the whole population. Thus, providing a clustering result that is not typical of the population. Finally, given the success of traditional matrix factorization methods in the clustering pipeline, text document classification using a new convolutional neural network architecture that leverages singular value decomposition was developed. This new model provided state-of-the-art document classification accuracy.
    URI
    https://hdl.handle.net/10155/951
    Collections
    • Doctoral Dissertations [129]
    • Electronic Theses and Dissertations [1336]

    DSpace software copyright © 2002-2016  DuraSpace
    Contact Us | Send Feedback
    Theme by 
    Atmire NV
     

     

    Browse

    All of eScholarCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister

    DSpace software copyright © 2002-2016  DuraSpace
    Contact Us | Send Feedback
    Theme by 
    Atmire NV