Jorge Silva, Sr. Research Statistician Developer, SAS @ MLconf ATL
Estimating the Number of Clusters in Big Data with the Aligned Box Criterion: Finding the number, k, of clusters in a dataset is a fundamental problem in unsupervised learning. It is also an important business problem, e.g. in market segmentation. Existing approaches include the silhouette measure, the gap statistic and Dirichlet process clustering. For thirty years SAS procedures have included the option of using the cubic clustering criterion (CCC) to estimate k. While CCC remains competitive, we propose a significant and original improvement, referred to herein as the aligned box criterion (ABC). Like CCC, ABC is based on a hypothesis-testing framework, but instead of a heuristic measure we use data-adaptive reference distributions to generate more realistic null hypotheses in a scalable and easily parallelizable manner. We have implemented ABC using SAS’ High Performance Analytics platform, and achieve state-of-the-art accuracy in the estimation of k.
Видео Jorge Silva, Sr. Research Statistician Developer, SAS @ MLconf ATL канала MLconf
Видео Jorge Silva, Sr. Research Statistician Developer, SAS @ MLconf ATL канала MLconf
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
Dr. June Andrews, Principal Data Scientist, Wise.io, From GE DigitalBuilding Machine Learning Models with Strict Privacy BoundariesAnima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor, CalTechManipulating and Measuring Model InterpretabilityJennifer Marsman, Principal Developer Evangelist, Microsoft @ MLconf NYCMLconf Online 2020: DevOps for Data Science With Kubernetes by Sophie WatsonSven Kreiss, Lead Data Scientist, Wildcard @ MLconf ATLVirginia Smith - A General Framework for Communication-Efficient Distributed... - MLconf SF 2016Jeremy Stanley, EVP/Data Scientist, Sailthru @ MLconf NYCSanjeev Satheesh, The Story of End to End Models in Deep Learning at The AI Conference 2017MLconf Online 2020: Data Science is Key to Achieving Energy Access in Africa Madeleine GleaveSubutai Ahmad, VP of Research, Numenta @ MLconf SFJustin Basilico, Senior Researcher Engineer in Recommendation Systems, Netlix @ MLconf ATLTed Dunning, Chief Application Architect, MapR @ MLconf ATLMLconf Online 2020: Mathematical Approaches to Clustering by Joseph RossByron Galbraith, Chief Data Scientist, Talla, NYC 2017MLconf NYC 2022: How to Detect and Interpret Data Drift in Production by Emeli Dral of Evidently AIBryan Thompson, Chief Scientist and Founder, SYSTAP, LLC @ MLconf ATLOptimized Image Classification on the CheapMLconf SF 2022: Essential Ingredients in Scaling Organizations for ML by Dr. Ali Arsanjani @Google