Загрузка страницы

High Quality, High Performance Clustering with HDBSCAN | SciPy 2016 | Leland McInnes

Data clustering is a powerful tool for data analysis. It can be particularly useful in exploratory data analysis for helping to summarize and give intuition about a dataset. Despite it's power clustering is used for this task far less frequently than it could be. A plethora of options for clustering algorithms exist, and we will provide a survey of some of the more popular options, discussing their strengths and weaknesses, particularly with regard to exploratory data analysis. Our focus, however, is on a relatively new algorithm that appears to be the best equipped to meet the needs of exploratory data analysis: HDBSCAN* has the strengths of density based algorithms, has a small robust set of parameters, and with suitable implementation can be made highly scalable to large datasets. We will discuss how the algorithm works, taking a few different perspectives, and explain the techniques used for a high performance implementation. Finally we'll discuss ways to extend the algorithm, drawing on ideas from topological data analysis.

More info on HDBSCAN here: https://github.com/lmcinnes/hdbscan.

See the complete SciPy 2016 Conference talk & tutorial playlist here: https://www.youtube.com/playlist?list=PLYx7XA2nY5Gf37zYZMw6OqGFRPjB1jCy6

Видео High Quality, High Performance Clustering with HDBSCAN | SciPy 2016 | Leland McInnes канала Enthought
Показать
Комментарии отсутствуют
Введите заголовок:

Введите адрес ссылки:

Введите адрес видео с YouTube:

Зарегистрируйтесь или войдите с
Информация о видео
15 июля 2016 г. 7:40:27
00:22:57
Яндекс.Метрика