Загрузка страницы

MIA: Eli Weinstein on Generative models of proteins and genomes; Primer by Alan Amin on Polya trees

Models, Inference and Algorithms
Broad Institute of MIT and Harvard
May 5, 2021

Meeting: Building and evaluating generative models of biological sequences, from proteins to whole genomes

Eli Weinstein
Marks Lab, Harvard Medical School
Across biology and biomedicine, scientists are interested in measuring sequences, predicting sequences, and testing their predictions experimentally by synthesizing or editing sequences. Generative probabilistic modeling offers a flexible and rigorous framework for learning from sequence data and forming predictions, but building, inferring and critiquing probabilistic models of biological sequences remains challenging. In this talk we outline the major practical and theoretical limitations of existing techniques and propose alternatives. We first describe a structured output distribution for protein data, the “MuE” distribution, that enables the creation of regression models, forecasting models, latent feature models and more; models built with the MuE do not require alignments for training and meet key theoretical conditions. Second, we describe a new generative model that can be scaled to whole genomes, the “BEAR” model, and use it to construct a nonparametric density estimator, robust parameter estimators, a goodness-of-fit test, and a two-sample test, each with consistency guarantees. We illustrate the applications of these methods on a range of biological problems including characterizing immune receptor repertoires, mapping disordered protein families, comparing metagenomic samples, exploring unaligned read data, and forecasting pathogen evolution.

Primer: Estimation and testing with generative nonparametric Bayesian models

Alan Amin
Marks Lab, Harvard Medical School

In this primer, we review some key statistical ideas that have been fundamental to the analysis of continuous low-dimensional data, but have yet to be successfully extended to apply to large scale biological sequence data. In particular, we introduce and motivate nonparametric density estimation, goodness-of-fit testing, and two-sample testing; we then illustrate how each of these challenges may be addressed for continuous low-dimensional data using methods based on the Bayesian Polya tree model. Finally, we describe theoretical guarantees available for each application, focusing on asymptotic consistency results. These ideas lay the foundation for the BEAR sequence model, introduced in the main talk, which we show can address the same challenges in the context of biological sequence data.

Chapters:

00:00 Primer: Estimation and testing with generative nonparametric Bayesian models
45:37 Meeting: Building and evaluating generative models of biological sequences, from proteins to whole genomes

For more information visit: https://www.broadinstitute.org/MIA

Copyright Broad Institute, 2021. All rights reserved.

Видео MIA: Eli Weinstein on Generative models of proteins and genomes; Primer by Alan Amin on Polya trees канала Broad Institute
Показать
Комментарии отсутствуют
Введите заголовок:

Введите адрес ссылки:

Введите адрес видео с YouTube:

Зарегистрируйтесь или войдите с
Информация о видео
11 мая 2021 г. 21:54:39
01:46:15
Яндекс.Метрика