MIA: Eli Weinstein on Generative models of proteins and genomes; Primer by Alan Amin on Polya trees
Models, Inference and Algorithms
Broad Institute of MIT and Harvard
May 5, 2021
Meeting: Building and evaluating generative models of biological sequences, from proteins to whole genomes
Eli Weinstein
Marks Lab, Harvard Medical School
Across biology and biomedicine, scientists are interested in measuring sequences, predicting sequences, and testing their predictions experimentally by synthesizing or editing sequences. Generative probabilistic modeling offers a flexible and rigorous framework for learning from sequence data and forming predictions, but building, inferring and critiquing probabilistic models of biological sequences remains challenging. In this talk we outline the major practical and theoretical limitations of existing techniques and propose alternatives. We first describe a structured output distribution for protein data, the “MuE” distribution, that enables the creation of regression models, forecasting models, latent feature models and more; models built with the MuE do not require alignments for training and meet key theoretical conditions. Second, we describe a new generative model that can be scaled to whole genomes, the “BEAR” model, and use it to construct a nonparametric density estimator, robust parameter estimators, a goodness-of-fit test, and a two-sample test, each with consistency guarantees. We illustrate the applications of these methods on a range of biological problems including characterizing immune receptor repertoires, mapping disordered protein families, comparing metagenomic samples, exploring unaligned read data, and forecasting pathogen evolution.
Primer: Estimation and testing with generative nonparametric Bayesian models
Alan Amin
Marks Lab, Harvard Medical School
In this primer, we review some key statistical ideas that have been fundamental to the analysis of continuous low-dimensional data, but have yet to be successfully extended to apply to large scale biological sequence data. In particular, we introduce and motivate nonparametric density estimation, goodness-of-fit testing, and two-sample testing; we then illustrate how each of these challenges may be addressed for continuous low-dimensional data using methods based on the Bayesian Polya tree model. Finally, we describe theoretical guarantees available for each application, focusing on asymptotic consistency results. These ideas lay the foundation for the BEAR sequence model, introduced in the main talk, which we show can address the same challenges in the context of biological sequence data.
Chapters:
00:00 Primer: Estimation and testing with generative nonparametric Bayesian models
45:37 Meeting: Building and evaluating generative models of biological sequences, from proteins to whole genomes
For more information visit: https://www.broadinstitute.org/MIA
Copyright Broad Institute, 2021. All rights reserved.
Видео MIA: Eli Weinstein on Generative models of proteins and genomes; Primer by Alan Amin on Polya trees канала Broad Institute
Broad Institute of MIT and Harvard
May 5, 2021
Meeting: Building and evaluating generative models of biological sequences, from proteins to whole genomes
Eli Weinstein
Marks Lab, Harvard Medical School
Across biology and biomedicine, scientists are interested in measuring sequences, predicting sequences, and testing their predictions experimentally by synthesizing or editing sequences. Generative probabilistic modeling offers a flexible and rigorous framework for learning from sequence data and forming predictions, but building, inferring and critiquing probabilistic models of biological sequences remains challenging. In this talk we outline the major practical and theoretical limitations of existing techniques and propose alternatives. We first describe a structured output distribution for protein data, the “MuE” distribution, that enables the creation of regression models, forecasting models, latent feature models and more; models built with the MuE do not require alignments for training and meet key theoretical conditions. Second, we describe a new generative model that can be scaled to whole genomes, the “BEAR” model, and use it to construct a nonparametric density estimator, robust parameter estimators, a goodness-of-fit test, and a two-sample test, each with consistency guarantees. We illustrate the applications of these methods on a range of biological problems including characterizing immune receptor repertoires, mapping disordered protein families, comparing metagenomic samples, exploring unaligned read data, and forecasting pathogen evolution.
Primer: Estimation and testing with generative nonparametric Bayesian models
Alan Amin
Marks Lab, Harvard Medical School
In this primer, we review some key statistical ideas that have been fundamental to the analysis of continuous low-dimensional data, but have yet to be successfully extended to apply to large scale biological sequence data. In particular, we introduce and motivate nonparametric density estimation, goodness-of-fit testing, and two-sample testing; we then illustrate how each of these challenges may be addressed for continuous low-dimensional data using methods based on the Bayesian Polya tree model. Finally, we describe theoretical guarantees available for each application, focusing on asymptotic consistency results. These ideas lay the foundation for the BEAR sequence model, introduced in the main talk, which we show can address the same challenges in the context of biological sequence data.
Chapters:
00:00 Primer: Estimation and testing with generative nonparametric Bayesian models
45:37 Meeting: Building and evaluating generative models of biological sequences, from proteins to whole genomes
For more information visit: https://www.broadinstitute.org/MIA
Copyright Broad Institute, 2021. All rights reserved.
Видео MIA: Eli Weinstein on Generative models of proteins and genomes; Primer by Alan Amin on Polya trees канала Broad Institute
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
![Broad@15 Talk Series: The extraordinary evolution of genome editing](https://i.ytimg.com/vi/UdlOzaWt4Lc/default.jpg)
![MPG Primer: GWAS and secondary analyses of GWAS results (2017)](https://i.ytimg.com/vi/fLgfknsAXdQ/default.jpg)
![The first 20 hours -- how to learn anything | Josh Kaufman | TEDxCSU](https://i.ytimg.com/vi/5MgBikgcWnY/default.jpg)
![Life Lessons From 100-Year-Olds](https://i.ytimg.com/vi/9AThycGCakk/default.jpg)
![Harnessing Evolution to Solve Problems in Biotechnology and Therapeutics Science](https://i.ytimg.com/vi/3oKCPoN44Lc/default.jpg)
![BroadE: Fundamentals of peptide and protein mass spectrometry](https://i.ytimg.com/vi/PFOodSbH9IY/default.jpg)
![MPG Primer: GWAS design and interpretation (2016)](https://i.ytimg.com/vi/xw419NKqMqw/default.jpg)
![BroadE: Quantitative methods in proteomics](https://i.ytimg.com/vi/e95i5_iCbQM/default.jpg)
![Ben Barres: What do reactive astrocytes do?](https://i.ytimg.com/vi/zvS71CdNUsg/default.jpg)
![Why the Broad Institute?](https://i.ytimg.com/vi/jqfVnIzYG3g/default.jpg)
![BroadE: Sample prep for proteomics](https://i.ytimg.com/vi/Zaqt9Jo-U-M/default.jpg)
![MIA and CC&E Joint Seminar: Michael Bronstein, Geometric deep learning for function protein design](https://i.ytimg.com/vi/pDp-uxR4JDI/default.jpg)
![MPG Primer: Mitochondrial Genomics (2019)](https://i.ytimg.com/vi/InRC4OQYGPA/default.jpg)
![Drone Programming With Python Course | 3 Hours | Including x4 Projects | Computer Vision](https://i.ytimg.com/vi/LmEcyQnfpDA/default.jpg)
![Fundamental of IT - Complete Course || IT course for Beginners](https://i.ytimg.com/vi/awLnur5Yt9o/default.jpg)
![How Far is Too Far? | The Age of A.I.](https://i.ytimg.com/vi/UwsrzCVZAb8/default.jpg)
![What is the Broad Institute?](https://i.ytimg.com/vi/pw0VV5WBZbQ/default.jpg)
![Midsummer Nights' Science: How epigenetics controls our genes in health and disease (2016)](https://i.ytimg.com/vi/xzZ859nYdbE/default.jpg)
![BroadE: Interpretation and automated analysis of proteomic data](https://i.ytimg.com/vi/tGYNkQaS6Lg/default.jpg)
![Protein function prediction by neural networks - Cambridge ML Summit ‘19](https://i.ytimg.com/vi/x-35bDrKfHA/default.jpg)