Large scale Natural Language Processing of biomedical literature (Beam Summit Europe 2019)
Large scale Natural Language Processing of biomedical literature in Python with beam and spacy
We use beam to extract the relations between entities such as genes, drugs, and diseases from biomedical literature and build a knowledge graph from the extracted relations. Using the knowledge graph to match existing drugs to rare diseases, Healx is on a mission to advance 100 rare disease treatments towards the clinic by 2025.
Beam allows us to build a knowledge graph encapsulating these relations at scale. We can process about 30 million PubMed abstracts to build our internal knowledge graph in less than 30 hours. Using Dataflow to run our beam job allows us to quickly scale a large cluster up and down depending on the computational needs. The potential for streaming in documents means we don’t need to rebuild our knowledge graph and can continuously push updates from novel publications. Developing and running beam jobs in Python still has some challenges which I will also talk about.
Speakers:
Christiaan Swart - NLP Engineer @ Healx
The Beam Summit Europe 2019 was a 2 day event held in Berlin at the KulturBrauerei, all focused around Apache Beam.
For more information about the Beam Summit, follow us on twitter @BeamSummit or go to the website: https://beamsummit.org/
Видео Large scale Natural Language Processing of biomedical literature (Beam Summit Europe 2019) канала Apache Beam
We use beam to extract the relations between entities such as genes, drugs, and diseases from biomedical literature and build a knowledge graph from the extracted relations. Using the knowledge graph to match existing drugs to rare diseases, Healx is on a mission to advance 100 rare disease treatments towards the clinic by 2025.
Beam allows us to build a knowledge graph encapsulating these relations at scale. We can process about 30 million PubMed abstracts to build our internal knowledge graph in less than 30 hours. Using Dataflow to run our beam job allows us to quickly scale a large cluster up and down depending on the computational needs. The potential for streaming in documents means we don’t need to rebuild our knowledge graph and can continuously push updates from novel publications. Developing and running beam jobs in Python still has some challenges which I will also talk about.
Speakers:
Christiaan Swart - NLP Engineer @ Healx
The Beam Summit Europe 2019 was a 2 day event held in Berlin at the KulturBrauerei, all focused around Apache Beam.
For more information about the Beam Summit, follow us on twitter @BeamSummit or go to the website: https://beamsummit.org/
Видео Large scale Natural Language Processing of biomedical literature (Beam Summit Europe 2019) канала Apache Beam
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
Quality Assurance in Beam: measure your pipeline! (Beam Summit Europe 2019)Triggers in Apache Beam (incubating) - Strata NYC 2016Lecture 11 – Semantic Parsing | Stanford CS224U: Natural Language Understanding | Spring 2019Custom Named Entity (Disease) Recognition in clinical text with spaCy 2.0 in Python | #NLProcAWS re:Invent 2020: Building real-time applications using Apache FlinkScreencast: Cleaning and exploring the COVID-19 Open Research Dataset (CORD-19)McKenzie Marshall: NLP in Asset Management (spaCy IRL 2019)Natural Language Processing with GraphsFundamentals of Stream Processing with Apache BeamText Mining: bag-of-words, tf-idf, topic modelling, embeddings, word2vec, etc.Microservices + Events + Docker = A Perfect TrioFirst hour with a Kaggle ChallengeFlink Deep Dive - Concepts and Real ExamplesScalable Stream Processing: A Survey of Storm, Samza, Spark and Flink by Felix GessertWhat Does It Take To Be An Expert At Python?How Netflix Thinks of DevOpsNCBI Minute: The NCBI Application Programming Interfaces (APIs)Analyzing Biomedical and Clinical Text with the Stanza Python NLP Library | Healthcare NLP SummitLecture 7 – Relation Extraction | Stanford CS224U: Natural Language Understanding | Spring 2019End-to-End ML pipelines with Beam, Flink, TensorFlow, and Hopsworks (Beam Summit Europe 2019)