Running loops in parallel in R using foreach
Loops have a bad reputation in R for being slow. In many cases, loops can be avoided using vectorized functions or apply functions like lapply or the map family of functions from the purrr package.
However, if you encounter R code that runs too slowly because of loops and you find it hard to rewrite the code to avoid loops, a quicker, yet powerful approach may be to make the loops run in parallel. We can do that using the foreach package by Michelle Wallig and Steve Weston.
We compare Base R's for loops to the foreach approach. A strength of the latter is that it automatically creates a return object (default: a list), which is not the case in Base R. (It's possible to customize that, which I don't do in the video.) Benchmarking shows a great speed improvement for parallelized loops compared to loops running sequentially. However, the clusterApply() approach is still a bit faster in our use case, which runs 200 regression models and returns model summaries.
Check out foreach's documentation: It contains well-written vignettes - see help(package = "foreach"). A powerful concept I don't mention in the video is iterators, which allow you to efficiently manage what is sent to the workers in each iteration, to minimize data transfer overhead.
Note that not all loops are suited for running in parallel: especially if each iteration depends on results of previous iterations, as may be the case in simulations. Here, we assume that each iteration runs independently of other iterations.
Code can be found here:
https://github.com/fjodor/parallelization
Here's the video that explains parallel::clusterApply() in more detail:
https://youtu.be/leoEacKLotA
Thumbnail image: Chait Goli from Pexels
Contact me, e. g. to discuss (online) R workshops / trainings / webinars:
LinkedIn: https://www.linkedin.com/in/wolfriepl/
Twitter: https://twitter.com/StatistikInDD
Xing: https://www.xing.com/profile/Wolf_Riepl
Facebook: https://www.facebook.com/statistikdresden/
https://statistik-dresden.de/kontakt
R Workshops: https://statistik-dresden.de/r-schulungen
Blog (German, translate option): https://statistik-dresden.de/statistik-blog
Playlist: Music chart history
https://www.youtube.com/playlist?list=PL4ZUlAlk7QidRlzHEiHX09htXMAbxTpjW
Видео Running loops in parallel in R using foreach канала StatistikinDD
However, if you encounter R code that runs too slowly because of loops and you find it hard to rewrite the code to avoid loops, a quicker, yet powerful approach may be to make the loops run in parallel. We can do that using the foreach package by Michelle Wallig and Steve Weston.
We compare Base R's for loops to the foreach approach. A strength of the latter is that it automatically creates a return object (default: a list), which is not the case in Base R. (It's possible to customize that, which I don't do in the video.) Benchmarking shows a great speed improvement for parallelized loops compared to loops running sequentially. However, the clusterApply() approach is still a bit faster in our use case, which runs 200 regression models and returns model summaries.
Check out foreach's documentation: It contains well-written vignettes - see help(package = "foreach"). A powerful concept I don't mention in the video is iterators, which allow you to efficiently manage what is sent to the workers in each iteration, to minimize data transfer overhead.
Note that not all loops are suited for running in parallel: especially if each iteration depends on results of previous iterations, as may be the case in simulations. Here, we assume that each iteration runs independently of other iterations.
Code can be found here:
https://github.com/fjodor/parallelization
Here's the video that explains parallel::clusterApply() in more detail:
https://youtu.be/leoEacKLotA
Thumbnail image: Chait Goli from Pexels
Contact me, e. g. to discuss (online) R workshops / trainings / webinars:
LinkedIn: https://www.linkedin.com/in/wolfriepl/
Twitter: https://twitter.com/StatistikInDD
Xing: https://www.xing.com/profile/Wolf_Riepl
Facebook: https://www.facebook.com/statistikdresden/
https://statistik-dresden.de/kontakt
R Workshops: https://statistik-dresden.de/r-schulungen
Blog (German, translate option): https://statistik-dresden.de/statistik-blog
Playlist: Music chart history
https://www.youtube.com/playlist?list=PL4ZUlAlk7QidRlzHEiHX09htXMAbxTpjW
Видео Running loops in parallel in R using foreach канала StatistikinDD
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
Data Mining mit R und Shiny: Zusammenhänge erkennen, Zielgruppen findenHow to Plot Statistical Group Comparisons in R: ggstatsplotStatistics in R Made Easy: R Commander - A Graphical User InterfaceMachine Learning mit R und caret: GBM (Gradient Boosting Machine) vs. Random ForestDoes ChatGPT Understand Dad Jokes?Wie man R-Projekte vor Paket-Updates schützen kann: renvggplot2: Quick Intro to the Grammar of Graphics - Three Basic LayersMachine Learning-Algorithmen verstehen: InteraktionseffekteSäulendiagramm vs. Punktdiagramm (Dot plot) - irreführend vs. informativ?10 Reasons to use RStudioWie man Boxplots in R informativer macht (ggplot2 und Erweiterungspakete)Why You Should NOT use parallel::detectCores() in RDoubletten ausschließen in R: unique() und wie man es schneller machtWhat Can Go Wrong with Factors in R ProgrammingR für Umsteiger von Excel und SPSS: Automatisierte BerichteFarbskalen in R auswählen per App: Der Palette Explorer (tmaptools)How to Draw Barplots in R: Base R & ggplot2 (Ft. @StatisticsGlobe )Running R code in parallel using parallel::clusterApply()Storytelling mit R und ggplot2 #rstatsStatistik mit R leicht gemacht: Der R Commander - eine grafische Oberfläche