Загрузка страницы

Running loops in parallel in R using foreach

Loops have a bad reputation in R for being slow. In many cases, loops can be avoided using vectorized functions or apply functions like lapply or the map family of functions from the purrr package.

However, if you encounter R code that runs too slowly because of loops and you find it hard to rewrite the code to avoid loops, a quicker, yet powerful approach may be to make the loops run in parallel. We can do that using the foreach package by Michelle Wallig and Steve Weston.

We compare Base R's for loops to the foreach approach. A strength of the latter is that it automatically creates a return object (default: a list), which is not the case in Base R. (It's possible to customize that, which I don't do in the video.) Benchmarking shows a great speed improvement for parallelized loops compared to loops running sequentially. However, the clusterApply() approach is still a bit faster in our use case, which runs 200 regression models and returns model summaries.

Check out foreach's documentation: It contains well-written vignettes - see help(package = "foreach"). A powerful concept I don't mention in the video is iterators, which allow you to efficiently manage what is sent to the workers in each iteration, to minimize data transfer overhead.

Note that not all loops are suited for running in parallel: especially if each iteration depends on results of previous iterations, as may be the case in simulations. Here, we assume that each iteration runs independently of other iterations.

Code can be found here:
https://github.com/fjodor/parallelization

Here's the video that explains parallel::clusterApply() in more detail:
https://youtu.be/leoEacKLotA

Thumbnail image: Chait Goli from Pexels

Contact me, e. g. to discuss (online) R workshops / trainings / webinars:

LinkedIn: https://www.linkedin.com/in/wolfriepl/
Twitter: https://twitter.com/StatistikInDD
Xing: https://www.xing.com/profile/Wolf_Riepl
Facebook: https://www.facebook.com/statistikdresden/

https://statistik-dresden.de/kontakt
R Workshops: https://statistik-dresden.de/r-schulungen
Blog (German, translate option): https://statistik-dresden.de/statistik-blog

Playlist: Music chart history
https://www.youtube.com/playlist?list=PL4ZUlAlk7QidRlzHEiHX09htXMAbxTpjW

Видео Running loops in parallel in R using foreach канала StatistikinDD
Показать
Комментарии отсутствуют
Введите заголовок:

Введите адрес ссылки:

Введите адрес видео с YouTube:

Зарегистрируйтесь или войдите с
Информация о видео
20 декабря 2020 г. 23:06:41
00:08:53
Яндекс.Метрика