Загрузка страницы

10 Julia Packages You Should Learn for Data Science (in 2020)

Subscribe to RichardOnData here: https://www.youtube.com/channel/UCKPyg5gsnt6h0aA8EBw3i6A?sub_confirmation=1

In this video, I discuss ten Julia packages that any aspiring or current data scientist who is picking up Julia should be acquainted with in 2020. While Julia is a general purpose programming language, I am targeting people with interest in data science here.

1. IJulia
This package enables use of Jupyter notebooks or JupyterLab. This is a helpful environment for programming in as well as creating reports and outputting them to HTML, MD, PDF, etc. It also provides options for other Julia kernels.
Documentation: https://www.juliaobserver.com/packages/IJulia

2. DataFrames
This is Julia's answer to Pandas in Python, or tidyr/dplyr in R's tidyverse. It provides the DataFrame object, which will be the basis for much data analysis and wrangling, and provides functionalities for selecting columns, filtering rows, sorting datasets, creating new variables, joins, converting datasets from wide to long, etc. You can use the CSV package to read in datasets or create them yourself. It also has functions inspired by Hadley Wickham's Split-Apply-Combine approach and a very helpful describe() function.
Documentation: https://juliadata.github.io/DataFrames.jl/stable/

3. Plots
This is a very basic and easy to use visualization library which can be thought as an interpreter for various other plotting libraries. It supports various different backends, most notably Plotly. It is very customizable, offering options for layouts, colors, attributes, and objects. Note there are also "recipes" (extensions of the Plots framework) that enable Plots to perform different plot commands, use different functions, and handle different data types.
Documentation: http://docs.juliaplots.org/latest/tutorial/

4. VegaLite
This is my personal favorite visualization library for Julia, even moreso than the Gadfly library. It functions through a grammar of graphics framework, with core macro @vlplot.
Documentation: https://www.queryverse.org/VegaLite.jl/stable/

5. RCall
As the name suggests, the RCall package enables the use of R code in Julia, either from the Juno REPL or from Jupyter. It is particularly helpful because objects can be created using R and passed to Julia functions or vice versa.
Documentation: http://juliainterop.github.io/RCall.jl/stable/

6. Distributions
This package can be used for creating statistical distribution objects as well as sampling from them. This includes the Normal, Exponential, Uniform, Binomial, Gamma distributions and more. Another very helpful feature is finding the best fit from a theoretical distribution using the empirical distribution.
Documentation: https://juliastats.org/Distributions.jl/latest/

7. PrettyTables
This package can be used for formatting tables, using either text, HTML, or LaTeX backends. It is also customizable for options like alignment, printing rows satisfying certain conditions, etc.
Documentation: https://ronisbr.github.io/PrettyTables.jl/stable/

8. GLM
The GLM package is helpful for creating either a linear regression model with extractable methods (R2, estimates of coefficients, etc.) or other generalized linear models.
Documentation: https://juliastats.org/GLM.jl/stable/manual/

9. ScikitLearn
The ScikitLearn package from Python has an implementation in Julia, and it is just as useful there, working quite similarly but also offering new Julia based methods on top of standard Python methods. Types of models include supervised learning, unsupervised learning, and dataset transformations; the package also offers capabilities for cross-validation, tuning hyperparameters, etc.
Documentation: https://scikitlearnjl.readthedocs.io/en/latest/quickstart/

10. Flux
Flux is a Julia package for machine learning and deep learning needs. This provides a lot of flexibility, utilizing a key feature of taking gradients of other Julia code. Features include: defining loss functions and gradient descent, building layers of models, regularization, and training models. This is a fairly technical package but comes with a repository called the "model zoo" which does a nice job showcasing the package's capabilities.
Documentation: https://fluxml.ai/Flux.jl/stable/
Model Zoo: https://github.com/FluxML/model-zoo

Видео 10 Julia Packages You Should Learn for Data Science (in 2020) канала RichardOnData
Показать
Комментарии отсутствуют
Введите заголовок:

Введите адрес ссылки:

Введите адрес видео с YouTube:

Зарегистрируйтесь или войдите с
Информация о видео
4 июня 2020 г. 22:00:37
00:19:33
Яндекс.Метрика