10 Julia Packages You Should Learn for Data Science (in 2020)
Subscribe to RichardOnData here: https://www.youtube.com/channel/UCKPyg5gsnt6h0aA8EBw3i6A?sub_confirmation=1
In this video, I discuss ten Julia packages that any aspiring or current data scientist who is picking up Julia should be acquainted with in 2020. While Julia is a general purpose programming language, I am targeting people with interest in data science here.
1. IJulia
This package enables use of Jupyter notebooks or JupyterLab. This is a helpful environment for programming in as well as creating reports and outputting them to HTML, MD, PDF, etc. It also provides options for other Julia kernels.
Documentation: https://www.juliaobserver.com/packages/IJulia
2. DataFrames
This is Julia's answer to Pandas in Python, or tidyr/dplyr in R's tidyverse. It provides the DataFrame object, which will be the basis for much data analysis and wrangling, and provides functionalities for selecting columns, filtering rows, sorting datasets, creating new variables, joins, converting datasets from wide to long, etc. You can use the CSV package to read in datasets or create them yourself. It also has functions inspired by Hadley Wickham's Split-Apply-Combine approach and a very helpful describe() function.
Documentation: https://juliadata.github.io/DataFrames.jl/stable/
3. Plots
This is a very basic and easy to use visualization library which can be thought as an interpreter for various other plotting libraries. It supports various different backends, most notably Plotly. It is very customizable, offering options for layouts, colors, attributes, and objects. Note there are also "recipes" (extensions of the Plots framework) that enable Plots to perform different plot commands, use different functions, and handle different data types.
Documentation: http://docs.juliaplots.org/latest/tutorial/
4. VegaLite
This is my personal favorite visualization library for Julia, even moreso than the Gadfly library. It functions through a grammar of graphics framework, with core macro @vlplot.
Documentation: https://www.queryverse.org/VegaLite.jl/stable/
5. RCall
As the name suggests, the RCall package enables the use of R code in Julia, either from the Juno REPL or from Jupyter. It is particularly helpful because objects can be created using R and passed to Julia functions or vice versa.
Documentation: http://juliainterop.github.io/RCall.jl/stable/
6. Distributions
This package can be used for creating statistical distribution objects as well as sampling from them. This includes the Normal, Exponential, Uniform, Binomial, Gamma distributions and more. Another very helpful feature is finding the best fit from a theoretical distribution using the empirical distribution.
Documentation: https://juliastats.org/Distributions.jl/latest/
7. PrettyTables
This package can be used for formatting tables, using either text, HTML, or LaTeX backends. It is also customizable for options like alignment, printing rows satisfying certain conditions, etc.
Documentation: https://ronisbr.github.io/PrettyTables.jl/stable/
8. GLM
The GLM package is helpful for creating either a linear regression model with extractable methods (R2, estimates of coefficients, etc.) or other generalized linear models.
Documentation: https://juliastats.org/GLM.jl/stable/manual/
9. ScikitLearn
The ScikitLearn package from Python has an implementation in Julia, and it is just as useful there, working quite similarly but also offering new Julia based methods on top of standard Python methods. Types of models include supervised learning, unsupervised learning, and dataset transformations; the package also offers capabilities for cross-validation, tuning hyperparameters, etc.
Documentation: https://scikitlearnjl.readthedocs.io/en/latest/quickstart/
10. Flux
Flux is a Julia package for machine learning and deep learning needs. This provides a lot of flexibility, utilizing a key feature of taking gradients of other Julia code. Features include: defining loss functions and gradient descent, building layers of models, regularization, and training models. This is a fairly technical package but comes with a repository called the "model zoo" which does a nice job showcasing the package's capabilities.
Documentation: https://fluxml.ai/Flux.jl/stable/
Model Zoo: https://github.com/FluxML/model-zoo
Видео 10 Julia Packages You Should Learn for Data Science (in 2020) канала RichardOnData
In this video, I discuss ten Julia packages that any aspiring or current data scientist who is picking up Julia should be acquainted with in 2020. While Julia is a general purpose programming language, I am targeting people with interest in data science here.
1. IJulia
This package enables use of Jupyter notebooks or JupyterLab. This is a helpful environment for programming in as well as creating reports and outputting them to HTML, MD, PDF, etc. It also provides options for other Julia kernels.
Documentation: https://www.juliaobserver.com/packages/IJulia
2. DataFrames
This is Julia's answer to Pandas in Python, or tidyr/dplyr in R's tidyverse. It provides the DataFrame object, which will be the basis for much data analysis and wrangling, and provides functionalities for selecting columns, filtering rows, sorting datasets, creating new variables, joins, converting datasets from wide to long, etc. You can use the CSV package to read in datasets or create them yourself. It also has functions inspired by Hadley Wickham's Split-Apply-Combine approach and a very helpful describe() function.
Documentation: https://juliadata.github.io/DataFrames.jl/stable/
3. Plots
This is a very basic and easy to use visualization library which can be thought as an interpreter for various other plotting libraries. It supports various different backends, most notably Plotly. It is very customizable, offering options for layouts, colors, attributes, and objects. Note there are also "recipes" (extensions of the Plots framework) that enable Plots to perform different plot commands, use different functions, and handle different data types.
Documentation: http://docs.juliaplots.org/latest/tutorial/
4. VegaLite
This is my personal favorite visualization library for Julia, even moreso than the Gadfly library. It functions through a grammar of graphics framework, with core macro @vlplot.
Documentation: https://www.queryverse.org/VegaLite.jl/stable/
5. RCall
As the name suggests, the RCall package enables the use of R code in Julia, either from the Juno REPL or from Jupyter. It is particularly helpful because objects can be created using R and passed to Julia functions or vice versa.
Documentation: http://juliainterop.github.io/RCall.jl/stable/
6. Distributions
This package can be used for creating statistical distribution objects as well as sampling from them. This includes the Normal, Exponential, Uniform, Binomial, Gamma distributions and more. Another very helpful feature is finding the best fit from a theoretical distribution using the empirical distribution.
Documentation: https://juliastats.org/Distributions.jl/latest/
7. PrettyTables
This package can be used for formatting tables, using either text, HTML, or LaTeX backends. It is also customizable for options like alignment, printing rows satisfying certain conditions, etc.
Documentation: https://ronisbr.github.io/PrettyTables.jl/stable/
8. GLM
The GLM package is helpful for creating either a linear regression model with extractable methods (R2, estimates of coefficients, etc.) or other generalized linear models.
Documentation: https://juliastats.org/GLM.jl/stable/manual/
9. ScikitLearn
The ScikitLearn package from Python has an implementation in Julia, and it is just as useful there, working quite similarly but also offering new Julia based methods on top of standard Python methods. Types of models include supervised learning, unsupervised learning, and dataset transformations; the package also offers capabilities for cross-validation, tuning hyperparameters, etc.
Documentation: https://scikitlearnjl.readthedocs.io/en/latest/quickstart/
10. Flux
Flux is a Julia package for machine learning and deep learning needs. This provides a lot of flexibility, utilizing a key feature of taking gradients of other Julia code. Features include: defining loss functions and gradient descent, building layers of models, regularization, and training models. This is a fairly technical package but comes with a repository called the "model zoo" which does a nice job showcasing the package's capabilities.
Documentation: https://fluxml.ai/Flux.jl/stable/
Model Zoo: https://github.com/FluxML/model-zoo
Видео 10 Julia Packages You Should Learn for Data Science (in 2020) канала RichardOnData
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
A programming language to heal the planet together: Julia | Alan Edelman | TEDxMITIntro to Julia Programming Language with Detroit Tech WatchMachine Learning Zero to Hero (Google I/O'19)Can You Become a Data Analyst/Scientist With No College Degree?Should You Learn JavaScript in 2021? (for Data Science)10 R Packages You Should Know in 2020Should You Learn SAS in 2020? (for Data Science)JuliaCon 2020 | State of Julia | Jeff Bezanson & Stefan KarpinskiThe Julia Programming Language in 2020 (for Data Science)Beyond Deep Learning - Differentiable Programming with Flux - Avik Sengupta | ODSC Europe 2019TensorFlow.jl: A Julia Front End to the TensorFlow World (TF Dev Summit '19)A Day in the Life of a London Data Scientist (working remotely during coronavirus)36. Alan Edelman and Julia LanguageWhat Is the Best Degree for Data Science? (in 2021)R vs Python for Data Science, Data Analytics, Machine Learning Building Apps, Moving to ProductionJuliaCon 2020 | Interactive notebooks ~ Pluto.jl | Fons van der PlasJulia; VS Code; Supply-Demand; Mortgage; CairoMakie | Tutorial 2 / 13 | Julia Analysis for BeginnersMarco Cusumano-Towner: "Gen: A Flexible System for Programming Probabilistic AI"Introduction to DataFrames.jl | Week 4 | 18.S191 MIT Fall 2020How I Would Learn Data Science (If I Had to Start Over)