Загрузка страницы

A Study Pathway for Data Science in 2020 (7 Steps)

In this video I lay out a seven step pathway to becoming a successful and effective data scientist in the year 2020. I have laid out in another video that data science can be described as an intersection between: statistics, programming, communication, and domain knowledge. But that's just the high level overview, and it leaves open the question of: what are the MOST important skills to know, and what's an order of priority?

I do believe there are some universal items that EVERY data scientist must know. However, once some core fundamentals are put into place, there is some flexibility based on the breadth in the field and the flavor of data scientist that one wants to become. The first three steps are universal; the next four are in a rough order of priority but can be rearranged.

1. Statistics
Statistics will truly give you the "what" of data science, while domain knowledge provides the "why" and programming languages provide the "how". It is necessary because without it, you won't have the tools you need to creatively handle complex problems or make proper conclusions. In short, you should know probability, distributions, Bayes' rule, confidence intervals, and hypothesis testing so you know the fundamentals. You need to be able to reason your way through problems, and that comes by knowing concepts like confounding variables, Simpson's paradox, assumptions of tests, bias, and variance. You also need to know statistical tests, models, and survival analysis.

Coursera courses:
Duke: https://www.coursera.org/specializations/statistics
John Hopkins: https://www.coursera.org/specializations/jhu-data-science
University of Amsterdam: https://www.coursera.org/specializations/social-science

2. SQL
SQL should take priority over R or Python because first of all, it's easier. Also, it prepares you for the real world where data is truly messy and lives in a variety of environments. Additionally, work you do in R or Python tends to live downstream, and you can only start after you've used SQL to create a clean working dataset. You don't need to be a SQL guru, but you need to know how to query your data, join, use case when/exist statements, window functions, nested queries, etc.

3. One of R or Python
Pick one of these two and master it. If you are coming from a statistics background that will probably be R; if you are coming from a computer science background that will probably be Python. It is less important which one you pick and more that you master one of them rather than being mediocre at two things. You want to know one of these from beginning to end: so know the fundamentals of the language, how to tidy and manipulate your dataset, how to create visualizations, reports, models, etc. You also have key data science packages with which to familiarize yourself. If you're learning R, a good starting point is the Tidyverse. If you're learning Python, you want to know NumPy, Pandas, MatPlotLib, Seaborn, Scikit-Learn, and StatsModels.

At this point, if you know all three of the above, you will be very employable. But there is still much more to learn. The following order is my recommendation but you can rearrange.

Good book for R: https://amzn.to/3je8kK6
Python data science book: https://amzn.to/3cDXKcE

4. The other of R or Python
If you know BOTH R and Python it will be irrelevant that some companies stick to one infrastructure. This will make you massively employable.

5. Linear algebra
This helps you to innovatively create your own solutions. There is also massive crossover benefit to understanding statistics and machine learning. I recommend this book: https://amzn.to/2HEj4U4

6. UX/design principles
This will improve your ability to communicate with your client and create solutions (reports, visualizations, apps, etc.) that are useful for your actual user. I highly recommend the books "The Visual Display of Quantitative Information" by Edward Tufte or "Show Me the Numbers" by Stephen Few to better understand graphical principles.

Tufte book: https://amzn.to/3kVrR2o
Few book: https://amzn.to/3n2qTTU

7. Machine learning
I am saving this for last because it requires knowledge of other topics on here (statistics, R/Python, and linear algebra) and the importance is overstated. But it's undeniable the big push many firms are doing into this space. I highly recommend Andrew Ng's Coursera course: https://www.coursera.org/learn/machine-learning

This is not an exhaustive list of all items that are important for data science. But if you know the first three -- or certainly all seven -- then I can virtually guarantee you will be an extremely marketable and successful data scientist.

Видео A Study Pathway for Data Science in 2020 (7 Steps) канала RichardOnData
Показать
Комментарии отсутствуют
Введите заголовок:

Введите адрес ссылки:

Введите адрес видео с YouTube:

Зарегистрируйтесь или войдите с
Информация о видео
24 марта 2020 г. 1:46:41
00:16:29
Яндекс.Метрика