Online Learning Curriculum for Data Scientists

Parallel programming allows you to speed up your code execution - very useful for data science and data processing

“Is there any online reading or courses I can do to get into data analysis?”

At my workplace, I get asked the question above. The question is usually posed by people typically with a finance background, who’s working as a management consultant. In this post I propose a learning path for such people to “get into data analysis”.  I will assume that the prospective student someone with decent Excel skills, not afraid of a VLOOKUP or a touch of VB, and can throw together decent plots / dashboards using the same Microsoft package, but has little or no knowledge of programming / command line operations.

A data scientist can be defined by Drew Conway‘s Data Science Venn diagram which suggests that data scientists must have a solid mathematical background, skills in coding and computer hacking, and a healthy mix of subject matter expertise.

Data science venn diagram

The courses mentioned below are by no means a “over a weekend” type of engagement – if you are serious about entering the world of data science as a profession, allow yourself at least 3-6 months to complete and study the content of the courses below.

  1. Learn to program.

    R and Python are the two primary scripting languages that are taking over the world of data science. There is very little that can not be done with knowledge of these two languages, and I would recommend getting to grips with both during your learning. R is a statistical programming language that has a huge number of packages available for every function you could think of. Python is a more general language that has data science capabilities built through the numpy and scipy libraries.

    Course

    Description

    Try R Start your journey into R and data visualisation with the “Try R” free online course from CodeSchool.com. Learn the basic syntax and get loading and plotting small data sets.
    Computing for Data Analysis Augment your fundamental R knowledge with “Computing for Data Analysis” at Coursera.org.
    Python Track Take a trip into Python and get top grips with the basic syntax with the Python track at Codeacademy.com
    Introduction to Computer Science Expand this preliminary Python know-how with a fully blown project to create a working search engine at Udacity.com’s Introduction to Computer Science
  2. Learn some maths.

    Data scientists are one part statisticians. To gather meaningful information from large data sets requires skills in summarising and correlating variables on a regular basis. A solid understanding of the maths behind statistical transformations and machine learning techniques ensures that results are valid and immune to scrutiny. Note that a lot of the necessary statistics and maths knowledge can be picked up from the machine learning-focussed courses.

    Course

    Description

    Introduction to Statistics  Start off with some preliminary statistics at Udacity.com’s “Introduction to Statistics”
    Statistics One  Go a bit deeper with “Statistics One” from Princeton at Coursera.org
  3. Learn machine learning and data visualisation.

    The core information that separates data scientists from data analysts is the ability to move beyond reporting and applying more sophisticated analytical techniques to model variance, extract meaning, and predict variables of interest, using your data.

    Course Name

    Description

    Data Analysis Start with the excellent “Data Analysis” course at Coursera.org that will give you direct experience in loading, visualising, and modelling of real data sets using R. This course is considerably more advanced than the previous “computing for data analysis”, and covers some data analysis techniques, and focuses on teaching students how to structure data analysis reports.
    Machine Learning Make sure that you take the brilliant and MOOC-starting “Machine Learning” or “ml-class” course with Andrew Ng at Coursera.org. Python skills are a must for this course that covers linear algebra, regression, neural networks, support vector machine, and recommender systems among others. Andrew Ng provides an excellent background for the topics that are covered.
    Artificial Intelligence for Robotics  Sebastian Thrun‘s “Artificial Intelligence for Robotics” class is a brilliant introduction to more applied machine learning techniques such as the Kalman Filter and Particle Filters. While perhaps slightly off-topic, the course has a range of interesting and worthwhile Python-based exercises that will only add to your learning journey.
    Algorithms / Neural Networks  More detailed specific-topic courses can be taken in Algorithms and Network Analysis at Udacity, or Neural Networks for Machine Learning – both of which I’ve personally found useful. The Neural Networks course dips into the realm of “Deep Learning”, a hot, but advanced, topic in machine learning at Google and Facebook at the moment.
    Introduction to Hadoop and MapReduce  At some point, you’re going to need to wet your toes with some Big Data, Hadoop, and MapReduce knowledge – Get a basic introduction with “Introduction to Hadoop and MapReduce” at Udacity.com, in conjunction with Cloudera.

When you have completed the majority of the courses listed above, you’ll be in a very strong position to put your knowledge to use. And practice is the key. Get on Kaggle, download a data set, and get involved!!

Other helpful links:

Subscribe
Notify of

15 Comments
Inline Feedbacks
View all comments

Nice article!

P.S.
“Machine Learning” class uses Octave not Python.

Thanks for providing us with the course curriculum.
This course is designed to introduce students to the data management, storage and manipulation tools common in data science and will apply those tools to real scenarios. An overview of different SQL and No-SQL database technologies is presented and the course finishes with a discussion of choosing the appropriate tool to get the job done.
Topics include:

Introduction to data (data types, data movement, terminology, etc.)
Storage and Concurrency Preliminaries
Files and File-based data systems
Relational Database Management Systems
Hadoop Introduction
NoSQL – MapReduce vs. Parallel RDBMS
Search and Text Analysis
Thanks! http://www.intellipaat.com/

Very useful material to learn the differences between data analysis and machine learning.

[…] are multiple ways to select and index rows and columns from Pandas DataFrames, and as a data scientist, I usually need to select rows of data with subsets of columns as necessary. I find tutorials […]

Hi Shane, I’m currently taking “Introduction to R” on http://www.datacamp.com. I’m wondering If you have an opinion on sites like http://www.datacamp.com and http://www.dataquest.io for learning data science in a hands-on manner?

Should we follow in the order 1,2,3 or can start learning simultaneously

I curious more interest in some of them hope you will give more information on this topic Data analytics courses in your next articles.

I have been searching to find a comfort or effective procedure to complete this process and I think this is the most suitable way to do it effectively.
Please check ExcelR Data Science Certification

Thanks for giving me the time to share such nice information. Thanks for sharing.

Data Science Course

Very nice blogs!!! i have to learning for lot of information for this sites…Sharing for wonderful information.Thanks for sharing this valuable information to our vision. You have posted a trust worthy blog keep sharing, <a href=”https://www.excelr.com/data-science-course-training-hyderabad”>data science course in hyderabad with placements</a> 

Amazing Article ! I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
<a href=”https://www.excelr.com/blogs”>Simple Linear Regression</a>
<a href=”https://www.excelr.com/blogs”>Correlation vs covariance</a>
<a href=”https://www.excelr.com/mock-interview/data-science-interview-questions”>data science interview questions</a>
<a href=”https://www.excelr.com/blog/data-science/machine-learning-supervised/understanding-the-concept-of-knn-algorithm-using-r”>KNN Algorithm</a>

After reading your article I was amazed. I know that you explain it very well. And I hope that other readers will also experience how I feel after reading your article. <a href=”https://360digitmg.com/india/certification-program-in-cyber-security-analytics”>corporate training/a>

Well-written and clear explanation. I really liked it. Thanks for sharing this amazing blog. Keep sharing!
AI Patasala-Data Science Course in Hyderabad