Data Scientist’s Toolbox

Gitanjali Mule
4 min readFeb 26, 2022
Photo by Joshua Sortino on Unsplash

This is the first blog Data Science Specialization series. Introduction of the series is here. Data Science Specialization by John Hopkins University. Please do read it first before you start reading this one. I highly recommend it.

The course is divided into 4 weeks on the coursera platform. Each week they cover some topics.

Week 1:

The first week covers background of data science. brief history, what is data science, what is data, what is the data science process. This week is very brief, some of the videos I watched while cooking as well. Very much like FYI.

Week 2:

the second week is more of r language and RStudio. it starts with how to install R and RStudio on both mac and windows. It then gives you a RStudio tour which I found very informative. If you get stuck in RStudio then you can definitely come back to this video to just find what you need. No need to google or watch some 30 mins YouTube videos. Important part of the week 2 content is the r package. how to install them and use them in r programming and secondly the R project. I always underestimated the R project but recently I realized it is very important to get used to with starting your work in R project. It is the easiest way to implement Git/ version control. Keep track of your files and everything.

Week 3:

Now let’s come to the week 3 of this course. of course, we go over the version control system Git in this. Easy setup with GitHub. And then how you can stage, commit, push files to GitHub. Trust me on this, make a habit of uploading your coding documents to GitHub.

Week 4:

Once you are all set with your toolbox (R, RStudio, Git, GitHub), It will deep dive into RMarkdowns, Types of Data Analysis, Big Data and Experimental Design. Getting familiar with Rmarkdown will help you in future. Just play around in markdown. Another key takes away from this week’s content is Types of Data Analysis. There are about 6 types of data analysis which are as follows:

  • Descriptive: The goal is to describe or summarize a data set. Mean, median mode or measure of variance as standard deviation, variance, range etc.
  • Exploratory: Here the goal is to examine or explore the data and find relationships that weren’t previously known. It explores how different measures might be related to each other but do not confirm that relations as causative.

Correlation does not imply causation

  • Inferential: The goal of inferential analyses is to use relatively small sample of data to infer or say something about the population at large. example of this could be a study of data that has been collected from US population to infer how air pollution might be impacting life expectancy in the entire US.
  • Predictive: This analysis is in its name. The goal is to use current data to make predictions about future data. Essentially, you are using current and historic data to find patterns and predict the likelihood of future outcomes.
  • Causal: All of the above analysis that we have looked so far is that we can only see correlations and can’t get at the cause of the relationships we observe. Casual analysis fills that gap, the goal of the causal analysis is to see what happens to one variable when we manipulate another variable — looking at the cause and effect of a relationship.
  • Mechanistic: Mechanistic analyses are not nearly as commonly used as the previous analyses — the goal of mechanistic analysis is to understand the exact changes in variables that lead to exact changes in other variables.

Once you are done with the type you have a brief introduction of big data and How volume, variety, and velocity of data has made this field complex and how it created the opportunity for data science field.

Quizzes and Assignments:

Data Scientist toolbox course has about 4 quizzes and 1 assignment which is very simple. Each quiz on an average has about 3 - 4 questions. Assignment is to see if you are all set with RStudio and GitHub.

The course basically prepares your mindset about Data Science and set you up with all the tools you will be needing in your device to start this journey

I specially want to mention this, in one of the videos, it touches the on a case study by Dr. Hilary Parker, and it really gives you an idea how the process is. I found it very inspiring and motivating as it builds up your curiosity and also makes you very interested on how it is done. I researched on her and I have made up my mind, if I could be like her, I will call myself successful in this field. I am highly inspired by her work. Dr. Peng and Dr. Parker have a podcast called Not So Standard Deviation on mostly all major podcast platforms. I have been hearing them and in just 2 episodes I have lot of takeaways already. I will be writing a blog on takeaways from those podcasts.

Though this course has been divided into 4 weeks, I was able to complete it within 10 days by studying for an average 40 mins per day.

The next course is on R programming, which I have already started. I will be writing about that in my next blog. the link will be here within couple of weeks.

Thank you for reading my blog. Welcome to Data Science Journey with me.

Keep Learning.

Connect me on 💻Twitter and LinkedIn. shoot a message to let me know you are coming after reading my blog. I would love to connect with you.

The Next blog in this series is here. Click the link.

R Programming: Learn R programming as Data Scientist | by Gitanjali Mule | Mar, 2022 | Medium

Connect with me through a mailing list by signing up with your email id

here 👇.

--

--

Gitanjali Mule

Data Analyst |Python | R | Tableau | Web Application Developer | 10k+ views