Library

Course: Learn By Example: Statistics and Data Science in R

Learn By Example: Statistics and Data Science in R

  • Life Time Access
  • Certificate on Completion
  • Access on Android and iOS App
About this Course

Taught by a Stanford-educated, ex-Googler and an IIT, IIM - educated ex-Flipkart lead analyst. This team has decades of practical experience in quant trading, analytics and e-commerce. 

This course is a gentle yet thorough introduction to Data Science, Statistics and R using real life examples. 

Let’s parse that.

  • Gentle, yet thorough: This course does not require a prior quantitative or mathematics background. It starts by introducing basic concepts such as the mean, median etc and eventually covers all aspects of an analytics (or) data science career from analysing and preparing raw data to visualising your findings. 
  • Data Science, Statistics and R: This course is an introduction to Data Science and Statistics using the R programming language. It covers both the theoretical aspects of Statistical concepts and the practical implementation using R. 
  • Real life examples: Every concept is explained with the help of examples, case studies and source code in R wherever necessary. The examples cover a wide array of topics and range from A/B testing in an Internet company context to the Capital Asset Pricing Model in a quant finance context. 

What's Covered:

  • Data Analysis with R: Datatypes and Data structures in R, Vectors, Arrays, Matrices, Lists, Data Frames, Reading data from files, Aggregating, Sorting & Merging Data Frames
  • Linear Regression: Regression, Simple Linear Regression in Excel, Simple Linear Regression in R, Multiple Linear Regression in R, Categorical variables in regression, Robust regression, Parsing regression diagnostic plots
  • Data Visualization in R: Line plot, Scatter plot, Bar plot, Histogram, Scatterplot matrix, Heat map, Packages for Data Visualisation : Rcolorbrewer, ggplot2
  • Descriptive Statistics: Mean, Median, Mode, IQR, Standard Deviation, Frequency Distributions, Histograms, Boxplots
  • Inferential Statistics: Random Variables, Probability Distributions, Uniform Distribution, Normal Distribution, Sampling, Sampling Distribution, Hypothesis testing, Test statistic, Test of significance

Using discussion forums

Please use the discussion forums on this course to engage with other students and to help each other out. Unfortunately, much as we would like to, it is not possible for us at Loonycorn to respond to individual questions from students:-(

We're super small and self-funded with only 2 people developing technical video content. Our mission is to make high-quality courses available at super low prices.

The only way to keep our prices this low is to *NOT offer additional technical support over email or in-person*. The truth is, direct support is hugely expensive and just does not scale.

We understand that this is not ideal and that a lot of students might benefit from this additional support. Hiring resources for additional support would make our offering much more expensive, thus defeating our original purpose.

It is a hard trade-off.

Thank you for your patience and understanding!

Who is the target audience?

  • Yep! MBA graduates or business professionals who are looking to move to a heavily quantitative role
  • Yep! Engineers who want to understand basic statistics and lay a foundation for a career in Data Science
  • Yep! Analytics professionals who have mostly worked in Descriptive analytics and want to make the shift to being modelers or data scientists
  • Yep! Folks who've worked mostly with tools like Excel and want to learn how to use R for statistical analysis
Basic knowledge
  • No prerequisites : We start from basics and cover everything you need to know. We will be installing R and RStudio as part of the course and using it for most of the examples. Excel is used for one of the examples and basic knowledge of excel is assumed.
What you will learn
  • Harness R and R packages to read, process and visualize data
  • Understand linear regression and use it confidently to build models
  • Understand the intricacies of all the different data structures in R
  • Use Linear regression in R to overcome the difficulties of LINEST() in Excel
  • Draw inferences from data and support them using tests of significance
  • Use descriptive statistics to perform a quick study of some data and present results
Curriculum
Lectures quantity: 82
Common duration: 09:07:14
Introduction
  • You, This course and Us  

    This course is a gentle yet thorough introduction to Data Science, Statistics and R using real life examples.

  • Top Down vs Bottoms Up : The Google vs McKinsey way of looking at data  

    Q. How do companies make decisions? 

    A. Using data

    We talk about what it takes to go from data to making a decision from data. This sets the agenda for the rest of the course - each of the things on this journey is covered in the upcoming sections

  • R and RStudio installed  

    Get setup with R and Rstudio. All the examples that follow in this course will have source code attached. Download and run them in Rstudio


The 10 second answer : Descriptive Statistics
  • Descriptive Statistics : Mean, Median, Mode  

    Bosses are impatient. They often want you to cut to the chase, and give them an answer that's ok, but in a short amount of time. Descriptive statistics are the first place to start - they are often the 10s answer to any question about the data. 

  • Our first foray into R : Frequency Distributions  

    Computing a frequency distribution using R


  • Draw your first plot : A Histogram  

    A histogram is a good visual summary of your data. 


  • Computing Mean, Median, Mode in R  

    Computing Mean, Median, Mode in R

  • What is IQR (Inter-quartile Range)?  

    The mean, median and mode are point estimates to represent your data. IQR is a measure that explains the spread of the data.

    What is IQR (Inter-quartile Range)? 


  • Box and Whisker Plots  

    Visualize the IQR and outliers using box and whisker plots

  • The Standard Deviation  

    The standard deviation measures the spread of a dataset, and it so happens, the standard deviation is actually very profound.

  • Computing IQR and Standard Deviation in R  

    Computing IQR and Standard Deviation in R

Inferential Statistics
  • Drawing inferences from data  

    Drawing inferences from data is key to being able to take decisions using data. There is a science to this, whose foundation is in random variables, probability distributions, and performing tests of statistical significance. 

  • Random Variables are ubiquitous  

    Random variables are everywhere. Any data that you'll study is a random variable whose behaviour is determined by a probability distribution.

  • The Normal Probability Distribution  

    The Normal Distribution is arguably the most well-known and commonly seen probability distribution. It is characterized by its probability density function, mean and standard deviation.


  • Sampling is like fishing  

    Sampling is a little like fishing. Sampling is crucial to induction - drawing conclusions about something by looking at some evidence.

  • Sample Statistics and Sampling Distributions  

    A sample is described by sample statistics like the sample mean. The sampling distribution is the probability distribution of sample means. 

Case studies in Inferential Statistics
  • Case Study 1 : Football Players (Estimating Population Mean from a Sample)  

    Find a point estimate for the average weight of all football players using a sample of football players in 1 college team.

  • Case Study 2 : Election Polling (Estimating Population Proportion from a Sample)  

    Find a point estimate for the % of voters in favor of a candidate.

  • Case Study 3 : A Medical Study (Hypothesis Test for the Population Mean)  

    A test of significance is an important step in building support for your findings and inferences. Here is the first example of a test of significance - is the population mean equal to a given value? 

  • Case Study 4 : Employee Behavior (Hypothesis Test for the Population Proportion)  

    Perform a test of significance to check whether the population % is equal to a certain value

  • Case Study 5: A/B Testing (Comparing the means of two populations)  

    Perform a test of significance to compare 2 population means. The example used is A/B Testing - which is pretty widely used in internet companies to test out product features.

  • Case Study 6: Customer Analysis (Comparing the proportions of 2 populations)  

    Perform a test of significance to compare two population proportions

Diving into R
  • Harnessing the power of R  

    The next few sections dive deep into all the data processing, slicing and dicing ability that R provides. The wide variety of R packages available is one reason why R is popular among many data scientists. 

  • Assigning Variables  

    Let's start with the basics. What are variables and how do we assign variables in R? 

  • Printing an output  

    print(), show(), message(), cat() are different ways to print something to screen. 

  • Numbers are of type numeric  

    Numbers are of type numeric 


  • Characters and Dates  

    R has built-in datatypes for dates and timestamps. 


  • Logicals  

    Logical is a datatype that is the result of conditional tests in R

Vectors
  • Data Structures are the building blocks of R  

    The wide variety of built-in data structures are what makes R different from other standard programming languages. These include vectors, arrays, matrices, data frames and lists. 

  • Creating a Vector  


    Creating a Vector

  • The Mode of a Vector  

    The mode of a vector is the datatype of all its elements. 

  • Vectors are Atomic  

    Vectors are Atomic

  • Doing something with each element of a Vector  

    Doing something with each element of a Vector

  • Aggregating Vectors  

    Finding the sum, product, or mean of a vector

  • Operations between vectors of the same length  


    Operations between vectors of the same length

  • Operations between vectors of different length  

    Operations between vectors of different length

  • Generating Sequences  

    Generate sequences using the : operator, rep() and seq()

  • Using conditions with Vectors  

    Using conditions with Vectors

  • Find the lengths of multiple strings using Vectors  

    Find the lengths of multiple strings using Vectors

  • Generate a complex sequence (using recycling)  


    Generate a complex sequence (using recycling)

  • Vector Indexing (using numbers)  

    Access elements based on their position in the vector.

  • Vector Indexing (using conditions)  

    Access elements based on whether they pass a conditional test. 

  • Vector Indexing (using names)  

    Assign names to the elements of a vector

Arrays
  • Creating an Array  

    Creating an array can be done by using a vector and then arranging it along dimensions.

  • Indexing an Array  


    Indexing an Array

  • Operations between 2 Arrays  

    Operations between 2 Arrays

  • Operations between an Array and a Vector  

    Operations between an Array and a Vector


  • Outer Products  

    Outer products are complex operations that operate on every pair of elements from two arrays.

Matrices
  • A Matrix is a 2-Dimensional Array  

    A Matrix is a 2 Dimensional array. But it has special meaning and can be interpreted in a bunch of different ways.

  • Creating a Matrix  

    Creating a Matrix

  • Matrix Multiplication  


    Matrix Multiplication

  • Merging Matrices  

    rbind() and cbind() to merge matrices.


  • Solving a set of linear equations  


    Solving a set of linear equations

Factors
  • What is a factor?  

    A factor is a special type of vector used to represent categorical variables

  • Find the distinct values in a dataset (using factors)  


    Find the distinct values in a dataset (using factors)

  • Replace the levels of a factor  


    Replace the levels of a factor


  • Aggregate factors with table()  

    Aggregate factors with table()

  • Aggregate factors with tapply()  


    Aggregate factors with tapply()

Lists and Data Frames
  • Introducing Lists  

    Lists are fundamentally different from vectors, arrays and matrices - which are all homogenous data structures.

  • Introducing Data Frames  

    Data Frames are how R stores data read from files and databases.

  • Reading Data from files  


    Reading Data from files

  • Indexing a Data Frame  

    Indexing a Data Frame

  • Aggregating and Sorting a Data Frame  

    Using the aggregate() and order() function

  • Merging Data Frames  

    Merge data frames based on one or more common columns

Regression quantifies relationships between variables
  • Introducing Regression  

    Regression is the process of finding a model that describes the relationship between variables. 

  • What is Linear Regression?  

    Linear regression is the process of fitting a line or a linear model that best explains the relationship between 2 variables. Understand what residuals are, the ordinary least squares method and R-Squared

  • A Regression Case Study : The Capital Asset Pricing Model (CAPM)  

    The Capital Asset Pricing Model describes a relationship between risk and return. Use it with regression to either find the risk or returns of a given stock. Regression is one of the ways to estimate the Beta in CAPM.

Linear Regression in Excel
  • Linear Regression in Excel : Preparing the data  

    Find the Beta of Google by regressing Google returns against NASDAQ returns. We describe how to find, and prepare the data for fitting a linear model. 

  • Linear Regression in Excel : Using LINEST()  

    LINEST() is a function in excel that fits a linear model for a given set of variables. However LINEST() has a bunch of issues, including its inability to deal with missing values.

Linear Regression in R
  • Linear Regression in R : Preparing the data  

    Find the Beta of Google by regressing Google returns against NASDAQ returns. We describe how process data frames and prepare the data for fitting a linear model. 

  • Linear Regression in R : lm() and summary()  

    lm() is used to build linear models in R. The results of lm() can be parsed using summary(). Building the linear model in R has a bunch of advantages over doing the same in Excel.

  • Multiple Linear Regression  

    Build a linear model with multiple independent variables : Regress the returns of an oil stock against S&P 500 and the returns of an exchange traded oil fund. 

  • Adding Categorical Variables to a linear model  

    We describe how categorical variables can be built into a linear model, and how to do this in R specifically

  • Robust Regression in R : rlm()  

    rlm() helps you build Robust linear models that downweight the influence of outliers.


  • Parsing Regression Diagnostic Plots  

    lm() returns a bunch of diagnostic plots that are used to validate the assumptions underlying linear regression - Q-Q plots, Scale-location and Cook's distance plots

Data Visualization in R
  • Data Visualization  

    Data Visualization gives you the power to effectively get your point across and to deeply understand your data.


  • The plot() function in R  

    The plot() function in R

  • Control color palettes with RColorbrewer  

    Control color palettes with RColorbrewer

  • Drawing barplots  

    Drawing barplots

  • Drawing a heatmap  


    Drawing a heatmap

  • Drawing a Scatterplot Matrix  


    Drawing a Scatterplot Matrix

  • Plot a line chart with ggplot2  

    ggplot2 is a pretty cool R package for complex 2D graphics. Plot the time series of 4 different stocks in the same graph. 

Reviews (0)