Course Content
Module 06 – Capstone Project
0/1
Private: AI & ML and Data Science Foundation Nascomm FSP

Central tendency and variability are fundamental concepts in statistics that provide insights into the distribution of a dataset.

 

Central Tendency:

This refers to measures that identify the centre or “middle” of a distribution. The primary measures of central tendency are:

 

  • Mean (or Average): It is the sum of all the values in a dataset divided by the number of values. It gives the arithmetic centre of the distribution.
  • Median: It is the middle value in a dataset when the values are arranged in ascending or descending order. If the dataset has an odd number of observations, the median is the middle number. If there’s an even number of observations, the median is the average of the two middle numbers.
  • Mode: It is the value that appears most frequently in a dataset. A distribution can be unimodal (one mode), bimodal (two modes), or multimodal (more than two modes). It is the value that appears most frequently in a dataset. A distribution can be unimodal (one mode), bimodal (two modes), or multimodal (more than two modes).
  • Skewness: Skewness measures the asymmetry of a distribution about its mean.
    • Positive Skewness: When the tail on the right side (i.e., larger values) of the distribution is longer than on the left side, the distribution is positively skewed. In such cases, the mean is greater than the median.
    • Negative Skewness: When the tail on the left side (i.e., smaller values) is longer than on the right side, the distribution is negatively skewed. Here, the mean is less than the median.
    • Zero Skewness: When the values are symmetrically distributed around the mean, skewness is zero, implying that the mean and median are equal.
  • Kurtosis: Kurtosis measures the “tailedness” of a distribution, i.e., the relative concentration of values in the centre, shoulders, and tails of a distribution compared to a normal distribution.
    • Leptokurtic (Kurtosis > 0): A distribution with positive kurtosis indicates that it has heavier tails and a sharper peak than a normal distribution. Such distributions are termed “leptokurtic.” They tend to have more extreme values (outliers).
    • Platykurtic (Kurtosis < 0): A distribution with negative kurtosis suggests it has lighter tails and a flatter peak than a normal distribution. Such distributions are termed “platykurtic.” They tend to have fewer extreme values.
    • Mesokurtic (Kurtosis ≈ 0): A distribution with kurtosis close to zero is similar in shape to a normal distribution in terms of its tailedness. Such distributions are termed “mesokurtic.”

 

In summary, while skewness provides insights about the direction and degree of asymmetry of a distribution, kurtosis provides information about the thickness of the tails and the peakedness of a distribution relative to a normal distribution. Both measures can offer additional insights into the distribution and behaviour of a dataset beyond measures of central tendency and dispersion.

 

Variability (or Dispersion):

This refers to measures that identify how spread out or scattered the values in a dataset are. The most common measures of variability are:

 

  • Range: It is the difference between the highest and lowest values in a dataset.

           Range = Highest value – Lowest value

  • Variance: Variance is the average of the squared differences from the mean.
  • Standard Deviation: It is the square root of the variance and provides a measure of the average distance between each data point and the mean.
  • Percentiles: Percentiles divide a dataset into 100 equal parts, representing the percentage of data points that fall below a given value.
    • For example, the 25th percentile (also known as the first quartile) represents the value below which 25% of the data points fall, and the 75th percentile (the third quartile) represents the value below which 75% of the data points fall.
  • Quartiles: Quartiles divide a dataset into four equal parts, with each part representing 25% of the data points. Quartiles are often used in conjunction with box plots and are particularly helpful for understanding the central tendency and spread of data.
    • First Quartile (Q1): This is the 25th percentile and represents the value below which 25% of the data points fall.
    • Second Quartile (Q2): This is the 50th percentile and represents the median, below which 50% of the data points fall.
    • Third Quartile (Q3): This is the 75th percentile and represents the value below which 75% of the data points fall.
    • Fourth Quartile: This represents the values above the third quartile.
  • Interquartile Range (IQR): It is the range between the first quartile (Q1, or the 25th percentile) and the third quartile (Q3, or the 75th percentile). It provides a measure of the middle 50% of data.

 

Understanding both the central tendency and variability of a dataset provides a comprehensive picture of its distribution. For example, two datasets can have the same mean but different standard deviations, indicating that one is more spread out than the other.

0% Complete