Inferential Statistics

Sai Gowtham Babu
Fast-Feed.ai
Published in
3 min readMay 9, 2021

--

Collection of data is hard for a lot of times, we can only afford to collect data from samples, because it is too difficult or expensive to acquire data from the whole population.

So, what do we do now?

Inferential statistics use a sample of the data to make reasonable guesses about the larger population. This is called inference making and so it is popular by the name inferential statistics. While using inferential statistics, it’s important to use random and unbiased sampling methods. If your sample isn’t representative of your population, then you can’t build right inference on the population.

So, what are the different sampling techniques?

  1. Convinient sampling is a technique in which people are sampled based on the convinience of the user.
  2. Volunteer sampling is the case in which the people them selves comes and act as an volunteer. To become a sample of the whole population.
  3. Random sampling is a technique in which each part of the total population is taken into consideration.
  4. Stratified sampling is where the population is divided into small groups called strata and each member of strata are called as stratum.

Mostly, we use random and stratified sampling.

Sampling Error:

In Inferential Statistics, we use sample data which is always way smaller than the actual population and this usually creates sampling error which is the difference between the actual population (parameters) and the measured sample values(statistics).

  1. A measure of population data is called parameter. Eg: Population mean, standard deviation
  2. A measure of sample data is called statistic. Eg: sampled mean.

The above table represents the total population and sample popularion mean, variance and standard deviation. We use something called as a confidence interval to check the error between original and sample

Confidence Interval:-

Confidence interval refers to probability in which a population parameter will fall between a set of values or a certain proportion. This measures the degree of uncertainity.

The margin of error is calculated by multiplying the standard error of the mean and the z-score.

Margin of error = (z. σ)/ √n

Confidence interval is defined as:

Confidence interval interprets that we are 95% confident that the interval between x[lower bound] and y[upper bound] contains the value of the population parameter.

The below table provides us an understanding of number of sample items we need to take for a better confidence interval

Thanks for reading.

Please do follow for more posts on data science related topics.

Pictures credicts:-

1

2

3

--

--

Sai Gowtham Babu
Fast-Feed.ai

Machine Learning, Deep Learning and Data Science enthusiast, looking forward to do more research on these topics.