Data Science concepts in 5 minutes: Probability, Statistics and Bayes’ theorem.

Philip Pol
3 min readFeb 15, 2021


In this quick post I go through how Bayes’ theorem uses conditional probability and statistic to evaluate the probability of an event based on prior knowledge and conditions that might be relevant to that event. Many modern machine learning techniques rely on this formula so I have sprinkled links to other sources through out the post. Highly encourage you to check these links out as they’re great explanations of complex ideas — some are even interactive!

This quick 5 mins read will break down key concepts and show a quick example of how it works in practice!


Probability is a branch of mathematics and has many interpretations; we will be exploring probability theory today. Probability theory treats concepts in a strict mathematical manner by expressing it through a set of axioms (truths). “As far as we know we will assume that if we flip a fair coin 100 times, it will land 50% on heads and 50% on tails. *The law of large numbers has entered the chat*

Into cool interactive sites? click here for one on probability!


Statistics is a subfield of mathematics which refers to a collection of methods for working with data and using data to answer questions. Statistics helps us turn data into information that we then use to answer questions about sample data subsets.

..and as I write this, if there was any one sentence that explained machine learning modeling better, it was that haha.

Probability & Statistics = the foundation of Data Science

Quick recap: while probability deals with predicting the likelihood of future events, statistics helps us understand (interpret, analyze and summarize) past events via data. I like to call this “understanding” a model that we then use on future data sets and further analyze.

From this point on we will be focusing on conditional probability, and how we can use it in a formula.

Bayes’ theorem

Hands down one of my favorite mathematical formals — Bayes’ theorem describes the probability of an event based on prior knowledge of the conditions that might be relevant to the event. BIG neural network vibes right now! In fact, many modern machine learning techniques rely on this formula to categorize data — spam filters are a good example.

Let’s first take a look at the equation for the Bayes’ Theorem:

With this equation we can then isolate the variable we want to solve for and get to work!

In the case of a spam filter, it would look something like this:

Chris I has a phenomenal article on this that you can check out here.

I will admit that the first time you try to solve this equation it’s a bit of a mind bender, but just like any problem in life, if you break it into smaller pieces and keep-at-it you’re bound to solve it — and it will get easier every time!

Got your brain wanting more? Read these!

Best machine learning article: Click here! Probability Distributions: Click here! Conditional Probability, why not? Click here! Wait, there are two types of statistics?! Click here! Always test your hypothesis! Chick here!



Philip Pol

Developer (CRM/ERP/Process management software) C#.NET JavaScript & honestly whatever will get the job done!