In order to better understand the nature of probabilistic decisions, consider the following court case of The People v. Collins, 1968. In this case, the robbery victim was unable to identify his assailant. All that the victim could recall was that the assailant was female with a blonde ponytail. In addition, he remembered that she fled the scene in a yellow convertible that was driven by an African American male who had a full beard. The suspect in the case fit the description given by the victim, so the question was “Could the jury be sure, beyond a reasonable doubt, that the woman on trial was the robber?” The evidence against her was as follows: She was blonde and often wore her hair in a ponytail; her codefendant friend was an African American male with a moustache, beard, and a yellow convertible. The attorney for the defense stressed the fact that the victim could not identify this woman as the woman who robbed him, and that therefore there should be reasonable doubt on the part of the jury.

The prosecutor, on the other hand, called an expert in probability theory who testified to the following: The probability of all of the above conditions (being blonde and often having a pony tail and having an African American male friend and his having a full beard, and his owning a yellow convertible) co-occurring when these characteristics are independent was 1 in 12 million. The expert further testified that the combination of characteristics was so unusual that the jury could in fact be certain “beyond a reasonable doubt” that the woman was the robber. The jury returned a verdict of “guilty” (Arkes & Hammond, 1986; Halpern, 1996).

As can be seen in the previous example, the legal system operates on probability and recognizes that we can never be absolutely certain when deciding whether an individual is guilty. Thus, the standard of “beyond a reasonable doubt” was established and jurors base their decisions on probability, whether they realize it or not. Most decisions that we make on a daily basis are, in fact, based on probabilities. Diagnoses made by doctors, verdicts produced by juries, decisions made by business executives regarding expansion and what products to carry, decisions regarding whether individuals are admitted to colleges, and most everyday decisions all involve using probability. In addition, all games of chance (for example, cards, horse racing, the stock market) involve probability.

If you think about it, there is very little in life that is certain. Therefore, most of our decisions are probabilistic and having a better understanding of probability will help you with those decisions. In addition, because probability also plays an important role in science, that is another important reason for us to have an understanding of it.


Probability is a measure of chance, and we shall propose general rules for calculating the probability of combinations of simple events.

Probability refers to the number of ways a particular outcome (event) can occur divided by the total number of outcomes (events).

The tossing of a coin is a simple example of a large class of games of chance with certain common features. Each game is decided on the results or outcomes of one or more trials, where a trial might be rolling a die, tossing a coin, or drawing a card from a pack. If the outcomes are distinguishable, we say they are mutually exclusive, and if they are the only possible results they are also said to be exhaustive. There may be more than one way of listing the outcomes. If we draw a card from the pack, the outcomes red, black are mutually exclusive and exhaustive, but so are the outcomes Spades, Hearts, Diamonds, and Clubs. The trials are also said to be independent if the result of one trial does not depend on the outcome of any previous trial, or any combination of previous trials.


Probabilities are often presented or expressed as proportions. Proportions vary between 0.0 and 1.0, where a probability of 0.0 means the event certainly will not occur and a probability of 1.0 means that the event is certain to occur. Thus, any probability between 0.0 and 1.0 represents an event with some degree of uncertainty to it. How much uncertainty depends on the exact probability with which we are dealing. For example, a probability close to 0.0 represents an event that is almost certain not to occur, and a probability close to 1.0 represents an event that is almost certain to occur. On the other hand, a probability of .50 represents maximum uncertainty.

Let’s start with a simplistic example of probability. What is the probability of getting a “head” when tossing a coin? In this example, we have to consider how many ways there are to get a “head” on a coin toss (there is only one way, the coin lands heads up) and how many possible outcomes there are (there are two possible outcomes, either a “head” or a “tail”). So, the probability of a “head” in a coin toss is:


Set Theory: A set is a collection of items or events. The items within a set are generally referred to as elements. A set can be an element of another set.

The Universal Set is the set of all possible elements. In probability, the universal set is the set of all possible outcomes of a trial (experiment).

Sample space: In order to avoid continually referring to particular games or experiments, it is useful to employ an abstract representation for a trial and its outcomes. Each distinguishable and indecomposable outcome, or simple event, is regarded as a point in a sample space, S. Thus, for the experiment of drawing a card from a pack the sample space contains 52 points. Every collection of simple events or set of points of S is called an event. The Sample space is an example of a universal set.

Intersection: The intersection of two sets A, B is the set of points of S which belong to both A and B and is an event. Thus the intersection of the sets (HH, TH, HT} and {HT, ΤΤ) is the set containing the single point HT. This event may be called ‘heads on the first coin and tails on the second coin’. It may happen that the two sets have no points in common, that is, their intersection is the empty set. Simply, an intersection of two sets is a set containing elements common to both sets.

Union: This is defined as the set which contains all the points of S which are in either A or B {or both). Thus, the union of the events {HH, TH, ΗΤ) and {HT, TT) , in the present example, is the event ( HH , TH, HT, TT} , which contains every point in the sample space and may reasonably be called ‘the certain event’. In other words a union of sets is a set that contains all unique elements of the sets.


Figure 5.1: A Venn Diagram representing the intersection of two sets.


The Venn diagram is a simple graphical tool used to represent set theory computations. In a Venn diagram, the sample space is represented by a rectangle and any event by a circle in this rectangle.


An ice-cream firm, before launching three new flavors, conducts a tasting with the assistance of 60 schoolboys. The findings were summarized as:

32 liked A

24 liked Β

31 liked C

10 liked A and Β

11 liked A and C

14 liked B and C

6 liked A and Β and C.

Since there are only three flavors. A, B, C to consider, the information provided can easily be grasped through a diagram. Can you draw a Venn diagram to represent this relationship?


For every event, E, in the sample space S we assign a non-negative number, called the probability of Ε denoted by Pr(E), so that the following axioms are satisfied.

(a) For every event E, Pr(E) > 0                                                                                  Non-negativity

(b) For the certain event, Pr(S) = 1.                                                                          Sum of all probabilities =1

(c) If E1, E2 are mutually exclusive events Pr(E1 U E2) = Pr(E1) + Pr (E2)       Additivity

(d) If E1, E2 are independent events Pr(E1 ∩ E2) = Pr(E1) x Pr (E2)                 Multiplication rule

Conditional Probability

Conditional probability measures that probability that an event will occur “given that” another event has occurred. For two events, A and B, the conditional probability of B given that A has occurred is denoted as Pr(B |A).  It is calculated as:

Bayes’ Theorem

Also called Bayes’ Rule or Bayes’ law, it simply relates our current belief based on prior evidence. Sounds vague and mysterious? Not to worry, this will become clearer when we start making deductions based on the probability estimates. For now what you need to remember is that Bayes’ law tell us something about the future based on what we have observed in the past.

The formula:

Can you try and derive this formula using the formula for conditional probability?

Next Lesson we look at the Random Variable and Probability Distributions


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s