Probability (special topic)[modifier | modifier le wikicode]

Probability forms a foundation for statistics. You might already be familiar with many aspects of probability, however, formalization of the concepts is new for most. This chapter aims to introduce probability on familiar terms using processes most people have seen before.

Defining probability (special topic)[modifier | modifier le wikicode]

A “die”, the singular of dice, is a cube with six faces numbered , , , , , and . What is the chance of getting when rolling a die?[probOf1] If the die is fair, then the chance of a is as good as the chance of any other number. Since there are six outcomes, the chance must be 1-in-6 or, equivalently, $1/6$ .

What is the chance of getting a or in the next roll?[probOf1Or2] and constitute two of the six equally likely possible outcomes, so the chance of getting one of these two outcomes must be $2/6=1/3$ .

What is the chance of getting either , , , , , or on the next roll?[probOf123456] 100%. The outcome must be one of these numbers.

What is the chance of not rolling a ?[probNot2] Since the chance of rolling a is $1/6$ or $16.{\bar {6}}\%$ , the chance of not rolling a must be $100\%-16.{\bar {6}}\%=83.{\bar {3}}\%$ or $5/6$ .

Alternatively, we could have noticed that not rolling a is the same as getting a , , , , or , which makes up five of the six equally likely outcomes and has probability $5/6$ .

Consider rolling two dice. If $1/6^{th}$ of the time the first die is a and $1/6^{th}$ of those times the second die is a , what is the chance of getting two s?[probOf2Ones] If $16.{\bar {6}}$ % of the time the first die is a and $1/6^{th}$ of those times the second die is also a , then the chance that both dice are is $(1/6)\times (1/6)$ or $1/36$ .

Probability[modifier | modifier le wikicode]

We use probability to build tools to describe and understand apparent randomness. We often frame probability in terms of a giving rise to an .

Roll a die	$\rightarrow$	, , , , , or
Flip a coin	$\rightarrow$	or

Rolling a die or flipping a coin is a seemingly random process and each gives rise to an outcome.

The of an outcome is the proportion of times the outcome would occur if we observed the random process an infinite number of times.

Probability is defined as a proportion, and it always takes values between 0 and 1 (inclusively). It may also be displayed as a percentage between 0% and 100%.

Probability can be illustrated by rolling a die many times. Let ${\hat {p}}_{n}$ be the proportion of outcomes that are after the first $n$ rolls. As the number of rolls increases, ${\hat {p}}_{n}$ will converge to the probability of rolling a , $p=1/6$ . Figure [dieProp] shows this convergence for 100,000 die rolls. The tendency of ${\hat {p}}_{n}$ to stabilize around $p$ is described by the .

Fichier:Ch probability/figures/dieProp/dieProp

caption The fraction of die rolls that are 1 at each stage in a simulation. The proportion tends to get closer to the probability

1/6\approx 0.167

as the number of rolls increases.

As more observations are collected, the proportion ${\hat {p}}_{n}$ of occurrences with a particular outcome converges to the probability $p$ of that outcome.

Occasionally the proportion will veer off from the probability and appear to defy the Law of Large Numbers, as ${\hat {p}}_{n}$ does many times in Figure [dieProp]. However, these deviations become smaller as the number of rolls increases.

Above we write $p$ as the probability of rolling a . We can also write this probability as Échec de l’analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{rolling a \resp{1}})\end{aligned}} As we become more comfortable with this notation, we will abbreviate it further. For instance, if it is clear that the process is “rolling a die”, we could abbreviate $P($ rolling a $)$ as $P($ $)$ .

[randomProcessExercise] Random processes include rolling a die and flipping a coin. (a) Think of another random process. (b) Describe all the possible outcomes of that process. For instance, rolling a die is a random process with possible outcomes .^[1]

What we think of as random processes are not necessarily random, but they may just be too difficult to understand exactly. The fourth example in the footnote solution to Guided Practice [randomProcessExercise] suggests a roommate’s behavior is a random process. However, even if a roommate’s behavior is not truly random, modeling her behavior as a random process can still be useful.

It can be helpful to model a process as random even if it is not truly random.

Disjoint or mutually exclusive outcomes[modifier | modifier le wikicode]

Two outcomes are called or if they cannot both happen. For instance, if we roll a die, the outcomes and are disjoint since they cannot both occur. On the other hand, the outcomes and “rolling an odd number” are not disjoint since both occur if the outcome of the roll is a . The terms disjoint and mutually exclusive are equivalent and interchangeable.

Calculating the probability of disjoint outcomes is easy. When rolling a die, the outcomes and are disjoint, and we compute the probability that one of these outcomes will occur by adding their separate probabilities: Échec de l’analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{\resp{1} or \resp{2}}) = P(\text{\resp{1}})+P(\text{\resp{2}}) = 1/6 + 1/6 = 1/3\end{aligned}} What about the probability of rolling a , , , , , or ? Here again, all of the outcomes are disjoint so we add the probabilities: Échec de l’analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} &&P(\text{\resp{1} or \resp{2} or \resp{3} or \resp{4} or \resp{5} or \resp{6}}) \\ &&\quad= P(\text{\resp{1}})+P(\text{\resp{2}})+P(\text{\resp{3}})+P(\text{\resp{4}})+P(\text{\resp{5}})+P(\text{\resp{6}}) \\ &&\quad= 1/6 + 1/6 + 1/6 + 1/6 + 1/6 + 1/6 = 1.\end{aligned}} The guarantees the accuracy of this approach when the outcomes are disjoint.

If $A_{1}$ and $A_{2}$ represent two disjoint outcomes, then the probability that one of them occurs is given by ${\begin{aligned}P(A_{1}{\text{ or }}A_{2})=P(A_{1})+P(A_{2})\end{aligned}}$ If there are many disjoint outcomes $A_{1}$ , ..., $A_{k}$ , then the probability that one of these outcomes will occur is ${\begin{aligned}P(A_{1})+P(A_{2})+\cdots +P(A_{k})\end{aligned}}$

We are interested in the probability of rolling a , , or . (a) Explain why the outcomes , , and are disjoint. (b) Apply the Addition Rule for disjoint outcomes to determine $P($ or or $)$ .^[2]

In the data set in Chapter [introductionToData], the variable described whether no number (labeled ), only one or more small numbers (), or whether at least one big number appeared in an email (). Of the 3,921 emails, 549 had no numbers, 2,827 had only one or more small numbers, and 545 had at least one big number. (a) Are the outcomes , , and disjoint? (b) Determine the proportion of emails with value and separately. (c) Use the Addition Rule for disjoint outcomes to compute the probability a randomly selected email from the data set has a number in it, small or big.^[3]

Statisticians rarely work with individual outcomes and instead consider or of outcomes. Let $A$ represent the event where a die roll results in or and $B$ represent the event that the die roll is a or a . We write $A$ as the set of outcomes $\{$ , $\}$ and $B=\{$ , $\}$ . These sets are commonly called . Because $A$ and $B$ have no elements in common, they are disjoint events. $A$ and $B$ are represented in Figure [disjointSets].

Fichier:Ch probability/figures/disjointSets/disjointSets

caption Three events,

A

,

B

, and

D

, consist of outcomes from rolling a die.

A

and

B

are disjoint since they do not have any outcomes in common.

The Addition Rule applies to both disjoint outcomes and disjoint events. The probability that one of the disjoint events $A$ or $B$ occurs is the sum of the separate probabilities: ${\begin{aligned}P(A{\text{ or }}B)=P(A)+P(B)=1/3+1/3=2/3\end{aligned}}$

(a) Verify the probability of event $A$ , $P(A)$ , is $1/3$ using the Addition Rule. (b) Do the same for event $B$ .^[4]

[exerExaminingDisjointSetsABD] (a) Using Figure [disjointSets] as a reference, what outcomes are represented by event $D$ ? (b) Are events $B$ and $D$ disjoint? (c) Are events $A$ and $D$ disjoint?^[5]

In Guided Practice [exerExaminingDisjointSetsABD], you confirmed $B$ and $D$ from Figure [disjointSets] are disjoint. Compute the probability that event $B$ or event $D$ occurs.^[6]

Probabilities when events are not disjoint[modifier | modifier le wikicode]

Let’s consider calculations for two events that are not disjoint in the context of a , represented in Table [deckOfCards]. If you are unfamiliar with the cards in a regular deck, please see the footnote.^[7]

Representations of the 52 unique cards in a deck.

(a) What is the probability that a randomly selected card is a diamond? (b) What is the probability that a randomly selected card is a face card?^[8]

are useful when outcomes can be categorized as “in” or “out” for two or three variables, attributes, or random processes. The Venn diagram in Figure [cardsDiamondFaceVenn] uses a circle to represent diamonds and another to represent face cards. If a card is both a diamond and a face card, it falls into the intersection of the circles. If it is a diamond but not a face card, it will be in part of the left circle that is not in the right circle (and so on). The total number of cards that are diamonds is given by the total number of cards in the diamonds circle: $10+3=13$ . The probabilities are also shown (e.g. $10/52=0.1923$ ).

Fichier:Ch probability/figures/cardsDiamondFaceVenn/cardsDiamondFaceVenn

caption A Venn diagram for diamonds and face cards.

Let $A$ represent the event that a randomly selected card is a diamond and $B$ represent the event that it is a face card. How do we compute $P(A$ or $B)$ ? Events $A$ and $B$ are not disjoint – the cards $J\diamondsuit$ , $Q\diamondsuit$ , and $K\diamondsuit$ fall into both categories – so we cannot use the Addition Rule for disjoint events. Instead we use the Venn diagram. We start by adding the probabilities of the two events: Échec de l’analyse (fonction inconnue « \label »): {\displaystyle \begin{aligned} P(A) + P(B) = P({\color{redcards}\diamondsuit}) + P(\text{face card}) = 13/52 + 12/52 \label{overCountFaceDiamond}\end{aligned}} However, the three cards that are in both events were counted twice, once in each probability. We must correct this double counting: Échec de l’analyse (fonction inconnue « \notag »): {\displaystyle \begin{aligned} P(A\text{ or } B) &=&P({\color{redcards}\diamondsuit}\text{ or face card}) \notag \\ &=& P({\color{redcards}\diamondsuit}) + P(\text{face card}) - P({\color{redcards}\diamondsuit}\text{ and face card}) \label{diamondFace} \\ &=& 13/52 + 12/52 - 3/52 \notag \\ &=& 22/52 = 11/26 \notag\end{aligned}} Equation ([diamondFace]) is an example of the .

If $A$ and $B$ are any two events, disjoint or not, then the probability that at least one of them will occur is Échec de l’analyse (fonction inconnue « \label »): {\displaystyle \begin{aligned} P(A\text{ or }B) = P(A) + P(B) - P(A\text{ and }B) \label{generalAdditionRule}\end{aligned}} where $P(A$ and $B)$ is the probability that both events occur.

When we write “or” in statistics, we mean “and/or” unless we explicitly state otherwise. Thus, $A$ or $B$ occurs means $A$ , $B$ , or both $A$ and $B$ occur.

(a) If $A$ and $B$ are disjoint, describe why this implies $P(A$ and $B)=0$ . (b) Using part (a), verify that the General Addition Rule simplifies to the simpler Addition Rule for disjoint events if $A$ and $B$ are disjoint.^[9]

[emailSpamNumberVennExer] In the data set with 3,921 emails, 367 were spam, 2,827 contained some small numbers but no big numbers, and 168 had both characteristics. Create a Venn diagram for this setup.^[10]

(a) Use your Venn diagram from Guided Practice [emailSpamNumberVennExer] to determine the probability a randomly drawn email from the data set is spam and had small numbers (but not big numbers). (b) What is the probability that the email had either of these attributes?^[11]

Probability distributions[modifier | modifier le wikicode]

A is a table of all disjoint outcomes and their associated probabilities. Table [diceProb] shows the probability distribution for the sum of two dice.

l ccc ccc ccc cc

Dice sum

& 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 & 12
Probability & ${\frac {1}{36}}$ & ${\frac {2}{36}}$ & ${\frac {3}{36}}$ & ${\frac {4}{36}}$ & ${\frac {5}{36}}$ & ${\frac {6}{36}}$ & ${\frac {5}{36}}$ & ${\frac {4}{36}}$ & ${\frac {3}{36}}$ & ${\frac {2}{36}}$ & ${\frac {1}{36}}$

A probability distribution is a list of the possible outcomes with corresponding probabilities that satisfies three rules:

The outcomes listed must be disjoint.
Each probability must be between 0 and 1.
The probabilities must total 1.

[usHouseholdIncomeDistsExercise] Table [usHouseholdIncomeDists] suggests three distributions for household income in the United States. Only one is correct. Which one must it be? What is wrong with the other two?^[12]

r | rr rr Income range ($1000s) & 0-25 & 25-50 & 50-100 & 100+
(a)

& 0.18 & 0.39 & 0.33 & 0.16
(b) & 0.38 & -0.27 & 0.52 & 0.37
(c)

& 0.28 & 0.27 & 0.29 & 0.16

Chapter [introductionToData] emphasized the importance of plotting data to provide quick summaries. Probability distributions can also be summarized in a bar plot. For instance, the distribution of US household incomes is shown in Figure [usHouseholdIncomeDistBar] as a bar plot. The probability distribution for the sum of two dice is shown in Table [diceProb] and plotted in Figure [diceSumDist].

Fichier:Ch probability/figures/usHouseholdIncomeDistBar/usHouseholdIncomeDistBar

caption The probability distribution of US household income.

Fichier:Ch probability/figures/diceSumDist/diceSumDist

caption The probability distribution of the sum of two dice.

In these bar plots, the bar heights represent the probabilities of outcomes. If the outcomes are numerical and discrete, it is usually (visually) convenient to make a bar plot that resembles a histogram, as in the case of the sum of two dice. Another example of plotting the bars at their respective locations is shown in Figure [bookCostDist] on page .

Complement of an event[modifier | modifier le wikicode]

Rolling a die produces a value in the set $\{$ , , , , , $\}$ . This set of all possible outcomes is called the ( $S$ ) for rolling a die. We often use the sample space to examine the scenario where an event does not occur.

Let $D=\{$ , $\}$ represent the event that the outcome of a die roll is or . Then the of $D$ represents all outcomes in our sample space that are not in $D$ , which is denoted by $D^{c}=\{$ , , , $\}$ . That is, $D^{c}$ is the set of all possible outcomes not already included in $D$ . Figure [complementOfD] shows the relationship between $D$ , $D^{c}$ , and the sample space $S$ .

Fichier:Ch probability/figures/complementOfD/complementOfD

caption Event

D=\{

,

\}

and its complement,

D^{c}=\{

, , ,

\}

.

S

represents the sample space, which is the set of all possible events.

(a) Compute $P(D^{c})=P($ rolling a , , , or $)$ . (b) What is $P(D)+P(D^{c})$ ?^[13]

Events $A=\{$ , $\}$ and $B=\{$ , $\}$ are shown in Figure [disjointSets] on page . (a) Write out what $A^{c}$ and $B^{c}$ represent. (b) Compute $P(A^{c})$ and $P(B^{c})$ . (c) Compute $P(A)+P(A^{c})$ and $P(B)+P(B^{c})$ .^[14]

A complement of an event $A$ is constructed to have two very important properties: (i) every possible outcome not in $A$ is in $A^{c}$ , and (ii) $A$ and $A^{c}$ are disjoint. Property (i) implies Échec de l’analyse (fonction inconnue « \label »): {\displaystyle \begin{aligned} P(A\text{ or }A^c) = 1 \label{complementSumTo1}\end{aligned}} That is, if the outcome is not in $A$ , it must be represented in $A^{c}$ . We use the Addition Rule for disjoint events to apply Property (ii): Échec de l’analyse (fonction inconnue « \label »): {\displaystyle \begin{aligned} P(A\text{ or }A^c) = P(A) + P(A^c) \label{complementDisjointEquation}\end{aligned}} Combining Equations ([complementSumTo1]) and ([complementDisjointEquation]) yields a very useful relationship between the probability of an event and its complement.

The complement of event $A$ is denoted $A^{c}$ , and $A^{c}$ represents all outcomes not in $A$ . $A$ and $A^{c}$ are mathematically related:

Échec de l’analyse (fonction inconnue « \label »): {\displaystyle \begin{aligned} \label{complement} P(A) + P(A^c) = 1, \quad\text{i.e.}\quad P(A) = 1-P(A^c)\end{aligned}}

In simple examples, computing $A$ or $A^{c}$ is feasible in a few steps. However, using the complement can save a lot of time as problems grow in complexity.

Let $A$ represent the event where we roll two dice and their total is less than . (a) What does the event $A^{c}$ represent? (b) Determine $P(A^{c})$ from Table [diceProb] on page . (c) Determine $P(A)$ .^[15]

Consider again the probabilities from Table [diceProb] and rolling two dice. Find the following probabilities: (a) The sum of the dice is not . (b) The sum is at least . That is, determine the probability of the event $B=\{$ , , ..., $\}$ . (c) The sum is no more than . That is, determine the probability of the event $D=\{$ , , ..., $\}$ .^[16]

Independence[modifier | modifier le wikicode]

Just as variables and observations can be independent, random processes can be independent, too. Two processes are if knowing the outcome of one provides no useful information about the outcome of the other. For instance, flipping a coin and rolling a die are two independent processes – knowing the coin was heads does not help determine the outcome of a die roll. On the other hand, stock prices usually move up or down together, so they are not independent.

Example [probOf2Ones] provides a basic example of two independent processes: rolling two dice. We want to determine the probability that both will be . Suppose one of the dice is red and the other white. If the outcome of the red die is a , it provides no information about the outcome of the white die. We first encountered this same question in Example [probOf2Ones] (page ), where we calculated the probability using the following reasoning: $1/6^{th}$ of the time the red die is a , and $1/6^{th}$ of those times the white die will also be . This is illustrated in Figure [indepForRollingTwo1s]. Because the rolls are independent, the probabilities of the corresponding outcomes can be multiplied to get the final answer: $(1/6)\times (1/6)=1/36$ . This can be generalized to many independent processes.

Fichier:Ch probability/figures/indepForRollingTwo1s/indepForRollingTwo1s

caption

1/6^{th}

of the time, the first roll is a . Then

1/6^{th}

of those times, the second roll will also be a .

What if there was also a blue die independent of the other two? What is the probability of rolling the three dice and getting all s?[threeDice] The same logic applies from Example [probOf2Ones]. If $1/36^{th}$ of the time the white and red dice are both , then $1/6^{th}$ of those times the blue die will also be , so multiply: Échec de l’analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(white=\text{\small\resp{1} and } red=\text{\small\resp{1} and } blue=\text{\small\resp{1}}) &= P(white=\text{\small\resp{1}})\times P(red=\text{\small\resp{1}})\times P(blue=\text{\small\resp{1}}) \\ &= (1/6)\times (1/6)\times (1/6) = 1/216\end{aligned}}

Example [threeDice] illustrates what is called the Multiplication Rule for independent processes.

If $A$ and $B$ represent events from two different and independent processes, then the probability that both $A$ and $B$ occur can be calculated as the product of their separate probabilities:

Échec de l’analyse (fonction inconnue « \label »): {\displaystyle \begin{aligned} \label{eqForIndependentEvents} P(A \text{ and }B) = P(A) \times P(B)\end{aligned}}

Similarly, if there are $k$ events $A_{1}$ , ..., $A_{k}$ from $k$ independent processes, then the probability they all occur is

${\begin{aligned}P(A_{1})\times P(A_{2})\times \cdots \times P(A_{k})\end{aligned}}$

[ex2Handedness] About 9% of people are left-handed. Suppose 2 people are selected at random from the U.S. population. Because the sample size of 2 is very small relative to the population, it is reasonable to assume these two people are independent. (a) What is the probability that both are left-handed? (b) What is the probability that both are right-handed?^[17]

[ex5Handedness] Suppose 5 people are selected at random.^[18]

What is the probability that all are right-handed?
What is the probability that all are left-handed?
What is the probability that not all of the people are right-handed?

Suppose the variables and are independent, i.e. knowing someone’s provides no useful information about their and vice-versa. Then we can compute whether a randomly selected person is right-handed and female^[19] using the Multiplication Rule: ${\begin{aligned}P({\text{right-handed and female}})&=&P({\text{right-handed}})\times P({\text{female}})\\&=&0.91\times 0.50=0.455\end{aligned}}$

Three people are selected at random.^[20]

What is the probability that the first person is male and right-handed?
What is the probability that the first two people are male and right-handed?.
What is the probability that the third person is female and left-handed?
What is the probability that the first two people are male and right-handed and the third person is female and left-handed?

Sometimes we wonder if one outcome provides useful information about another outcome. The question we are asking is, are the occurrences of the two events independent? We say that two events $A$ and $B$ are independent if they satisfy Equation .

If we shuffle up a deck of cards and draw one, is the event that the card is a heart independent of the event that the card is an ace? The probability the card is a heart is $1/4$ and the probability that it is an ace is $1/13$ . The probability the card is the ace of hearts is $1/52$ . We check whether Equation [eqForIndependentEvents] is satisfied: ${\begin{aligned}P({\color {redcards}\heartsuit })\times P({\text{ace}})={\frac {1}{4}}\times {\frac {1}{13}}={\frac {1}{52}}=P({\color {redcards}\heartsuit }{\text{ and ace}})\end{aligned}}$ Because the equation holds, the event that the card is a heart and the event that the card is an ace are independent events.

Conditional probability (special topic)[modifier | modifier le wikicode]

The data set contains a sample of 792 cases with two variables, and , and is summarized in Table [contTableOfParStCollege].^[21] The variable is either or , where the label means the teen went to college immediately after high school. The variable takes the value if at least one parent of the teenager completed a college degree.

ll rr r rr && &

&
&& & & Total
& & 231 & 214 & 445
[0pt] &

& 49 & 298 & 347
& Total & 280 & 512 & 792

Fichier:Ch probability/figures/familyCollegeVenn/familyCollegeVenn

caption A Venn diagram using boxes for the data set.

If at least one parent of a teenager completed a college degree, what is the chance the teenager attended college right after high school? We can estimate this probability using the data. Of the 280 cases in this data set where takes value , 231 represent cases where the variable takes value : Échec de l’analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{\var{teen} \resp{college} given \var{parents} \resp{degree}}) = \frac{231}{280} = 0.825\end{aligned}}

A teenager is randomly selected from the sample and she did not attend college right after high school. What is the probability that at least one of her parents has a college degree?[collegeProbOfParentsGivenStudentNot] If the teenager did not attend, then she is one of the 347 teens in the second row. Of these 347 teens, 49 had at least one parent who got a college degree: Échec de l’analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{\var{parents} \resp{degree} given \var{teen} \resp{not}}) = \frac{49}{347} = 0.141\end{aligned}}

Marginal and joint probabilities[modifier | modifier le wikicode]

Table [contTableOfParStCollege] includes row and column totals for each variable separately in the data set. These totals represent for the sample, which are the probabilities based on a single variable without regard to any other variables. For instance, a probability based solely on the variable is a marginal probability: Échec de l’analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{\var{teen} \resp{college}}) = \frac{445}{792} = 0.56\end{aligned}} A probability of outcomes for two or more variables or processes is called a : Échec de l’analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{\var{teen} \resp{college} and \var{parents} \resp{not}}) = \frac{214}{792} = 0.27\end{aligned}} It is common to substitute a comma for “and” in a joint probability, although either is acceptable. That is,

Échec de l’analyse (erreur de syntaxe): {\displaystyle P(\text{\var{teen} \resp{college}, \var{parents} \resp{not}})}
means the same thing as
Échec de l’analyse (erreur de syntaxe): {\displaystyle P(\text{\var{teen} \resp{college} and \var{parents} \resp{not}})}

If a probability is based on a single variable, it is a '. The probability of outcomes for two or more variables or processes is called a '.

We use to summarize joint probabilities for the sample. These proportions are computed by dividing each count in Table [contTableOfParStCollege] by the table’s total, 792, to obtain the proportions in Table [familyCollegeProbTable]. The joint probability distribution of the and variables is shown in Table [familyCollegeDistribution].

l rr r & : & : & Total

& 0.29 & 0.27 & 0.56

& 0.06 & 0.38 & 0.44
Total & 0.35 & 0.65 & 1.00

Joint probability distribution for the data set.
Joint outcome	Probability
and	0.29
and	0.06
and	0.27
and	0.38
Total	1.00

Verify Table [familyCollegeDistribution] represents a probability distribution: events are disjoint, all probabilities are non-negative, and the probabilities sum to 1.^[22]

We can compute marginal probabilities using joint probabilities in simple cases. For example, the probability a random teenager from the study went to college is found by summing the outcomes where takes value : Échec de l’analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{\underline{\color{black}\var{teen} \resp{college}}}) &= P(\text{\var{parents} \resp{degree} and \underline{\color{black}\var{teen} \resp{college}}}) \\ & \quad \quad + P(\text{\var{parents} \resp{not} and \underline{\color{black}\var{teen} \resp{college}}}) \\ &= 0.29 + 0.27 \\ &= 0.56\end{aligned}}

Defining conditional probability[modifier | modifier le wikicode]

There is some connection between education level of parents and of the teenager: a college degree by a parent is associated with college attendance of the teenager. In this section, we discuss how to use information about associations between two variables to improve probability estimation.

The probability that a random teenager from the study attended college is 0.56. Could we update this probability if we knew that one of the teen’s parents has a college degree? Absolutely. To do so, we limit our view to only those 280 cases where a parent has a college degree and look at the fraction where the teenager attended college: Échec de l’analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{\var{teen} \resp{college} given \var{parents} \resp{degree}}) = \frac{231}{280} = 0.825\end{aligned}} We call this a because we computed the probability under a condition: a parent has a college degree. There are two parts to a conditional probability, the and the . It is useful to think of the condition as information we know to be true, and this information usually can be described as a known outcome or event.

We separate the text inside our probability notation into the outcome of interest and the condition: Échec de l’analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} && P(\text{\var{teen} \resp{college} given \var{parents} \resp{degree}}) \notag \\ && = P(\text{\var{teen} \resp{college}}\ |\ \text{\var{parents} \resp{degree}}) = \frac{231}{280} = 0.825 \label{probStudentUsedIfParentsUsedInFormalNotation}\end{aligned}} The vertical bar “ $|$ ” is read as given.

In Equation , we computed the probability a teen attended college based on the condition that at least one parent has a college degree as a fraction: Échec de l’analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} && P(\text{\var{teen} \resp{college}}\ |\ \text{\var{parents} \resp{degree}}) \notag \\ &&\quad = \frac{\text{\# cases where \var{teen} \resp{college} and \var{parents} \resp{degree}}}{\text{\# cases where \var{parents} \resp{degree}}} \label{ratioOfBothToRatioOfConditionalForParentsAndStudent} \\ &&\quad = \frac{231}{280} = 0.825 \notag\end{aligned}} We considered only those cases that met the condition, , and then we computed the ratio of those cases that satisfied our outcome of interest, the teenager attended college.

Frequently, marginal and joint probabilities are provided instead of count data. For example, disease rates are commonly listed in percentages rather than in a count format. We would like to be able to compute conditional probabilities even when no counts are available, and we use Equation as a template to understand this technique.

We considered only those cases that satisfied the condition, . Of these cases, the conditional probability was the fraction who represented the outcome of interest, . Suppose we were provided only the information in Table [familyCollegeProbTable], i.e. only probability data. Then if we took a sample of 1000 people, we would anticipate about 35% or $0.35\times 1000=350$ would meet the information criterion ( ). Similarly, we would expect about 29% or $0.29\times 1000=290$ to meet both the information criteria and represent our outcome of interest. Then the conditional probability can be computed as Échec de l’analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} &P(\text{\var{teen} \resp{college}}\ |\ \text{\var{parents} \resp{degree}}) \notag \\ &= \frac{\text{\# (\var{teen} \resp{college} and \var{parents} \resp{degree})}}{\text{\# (\var{parents} \resp{degree})}} \notag \\ &= \frac{290}{350} = \frac{0.29}{0.35} = 0.829\quad\text{(different from 0.825 due to rounding error)} \label{stUserPUsedHypSampSize}\end{aligned}} In Equation ([stUserPUsedHypSampSize]), we examine exactly the fraction of two probabilities, 0.29 and 0.35, which we can write as Échec de l’analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{\var{teen} \resp{college} and \var{parents} \resp{degree}}) \quad\text{and}\quad P(\text{\var{parents} \resp{degree}}).\end{aligned}} The fraction of these probabilities is an example of the general formula for conditional probability.

The conditional probability of the outcome of interest $A$ given condition $B$ is computed as the following: Échec de l’analyse (fonction inconnue « \label »): {\displaystyle \begin{aligned} P(A | B) = \frac{P(A\text{ and }B)}{P(B)} \label{condProbEq}\end{aligned}}

[familyCollegeProbOfParentsEqualNotGivenTeen] (a) Write out the following statement in conditional probability notation: “The probability a random case where neither parent has a college degree if it is known that the teenager didn’t attend college right after high school”. Notice that the condition is now based on the teenager, not the parent.
(b) Determine the probability from part (a). Table may be helpful.^[23]

[whyCondProbSumTo1] (a) Determine the probability that one of the parents has a college degree if it is known the teenager did not attend college.
(b) Using the answers from part (a) and Guided Practice [familyCollegeProbOfParentsEqualNotGivenTeen](b), compute
Échec de l’analyse (erreur de syntaxe): {\displaystyle P(\text{\var{parents} \resp{degree}}\ |\ \text{\var{teen} \resp{not}}) \ + \ P(\text{\var{parents} \resp{not}}\ |\ \text{\var{teen} \resp{not}})}
(c) Provide an intuitive argument to explain why the sum in (b) is 1.^[24]

The data indicate there is an association between parents having a college degree and their teenager attending college. Does this mean the parents’ college degree(s) caused the teenager to go to college?^[25]

Smallpox in Boston, 1721[modifier | modifier le wikicode]

The data set provides a sample of 6,224 individuals from the year 1721 who were exposed to smallpox in Boston.^[26] Doctors at the time believed that inoculation, which involves exposing a person to the disease in a controlled form, could reduce the likelihood of death.

Each case represents one person with two variables: and . The variable takes two levels: or , indicating whether the person was inoculated or not. The variable has outcomes or . These data are summarized in Tables [smallpoxContingencyTable] and [smallpoxProbabilityTable].

ll rr r & & &
& & & & Total
& & 238 & 5136 & 5374
[0pt] &

& 6 & 844 & 850
& Total & 244 & 5980 & 6224

ll rr r & & &
& & & & Total
& & 0.0382 & 0.8252 & 0.8634
[0pt] &

& 0.0010 & 0.1356 & 0.1366
& Total & 0.0392 & 0.9608 & 1.0000

[probDiedIfNotInoculated] Write out, in formal notation, the probability a randomly selected person who was not inoculated died from smallpox, and find this ^[27]

Determine the probability that an inoculated person died from smallpox. How does this result compare with the result of Guided Practice [probDiedIfNotInoculated]?^[28]

[SmallpoxInoculationObsExpExercise] The people of Boston self-selected whether or not to be inoculated. (a) Is this study observational or was this an experiment? (b) Can we infer any causal connection using these data? (c) What are some potential confounding variables that might influence whether someone or and also affect whether that person was inoculated?^[29]

General multiplication rule[modifier | modifier le wikicode]

Section [probabilityIndependence] introduced the Multiplication Rule for independent processes. Here we provide the for events that might not be independent.

If $A$ and $B$ represent two outcomes or events, then

${\begin{aligned}P(A{\text{ and }}B)=P(A|B)\times P(B)\end{aligned}}$

It is useful to think of $A$ as the outcome of interest and $B$ as the condition.

This General Multiplication Rule is simply a rearrangement of the definition for conditional probability in Equation ([condProbEq]) on page .

Consider the data set. Suppose we are given only two pieces of information: 96.08% of residents were not inoculated, and 85.88% of the residents who were not inoculated ended up surviving. How could we compute the probability that a resident was not inoculated and lived? We will compute our answer using the General Multiplication Rule and then verify it using Table [smallpoxProbabilityTable]. We want to determine Échec de l’analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{\var{result} = \resp{lived} and \var{inoculated} = \resp{no}})\end{aligned}} and we are given that Échec de l’analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{\var{result} = \resp{lived} }|\text{ \var{inoculated} = \resp{no}})=0.8588 \\ P(\text{\var{inoculated} = \resp{no}})=0.9608\end{aligned}} Among the 96.08% of people who were not inoculated, 85.88% survived: Échec de l’analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{\var{result} = \resp{lived} and \var{inoculated} = \resp{no}}) = 0.8588\times 0.9608 = 0.8251\end{aligned}} This is equivalent to the General Multiplication Rule. We can confirm this probability in Table [smallpoxProbabilityTable] at the intersection of and (with a small rounding error).

Use $P($ = $)=0.0392$ and $P($ = $|$ = $)=0.9754$ to determine the probability that a person was both inoculated and lived.^[30]

If 97.54% of the people who were inoculated lived, what proportion of inoculated people must have died?^[31]

Let $A_{1}$ , ..., $A_{k}$ represent all the disjoint outcomes for a variable or process. Then if $B$ is an event, possibly for another variable or process, we have:

${\begin{aligned}P(A_{1}|B)+\cdots +P(A_{k}|B)=1\end{aligned}}$

The rule for complements also holds when an event and its complement are conditioned on the same information:

${\begin{aligned}P(A|B)=1-P(A^{c}|B)\end{aligned}}$

Based on the probabilities computed above, does it appear that inoculation is effective at reducing the risk of death from smallpox?^[32]

Independence considerations in conditional probability[modifier | modifier le wikicode]

If two events are independent, then knowing the outcome of one should provide no information about the other. We can show this is mathematically true using conditional probabilities.

[condProbOfRollingA1AfterOne1] Let $X$ and $Y$ represent the outcomes of rolling two dice.^[33]

What is the probability that the first die, $X$ , is ?
What is the probability that both $X$ and $Y$ are ?
Use the formula for conditional probability to compute $P(Y=$ $\ |\ X=$ $)$ .
What is $P(Y=1)$ ? Is this different from the answer from part (c)? Explain.

We can show in Guided Practice [condProbOfRollingA1AfterOne1](c) that the conditioning information has no influence by using the Multiplication Rule for independence processes: Échec de l’analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(Y=\text{\resp{1}}\ |\ X=\text{\resp{1}}) &=& \frac{P(Y=\text{\resp{1} and }X=\text{\resp{1}})}{P(X=\text{\resp{1}})} \\ &=& \frac{P(Y=\text{\resp{1}})\times \color{oiGB}P(X=\text{\resp{1}})}{\color{oiGB}P(X=\text{\resp{1}})} \\ &=& P(Y=\text{\resp{1}}) \\\end{aligned}}

Ron is watching a roulette table in a casino and notices that the last five outcomes were . He figures that the chances of getting six times in a row is very small (about $1/64$ ) and puts his paycheck on red. What is wrong with his reasoning?^[34]

Tree diagrams[modifier | modifier le wikicode]

are a tool to organize outcomes and probabilities around the structure of the data. They are most useful when two or more processes occur in a sequence and each process is conditioned on its predecessors.

The data fit this description. We see the population as split by : and . Following this split, survival rates were observed for each group. This structure is reflected in the shown in Figure [smallpoxTreeDiagram]. The first branch for is said to be the branch while the other branches are .

Fichier:Ch probability/figures/smallpoxTreeDiagram/smallpoxTreeDiagram

caption A tree diagram of the data set.

Tree diagrams are annotated with marginal and conditional probabilities, as shown in Figure [smallpoxTreeDiagram]. This tree diagram splits the smallpox data by into the and groups with respective marginal probabilities 0.0392 and 0.9608. The secondary branches are conditioned on the first, so we assign conditional probabilities to these branches. For example, the top branch in Figure [smallpoxTreeDiagram] is the probability that = conditioned on the information that = . We may (and usually do) construct joint probabilities at the end of each branch in our tree by multiplying the numbers we come across as we move from left to right. These joint probabilities are computed using the General Multiplication Rule: Échec de l’analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} && P(\text{\var{inoculated} = \resp{yes} and \var{result} = \resp{lived}}) \\ &&\quad = P(\text{\var{inoculated} = \resp{yes}})\times P(\text{\var{result} = \resp{lived}}|\text{\var{inoculated} = \resp{yes}}) \\ &&\quad = 0.0392\times 0.9754=0.0382\end{aligned}}

Consider the midterm and final for a statistics class. Suppose 13% of students earned an on the midterm. Of those students who earned an on the midterm, 47% received an on the final, and 11% of the students who earned lower than an on the midterm received an on the final. You randomly pick up a final exam and notice the student received an . What is the probability that this student earned an on the midterm? [exerciseForTreeDiagramOfStudentGettingAOnMidtermGivenThatSheGotAOnFinal] The end-goal is to find Échec de l’analyse (erreur de syntaxe): {\displaystyle P(\text{\var{midterm} = \resp{A}} | \text{\var{final} = \resp{A}})} . To calculate this conditional probability, we need the following probabilities: Échec de l’analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{\var{midterm} = \resp{A} and \var{final} = \resp{A}}) \qquad\text{and}\qquad P(\text{\var{final} = \resp{A}})\end{aligned}} However, this information is not provided, and it is not obvious how to calculate these probabilities. Since we aren’t sure how to proceed, it is useful to organize the information into a tree diagram, as shown in Figure [testTree]. When constructing a tree diagram, variables provided with marginal probabilities are often used to create the tree’s primary branches; in this case, the marginal probabilities are provided for midterm grades. The final grades, which correspond to the conditional probabilities provided, will be shown on the secondary branches.

Fichier:Ch probability/figures/testTree/testTree

caption A tree diagram describing the and variables.

With the tree diagram constructed, we may compute the required probabilities: Échec de l’analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} &&P(\text{\var{midterm} = \resp{A} and \var{final} = \resp{A}}) = 0.0611 \\ &&P(\text{\underline{\color{black}\var{final} = \resp{A}}}) \\ && \quad= P(\text{\var{midterm} = \resp{other} and \underline{\color{black}\var{final} = \resp{A}}}) + P(\text{\var{midterm} = \resp{A} and \underline{\color{black}\var{final} = \resp{A}}}) \\ && \quad= 0.0957 + 0.0611 = 0.1568\end{aligned}} The marginal probability, $P($ = $)$ , was calculated by adding up all the joint probabilities on the right side of the tree that correspond to = . We may now finally take the ratio of the two probabilities: Échec de l’analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{\var{midterm} = \resp{A}} | \text{\var{final} = \resp{A}}) &=& \frac{P(\text{\var{midterm} = \resp{A} and \var{final} = \resp{A}})}{P(\text{\var{final} = \resp{A}})} \\ &=& \frac{0.0611}{0.1568} = 0.3897\end{aligned}} The probability the student also earned an A on the midterm is about 0.39.

After an introductory statistics course, 78% of students can successfully construct tree diagrams. Of those who can construct tree diagrams, 97% passed, while only 57% of those students who could not construct tree diagrams passed. (a) Organize this information into a tree diagram. (b) What is the probability that a randomly selected student passed? (c) Compute the probability a student is able to construct a tree diagram if it is known that she passed.^[35]

Bayes’ Theorem[modifier | modifier le wikicode]

In many instances, we are given a conditional probability of the form ${\begin{aligned}P({\text{statement about variable 1 }}|{\text{ statement about variable 2}})\end{aligned}}$ but we would really like to know the inverted conditional probability: ${\begin{aligned}P({\text{statement about variable 2 }}|{\text{ statement about variable 1}})\end{aligned}}$ Tree diagrams can be used to find the second conditional probability when given the first. However, sometimes it is not possible to draw the scenario in a tree diagram. In these cases, we can apply a very useful and general formula: Bayes’ Theorem.

We first take a critical look at an example of inverting conditional probabilities where we still apply a tree diagram.

In Canada, about 0.35% of women over 40 will develop breast cancer in any given year. A common screening test for cancer is the mammogram, but this test is not perfect. In about 11% of patients with breast cancer, the test gives a : it indicates a woman does not have breast cancer when she does have breast cancer. Similarly, the test gives a in 7% of patients who do not have breast cancer: it indicates these patients have breast cancer when they actually do not.^[36] If we tested a random woman over 40 for breast cancer using a mammogram and the test came back positive – that is, the test suggested the patient has cancer – what is the probability that the patient actually has breast cancer?

[probabilityOfBreastCancerGivenPositiveTestExample]

Fichier:Ch probability/figures/BreastCancerTreeDiagram/BreastCancerTreeDiagram

caption Tree diagram for Example [probabilityOfBreastCancerGivenPositiveTestExample], computing the probability a random patient who tests positive on a mammogram actually has breast cancer.

Notice that we are given sufficient information to quickly compute the probability of testing positive if a woman has breast cancer ( $1.00-0.11=0.89$ ). However, we seek the inverted probability of cancer given a positive test result. (Watch out for the non-intuitive medical language: a positive test result suggests the possible presence of cancer in a mammogram screening.) This inverted probability may be broken into two pieces: Échec de l’analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{has BC } | \text{ mammogram$^+$}) = \frac{P(\text{has BC and mammogram$^+$})}{P(\text{mammogram$^+$})}\end{aligned}} where “has BC” is an abbreviation for the patient actually having breast cancer and “mammogram $^{+}$ ” means the mammogram screening was positive. A tree diagram is useful for identifying each probability and is shown in Figure [BreastCancerTreeDiagram]. The probability the patient has breast cancer and the mammogram is positive is Échec de l’analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{has BC and mammogram$^+$}) &= P(\text{mammogram$^+$ } | \text{ has BC})P(\text{has BC}) \\ &= 0.89\times 0.0035 = 0.00312\end{aligned}} The probability of a positive test result is the sum of the two corresponding scenarios: Échec de l’analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{\underline{\color{black}mammogram$^+$}}) &= P(\text{\underline{\color{black}mammogram$^+$} and has BC}) + P(\text{\underline{\color{black}mammogram$^+$} and no BC}) \\ &= P(\text{has BC})P(\text{mammogram$^+$ } | \text{ has BC}) \\ &\qquad\qquad + P(\text{no BC})P(\text{mammogram$^+$ } | \text{ no BC}) \\ &= 0.0035\times 0.89 + 0.9965\times 0.07 = 0.07288\end{aligned}} Then if the mammogram screening is positive for a patient, the probability the patient has breast cancer is Échec de l’analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{has BC } | \text{ mammogram$^+$}) &= \frac{P(\text{has BC and mammogram$^+$})}{P(\text{mammogram$^+$})}\\ &= \frac{0.00312}{0.07288} \approx 0.0428\end{aligned}} That is, even if a patient has a positive mammogram screening, there is still only a 4% chance that she has breast cancer.

Example [probabilityOfBreastCancerGivenPositiveTestExample] highlights why doctors often run more tests regardless of a first positive test result. When a medical condition is rare, a single positive test isn’t generally definitive.

Consider again the last equation of Example [probabilityOfBreastCancerGivenPositiveTestExample]. Using the tree diagram, we can see that the numerator (the top of the fraction) is equal to the following product: Échec de l’analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{has BC and mammogram$^+$}) = P(\text{mammogram$^+$ } | \text{ has BC})P(\text{has BC})\end{aligned}} The denominator – the probability the screening was positive – is equal to the sum of probabilities for each positive screening scenario: Échec de l’analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{\underline{\color{black}mammogram$^+$}}) &= P(\text{\underline{\color{black}mammogram$^+$} and no BC}) + P(\text{\underline{\color{black}mammogram$^+$} and has BC})\end{aligned}} In the example, each of the probabilities on the right side was broken down into a product of a conditional probability and marginal probability using the tree diagram. Échec de l’analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{mammogram$^+$}) &= P(\text{mammogram$^+$ and no BC}) + P(\text{mammogram$^+$ and has BC}) \\ &= P(\text{mammogram$^+$ } | \text{ no BC})P(\text{no BC}) \\ &\qquad\qquad + P(\text{mammogram$^+$ } | \text{ has BC})P(\text{has BC})\end{aligned}} We can see an application of Bayes’ Theorem by substituting the resulting probability expressions into the numerator and denominator of the original conditional probability. Échec de l’analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} & P(\text{has BC } | \text{ mammogram$^+$}) \\ & \qquad= \frac{P(\text{mammogram$^+$ } | \text{ has BC})P(\text{has BC})} {P(\text{mammogram$^+$ } | \text{ no BC})P(\text{no BC}) + P(\text{mammogram$^+$ } | \text{ has BC})P(\text{has BC})}\end{aligned}}

Consider the following conditional probability for variable 1 and variable 2:

Échec de l’analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{outcome $A_1$ of variable 1 } | \text{ outcome $B$ of variable 2})\end{aligned}}

Bayes’ Theorem states that this conditional probability can be identified as the following fraction:

Échec de l’analyse (fonction inconnue « \label »): {\displaystyle \begin{aligned} \frac{P(B | A_1) P(A_1)} {P(B | A_1) P(A_1) + P(B | A_2) P(A_2) + \cdots + P(B | A_k) P(A_k)} \label{equationOfBayesTheorem}\end{aligned}}

where $A_{2}$ , $A_{3}$ , ..., and $A_{k}$ represent all other possible outcomes of the first variable.

Bayes’ Theorem is just a generalization of what we have done using tree diagrams. The numerator identifies the probability of getting both $A_{1}$ and $B$ . The denominator is the marginal probability of getting $B$ . This bottom component of the fraction appears long and complicated since we have to add up probabilities from all of the different ways to get $B$ . We always completed this step when using tree diagrams. However, we usually did it in a separate step so it didn’t seem as complex.

To apply Bayes’ Theorem correctly, there are two preparatory steps:

First identify the marginal probabilities of each possible outcome of the first variable: $P(A_{1})$ , $P(A_{2})$ , ..., $P(A_{k})$ .
Then identify the probability of the outcome $B$ , conditioned on each possible scenario for the first variable: $P(B|A_{1})$ , $P(B|A_{2})$ , ..., $P(B|A_{k})$ .

Once each of these probabilities are identified, they can be applied directly within the formula.

Drawing a tree diagram makes it easier to understand how two variables are connected. Use Bayes’ Theorem only when there are so many scenarios that drawing a tree diagram would be complex.

[exerciseForParkingLotOnCampusBeingFullAndWhetherOrNotThereIsASportingEvent] Jose visits campus every Thursday evening. However, some days the parking garage is full, often due to college events. There are academic events on 35% of evenings, sporting events on 20% of evenings, and no events on 45% of evenings. When there is an academic event, the garage fills up about 25% of the time, and it fills up 70% of evenings with sporting events. On evenings when there are no events, it only fills up about 5% of the time. If Jose comes to campus and finds the garage full, what is the probability that there is a sporting event? Use a tree diagram to solve this problem.^[37]

Here we solve the same problem presented in Guided Practice [exerciseForParkingLotOnCampusBeingFullAndWhetherOrNotThereIsASportingEvent], except this time we use Bayes’ Theorem. The outcome of interest is whether there is a sporting event (call this $A_{1}$ ), and the condition is that the lot is full ( $B$ ). Let $A_{2}$ represent an academic event and $A_{3}$ represent there being no event on campus. Then the given probabilities can be written as ${\begin{aligned}&P(A_{1})=0.2&&P(A_{2})=0.35&&P(A_{3})=0.45\\&P(B|A_{1})=0.7&&P(B|A_{2})=0.25&&P(B|A_{3})=0.05\end{aligned}}$ Bayes’ Theorem can be used to compute the probability of a sporting event ( $A_{1}$ ) under the condition that the parking lot is full ( $B$ ): ${\begin{aligned}P(A_{1}|B)&={\frac {P(B|A_{1})P(A_{1})}{P(B|A_{1})P(A_{1})+P(B|A_{2})P(A_{2})+P(B|A_{3})P(A_{3})}}\\&={\frac {(0.7)(0.2)}{(0.7)(0.2)+(0.25)(0.35)+(0.05)(0.45)}}\\&=0.56\end{aligned}}$ Based on the information that the garage is full, there is a 56% probability that a sporting event is being held on campus that evening.

[exerciseForParkingLotOnCampusBeingFullAndWhetherOrNotThereIsAnAcademicEvent] Use the information in the previous exercise and example to verify the probability that there is an academic event conditioned on the parking lot being full is 0.35.^[38]

[exerciseForParkingLotOnCampusBeingFullAndWhetherOrNotThereIsNoEvent] In Guided Practice [exerciseForParkingLotOnCampusBeingFullAndWhetherOrNotThereIsASportingEvent] and [exerciseForParkingLotOnCampusBeingFullAndWhetherOrNotThereIsAnAcademicEvent], you found that if the parking lot is full, the probability there is a sporting event is 0.56 and the probability there is an academic event is 0.35. Using this information, compute $P($ no event $|$ the lot is full $)$ .^[39]

The last several exercises offered a way to update our belief about whether there is a sporting event, academic event, or no event going on at the school based on the information that the parking lot was full. This strategy of updating beliefs using Bayes’ Theorem is actually the foundation of an entire section of statistics called . While Bayesian statistics is very important and useful, we will not have time to cover much more of it in this book.

Sampling from a small population (special topic)[modifier | modifier le wikicode]

Professors sometimes select a student at random to answer a question. If each student has an equal chance of being selected and there are 15 people in your class, what is the chance that she will pick you for the next question? If there are 15 people to ask and none are skipping class, then the probability is $1/15$ , or about $0.067$ .

If the professor asks 3 questions, what is the probability that you will not be selected? Assume that she will not pick the same person twice in a given lecture.[3woRep] For the first question, she will pick someone else with probability $14/15$ . When she asks the second question, she only has 14 people who have not yet been asked. Thus, if you were not picked on the first question, the probability you are again not picked is $13/14$ . Similarly, the probability you are again not picked on the third question is $12/13$ , and the probability of not being picked for any of the three questions is Échec de l’analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} &&P(\text{not picked in 3 questions}) \\ &&\quad = P(\text{\var{Q1}} = \text{\resp{not\_\hspace{0.3mm}picked}, }\text{\var{Q2}} = \text{\resp{not\_\hspace{0.3mm}picked}, }\text{\var{Q3}} = \text{\resp{not\_\hspace{0.3mm}picked}.}) \\ &&\quad = \frac{14}{15}\times\frac{13}{14}\times\frac{12}{13} = \frac{12}{15} = 0.80\end{aligned}}

What rule permitted us to multiply the probabilities in Example [3woRep]?^[40]

Suppose the professor randomly picks without regard to who she already selected, i.e. students can be picked more than once. What is the probability that you will not be picked for any of the three questions?[3wRep] Each pick is independent, and the probability of not being picked for any individual question is $14/15$ . Thus, we can use the Multiplication Rule for independent processes. Échec de l’analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} &&P(\text{not picked in 3 questions}) \\ &&\quad = P(\text{\var{Q1}} = \text{\resp{not\_\hspace{0.3mm}picked}, }\text{\var{Q2}} = \text{\resp{not\_\hspace{0.3mm}picked}, }\text{\var{Q3}} = \text{\resp{not\_\hspace{0.3mm}picked}.}) \\ &&\quad = \frac{14}{15}\times\frac{14}{15}\times\frac{14}{15} = 0.813\end{aligned}} You have a slightly higher chance of not being picked compared to when she picked a new person for each question. However, you now may be picked more than once.

Under the setup of Example [3wRep], what is the probability of being picked to answer all three questions?^[41]

If we sample from a small population , we no longer have independence between our observations. In Example [3woRep], the probability of not being picked for the second question was conditioned on the event that you were not picked for the first question. In Example [3wRep], the professor sampled her students : she repeatedly sampled the entire class without regard to who she already picked.

[raffleOf30TicketsWWOReplacement] Your department is holding a raffle. They sell 30 tickets and offer seven prizes. (a) They place the tickets in a hat and draw one for each prize. The tickets are sampled without replacement, i.e. the selected tickets are not placed back in the hat. What is the probability of winning a prize if you buy one ticket? (b) What if the tickets are sampled with replacement?^[42]

[followUpToRaffleOf30TicketsWWOReplacement] Compare your answers in Guided Practice [raffleOf30TicketsWWOReplacement]. How much influence does the sampling method have on your chances of winning a prize?^[43]

Had we repeated Guided Practice [raffleOf30TicketsWWOReplacement] with 300 tickets instead of 30, we would have found something interesting: the results would be nearly identical. The probability would be 0.0233 without replacement and 0.0231 with replacement. When the sample size is only a small fraction of the population (under 10%), observations are nearly independent even when sampling without replacement.

Random variables (special topic)[modifier | modifier le wikicode]

Two books are assigned for a statistics class: a textbook and its corresponding study guide. The university bookstore determined 20% of enrolled students do not buy either book, 55% buy the textbook only, and 25% buy both books, and these percentages are relatively constant from one term to another. If there are 100 students enrolled, how many books should the bookstore expect to sell to this class?[bookStoreSales] Around 20 students will not buy either book (0 books total), about 55 will buy one book (55 books total), and approximately 25 will buy two books (totaling 50 books for these 25 students). The bookstore should expect to sell about 105 books for this class.

Would you be surprised if the bookstore sold slightly more or less than 105 books?^[44]

The textbook costs $137 and the study guide $33. How much revenue should the bookstore expect from this class of 100 students?[bookStoreRev] About 55 students will just buy a textbook, providing revenue of ${\begin{aligned}\$137\times 55=\$7,535\end{aligned}}$ The roughly 25 students who buy both the textbook and the study guide would pay a total of ${\begin{aligned}(\$137+\$33)\times 25=\$170\times 25=\$4,250\end{aligned}}$ Thus, the bookstore should expect to generate about $\$7,535+\$4,250=\$11,785$ from these 100 students for this one class. However, there might be some sampling variability so the actual amount may differ by a little bit.

Fichier:Ch probability/figures/bookCostDist/bookCostDist

caption Probability distribution for the bookstore’s revenue from a single student. The distribution balances on a triangle representing the average revenue per student.

What is the average revenue per student for this course?[revFromStudent] The expected total revenue is $11,785, and there are 100 students. Therefore the expected revenue per student is $\$11,785/100=\$117.85$ .

Expectation[modifier | modifier le wikicode]

We call a variable or process with a numerical outcome a , and we usually represent this random variable with a capital letter such as $X$ , $Y$ , or $Z$ . The amount of money a single student will spend on her statistics books is a random variable, and we represent it by $X$ .

A random process or variable with a numerical outcome.

The possible outcomes of $X$ are labeled with a corresponding lower case letter $x$ and subscripts. For example, we write $x_{1}=\$0$ , $x_{2}=\$137$ , and $x_{3}=\$170$ , which occur with probabilities $0.20$ , $0.55$ , and $0.25$ . The distribution of $X$ is summarized in Figure [bookCostDist] and Table [statSpendDist].

The probability distribution for the random variable $X$ , representing the bookstore’s revenue from a single student.
$i$	1	2	3	Total
$x_{i}$	$0	$137	$170	–
$P(X=x_{i})$	0.20	0.55	0.25	1.00

We computed the average outcome of $X$ as $117.85 in Example [revFromStudent]. We call this average the of $X$ , denoted by $E(X)$ . The expected value of a random variable is computed by adding each outcome weighted by its probability: ${\begin{aligned}E(X)&=0\times P(X=0)+137\times P(X=137)+170\times P(X=170)\\&=0\times 0.20+137\times 0.55+170\times 0.25=117.85\end{aligned}}$

If $X$ takes outcomes $x_{1}$ , ..., $x_{k}$ with probabilities $P(X=x_{1})$ , ..., $P(X=x_{k})$ , the expected value of $X$ is the sum of each outcome multiplied by its corresponding probability: Échec de l’analyse (fonction inconnue « \notag »): {\displaystyle \begin{aligned} E(X) &= x_1\times P(X=x_1) + \cdots + x_k\times P(X=x_k) \notag \\ &= \sum_{i=1}^{k}x_iP(X=x_i)\end{aligned}} The Greek letter $\mu$ may be used in place of the notation $E(X)$ .

The expected value for a random variable represents the average outcome. For example, $E(X)=117.85$ represents the average amount the bookstore expects to make from a single student, which we could also write as $\mu =117.85$ .

It is also possible to compute the expected value of a continuous random variable (see Section [contDist]). However, it requires a little calculus and we save it for a later class.^[45]

In physics, the expectation holds the same meaning as the center of gravity. The distribution can be represented by a series of weights at each outcome, and the mean represents the balancing point. This is represented in Figures [bookCostDist] and [bookWts]. The idea of a center of gravity also expands to continuous probability distributions. Figure [contBalance] shows a continuous probability distribution balanced atop a wedge placed at the mean.

Fichier:Ch probability/figures/bookWts/bookWts

caption A weight system representing the probability distribution for

X

. The string holds the distribution at the mean to keep the system balanced.

Fichier:Ch probability/figures/contBalance/contBalance

caption A continuous distribution can also be balanced at its mean.

Variability in random variables[modifier | modifier le wikicode]

Suppose you ran the university bookstore. Besides how much revenue you expect to generate, you might also want to know the volatility (variability) in your revenue.

The and can be used to describe the variability of a random variable. Section [variability] introduced a method for finding the variance and standard deviation for a data set. We first computed deviations from the mean ( $x_{i}-\mu$ ), squared those deviations, and took an average to get the variance. In the case of a random variable, we again compute squared deviations. However, we take their sum weighted by their corresponding probabilities, just like we did for the expectation. This weighted sum of squared deviations equals the variance, and we calculate the standard deviation by taking the square root of the variance, just as we did in Section [variability].

If $X$ takes outcomes $x_{1}$ , ..., $x_{k}$ with probabilities $P(X=x_{1})$ , ..., $P(X=x_{k})$ and expected value $\mu =E(X)$ , then the variance of $X$ , denoted by $Var(X)$ or the symbol $\sigma ^{2}$ , is Échec de l’analyse (MathML avec SVG ou PNG en secours (recommandé pour les navigateurs modernes et les outils d’accessibilité) : réponse non valide(« Math extension cannot connect to Restbase. ») du serveur « https://en.wikipedia.org/api/rest_v1/ » :): {\displaystyle \begin{aligned} \sigma^2 &= (x_1-\mu)^2\times P(X=x_1) + \cdots \notag \\ & \qquad\quad\cdots+ (x_k-\mu)^2\times P(X=x_k) \notag \\ &= \sum_{j=1}^{k} (x_j - \mu)^2 P(X=x_j)\end{aligned}} The standard deviation of $X$ , labeled $\sigma$ , is the square root of the variance.

Compute the expected value, variance, and standard deviation of $X$ , the revenue of a single statistics student for the bookstore. It is useful to construct a table that holds computations for each outcome separately, then add up the results.

$i$	1	2	3	Total
$x_{i}$	$0	$137	$170
$P(X=x_{i})$	0.20	0.55	0.25
$x_{i}\times P(X=x_{i})$	0	75.35	42.50	117.85

Thus, the expected value is $\mu =117.85$ , which we computed earlier. The variance can be constructed by extending this table:

$i$	1	2	3	Total
$x_{i}$	$0	$137	$170
$P(X=x_{i})$	0.20	0.55	0.25
$x_{i}\times P(X=x_{i})$	0	75.35	42.50	117.85
$x_{i}-\mu$	-117.85	19.15	52.15
$(x_{i}-\mu )^{2}$	13888.62	366.72	2719.62
$(x_{i}-\mu )^{2}\times P(X=x_{i})$	2777.7	201.7	679.9	3659.3

The variance of $X$ is $\sigma ^{2}=3659.3$ , which means the standard deviation is $\sigma ={\sqrt {3659.3}}=\$60.49$ .

The bookstore also offers a chemistry textbook for $159 and a book supplement for $41. From past experience, they know about 25% of chemistry students just buy the textbook while 60% buy both the textbook and supplement.^[46]

What proportion of students don’t buy either book? Assume no students buy the supplement without the textbook.
Let $Y$ represent the revenue from a single student. Write out the probability distribution of $Y$ , i.e. a table for each outcome and its associated probability.
Compute the expected revenue from a single chemistry student.
Find the standard deviation to describe the variability associated with the revenue from a single student.

Linear combinations of random variables[modifier | modifier le wikicode]

So far, we have thought of each variable as being a complete story in and of itself. Sometimes it is more appropriate to use a combination of variables. For instance, the amount of time a person spends commuting to work each week can be broken down into several daily commutes. Similarly, the total gain or loss in a stock portfolio is the sum of the gains and losses in its components.

John travels to work five days a week. We will use $X_{1}$ to represent his travel time on Monday, $X_{2}$ to represent his travel time on Tuesday, and so on. Write an equation using $X_{1}$ , ..., $X_{5}$ that represents his travel time for the week, denoted by $W$ . His total weekly travel time is the sum of the five daily values: $W=X_{1}+X_{2}+X_{3}+X_{4}+X_{5}$ Breaking the weekly travel time $W$ into pieces provides a framework for understanding each source of randomness and is useful for modeling $W$ .

It takes John an average of 18 minutes each day to commute to work. What would you expect his average commute time to be for the week? We were told that the average (i.e. expected value) of the commute time is 18 minutes per day: $E(X_{i})=18$ . To get the expected time for the sum of the five days, we can add up the expected time for each individual day: ${\begin{aligned}E(W)&=E(X_{1}+X_{2}+X_{3}+X_{4}+X_{5})\\&=E(X_{1})+E(X_{2})+E(X_{3})+E(X_{4})+E(X_{5})\\&=18+18+18+18+18=90{\text{ minutes}}\end{aligned}}$ The expectation of the total time is equal to the sum of the expected individual times. More generally, the expectation of a sum of random variables is always the sum of the expectation for each random variable.

[elenaIsSellingATVAndBuyingAToasterOvenAtAnAuction] Elena is selling a TV at a cash auction and also intends to buy a toaster oven in the auction. If $X$ represents the profit for selling the TV and $Y$ represents the cost of the toaster oven, write an equation that represents the net change in Elena’s cash.^[47]

Based on past auctions, Elena figures she should expect to make about $175 on the TV and pay about $23 for the toaster oven. In total, how much should she expect to make or spend?^[48]

[explainWhyThereIsUncertaintyInTheSum] Would you be surprised if John’s weekly commute wasn’t exactly 90 minutes or if Elena didn’t make exactly $152? Explain.^[49]

Two important concepts concerning combinations of random variables have so far been introduced. First, a final value can sometimes be described as the sum of its parts in an equation. Second, intuition suggests that putting the individual average values into this equation gives the average value we would expect in total. This second point needs clarification – it is guaranteed to be true in what are called linear combinations of random variables.

A of two random variables $X$ and $Y$ is a fancy phrase to describe a combination $aX+bY$ where $a$ and $b$ are some fixed and known numbers. For John’s commute time, there were five random variables – one for each work day – and each random variable could be written as having a fixed coefficient of 1: $1X_{1}+1X_{2}+1X_{3}+1X_{4}+1X_{5}$ For Elena’s net gain or loss, the $X$ random variable had a coefficient of +1 and the $Y$ random variable had a coefficient of -1.

When considering the average of a linear combination of random variables, it is safe to plug in the mean of each random variable and then compute the final result. For a few examples of nonlinear combinations of random variables – cases where we cannot simply plug in the means – see the footnote.^[50]

If $X$ and $Y$ are random variables, then a linear combination of the random variables is given by Échec de l’analyse (fonction inconnue « \label »): {\displaystyle \begin{aligned} \label{linComboOfRandomVariablesXAndY} aX + bY\end{aligned}} where $a$ and $b$ are some fixed numbers. To compute the average value of a linear combination of random variables, plug in the average of each individual random variable and compute the result: ${\begin{aligned}a\times E(X)+b\times E(Y)\end{aligned}}$ Recall that the expected value is the same as the mean, e.g. $E(X)=\mu _{X}$ .

Leonard has invested $6000 in Google Inc. (stock ticker: GOOG) and $2000 in Exxon Mobil Corp. (XOM). If $X$ represents the change in Google’s stock next month and $Y$ represents the change in Exxon Mobil stock next month, write an equation that describes how much money will be made or lost in Leonard’s stocks for the month. For simplicity, we will suppose $X$ and $Y$ are not in percents but are in decimal form (e.g. if Google’s stock increases 1%, then $X=0.01$ ; or if it loses 1%, then $X=-0.01$ ). Then we can write an equation for Leonard’s gain as ${\begin{aligned}\$6000\times X+\$2000\times Y\end{aligned}}$ If we plug in the change in the stock value for $X$ and $Y$ , this equation gives the change in value of Leonard’s stock portfolio for the month. A positive value represents a gain, and a negative value represents a loss.

[expectedChangeInLeonardsStockPortfolio] Suppose Google and Exxon Mobil stocks have recently been rising 2.1% and 0.4% per month, respectively. Compute the expected change in Leonard’s stock portfolio for next month.^[51]

You should have found that Leonard expects a positive gain in Guided Practice [expectedChangeInLeonardsStockPortfolio]. However, would you be surprised if he actually had a loss this month?^[52]

Variability in linear combinations of random variables[modifier | modifier le wikicode]

Quantifying the average outcome from a linear combination of random variables is helpful, but it is also important to have some sense of the uncertainty associated with the total outcome of that combination of random variables. The expected net gain or loss of Leonard’s stock portfolio was considered in Guided Practice [expectedChangeInLeonardsStockPortfolio]. However, there was no quantitative discussion of the volatility of this portfolio. For instance, while the average monthly gain might be about $134 according to the data, that gain is not guaranteed. Figure [changeInLeonardsStockPortfolioFor36Months] shows the monthly changes in a portfolio like Leonard’s during the 36 months from 2009 to 2011. The gains and losses vary widely, and quantifying these fluctuations is important when investing in stocks.

Fichier:Ch probability/figures/changeInLeonardsStockPortfolioFor36Months/changeInLeonardsStockPortfolioFor36Months

caption The change in a portfolio like Leonard’s for the 36 months from 2009 to 2011, where $6000 is in Google’s stock and $2000 is in Exxon Mobil’s.

Just as we have done in many previous cases, we use the variance and standard deviation to describe the uncertainty associated with Leonard’s monthly returns. To do so, the variances of each stock’s monthly return will be useful, and these are shown in Table [sumStatOfGOOGXOM]. The stocks’ returns are nearly independent.

The mean, standard deviation, and variance of the GOOG and XOM stocks. These statistics were estimated from historical stock data, so notation used for sample statistics has been used.
	Mean ( ${\bar {x}}$ )	Standard deviation ( $s$ )	Variance ( $s^{2}$ )
GOOG	0.0210	0.0846	0.0072
XOM	0.0038	0.0519	0.0027

Here we use an equation from probability theory to describe the uncertainty of Leonard’s monthly returns; we leave the proof of this method to a dedicated probability course. The variance of a linear combination of random variables can be computed by plugging in the variances of the individual random variables and squaring the coefficients of the random variables: ${\begin{aligned}Var(aX+bY)=a^{2}\times Var(X)+b^{2}\times Var(Y)\end{aligned}}$ It is important to note that this equality assumes the random variables are independent; if independence doesn’t hold, then more advanced methods are necessary. This equation can be used to compute the variance of Leonard’s monthly return: ${\begin{aligned}Var(6000\times X+2000\times Y)&=6000^{2}\times Var(X)+2000^{2}\times Var(Y)\\&=36,000,000\times 0.0072+4,000,000\times 0.0027\\&=270,000\end{aligned}}$ The standard deviation is computed as the square root of the variance: ${\sqrt {270,000}}=\$520$ . While an average monthly return of $134 on an $8000 investment is nothing to scoff at, the monthly returns are so volatile that Leonard should not expect this income to be very stable.

The variance of a linear combination of random variables may be computed by squaring the constants, substituting in the variances for the random variables, and computing the result: ${\begin{aligned}Var(aX+bY)=a^{2}\times Var(X)+b^{2}\times Var(Y)\end{aligned}}$ This equation is valid as long as the random variables are independent of each other. The standard deviation of the linear combination may be found by taking the square root of the variance.

Suppose John’s daily commute has a standard deviation of 4 minutes. What is the uncertainty in his total commute time for the week? [sdOfJohnsCommuteWeeklyTime] The expression for John’s commute time was ${\begin{aligned}X_{1}+X_{2}+X_{3}+X_{4}+X_{5}\end{aligned}}$ Each coefficient is 1, and the variance of each day’s time is $4^{2}=16$ . Thus, the variance of the total weekly commute time is ${\begin{aligned}&{\text{variance }}=1^{2}\times 16+1^{2}\times 16+1^{2}\times 16+1^{2}\times 16+1^{2}\times 16=5\times 16=80\\&{\text{standard deviation }}={\sqrt {\text{variance}}}={\sqrt {80}}=8.94\end{aligned}}$ The standard deviation for John’s weekly work commute time is about 9 minutes.

The computation in Example [sdOfJohnsCommuteWeeklyTime] relied on an important assumption: the commute time for each day is independent of the time on other days of that week. Do you think this is valid? Explain.^[53]

[elenaIsSellingATVAndBuyingAToasterOvenAtAnAuctionVariability] Consider Elena’s two auctions from Guided Practice [elenaIsSellingATVAndBuyingAToasterOvenAtAnAuction] on page . Suppose these auctions are approximately independent and the variability in auction prices associated with the TV and toaster oven can be described using standard deviations of $25 and $8. Compute the standard deviation of Elena’s net gain.^[54]

Consider again Guided Practice [elenaIsSellingATVAndBuyingAToasterOvenAtAnAuctionVariability]. The negative coefficient for $Y$ in the linear combination was eliminated when we squared the coefficients. This generally holds true: negatives in a linear combination will have no impact on the variability computed for a linear combination, but they do impact the expected value computations.

Continuous distributions (special topic)[modifier | modifier le wikicode]

Figure [fdicHistograms] shows a few different hollow histograms of the variable for 3 million US adults from the mid-90’s.^[55] How does changing the number of bins allow you to make different interpretations of the data?[usHeights] Adding more bins provides greater detail. This sample is extremely large, which is why much smaller bins still work well. Usually we do not use so many bins with smaller sample sizes since small counts per bin mean the bin heights are very volatile.

Fichier:Ch probability/figures/fdicHistograms/fdicHistograms

caption Four hollow histograms of US adults heights with varying bin widths.

What proportion of the sample is between cm and cm tall (about 5’11“ to 6’1”)?[contDistProb] We can add up the heights of the bins in the range cm and and divide by the sample size. For instance, this can be done with the two shaded bins shown in Figure [usHeightsHist180185]. The two bins in this region have counts of 195,307 and 156,239 people, resulting in the following estimate of the probability: ${\begin{aligned}{\frac {195307+156239}{\text{3,000,000}}}=0.1172\end{aligned}}$ This fraction is the same as the proportion of the histogram’s area that falls in the range to cm.

Fichier:Ch probability/figures/usHeightsHist180185/usHeightsHist180185

caption A histogram with bin sizes of 2.5 cm. The shaded region represents individuals with heights between and cm.

From histograms to continuous distributions[modifier | modifier le wikicode]

Examine the transition from a boxy hollow histogram in the top-left of Figure [fdicHistograms] to the much smoother plot in the lower-right. In this last plot, the bins are so slim that the hollow histogram is starting to resemble a smooth curve. This suggests the population height as a continuous numerical variable might best be explained by a curve that represents the outline of extremely slim bins.

This smooth curve represents a (also called a or ), and such a curve is shown in Figure [fdicHeightContDist] overlaid on a histogram of the sample. A density has a special property: the total area under the density’s curve is 1.

Fichier:Ch probability/figures/fdicHeightContDist/fdicHeightContDist

caption The continuous probability distribution of heights for US adults.

Probabilities from continuous distributions[modifier | modifier le wikicode]

We computed the proportion of individuals with heights to cm in Example [contDistProb] as a fraction: Échec de l’analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} \frac{\text{number of people between \resp{180} and \resp{185}}}{\text{total sample size}}\end{aligned}} We found the number of people with heights between and cm by determining the fraction of the histogram’s area in this region. Similarly, we can use the area in the shaded region under the curve to find a probability (with the help of a computer): Échec de l’analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{\var{height} between \resp{180} and \resp{185}}) = \text{area between \resp{180} and \resp{185}} = 0.1157\end{aligned}} The probability that a randomly selected person is between and cm is 0.1157. This is very close to the estimate from Example [contDistProb]: 0.1172.

Fichier:Ch probability/figures/fdicHeightContDistFilled/fdicHeightContDistFilled

caption Density for heights in the US adult population with the area between 180 and 185 cm shaded. Compare this plot with Figure [usHeightsHist180185].

Three US adults are randomly selected. The probability a single adult is between and cm is 0.1157.^[56]

What is the probability that all three are between and cm tall?
What is the probability that none are between and cm?

What is the probability that a randomly selected person is exactly cm? Assume you can measure perfectly. [probabilityOfExactly180cm] This probability is zero. A person might be close to cm, but not exactly cm tall. This also makes sense with the definition of probability as area; there is no area captured between cm and cm.

Suppose a person’s height is rounded to the nearest centimeter. Is there a chance that a random person’s measured height will be cm?^[57]

↑ Here are four examples. (i) Whether someone gets sick in the next month or not is an apparently random process with outcomes and . (ii) We can generate a random process by randomly picking a person and measuring that person’s height. The outcome of this process will be a positive number. (iii) Whether the stock market goes up or down next week is a seemingly random process with possible outcomes , , and . Alternatively, we could have used the percent change in the stock market as a numerical outcome. (iv) Whether your roommate cleans her dishes tonight probably seems like a random process with possible outcomes and .
↑ (a) The random process is a die roll, and at most one of these outcomes can come up. This means they are disjoint outcomes. (b) $P($ or or $)=P($ $)+P($ $)+P($ $)={\frac {1}{6}}+{\frac {1}{6}}+{\frac {1}{6}}={\frac {3}{6}}={\frac {1}{2}}$
↑ (a) Yes. Each email is categorized in only one level of . (b) Small: ${\frac {2827}{3921}}=0.721$ . Big: ${\frac {545}{3921}}=0.139$ . (c) $P($ or $)=P($ $)+P($ $)=0.721+0.139=0.860$ .
↑ (a) $P(A)=P($ or $)=P($ $)+P($ $)={\frac {1}{6}}+{\frac {1}{6}}={\frac {2}{6}}={\frac {1}{3}}$ . (b) Similarly, $P(B)=1/3$ .
↑ (a) Outcomes and . (b) Yes, events $B$ and $D$ are disjoint because they share no outcomes. (c) The events $A$ and $D$ share an outcome in common, , and so are not disjoint.
↑ Since $B$ and $D$ are disjoint events, use the Addition Rule: $P(B$ or $D)=P(B)+P(D)={\frac {1}{3}}+{\frac {1}{3}}={\frac {2}{3}}$ .
↑ The 52 cards are split into four : $\clubsuit$ (club), $\diamondsuit$ (diamond), $\heartsuit$ (heart), $\spadesuit$ (spade). Each suit has its 13 cards labeled: , , ..., , (jack), (queen), (king), and (ace). Thus, each card is a unique combination of a suit and a label, e.g. and . The 12 cards represented by the jacks, queens, and kings are called . The cards that are $\diamondsuit$ or $\heartsuit$ are typically colored red while the other two suits are typically colored black.
↑ (a) There are 52 cards and 13 diamonds. If the cards are thoroughly shuffled, each card has an equal chance of being drawn, so the probability that a randomly selected card is a diamond is $P({\color {redcards}\diamondsuit })={\frac {13}{52}}=0.250$ . (b) Likewise, there are 12 face cards, so $P($ face card $)={\frac {12}{52}}={\frac {3}{13}}=0.231$ .
↑ (a) If $A$ and $B$ are disjoint, $A$ and $B$ can never occur simultaneously. (b) If $A$ and $B$ are disjoint, then the last term of Equation ([generalAdditionRule]) is 0 (see part (a)) and we are left with the Addition Rule for disjoint events.
↑ 0.65 Both the counts and corresponding probabilities (e.g. $2659/3921=0.678$ ) are shown. Notice that the number of emails represented in the left circle corresponds to $2659+168=2827$ , and the number represented in the right circle is $168+199=367$ . 0.3 image
↑ (a) The solution is represented by the intersection of the two circles: 0.043. (b) This is the sum of the three disjoint probabilities shown in the circles: $0.678+0.043+0.051=0.772$ .
↑ The probabilities of (a) do not sum to 1. The second probability in (b) is negative. This leaves (c), which sure enough satisfies the requirements of a distribution. One of the three was said to be the actual distribution of US household incomes, so it must be (c).
↑ (a) The outcomes are disjoint and each has probability $1/6$ , so the total probability is $4/6=2/3$ . (b) We can also see that $P(D)={\frac {1}{6}}+{\frac {1}{6}}=1/3$ . Since $D$ and $D^{c}$ are disjoint, $P(D)+P(D^{c})=1$ .
↑ Brief solutions: (a) $A^{c}=\{$ , , , $\}$ and $B^{c}=\{$ , , , $\}$ . (b) Noting that each outcome is disjoint, add the individual outcome probabilities to get $P(A^{c})=2/3$ and $P(B^{c})=2/3$ . (c) $A$ and $A^{c}$ are disjoint, and the same is true of $B$ and $B^{c}$ . Therefore, $P(A)+P(A^{c})=1$ and $P(B)+P(B^{c})=1$ .
↑ (a) The complement of $A$ : when the total is equal to . (b) $P(A^{c})=1/36$ . (c) Use the probability of the complement from part (b), $P(A^{c})=1/36$ , and Equation ([complement]): $P($ less than $)=1-P($ $)=1-1/36=35/36$ .
↑ (a) First find $P($ $)=5/36$ , then use the complement: $P($ not $)=1-P($ $)=31/36$ . (b) First find the complement, which requires much less effort: $P($ or $)=1/36+2/36=1/12$ . Then calculate $P(B)=1-P(B^{c})=1-1/12=11/12$ . (c) As before, finding the complement is the clever way to determine $P(D)$ . First find $P(D^{c})=P($ or $)=2/36+1/36=1/12$ . Then calculate $P(D)=1-P(D^{c})=11/12$ .
↑ (a) The probability the first person is left-handed is $0.09$ , which is the same for the second person. We apply the Multiplication Rule for independent processes to determine the probability that both will be left-handed: $0.09\times 0.09=0.0081$ . (b) It is reasonable to assume the proportion of people who are ambidextrous (both right and left handed) is nearly 0, which results in $P($ right-handed $)=1-0.09=0.91$ . Using the same reasoning as in part (a), the probability that both will be right-handed is $0.91\times 0.91=0.8281$ .
↑ (a) The abbreviations and are used for right-handed and left-handed, respectively. Since each are independent, we apply the Multiplication Rule for independent processes: Échec de l’analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{all five are \resp{RH}}) &= P(\text{first = \resp{RH}, second = \resp{RH}, ..., fifth = \resp{RH}}) \\ &= P(\text{first = \resp{RH}})\times P(\text{second = \resp{RH}})\times \dots \times P(\text{fifth = \resp{RH}}) \\ &= 0.91\times 0.91\times 0.91\times 0.91\times 0.91 = 0.624\end{aligned}} (b) Using the same reasoning as in (a), $0.09\times 0.09\times 0.09\times 0.09\times 0.09=0.0000059$ (c) Use the complement, $P($ all five are $)$ , to answer this question: Échec de l’analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{not all \resp{RH}}) = 1 - P(\text{all \resp{RH}}) = 1 - 0.624 = 0.376\end{aligned}}
↑ The actual proportion of the U.S. population that is is about 50%, and so we use 0.5 for the probability of sampling a woman. However, this probability does differ in other countries.
↑ Brief answers are provided. (a) This can be written in probability notation as $P($ a randomly selected person is male and right-handed $)=0.455$ . (b) 0.207. (c) 0.045. (d) 0.0093.
↑ A simulated data set based on real population summaries at .
↑ Each of the four outcome combination are disjoint, all probabilities are indeed non-negative, and the sum of the probabilities is $0.29+0.06+0.27+0.38=1.00$ .
↑ (a) Échec de l’analyse (erreur de syntaxe): {\displaystyle P(\text{\var{parents} \resp{not}}\ |\ \text{\var{teen} \resp{not}})} . (b) Equation ([condProbEq]) for conditional probability indicates we should first find Échec de l’analyse (erreur de syntaxe): {\displaystyle P(\text{\var{parents} \resp{not} and \var{teen} \resp{not}}) = 0.38} and Échec de l’analyse (erreur de syntaxe): {\displaystyle P(\text{\var{teen} \resp{not}}) = 0.44} . Then the ratio represents the conditional probability: $0.38/0.44=0.864$ .
↑ (a) This probability is Échec de l’analyse (erreur de syntaxe): {\displaystyle \frac{P(\text{\var{parents} \resp{degree}, \var{teen} \resp{not}})}{P(\text{\var{teen} \resp{not}})} = \frac{0.06}{0.44} = 0.136} . (b) The total equals 1. (c) Under the condition the teenager didn’t attend college, the parents must either have a college degree or not. The complement still works for conditional probabilities, provided the probabilities are conditioned on the same information.
↑ No. While there is an association, the data are observational. Two potential confounding variables include and . Can you think of others?
↑ Fenner F. 1988. Smallpox and Its Eradication (History of International Public Health, No. 6). Geneva: World Health Organization. ISBN 92-4-156110-6.
↑ $P($ = $|$ = Échec de l’analyse (erreur de syntaxe): {\displaystyle ) = \frac{P(\text{\var{result} = \resp{died} and \var{inoculated} = \resp{no}})}{P(\text{\var{inoculated} = \resp{no}})} = \frac{0.1356}{0.9608} = 0.1411} .
↑ $P($ = $|$ = Échec de l’analyse (erreur de syntaxe): {\displaystyle ) = \frac{P(\text{\var{result} = \resp{died} and \var{inoculated} = \resp{yes}})}{P(\text{\var{inoculated} = \resp{yes}})} = \frac{0.0010}{0.0392} = 0.0255} . The death rate for individuals who were inoculated is only about 1 in 40 while the death rate is about 1 in 7 for those who were not inoculated.
↑ Brief answers: (a) Observational. (b) No, we cannot infer causation from this observational study. (c) Accessibility to the latest and best medical care. There are other valid answers for part (c).
↑ The answer is 0.0382, which can be verified using Table [smallpoxProbabilityTable].
↑ There were only two possible outcomes: or . This means that 100% - 97.45% = 2.55% of the people who were inoculated died.
↑ The samples are large relative to the difference in death rates for the “inoculated” and “not inoculated” groups, so it seems there is an association between and . However, as noted in the solution to Guided Practice [SmallpoxInoculationObsExpExercise], this is an observational study and we cannot be sure if there is a causal connection. (Further research has shown that inoculation is effective at reducing death rates.)
↑ Brief solutions: (a) $1/6$ . (b) $1/36$ . (c) Échec de l’analyse (erreur de syntaxe): {\displaystyle \frac{P(Y = \text{ \resp{1} and }X=\text{ \resp{1}})}{P(X=\text{ \resp{1}})} = \frac{1/36}{1/6} = 1/6} . (d) The probability is the same as in part (c): $P(Y=1)=1/6$ . The probability that $Y=1$ was unchanged by knowledge about $X$ , which makes sense as $X$ and $Y$ are independent.
↑ He has forgotten that the next roulette spin is independent of the previous spins. Casinos do employ this practice; they post the last several outcomes of many betting games to trick unsuspecting gamblers into believing the odds are in their favor. This is called the .
↑ 0.47 (a) The tree diagram is shown to the right. (b) Identify which two joint probabilities represent students who passed, and add them: $P($ passed $)=0.7566+0.1254=0.8820$ . (c) $P($ construct tree diagram $|$ passed $)={\frac {0.7566}{0.8820}}=0.8578$ .
0.5 image
↑ The probabilities reported here were obtained using studies reported at and .
↑ 0.47 The tree diagram, with three primary branches, is shown to the right. Next, we identify two probabilities from the tree diagram. (1) The probability that there is a sporting event and the garage is full: 0.14. (2) The probability the garage is full: $0.0875+0.14+0.0225=0.25$ . Then the solution is the ratio of these probabilities: ${\frac {0.14}{0.25}}=0.56$ . If the garage is full, there is a 56% probability that there is a sporting event.
0.5 image
↑ Short answer: ${\begin{aligned}P(A_{2}|B)&={\frac {P(B|A_{2})P(A_{2})}{P(B|A_{1})P(A_{1})+P(B|A_{2})P(A_{2})+P(B|A_{3})P(A_{3})}}\\&={\frac {(0.25)(0.35)}{(0.7)(0.2)+(0.25)(0.35)+(0.05)(0.45)}}\\&=0.35\end{aligned}}$
↑ Each probability is conditioned on the same information that the garage is full, so the complement may be used: $1.00-0.56-0.35=0.09$ .
↑ The three probabilities we computed were actually one marginal probability, $P($ $=$ $)$ , and two conditional probabilities: Échec de l’analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} &&P(\text{\var{Q2}} = \text{\resp{not\_\hspace{0.3mm}picked} }|\text{ \var{Q1}} = \text{\resp{not\_\hspace{0.3mm}picked}}) \\ &&P(\text{\var{Q3}} = \text{\resp{not\_\hspace{0.3mm}picked} }|\text{ \var{Q1}} = \text{\resp{not\_\hspace{0.3mm}picked}, }\text{\var{Q2}} = \text{\resp{not\_\hspace{0.3mm}picked}})\end{aligned}} Using the General Multiplication Rule, the product of these three probabilities is the probability of not being picked in 3 questions.
↑ $P($ being picked to answer all three questions $)=\left({\frac {1}{15}}\right)^{3}=0.00030$ .
↑ (a) First determine the probability of not winning. The tickets are sampled without replacement, which means the probability you do not win on the first draw is $29/30$ , $28/29$ for the second, ..., and $23/24$ for the seventh. The probability you win no prize is the product of these separate probabilities: $23/30$ . That is, the probability of winning a prize is $1-23/30=7/30=0.233$ . (b) When the tickets are sampled with replacement, there are seven independent draws. Again we first find the probability of not winning a prize: $(29/30)^{7}=0.789$ . Thus, the probability of winning (at least) one prize when drawing with replacement is 0.211.
↑ There is about a 10% larger chance of winning a prize when using sampling without replacement. However, at most one prize may be won under this sampling procedure.
↑ If they sell a little more or a little less, this should not be a surprise. Hopefully Chapter [introductionToData] helped make clear that there is natural variability in observed data. For example, if we would flip a coin 100 times, it will not usually come up heads exactly half the time, but it will probably be close.
↑ $\mu =\int xf(x)dx$ where $f(x)$ represents a function for the density curve.

↑ (a) 100% - 25% - 60% = 15% of students do not buy any books for the class. Part (b) is represented by the first two lines in the table below. The expectation for part (c) is given as the total on the line

y_{i}\times P(Y=y_{i})

. The result of part (d) is the square-root of the variance listed on in the total on the last line:

\sigma ={\sqrt {Var(Y)}}=\$69.28

.

$i$ (scenario)	1 ()	2 ()	3 ()	Total
$y_{i}$	0.00	159.00	200.00
$P(Y=y_{i})$	0.15	0.25	0.60
$y_{i}\times P(Y=y_{i})$	0.00	39.75	120.00	$E(Y)=159.75$
$y_{i}-E(Y)$	-159.75	-0.75	40.25
$(y_{i}-E(Y))^{2}$	25520.06	0.56	1620.06
$(y_{i}-E(Y))^{2}\times P(Y)$	3828.0	0.1	972.0	$Var(Y)\approx 4800$

↑ She will make $X$ dollars on the TV but spend $Y$ dollars on the toaster oven: $X-Y$ .
↑ $E(X-Y)=E(X)-E(Y)=175-23=\$152$ . She should expect to make about $152.
↑ No, since there is probably some variability. For example, the traffic will vary from one day to next, and auction prices will vary depending on the quality of the merchandise and the interest of the attendees.
↑ If $X$ and $Y$ are random variables, consider the following combinations: $X^{1+Y}$ , $X\times Y$ , $X/Y$ . In such cases, plugging in the average value for each random variable and computing the result will not generally lead to an accurate average value for the end result.
↑ $E(\$6000\times X+\$2000\times Y)=\$6000\times 0.021+\$2000\times 0.004=\$134$ .
↑ No. While stocks tend to rise over time, they are often volatile in the short term.
↑ One concern is whether traffic patterns tend to have a weekly cycle (e.g. Fridays may be worse than other days). If that is the case, and John drives, then the assumption is probably not reasonable. However, if John walks to work, then his commute is probably not affected by any weekly traffic cycle.
↑ The equation for Elena can be written as ${\begin{aligned}(1)\times X+(-1)\times Y\end{aligned}}$ The variances of $X$ and $Y$ are 625 and 64. We square the coefficients and plug in the variances: ${\begin{aligned}(1)^{2}\times Var(X)+(-1)^{2}\times Var(Y)=1\times 625+1\times 64=689\end{aligned}}$ The variance of the linear combination is 689, and the standard deviation is the square root of 689: about $26.25.
↑ This sample can be considered a simple random sample from the US population. It relies on the USDA Food Commodity Intake Database.
↑ Brief answers: (a) $0.1157\times 0.1157\times 0.1157=0.0015$ . (b) $(1-0.1157)^{3}=0.692$
↑ This has positive probability. Anyone between cm and cm will have a measured height of cm. This is probably a more realistic scenario to encounter in practice versus Example [probabilityOfExactly180cm].

[1] Here are four examples. (i) Whether someone gets sick in the next month or not is an apparently random process with outcomes and . (ii) We can generate a random process by randomly picking a person and measuring that person’s height. The outcome of this process will be a positive number. (iii) Whether the stock market goes up or down next week is a seemingly random process with possible outcomes , , and . Alternatively, we could have used the percent change in the stock market as a numerical outcome. (iv) Whether your roommate cleans her dishes tonight probably seems like a random process with possible outcomes and .

[2] (a) The random process is a die roll, and at most one of these outcomes can come up. This means they are disjoint outcomes. (b) $P($ or or $)=P($ $)+P($ $)+P($ $)={\frac {1}{6}}+{\frac {1}{6}}+{\frac {1}{6}}={\frac {3}{6}}={\frac {1}{2}}$

[3] (a) Yes. Each email is categorized in only one level of . (b) Small: ${\frac {2827}{3921}}=0.721$ . Big: ${\frac {545}{3921}}=0.139$ . (c) $P($ or $)=P($ $)+P($ $)=0.721+0.139=0.860$ .

[4] (a) $P(A)=P($ or $)=P($ $)+P($ $)={\frac {1}{6}}+{\frac {1}{6}}={\frac {2}{6}}={\frac {1}{3}}$ . (b) Similarly, $P(B)=1/3$ .

[5] (a) Outcomes and . (b) Yes, events $B$ and $D$ are disjoint because they share no outcomes. (c) The events $A$ and $D$ share an outcome in common, , and so are not disjoint.

[6] Since $B$ and $D$ are disjoint events, use the Addition Rule: $P(B$ or $D)=P(B)+P(D)={\frac {1}{3}}+{\frac {1}{3}}={\frac {2}{3}}$ .

[7] The 52 cards are split into four : $\clubsuit$ (club), $\diamondsuit$ (diamond), $\heartsuit$ (heart), $\spadesuit$ (spade). Each suit has its 13 cards labeled: , , ..., , (jack), (queen), (king), and (ace). Thus, each card is a unique combination of a suit and a label, e.g. and . The 12 cards represented by the jacks, queens, and kings are called . The cards that are $\diamondsuit$ or $\heartsuit$ are typically colored red while the other two suits are typically colored black.

[8] (a) There are 52 cards and 13 diamonds. If the cards are thoroughly shuffled, each card has an equal chance of being drawn, so the probability that a randomly selected card is a diamond is $P({\color {redcards}\diamondsuit })={\frac {13}{52}}=0.250$ . (b) Likewise, there are 12 face cards, so $P($ face card $)={\frac {12}{52}}={\frac {3}{13}}=0.231$ .

[9] (a) If $A$ and $B$ are disjoint, $A$ and $B$ can never occur simultaneously. (b) If $A$ and $B$ are disjoint, then the last term of Equation ([generalAdditionRule]) is 0 (see part (a)) and we are left with the Addition Rule for disjoint events.

[10] 0.65 Both the counts and corresponding probabilities (e.g. $2659/3921=0.678$ ) are shown. Notice that the number of emails represented in the left circle corresponds to $2659+168=2827$ , and the number represented in the right circle is $168+199=367$ . 0.3 image

[11] (a) The solution is represented by the intersection of the two circles: 0.043. (b) This is the sum of the three disjoint probabilities shown in the circles: $0.678+0.043+0.051=0.772$ .

[12] The probabilities of (a) do not sum to 1. The second probability in (b) is negative. This leaves (c), which sure enough satisfies the requirements of a distribution. One of the three was said to be the actual distribution of US household incomes, so it must be (c).

[13] (a) The outcomes are disjoint and each has probability $1/6$ , so the total probability is $4/6=2/3$ . (b) We can also see that $P(D)={\frac {1}{6}}+{\frac {1}{6}}=1/3$ . Since $D$ and $D^{c}$ are disjoint, $P(D)+P(D^{c})=1$ .

[14] Brief solutions: (a) $A^{c}=\{$ , , , $\}$ and $B^{c}=\{$ , , , $\}$ . (b) Noting that each outcome is disjoint, add the individual outcome probabilities to get $P(A^{c})=2/3$ and $P(B^{c})=2/3$ . (c) $A$ and $A^{c}$ are disjoint, and the same is true of $B$ and $B^{c}$ . Therefore, $P(A)+P(A^{c})=1$ and $P(B)+P(B^{c})=1$ .

[15] (a) The complement of $A$ : when the total is equal to . (b) $P(A^{c})=1/36$ . (c) Use the probability of the complement from part (b), $P(A^{c})=1/36$ , and Equation ([complement]): $P($ less than $)=1-P($ $)=1-1/36=35/36$ .

[16] (a) First find $P($ $)=5/36$ , then use the complement: $P($ not $)=1-P($ $)=31/36$ . (b) First find the complement, which requires much less effort: $P($ or $)=1/36+2/36=1/12$ . Then calculate $P(B)=1-P(B^{c})=1-1/12=11/12$ . (c) As before, finding the complement is the clever way to determine $P(D)$ . First find $P(D^{c})=P($ or $)=2/36+1/36=1/12$ . Then calculate $P(D)=1-P(D^{c})=11/12$ .

[17] (a) The probability the first person is left-handed is $0.09$ , which is the same for the second person. We apply the Multiplication Rule for independent processes to determine the probability that both will be left-handed: $0.09\times 0.09=0.0081$ . (b) It is reasonable to assume the proportion of people who are ambidextrous (both right and left handed) is nearly 0, which results in $P($ right-handed $)=1-0.09=0.91$ . Using the same reasoning as in part (a), the probability that both will be right-handed is $0.91\times 0.91=0.8281$ .

[18] (a) The abbreviations and are used for right-handed and left-handed, respectively. Since each are independent, we apply the Multiplication Rule for independent processes: Échec de l’analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{all five are \resp{RH}}) &= P(\text{first = \resp{RH}, second = \resp{RH}, ..., fifth = \resp{RH}}) \\ &= P(\text{first = \resp{RH}})\times P(\text{second = \resp{RH}})\times \dots \times P(\text{fifth = \resp{RH}}) \\ &= 0.91\times 0.91\times 0.91\times 0.91\times 0.91 = 0.624\end{aligned}} (b) Using the same reasoning as in (a), $0.09\times 0.09\times 0.09\times 0.09\times 0.09=0.0000059$ (c) Use the complement, $P($ all five are $)$ , to answer this question: Échec de l’analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{not all \resp{RH}}) = 1 - P(\text{all \resp{RH}}) = 1 - 0.624 = 0.376\end{aligned}}

[19] The actual proportion of the U.S. population that is is about 50%, and so we use 0.5 for the probability of sampling a woman. However, this probability does differ in other countries.

[20] Brief answers are provided. (a) This can be written in probability notation as $P($ a randomly selected person is male and right-handed $)=0.455$ . (b) 0.207. (c) 0.045. (d) 0.0093.

[21] A simulated data set based on real population summaries at .

[22] Each of the four outcome combination are disjoint, all probabilities are indeed non-negative, and the sum of the probabilities is $0.29+0.06+0.27+0.38=1.00$ .

[23] (a) Échec de l’analyse (erreur de syntaxe): {\displaystyle P(\text{\var{parents} \resp{not}}\ |\ \text{\var{teen} \resp{not}})} . (b) Equation ([condProbEq]) for conditional probability indicates we should first find Échec de l’analyse (erreur de syntaxe): {\displaystyle P(\text{\var{parents} \resp{not} and \var{teen} \resp{not}}) = 0.38} and Échec de l’analyse (erreur de syntaxe): {\displaystyle P(\text{\var{teen} \resp{not}}) = 0.44} . Then the ratio represents the conditional probability: $0.38/0.44=0.864$ .

[24] (a) This probability is Échec de l’analyse (erreur de syntaxe): {\displaystyle \frac{P(\text{\var{parents} \resp{degree}, \var{teen} \resp{not}})}{P(\text{\var{teen} \resp{not}})} = \frac{0.06}{0.44} = 0.136} . (b) The total equals 1. (c) Under the condition the teenager didn’t attend college, the parents must either have a college degree or not. The complement still works for conditional probabilities, provided the probabilities are conditioned on the same information.

[25] No. While there is an association, the data are observational. Two potential confounding variables include and . Can you think of others?

[26] Fenner F. 1988. Smallpox and Its Eradication (History of International Public Health, No. 6). Geneva: World Health Organization. ISBN 92-4-156110-6.

[27] $P($ = $|$ = Échec de l’analyse (erreur de syntaxe): {\displaystyle ) = \frac{P(\text{\var{result} = \resp{died} and \var{inoculated} = \resp{no}})}{P(\text{\var{inoculated} = \resp{no}})} = \frac{0.1356}{0.9608} = 0.1411} .

[28] $P($ = $|$ = Échec de l’analyse (erreur de syntaxe): {\displaystyle ) = \frac{P(\text{\var{result} = \resp{died} and \var{inoculated} = \resp{yes}})}{P(\text{\var{inoculated} = \resp{yes}})} = \frac{0.0010}{0.0392} = 0.0255} . The death rate for individuals who were inoculated is only about 1 in 40 while the death rate is about 1 in 7 for those who were not inoculated.

[29] Brief answers: (a) Observational. (b) No, we cannot infer causation from this observational study. (c) Accessibility to the latest and best medical care. There are other valid answers for part (c).

[30] The answer is 0.0382, which can be verified using Table [smallpoxProbabilityTable].

[31] There were only two possible outcomes: or . This means that 100% - 97.45% = 2.55% of the people who were inoculated died.

[32] The samples are large relative to the difference in death rates for the “inoculated” and “not inoculated” groups, so it seems there is an association between and . However, as noted in the solution to Guided Practice [SmallpoxInoculationObsExpExercise], this is an observational study and we cannot be sure if there is a causal connection. (Further research has shown that inoculation is effective at reducing death rates.)

[33] Brief solutions: (a) $1/6$ . (b) $1/36$ . (c) Échec de l’analyse (erreur de syntaxe): {\displaystyle \frac{P(Y = \text{ \resp{1} and }X=\text{ \resp{1}})}{P(X=\text{ \resp{1}})} = \frac{1/36}{1/6} = 1/6} . (d) The probability is the same as in part (c): $P(Y=1)=1/6$ . The probability that $Y=1$ was unchanged by knowledge about $X$ , which makes sense as $X$ and $Y$ are independent.

[34] He has forgotten that the next roulette spin is independent of the previous spins. Casinos do employ this practice; they post the last several outcomes of many betting games to trick unsuspecting gamblers into believing the odds are in their favor. This is called the .

[35] 0.47 (a) The tree diagram is shown to the right. (b) Identify which two joint probabilities represent students who passed, and add them: $P($ passed $)=0.7566+0.1254=0.8820$ . (c) $P($ construct tree diagram $|$ passed $)={\frac {0.7566}{0.8820}}=0.8578$ .
0.5 image

[36] The probabilities reported here were obtained using studies reported at and .

[37] 0.47 The tree diagram, with three primary branches, is shown to the right. Next, we identify two probabilities from the tree diagram. (1) The probability that there is a sporting event and the garage is full: 0.14. (2) The probability the garage is full: $0.0875+0.14+0.0225=0.25$ . Then the solution is the ratio of these probabilities: ${\frac {0.14}{0.25}}=0.56$ . If the garage is full, there is a 56% probability that there is a sporting event.
0.5 image

[38] Short answer: ${\begin{aligned}P(A_{2}|B)&={\frac {P(B|A_{2})P(A_{2})}{P(B|A_{1})P(A_{1})+P(B|A_{2})P(A_{2})+P(B|A_{3})P(A_{3})}}\\&={\frac {(0.25)(0.35)}{(0.7)(0.2)+(0.25)(0.35)+(0.05)(0.45)}}\\&=0.35\end{aligned}}$

[39] Each probability is conditioned on the same information that the garage is full, so the complement may be used: $1.00-0.56-0.35=0.09$ .

[40] The three probabilities we computed were actually one marginal probability, $P($ $=$ $)$ , and two conditional probabilities: Échec de l’analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} &&P(\text{\var{Q2}} = \text{\resp{not\_\hspace{0.3mm}picked} }|\text{ \var{Q1}} = \text{\resp{not\_\hspace{0.3mm}picked}}) \\ &&P(\text{\var{Q3}} = \text{\resp{not\_\hspace{0.3mm}picked} }|\text{ \var{Q1}} = \text{\resp{not\_\hspace{0.3mm}picked}, }\text{\var{Q2}} = \text{\resp{not\_\hspace{0.3mm}picked}})\end{aligned}} Using the General Multiplication Rule, the product of these three probabilities is the probability of not being picked in 3 questions.

[41] $P($ being picked to answer all three questions $)=\left({\frac {1}{15}}\right)^{3}=0.00030$ .

[42] (a) First determine the probability of not winning. The tickets are sampled without replacement, which means the probability you do not win on the first draw is $29/30$ , $28/29$ for the second, ..., and $23/24$ for the seventh. The probability you win no prize is the product of these separate probabilities: $23/30$ . That is, the probability of winning a prize is $1-23/30=7/30=0.233$ . (b) When the tickets are sampled with replacement, there are seven independent draws. Again we first find the probability of not winning a prize: $(29/30)^{7}=0.789$ . Thus, the probability of winning (at least) one prize when drawing with replacement is 0.211.

[43] There is about a 10% larger chance of winning a prize when using sampling without replacement. However, at most one prize may be won under this sampling procedure.

[44] If they sell a little more or a little less, this should not be a surprise. Hopefully Chapter [introductionToData] helped make clear that there is natural variability in observed data. For example, if we would flip a coin 100 times, it will not usually come up heads exactly half the time, but it will probably be close.

[45] $\mu =\int xf(x)dx$ where $f(x)$ represents a function for the density curve.

[46] (a) 100% - 25% - 60% = 15% of students do not buy any books for the class. Part (b) is represented by the first two lines in the table below. The expectation for part (c) is given as the total on the line $y_{i}\times P(Y=y_{i})$ . The result of part (d) is the square-root of the variance listed on in the total on the last line: $\sigma ={\sqrt {Var(Y)}}=\$69.28$ .

$i$ (scenario) 1 () 2 () 3 () Total

$y_{i}$ 0.00 159.00 200.00

$P(Y=y_{i})$ 0.15 0.25 0.60

$y_{i}\times P(Y=y_{i})$ 0.00 39.75 120.00 $E(Y)=159.75$

$y_{i}-E(Y)$ -159.75 -0.75 40.25

$(y_{i}-E(Y))^{2}$ 25520.06 0.56 1620.06

$(y_{i}-E(Y))^{2}\times P(Y)$ 3828.0 0.1 972.0 $Var(Y)\approx 4800$

[47] She will make $X$ dollars on the TV but spend $Y$ dollars on the toaster oven: $X-Y$ .

[48] $E(X-Y)=E(X)-E(Y)=175-23=\$152$ . She should expect to make about $152.

[49] No, since there is probably some variability. For example, the traffic will vary from one day to next, and auction prices will vary depending on the quality of the merchandise and the interest of the attendees.

[50] If $X$ and $Y$ are random variables, consider the following combinations: $X^{1+Y}$ , $X\times Y$ , $X/Y$ . In such cases, plugging in the average value for each random variable and computing the result will not generally lead to an accurate average value for the end result.

[51] $E(\$6000\times X+\$2000\times Y)=\$6000\times 0.021+\$2000\times 0.004=\$134$ .

[52] No. While stocks tend to rise over time, they are often volatile in the short term.

[53] One concern is whether traffic patterns tend to have a weekly cycle (e.g. Fridays may be worse than other days). If that is the case, and John drives, then the assumption is probably not reasonable. However, if John walks to work, then his commute is probably not affected by any weekly traffic cycle.

[54] The equation for Elena can be written as ${\begin{aligned}(1)\times X+(-1)\times Y\end{aligned}}$ The variances of $X$ and $Y$ are 625 and 64. We square the coefficients and plug in the variances: ${\begin{aligned}(1)^{2}\times Var(X)+(-1)^{2}\times Var(Y)=1\times 625+1\times 64=689\end{aligned}}$ The variance of the linear combination is 689, and the standard deviation is the square root of 689: about $26.25.

[55] This sample can be considered a simple random sample from the US population. It relies on the USDA Food Commodity Intake Database.

[56] Brief answers: (a) $0.1157\times 0.1157\times 0.1157=0.0015$ . (b) $(1-0.1157)^{3}=0.692$

[57] This has positive probability. Anyone between cm and cm will have a measured height of cm. This is probably a more realistic scenario to encounter in practice versus Example [probabilityOfExactly180cm].

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

[47]

[48]

[49]

[50]

[51]

[52]

[53]

[54]

[55]

[56]

[57]