Probability (special topic)

From Baripedia

Probability (special topic)[edit | edit source]

Probability forms a foundation for statistics. You might already be familiar with many aspects of probability, however, formalization of the concepts is new for most. This chapter aims to introduce probability on familiar terms using processes most people have seen before.

Defining probability (special topic)[edit | edit source]

A “die”, the singular of dice, is a cube with six faces numbered , , , , , and . What is the chance of getting when rolling a die?[probOf1] If the die is fair, then the chance of a is as good as the chance of any other number. Since there are six outcomes, the chance must be 1-in-6 or, equivalently, .

What is the chance of getting a or in the next roll?[probOf1Or2] and constitute two of the six equally likely possible outcomes, so the chance of getting one of these two outcomes must be .

What is the chance of getting either , , , , , or on the next roll?[probOf123456] 100%. The outcome must be one of these numbers.

What is the chance of not rolling a ?[probNot2] Since the chance of rolling a is or , the chance of not rolling a must be or .

Alternatively, we could have noticed that not rolling a is the same as getting a , , , , or , which makes up five of the six equally likely outcomes and has probability .

Consider rolling two dice. If of the time the first die is a and of those times the second die is a , what is the chance of getting two s?[probOf2Ones] If % of the time the first die is a and of those times the second die is also a , then the chance that both dice are is or .

Probability[edit | edit source]

We use probability to build tools to describe and understand apparent randomness. We often frame probability in terms of a giving rise to an .

Roll a die , , , , , or
Flip a coin or

Rolling a die or flipping a coin is a seemingly random process and each gives rise to an outcome.

The of an outcome is the proportion of times the outcome would occur if we observed the random process an infinite number of times.

Probability is defined as a proportion, and it always takes values between 0 and 1 (inclusively). It may also be displayed as a percentage between 0% and 100%.

Probability can be illustrated by rolling a die many times. Let be the proportion of outcomes that are after the first rolls. As the number of rolls increases, will converge to the probability of rolling a , . Figure [dieProp] shows this convergence for 100,000 die rolls. The tendency of to stabilize around is described by the .

Fichier:Ch probability/figures/dieProp/dieProp
caption The fraction of die rolls that are 1 at each stage in a simulation. The proportion tends to get closer to the probability as the number of rolls increases.

As more observations are collected, the proportion of occurrences with a particular outcome converges to the probability of that outcome.

Occasionally the proportion will veer off from the probability and appear to defy the Law of Large Numbers, as does many times in Figure [dieProp]. However, these deviations become smaller as the number of rolls increases.

Above we write as the probability of rolling a . We can also write this probability as Échec d'analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{rolling a \resp{1}})\end{aligned}} As we become more comfortable with this notation, we will abbreviate it further. For instance, if it is clear that the process is “rolling a die”, we could abbreviate rolling a as .

[randomProcessExercise] Random processes include rolling a die and flipping a coin. (a) Think of another random process. (b) Describe all the possible outcomes of that process. For instance, rolling a die is a random process with possible outcomes .[1]

What we think of as random processes are not necessarily random, but they may just be too difficult to understand exactly. The fourth example in the footnote solution to Guided Practice [randomProcessExercise] suggests a roommate’s behavior is a random process. However, even if a roommate’s behavior is not truly random, modeling her behavior as a random process can still be useful.

It can be helpful to model a process as random even if it is not truly random.

Disjoint or mutually exclusive outcomes[edit | edit source]

Two outcomes are called or if they cannot both happen. For instance, if we roll a die, the outcomes and are disjoint since they cannot both occur. On the other hand, the outcomes and “rolling an odd number” are not disjoint since both occur if the outcome of the roll is a . The terms disjoint and mutually exclusive are equivalent and interchangeable.

Calculating the probability of disjoint outcomes is easy. When rolling a die, the outcomes and are disjoint, and we compute the probability that one of these outcomes will occur by adding their separate probabilities: Échec d'analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{\resp{1} or \resp{2}}) = P(\text{\resp{1}})+P(\text{\resp{2}}) = 1/6 + 1/6 = 1/3\end{aligned}} What about the probability of rolling a , , , , , or ? Here again, all of the outcomes are disjoint so we add the probabilities: Échec d'analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} &&P(\text{\resp{1} or \resp{2} or \resp{3} or \resp{4} or \resp{5} or \resp{6}}) \\ &&\quad= P(\text{\resp{1}})+P(\text{\resp{2}})+P(\text{\resp{3}})+P(\text{\resp{4}})+P(\text{\resp{5}})+P(\text{\resp{6}}) \\ &&\quad= 1/6 + 1/6 + 1/6 + 1/6 + 1/6 + 1/6 = 1.\end{aligned}} The guarantees the accuracy of this approach when the outcomes are disjoint.

If and represent two disjoint outcomes, then the probability that one of them occurs is given by If there are many disjoint outcomes , ..., , then the probability that one of these outcomes will occur is

We are interested in the probability of rolling a , , or . (a) Explain why the outcomes , , and are disjoint. (b) Apply the Addition Rule for disjoint outcomes to determine or or .[2]

In the data set in Chapter [introductionToData], the variable described whether no number (labeled ), only one or more small numbers (), or whether at least one big number appeared in an email (). Of the 3,921 emails, 549 had no numbers, 2,827 had only one or more small numbers, and 545 had at least one big number. (a) Are the outcomes , , and disjoint? (b) Determine the proportion of emails with value and separately. (c) Use the Addition Rule for disjoint outcomes to compute the probability a randomly selected email from the data set has a number in it, small or big.[3]

Statisticians rarely work with individual outcomes and instead consider or of outcomes. Let represent the event where a die roll results in or and  represent the event that the die roll is a or a . We write as the set of outcomes and , . These sets are commonly called . Because and have no elements in common, they are disjoint events. and are represented in Figure [disjointSets].

Fichier:Ch probability/figures/disjointSets/disjointSets
caption Three events, , , and , consist of outcomes from rolling a die. and are disjoint since they do not have any outcomes in common.

The Addition Rule applies to both disjoint outcomes and disjoint events. The probability that one of the disjoint events or occurs is the sum of the separate probabilities:

(a) Verify the probability of event , , is using the Addition Rule. (b) Do the same for event .[4]

[exerExaminingDisjointSetsABD] (a) Using Figure [disjointSets] as a reference, what outcomes are represented by event ? (b) Are events and disjoint? (c) Are events and disjoint?[5]

In Guided Practice [exerExaminingDisjointSetsABD], you confirmed and from Figure [disjointSets] are disjoint. Compute the probability that event or event  occurs.[6]

Probabilities when events are not disjoint[edit | edit source]

Let’s consider calculations for two events that are not disjoint in the context of a , represented in Table [deckOfCards]. If you are unfamiliar with the cards in a regular deck, please see the footnote.[7]

Representations of the 52 unique cards in a deck.

(a) What is the probability that a randomly selected card is a diamond? (b) What is the probability that a randomly selected card is a face card?[8]

are useful when outcomes can be categorized as “in” or “out” for two or three variables, attributes, or random processes. The Venn diagram in Figure [cardsDiamondFaceVenn] uses a circle to represent diamonds and another to represent face cards. If a card is both a diamond and a face card, it falls into the intersection of the circles. If it is a diamond but not a face card, it will be in part of the left circle that is not in the right circle (and so on). The total number of cards that are diamonds is given by the total number of cards in the diamonds circle: . The probabilities are also shown (e.g. ).

Fichier:Ch probability/figures/cardsDiamondFaceVenn/cardsDiamondFaceVenn
caption A Venn diagram for diamonds and face cards.

Let represent the event that a randomly selected card is a diamond and represent the event that it is a face card. How do we compute or ? Events and are not disjoint – the cards , , and fall into both categories – so we cannot use the Addition Rule for disjoint events. Instead we use the Venn diagram. We start by adding the probabilities of the two events: Échec d'analyse (fonction inconnue « \label »): {\displaystyle \begin{aligned} P(A) + P(B) = P({\color{redcards}\diamondsuit}) + P(\text{face card}) = 13/52 + 12/52 \label{overCountFaceDiamond}\end{aligned}} However, the three cards that are in both events were counted twice, once in each probability. We must correct this double counting: Échec d'analyse (fonction inconnue « \notag »): {\displaystyle \begin{aligned} P(A\text{ or } B) &=&P({\color{redcards}\diamondsuit}\text{ or face card}) \notag \\ &=& P({\color{redcards}\diamondsuit}) + P(\text{face card}) - P({\color{redcards}\diamondsuit}\text{ and face card}) \label{diamondFace} \\ &=& 13/52 + 12/52 - 3/52 \notag \\ &=& 22/52 = 11/26 \notag\end{aligned}} Equation ([diamondFace]) is an example of the .

If and are any two events, disjoint or not, then the probability that at least one of them will occur is Échec d'analyse (fonction inconnue « \label »): {\displaystyle \begin{aligned} P(A\text{ or }B) = P(A) + P(B) - P(A\text{ and }B) \label{generalAdditionRule}\end{aligned}} where and is the probability that both events occur.

When we write “or” in statistics, we mean “and/or” unless we explicitly state otherwise. Thus, or occurs means , , or both and occur.

(a) If and are disjoint, describe why this implies and . (b) Using part (a), verify that the General Addition Rule simplifies to the simpler Addition Rule for disjoint events if and are disjoint.[9]

[emailSpamNumberVennExer] In the data set with 3,921 emails, 367 were spam, 2,827 contained some small numbers but no big numbers, and 168 had both characteristics. Create a Venn diagram for this setup.[10]

(a) Use your Venn diagram from Guided Practice [emailSpamNumberVennExer] to determine the probability a randomly drawn email from the data set is spam and had small numbers (but not big numbers). (b) What is the probability that the email had either of these attributes?[11]

Probability distributions[edit | edit source]

A is a table of all disjoint outcomes and their associated probabilities. Table [diceProb] shows the probability distribution for the sum of two dice.

l ccc ccc ccc cc  

Dice sum

& 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 & 12
Probability & & & & & & & & & & &

A probability distribution is a list of the possible outcomes with corresponding probabilities that satisfies three rules:

  1. The outcomes listed must be disjoint.
  2. Each probability must be between 0 and 1.
  3. The probabilities must total 1.

[usHouseholdIncomeDistsExercise] Table [usHouseholdIncomeDists] suggests three distributions for household income in the United States. Only one is correct. Which one must it be? What is wrong with the other two?[12]

r | rr rr Income range ($1000s) & 0-25 & 25-50 & 50-100 & 100+

& 0.18 & 0.39 & 0.33 & 0.16
(b) & 0.38 & -0.27 & 0.52 & 0.37

& 0.28 & 0.27 & 0.29 & 0.16

Chapter [introductionToData] emphasized the importance of plotting data to provide quick summaries. Probability distributions can also be summarized in a bar plot. For instance, the distribution of US household incomes is shown in Figure [usHouseholdIncomeDistBar] as a bar plot. The probability distribution for the sum of two dice is shown in Table [diceProb] and plotted in Figure [diceSumDist].

Fichier:Ch probability/figures/usHouseholdIncomeDistBar/usHouseholdIncomeDistBar
caption The probability distribution of US household income.
Fichier:Ch probability/figures/diceSumDist/diceSumDist
caption The probability distribution of the sum of two dice.

In these bar plots, the bar heights represent the probabilities of outcomes. If the outcomes are numerical and discrete, it is usually (visually) convenient to make a bar plot that resembles a histogram, as in the case of the sum of two dice. Another example of plotting the bars at their respective locations is shown in Figure [bookCostDist] on page .

Complement of an event[edit | edit source]

Rolling a die produces a value in the set , , , , , . This set of all possible outcomes is called the () for rolling a die. We often use the sample space to examine the scenario where an event does not occur.

Let , represent the event that the outcome of a die roll is or . Then the of represents all outcomes in our sample space that are not in , which is denoted by , , , . That is, is the set of all possible outcomes not already included in . Figure [complementOfD] shows the relationship between , , and the sample space .

Fichier:Ch probability/figures/complementOfD/complementOfD
caption Event , and its complement, , , , .  represents the sample space, which is the set of all possible events.

(a) Compute rolling a , , , or . (b) What is ?[13]

Events , and , are shown in Figure [disjointSets] on page . (a) Write out what and represent. (b) Compute and . (c) Compute and .[14]

A complement of an event is constructed to have two very important properties: (i) every possible outcome not in is in , and (ii) and are disjoint. Property (i) implies Échec d'analyse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Réponse invalide(« Math extension cannot connect to Restbase. ») du serveur « » :): {\displaystyle \begin{aligned} P(A\text{ or }A^c) = 1 \label{complementSumTo1}\end{aligned}} That is, if the outcome is not in , it must be represented in . We use the Addition Rule for disjoint events to apply Property (ii): Échec d'analyse (fonction inconnue « \label »): {\displaystyle \begin{aligned} P(A\text{ or }A^c) = P(A) + P(A^c) \label{complementDisjointEquation}\end{aligned}} Combining Equations ([complementSumTo1]) and ([complementDisjointEquation]) yields a very useful relationship between the probability of an event and its complement.

The complement of event is denoted , and represents all outcomes not in . and are mathematically related:

Échec d'analyse (fonction inconnue « \label »): {\displaystyle \begin{aligned} \label{complement} P(A) + P(A^c) = 1, \quad\text{i.e.}\quad P(A) = 1-P(A^c)\end{aligned}}

In simple examples, computing or is feasible in a few steps. However, using the complement can save a lot of time as problems grow in complexity.

Let represent the event where we roll two dice and their total is less than . (a) What does the event represent? (b) Determine from Table [diceProb] on page . (c) Determine .[15]

Consider again the probabilities from Table [diceProb] and rolling two dice. Find the following probabilities: (a) The sum of the dice is not . (b) The sum is at least . That is, determine the probability of the event , , ..., . (c) The sum is no more than . That is, determine the probability of the event , , ..., .[16]

Independence[edit | edit source]

Just as variables and observations can be independent, random processes can be independent, too. Two processes are if knowing the outcome of one provides no useful information about the outcome of the other. For instance, flipping a coin and rolling a die are two independent processes – knowing the coin was heads does not help determine the outcome of a die roll. On the other hand, stock prices usually move up or down together, so they are not independent.

Example [probOf2Ones] provides a basic example of two independent processes: rolling two dice. We want to determine the probability that both will be . Suppose one of the dice is red and the other white. If the outcome of the red die is a , it provides no information about the outcome of the white die. We first encountered this same question in Example [probOf2Ones] (page ), where we calculated the probability using the following reasoning: of the time the red die is a , and of those times the white die will also be . This is illustrated in Figure [indepForRollingTwo1s]. Because the rolls are independent, the probabilities of the corresponding outcomes can be multiplied to get the final answer: . This can be generalized to many independent processes.

Fichier:Ch probability/figures/indepForRollingTwo1s/indepForRollingTwo1s
caption of the time, the first roll is a . Then of those times, the second roll will also be a .

What if there was also a blue die independent of the other two? What is the probability of rolling the three dice and getting all s?[threeDice] The same logic applies from Example [probOf2Ones]. If of the time the white and red dice are both , then of those times the blue die will also be , so multiply: Échec d'analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(white=\text{\small\resp{1} and } red=\text{\small\resp{1} and } blue=\text{\small\resp{1}}) &= P(white=\text{\small\resp{1}})\times P(red=\text{\small\resp{1}})\times P(blue=\text{\small\resp{1}}) \\ &= (1/6)\times (1/6)\times (1/6) = 1/216\end{aligned}}

Example [threeDice] illustrates what is called the Multiplication Rule for independent processes.

If and represent events from two different and independent processes, then the probability that both and occur can be calculated as the product of their separate probabilities:

Échec d'analyse (fonction inconnue « \label »): {\displaystyle \begin{aligned} \label{eqForIndependentEvents} P(A \text{ and }B) = P(A) \times P(B)\end{aligned}}

Similarly, if there are events , ..., from independent processes, then the probability they all occur is

[ex2Handedness] About 9% of people are left-handed. Suppose 2 people are selected at random from the U.S. population. Because the sample size of 2 is very small relative to the population, it is reasonable to assume these two people are independent. (a) What is the probability that both are left-handed? (b) What is the probability that both are right-handed?[17]

[ex5Handedness] Suppose 5 people are selected at random.[18]

  1. What is the probability that all are right-handed?
  2. What is the probability that all are left-handed?
  3. What is the probability that not all of the people are right-handed?

Suppose the variables and are independent, i.e. knowing someone’s provides no useful information about their and vice-versa. Then we can compute whether a randomly selected person is right-handed and female[19] using the Multiplication Rule:

Three people are selected at random.[20]

  1. What is the probability that the first person is male and right-handed?
  2. What is the probability that the first two people are male and right-handed?.
  3. What is the probability that the third person is female and left-handed?
  4. What is the probability that the first two people are male and right-handed and the third person is female and left-handed?

Sometimes we wonder if one outcome provides useful information about another outcome. The question we are asking is, are the occurrences of the two events independent? We say that two events and are independent if they satisfy Equation .

If we shuffle up a deck of cards and draw one, is the event that the card is a heart independent of the event that the card is an ace? The probability the card is a heart is and the probability that it is an ace is . The probability the card is the ace of hearts is . We check whether Equation [eqForIndependentEvents] is satisfied: Because the equation holds, the event that the card is a heart and the event that the card is an ace are independent events.

Conditional probability (special topic)[edit | edit source]

The data set contains a sample of 792 cases with two variables, and , and is summarized in Table [contTableOfParStCollege].[21] The variable is either or , where the label means the teen went to college immediately after high school. The  variable takes the value if at least one parent of the teenager completed a college degree.

ll rr r rr && &

&& & & Total
& & 231 & 214 & 445
[0pt] &

& 49 & 298 & 347
& Total & 280 & 512 & 792

Fichier:Ch probability/figures/familyCollegeVenn/familyCollegeVenn
caption A Venn diagram using boxes for the data set.

If at least one parent of a teenager completed a college degree, what is the chance the teenager attended college right after high school? We can estimate this probability using the data. Of the 280 cases in this data set where takes value , 231 represent cases where the variable takes value : Échec d'analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{\var{teen} \resp{college} given \var{parents} \resp{degree}}) = \frac{231}{280} = 0.825\end{aligned}}

A teenager is randomly selected from the sample and she did not attend college right after high school. What is the probability that at least one of her parents has a college degree?[collegeProbOfParentsGivenStudentNot] If the teenager did not attend, then she is one of the 347 teens in the second row. Of these 347 teens, 49 had at least one parent who got a college degree: Échec d'analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{\var{parents} \resp{degree} given \var{teen} \resp{not}}) = \frac{49}{347} = 0.141\end{aligned}}

Marginal and joint probabilities[edit | edit source]

Table [contTableOfParStCollege] includes row and column totals for each variable separately in the data set. These totals represent for the sample, which are the probabilities based on a single variable without regard to any other variables. For instance, a probability based solely on the variable is a marginal probability: Échec d'analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{\var{teen} \resp{college}}) = \frac{445}{792} = 0.56\end{aligned}} A probability of outcomes for two or more variables or processes is called a : Échec d'analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{\var{teen} \resp{college} and \var{parents} \resp{not}}) = \frac{214}{792} = 0.27\end{aligned}} It is common to substitute a comma for “and” in a joint probability, although either is acceptable. That is,

Échec d'analyse (erreur de syntaxe): {\displaystyle P(\text{\var{teen} \resp{college}, \var{parents} \resp{not}})}
means the same thing as
Échec d'analyse (erreur de syntaxe): {\displaystyle P(\text{\var{teen} \resp{college} and \var{parents} \resp{not}})}

If a probability is based on a single variable, it is a '. The probability of outcomes for two or more variables or processes is called a '.

We use to summarize joint probabilities for the sample. These proportions are computed by dividing each count in Table [contTableOfParStCollege] by the table’s total, 792, to obtain the proportions in Table [familyCollegeProbTable]. The joint probability distribution of the and variables is shown in Table [familyCollegeDistribution].

l rr r & : & : & Total

& 0.29 & 0.27 & 0.56

& 0.06 & 0.38 & 0.44
Total & 0.35 & 0.65 & 1.00

Joint probability distribution for the data set.
Joint outcome Probability
and 0.29
and 0.06
and 0.27
and 0.38
Total 1.00

Verify Table [familyCollegeDistribution] represents a probability distribution: events are disjoint, all probabilities are non-negative, and the probabilities sum to 1.[22]

We can compute marginal probabilities using joint probabilities in simple cases. For example, the probability a random teenager from the study went to college is found by summing the outcomes where takes value : Échec d'analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{\underline{\color{black}\var{teen} \resp{college}}}) &= P(\text{\var{parents} \resp{degree} and \underline{\color{black}\var{teen} \resp{college}}}) \\ & \quad \quad + P(\text{\var{parents} \resp{not} and \underline{\color{black}\var{teen} \resp{college}}}) \\ &= 0.29 + 0.27 \\ &= 0.56\end{aligned}}

Defining conditional probability[edit | edit source]

There is some connection between education level of parents and of the teenager: a college degree by a parent is associated with college attendance of the teenager. In this section, we discuss how to use information about associations between two variables to improve probability estimation.

The probability that a random teenager from the study attended college is 0.56. Could we update this probability if we knew that one of the teen’s parents has a college degree? Absolutely. To do so, we limit our view to only those 280 cases where a parent has a college degree and look at the fraction where the teenager attended college: Échec d'analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{\var{teen} \resp{college} given \var{parents} \resp{degree}}) = \frac{231}{280} = 0.825\end{aligned}} We call this a because we computed the probability under a condition: a parent has a college degree. There are two parts to a conditional probability, the and the . It is useful to think of the condition as information we know to be true, and this information usually can be described as a known outcome or event.

We separate the text inside our probability notation into the outcome of interest and the condition: Échec d'analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} && P(\text{\var{teen} \resp{college} given \var{parents} \resp{degree}}) \notag \\ && = P(\text{\var{teen} \resp{college}}\ |\ \text{\var{parents} \resp{degree}}) = \frac{231}{280} = 0.825 \label{probStudentUsedIfParentsUsedInFormalNotation}\end{aligned}} The vertical bar “” is read as given.

In Equation , we computed the probability a teen attended college based on the condition that at least one parent has a college degree as a fraction: Échec d'analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} && P(\text{\var{teen} \resp{college}}\ |\ \text{\var{parents} \resp{degree}}) \notag \\ &&\quad = \frac{\text{\# cases where \var{teen} \resp{college} and \var{parents} \resp{degree}}}{\text{\# cases where \var{parents} \resp{degree}}} \label{ratioOfBothToRatioOfConditionalForParentsAndStudent} \\ &&\quad = \frac{231}{280} = 0.825 \notag\end{aligned}} We considered only those cases that met the condition, , and then we computed the ratio of those cases that satisfied our outcome of interest, the teenager attended college.

Frequently, marginal and joint probabilities are provided instead of count data. For example, disease rates are commonly listed in percentages rather than in a count format. We would like to be able to compute conditional probabilities even when no counts are available, and we use Equation  as a template to understand this technique.

We considered only those cases that satisfied the condition, . Of these cases, the conditional probability was the fraction who represented the outcome of interest, . Suppose we were provided only the information in Table [familyCollegeProbTable], i.e. only probability data. Then if we took a sample of 1000 people, we would anticipate about 35% or would meet the information criterion ( ). Similarly, we would expect about 29% or to meet both the information criteria and represent our outcome of interest. Then the conditional probability can be computed as Échec d'analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} &P(\text{\var{teen} \resp{college}}\ |\ \text{\var{parents} \resp{degree}}) \notag \\ &= \frac{\text{\# (\var{teen} \resp{college} and \var{parents} \resp{degree})}}{\text{\# (\var{parents} \resp{degree})}} \notag \\ &= \frac{290}{350} = \frac{0.29}{0.35} = 0.829\quad\text{(different from 0.825 due to rounding error)} \label{stUserPUsedHypSampSize}\end{aligned}} In Equation ([stUserPUsedHypSampSize]), we examine exactly the fraction of two probabilities, 0.29 and 0.35, which we can write as Échec d'analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{\var{teen} \resp{college} and \var{parents} \resp{degree}}) \quad\text{and}\quad P(\text{\var{parents} \resp{degree}}).\end{aligned}} The fraction of these probabilities is an example of the general formula for conditional probability.

The conditional probability of the outcome of interest given condition is computed as the following: Échec d'analyse (fonction inconnue « \label »): {\displaystyle \begin{aligned} P(A | B) = \frac{P(A\text{ and }B)}{P(B)} \label{condProbEq}\end{aligned}}

[familyCollegeProbOfParentsEqualNotGivenTeen] (a) Write out the following statement in conditional probability notation: “The probability a random case where neither parent has a college degree if it is known that the teenager didn’t attend college right after high school”. Notice that the condition is now based on the teenager, not the parent.
(b) Determine the probability from part (a). Table  may be helpful.[23]

[whyCondProbSumTo1] (a) Determine the probability that one of the parents has a college degree if it is known the teenager did not attend college.
(b) Using the answers from part (a) and Guided Practice [familyCollegeProbOfParentsEqualNotGivenTeen](b), compute
Échec d'analyse (erreur de syntaxe): {\displaystyle P(\text{\var{parents} \resp{degree}}\ |\ \text{\var{teen} \resp{not}}) \ + \ P(\text{\var{parents} \resp{not}}\ |\ \text{\var{teen} \resp{not}})}
(c) Provide an intuitive argument to explain why the sum in (b) is 1.[24]

The data indicate there is an association between parents having a college degree and their teenager attending college. Does this mean the parents’ college degree(s) caused the teenager to go to college?[25]

Smallpox in Boston, 1721[edit | edit source]

The data set provides a sample of 6,224 individuals from the year 1721 who were exposed to smallpox in Boston.[26] Doctors at the time believed that inoculation, which involves exposing a person to the disease in a controlled form, could reduce the likelihood of death.

Each case represents one person with two variables: and . The variable takes two levels: or , indicating whether the person was inoculated or not. The variable has outcomes or . These data are summarized in Tables [smallpoxContingencyTable] and [smallpoxProbabilityTable].

ll rr r & & &
& & & & Total
& & 238 & 5136 & 5374
[0pt] &

& 6 & 844 & 850
& Total & 244 & 5980 & 6224

ll rr r & & &
& & & & Total
& & 0.0382 & 0.8252 & 0.8634
[0pt] &

& 0.0010 & 0.1356 & 0.1366
& Total & 0.0392 & 0.9608 & 1.0000

[probDiedIfNotInoculated] Write out, in formal notation, the probability a randomly selected person who was not inoculated died from smallpox, and find this [27]

Determine the probability that an inoculated person died from smallpox. How does this result compare with the result of Guided Practice [probDiedIfNotInoculated]?[28]

[SmallpoxInoculationObsExpExercise] The people of Boston self-selected whether or not to be inoculated. (a) Is this study observational or was this an experiment? (b) Can we infer any causal connection using these data? (c) What are some potential confounding variables that might influence whether someone or and also affect whether that person was inoculated?[29]

General multiplication rule[edit | edit source]

Section [probabilityIndependence] introduced the Multiplication Rule for independent processes. Here we provide the for events that might not be independent.

If and represent two outcomes or events, then

It is useful to think of as the outcome of interest and as the condition.

This General Multiplication Rule is simply a rearrangement of the definition for conditional probability in Equation ([condProbEq]) on page .

Consider the data set. Suppose we are given only two pieces of information: 96.08% of residents were not inoculated, and 85.88% of the residents who were not inoculated ended up surviving. How could we compute the probability that a resident was not inoculated and lived? We will compute our answer using the General Multiplication Rule and then verify it using Table [smallpoxProbabilityTable]. We want to determine Échec d'analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{\var{result} = \resp{lived} and \var{inoculated} = \resp{no}})\end{aligned}} and we are given that Échec d'analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{\var{result} = \resp{lived} }|\text{ \var{inoculated} = \resp{no}})=0.8588 \\ P(\text{\var{inoculated} = \resp{no}})=0.9608\end{aligned}} Among the 96.08% of people who were not inoculated, 85.88% survived: Échec d'analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{\var{result} = \resp{lived} and \var{inoculated} = \resp{no}}) = 0.8588\times 0.9608 = 0.8251\end{aligned}} This is equivalent to the General Multiplication Rule. We can confirm this probability in Table [smallpoxProbabilityTable] at the intersection of and (with a small rounding error).

Use = and = = to determine the probability that a person was both inoculated and lived.[30]

If 97.54% of the people who were inoculated lived, what proportion of inoculated people must have died?[31]

Let , ..., represent all the disjoint outcomes for a variable or process. Then if is an event, possibly for another variable or process, we have:

The rule for complements also holds when an event and its complement are conditioned on the same information:

Based on the probabilities computed above, does it appear that inoculation is effective at reducing the risk of death from smallpox?[32]

Independence considerations in conditional probability[edit | edit source]

If two events are independent, then knowing the outcome of one should provide no information about the other. We can show this is mathematically true using conditional probabilities.

[condProbOfRollingA1AfterOne1] Let and represent the outcomes of rolling two dice.[33]

  1. What is the probability that the first die, , is ?

  2. What is the probability that both and are ?

  3. Use the formula for conditional probability to compute .

  4. What is ? Is this different from the answer from part (c)? Explain.

We can show in Guided Practice [condProbOfRollingA1AfterOne1](c) that the conditioning information has no influence by using the Multiplication Rule for independence processes: Échec d'analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(Y=\text{\resp{1}}\ |\ X=\text{\resp{1}}) &=& \frac{P(Y=\text{\resp{1} and }X=\text{\resp{1}})}{P(X=\text{\resp{1}})} \\ &=& \frac{P(Y=\text{\resp{1}})\times \color{oiGB}P(X=\text{\resp{1}})}{\color{oiGB}P(X=\text{\resp{1}})} \\ &=& P(Y=\text{\resp{1}}) \\\end{aligned}}

Ron is watching a roulette table in a casino and notices that the last five outcomes were . He figures that the chances of getting six times in a row is very small (about ) and puts his paycheck on red. What is wrong with his reasoning?[34]

Tree diagrams[edit | edit source]

are a tool to organize outcomes and probabilities around the structure of the data. They are most useful when two or more processes occur in a sequence and each process is conditioned on its predecessors.

The data fit this description. We see the population as split by : and . Following this split, survival rates were observed for each group. This structure is reflected in the shown in Figure [smallpoxTreeDiagram]. The first branch for is said to be the branch while the other branches are .

Tree diagrams are annotated with marginal and conditional probabilities, as shown in Figure [smallpoxTreeDiagram]. This tree diagram splits the smallpox data by into the and groups with respective marginal probabilities 0.0392 and 0.9608. The secondary branches are conditioned on the first, so we assign conditional probabilities to these branches. For example, the top branch in Figure [smallpoxTreeDiagram] is the probability that = conditioned on the information that = . We may (and usually do) construct joint probabilities at the end of each branch in our tree by multiplying the numbers we come across as we move from left to right. These joint probabilities are computed using the General Multiplication Rule: Échec d'analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} && P(\text{\var{inoculated} = \resp{yes} and \var{result} = \resp{lived}}) \\ &&\quad = P(\text{\var{inoculated} = \resp{yes}})\times P(\text{\var{result} = \resp{lived}}|\text{\var{inoculated} = \resp{yes}}) \\ &&\quad = 0.0392\times 0.9754=0.0382\end{aligned}}

Consider the midterm and final for a statistics class. Suppose 13% of students earned an on the midterm. Of those students who earned an on the midterm, 47% received an on the final, and 11% of the students who earned lower than an on the midterm received an on the final. You randomly pick up a final exam and notice the student received an . What is the probability that this student earned an on the midterm? [exerciseForTreeDiagramOfStudentGettingAOnMidtermGivenThatSheGotAOnFinal] The end-goal is to find Échec d'analyse (erreur de syntaxe): {\displaystyle P(\text{\var{midterm} = \resp{A}} | \text{\var{final} = \resp{A}})} . To calculate this conditional probability, we need the following probabilities: Échec d'analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{\var{midterm} = \resp{A} and \var{final} = \resp{A}}) \qquad\text{and}\qquad P(\text{\var{final} = \resp{A}})\end{aligned}} However, this information is not provided, and it is not obvious how to calculate these probabilities. Since we aren’t sure how to proceed, it is useful to organize the information into a tree diagram, as shown in Figure [testTree]. When constructing a tree diagram, variables provided with marginal probabilities are often used to create the tree’s primary branches; in this case, the marginal probabilities are provided for midterm grades. The final grades, which correspond to the conditional probabilities provided, will be shown on the secondary branches.

Fichier:Ch probability/figures/testTree/testTree
caption A tree diagram describing the and variables.

With the tree diagram constructed, we may compute the required probabilities: Échec d'analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} &&P(\text{\var{midterm} = \resp{A} and \var{final} = \resp{A}}) = 0.0611 \\ &&P(\text{\underline{\color{black}\var{final} = \resp{A}}}) \\ && \quad= P(\text{\var{midterm} = \resp{other} and \underline{\color{black}\var{final} = \resp{A}}}) + P(\text{\var{midterm} = \resp{A} and \underline{\color{black}\var{final} = \resp{A}}}) \\ && \quad= 0.0957 + 0.0611 = 0.1568\end{aligned}} The marginal probability, = , was calculated by adding up all the joint probabilities on the right side of the tree that correspond to = . We may now finally take the ratio of the two probabilities: Échec d'analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{\var{midterm} = \resp{A}} | \text{\var{final} = \resp{A}}) &=& \frac{P(\text{\var{midterm} = \resp{A} and \var{final} = \resp{A}})}{P(\text{\var{final} = \resp{A}})} \\ &=& \frac{0.0611}{0.1568} = 0.3897\end{aligned}} The probability the student also earned an A on the midterm is about 0.39.

After an introductory statistics course, 78% of students can successfully construct tree diagrams. Of those who can construct tree diagrams, 97% passed, while only 57% of those students who could not construct tree diagrams passed. (a) Organize this information into a tree diagram. (b) What is the probability that a randomly selected student passed? (c) Compute the probability a student is able to construct a tree diagram if it is known that she passed.[35]

Bayes’ Theorem[edit | edit source]

In many instances, we are given a conditional probability of the form but we would really like to know the inverted conditional probability: Tree diagrams can be used to find the second conditional probability when given the first. However, sometimes it is not possible to draw the scenario in a tree diagram. In these cases, we can apply a very useful and general formula: Bayes’ Theorem.

We first take a critical look at an example of inverting conditional probabilities where we still apply a tree diagram.

In Canada, about 0.35% of women over 40 will develop breast cancer in any given year. A common screening test for cancer is the mammogram, but this test is not perfect. In about 11% of patients with breast cancer, the test gives a : it indicates a woman does not have breast cancer when she does have breast cancer. Similarly, the test gives a in 7% of patients who do not have breast cancer: it indicates these patients have breast cancer when they actually do not.[36] If we tested a random woman over 40 for breast cancer using a mammogram and the test came back positive – that is, the test suggested the patient has cancer – what is the probability that the patient actually has breast cancer?


Fichier:Ch probability/figures/BreastCancerTreeDiagram/BreastCancerTreeDiagram
caption Tree diagram for Example [probabilityOfBreastCancerGivenPositiveTestExample], computing the probability a random patient who tests positive on a mammogram actually has breast cancer.

Notice that we are given sufficient information to quickly compute the probability of testing positive if a woman has breast cancer (). However, we seek the inverted probability of cancer given a positive test result. (Watch out for the non-intuitive medical language: a positive test result suggests the possible presence of cancer in a mammogram screening.) This inverted probability may be broken into two pieces: Échec d'analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{has BC } | \text{ mammogram$^+$}) = \frac{P(\text{has BC and mammogram$^+$})}{P(\text{mammogram$^+$})}\end{aligned}} where “has BC” is an abbreviation for the patient actually having breast cancer and “mammogram” means the mammogram screening was positive. A tree diagram is useful for identifying each probability and is shown in Figure [BreastCancerTreeDiagram]. The probability the patient has breast cancer and the mammogram is positive is Échec d'analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{has BC and mammogram$^+$}) &= P(\text{mammogram$^+$ } | \text{ has BC})P(\text{has BC}) \\ &= 0.89\times 0.0035 = 0.00312\end{aligned}} The probability of a positive test result is the sum of the two corresponding scenarios: Échec d'analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{\underline{\color{black}mammogram$^+$}}) &= P(\text{\underline{\color{black}mammogram$^+$} and has BC}) + P(\text{\underline{\color{black}mammogram$^+$} and no BC}) \\ &= P(\text{has BC})P(\text{mammogram$^+$ } | \text{ has BC}) \\ &\qquad\qquad + P(\text{no BC})P(\text{mammogram$^+$ } | \text{ no BC}) \\ &= 0.0035\times 0.89 + 0.9965\times 0.07 = 0.07288\end{aligned}} Then if the mammogram screening is positive for a patient, the probability the patient has breast cancer is Échec d'analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{has BC } | \text{ mammogram$^+$}) &= \frac{P(\text{has BC and mammogram$^+$})}{P(\text{mammogram$^+$})}\\ &= \frac{0.00312}{0.07288} \approx 0.0428\end{aligned}} That is, even if a patient has a positive mammogram screening, there is still only a 4% chance that she has breast cancer.

Example [probabilityOfBreastCancerGivenPositiveTestExample] highlights why doctors often run more tests regardless of a first positive test result. When a medical condition is rare, a single positive test isn’t generally definitive.

Consider again the last equation of Example [probabilityOfBreastCancerGivenPositiveTestExample]. Using the tree diagram, we can see that the numerator (the top of the fraction) is equal to the following product: Échec d'analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{has BC and mammogram$^+$}) = P(\text{mammogram$^+$ } | \text{ has BC})P(\text{has BC})\end{aligned}} The denominator – the probability the screening was positive – is equal to the sum of probabilities for each positive screening scenario: Échec d'analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{\underline{\color{black}mammogram$^+$}}) &= P(\text{\underline{\color{black}mammogram$^+$} and no BC}) + P(\text{\underline{\color{black}mammogram$^+$} and has BC})\end{aligned}} In the example, each of the probabilities on the right side was broken down into a product of a conditional probability and marginal probability using the tree diagram. Échec d'analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{mammogram$^+$}) &= P(\text{mammogram$^+$ and no BC}) + P(\text{mammogram$^+$ and has BC}) \\ &= P(\text{mammogram$^+$ } | \text{ no BC})P(\text{no BC}) \\ &\qquad\qquad + P(\text{mammogram$^+$ } | \text{ has BC})P(\text{has BC})\end{aligned}} We can see an application of Bayes’ Theorem by substituting the resulting probability expressions into the numerator and denominator of the original conditional probability. Échec d'analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} & P(\text{has BC } | \text{ mammogram$^+$}) \\ & \qquad= \frac{P(\text{mammogram$^+$ } | \text{ has BC})P(\text{has BC})} {P(\text{mammogram$^+$ } | \text{ no BC})P(\text{no BC}) + P(\text{mammogram$^+$ } | \text{ has BC})P(\text{has BC})}\end{aligned}}

Consider the following conditional probability for variable 1 and variable 2:

Échec d'analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{outcome $A_1$ of variable 1 } | \text{ outcome $B$ of variable 2})\end{aligned}}

Bayes’ Theorem states that this conditional probability can be identified as the following fraction:

Échec d'analyse (fonction inconnue « \label »): {\displaystyle \begin{aligned} \frac{P(B | A_1) P(A_1)} {P(B | A_1) P(A_1) + P(B | A_2) P(A_2) + \cdots + P(B | A_k) P(A_k)} \label{equationOfBayesTheorem}\end{aligned}}

where , , ..., and represent all other possible outcomes of the first variable.

Bayes’ Theorem is just a generalization of what we have done using tree diagrams. The numerator identifies the probability of getting both and . The denominator is the marginal probability of getting . This bottom component of the fraction appears long and complicated since we have to add up probabilities from all of the different ways to get . We always completed this step when using tree diagrams. However, we usually did it in a separate step so it didn’t seem as complex.

To apply Bayes’ Theorem correctly, there are two preparatory steps:

  1. First identify the marginal probabilities of each possible outcome of the first variable: , , ..., .
  2. Then identify the probability of the outcome , conditioned on each possible scenario for the first variable: , , ..., .

Once each of these probabilities are identified, they can be applied directly within the formula.

Drawing a tree diagram makes it easier to understand how two variables are connected. Use Bayes’ Theorem only when there are so many scenarios that drawing a tree diagram would be complex.

[exerciseForParkingLotOnCampusBeingFullAndWhetherOrNotThereIsASportingEvent] Jose visits campus every Thursday evening. However, some days the parking garage is full, often due to college events. There are academic events on 35% of evenings, sporting events on 20% of evenings, and no events on 45% of evenings. When there is an academic event, the garage fills up about 25% of the time, and it fills up 70% of evenings with sporting events. On evenings when there are no events, it only fills up about 5% of the time. If Jose comes to campus and finds the garage full, what is the probability that there is a sporting event? Use a tree diagram to solve this problem.[37]

Here we solve the same problem presented in Guided Practice [exerciseForParkingLotOnCampusBeingFullAndWhetherOrNotThereIsASportingEvent], except this time we use Bayes’ Theorem. The outcome of interest is whether there is a sporting event (call this ), and the condition is that the lot is full (). Let represent an academic event and represent there being no event on campus. Then the given probabilities can be written as Bayes’ Theorem can be used to compute the probability of a sporting event () under the condition that the parking lot is full (): Based on the information that the garage is full, there is a 56% probability that a sporting event is being held on campus that evening.

[exerciseForParkingLotOnCampusBeingFullAndWhetherOrNotThereIsAnAcademicEvent] Use the information in the previous exercise and example to verify the probability that there is an academic event conditioned on the parking lot being full is 0.35.[38]

[exerciseForParkingLotOnCampusBeingFullAndWhetherOrNotThereIsNoEvent] In Guided Practice [exerciseForParkingLotOnCampusBeingFullAndWhetherOrNotThereIsASportingEvent] and [exerciseForParkingLotOnCampusBeingFullAndWhetherOrNotThereIsAnAcademicEvent], you found that if the parking lot is full, the probability there is a sporting event is 0.56 and the probability there is an academic event is 0.35. Using this information, compute no event the lot is full.[39]

The last several exercises offered a way to update our belief about whether there is a sporting event, academic event, or no event going on at the school based on the information that the parking lot was full. This strategy of updating beliefs using Bayes’ Theorem is actually the foundation of an entire section of statistics called . While Bayesian statistics is very important and useful, we will not have time to cover much more of it in this book.

Sampling from a small population (special topic)[edit | edit source]

Professors sometimes select a student at random to answer a question. If each student has an equal chance of being selected and there are 15 people in your class, what is the chance that she will pick you for the next question? If there are 15 people to ask and none are skipping class, then the probability is , or about .

If the professor asks 3 questions, what is the probability that you will not be selected? Assume that she will not pick the same person twice in a given lecture.[3woRep] For the first question, she will pick someone else with probability . When she asks the second question, she only has 14 people who have not yet been asked. Thus, if you were not picked on the first question, the probability you are again not picked is . Similarly, the probability you are again not picked on the third question is , and the probability of not being picked for any of the three questions is Échec d'analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} &&P(\text{not picked in 3 questions}) \\ &&\quad = P(\text{\var{Q1}} = \text{\resp{not\_\hspace{0.3mm}picked}, }\text{\var{Q2}} = \text{\resp{not\_\hspace{0.3mm}picked}, }\text{\var{Q3}} = \text{\resp{not\_\hspace{0.3mm}picked}.}) \\ &&\quad = \frac{14}{15}\times\frac{13}{14}\times\frac{12}{13} = \frac{12}{15} = 0.80\end{aligned}}

What rule permitted us to multiply the probabilities in Example [3woRep]?[40]

Suppose the professor randomly picks without regard to who she already selected, i.e. students can be picked more than once. What is the probability that you will not be picked for any of the three questions?[3wRep] Each pick is independent, and the probability of not being picked for any individual question is . Thus, we can use the Multiplication Rule for independent processes. Échec d'analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} &&P(\text{not picked in 3 questions}) \\ &&\quad = P(\text{\var{Q1}} = \text{\resp{not\_\hspace{0.3mm}picked}, }\text{\var{Q2}} = \text{\resp{not\_\hspace{0.3mm}picked}, }\text{\var{Q3}} = \text{\resp{not\_\hspace{0.3mm}picked}.}) \\ &&\quad = \frac{14}{15}\times\frac{14}{15}\times\frac{14}{15} = 0.813\end{aligned}} You have a slightly higher chance of not being picked compared to when she picked a new person for each question. However, you now may be picked more than once.

Under the setup of Example [3wRep], what is the probability of being picked to answer all three questions?[41]

If we sample from a small population , we no longer have independence between our observations. In Example [3woRep], the probability of not being picked for the second question was conditioned on the event that you were not picked for the first question. In Example [3wRep], the professor sampled her students : she repeatedly sampled the entire class without regard to who she already picked.

[raffleOf30TicketsWWOReplacement] Your department is holding a raffle. They sell 30 tickets and offer seven prizes. (a) They place the tickets in a hat and draw one for each prize. The tickets are sampled without replacement, i.e. the selected tickets are not placed back in the hat. What is the probability of winning a prize if you buy one ticket? (b) What if the tickets are sampled with replacement?[42]

[followUpToRaffleOf30TicketsWWOReplacement] Compare your answers in Guided Practice [raffleOf30TicketsWWOReplacement]. How much influence does the sampling method have on your chances of winning a prize?[43]

Had we repeated Guided Practice [raffleOf30TicketsWWOReplacement] with 300 tickets instead of 30, we would have found something interesting: the results would be nearly identical. The probability would be 0.0233 without replacement and 0.0231 with replacement. When the sample size is only a small fraction of the population (under 10%), observations are nearly independent even when sampling without replacement.

Random variables (special topic)[edit | edit source]

Two books are assigned for a statistics class: a textbook and its corresponding study guide. The university bookstore determined 20% of enrolled students do not buy either book, 55% buy the textbook only, and 25% buy both books, and these percentages are relatively constant from one term to another. If there are 100 students enrolled, how many books should the bookstore expect to sell to this class?[bookStoreSales] Around 20 students will not buy either book (0 books total), about 55 will buy one book (55 books total), and approximately 25 will buy two books (totaling 50 books for these 25 students). The bookstore should expect to sell about 105 books for this class.

Would you be surprised if the bookstore sold slightly more or less than 105 books?[44]

The textbook costs $137 and the study guide $33. How much revenue should the bookstore expect from this class of 100 students?[bookStoreRev] About 55 students will just buy a textbook, providing revenue of The roughly 25 students who buy both the textbook and the study guide would pay a total of Thus, the bookstore should expect to generate about from these 100 students for this one class. However, there might be some sampling variability so the actual amount may differ by a little bit.

Fichier:Ch probability/figures/bookCostDist/bookCostDist
caption Probability distribution for the bookstore’s revenue from a single student. The distribution balances on a triangle representing the average revenue per student.

What is the average revenue per student for this course?[revFromStudent] The expected total revenue is $11,785, and there are 100 students. Therefore the expected revenue per student is .

Expectation[edit | edit source]

We call a variable or process with a numerical outcome a , and we usually represent this random variable with a capital letter such as , , or . The amount of money a single student will spend on her statistics books is a random variable, and we represent it by .

A random process or variable with a numerical outcome.

The possible outcomes of are labeled with a corresponding lower case letter and subscripts. For example, we write , , and , which occur with probabilities , , and . The distribution of is summarized in Figure [bookCostDist] and Table [statSpendDist].

The probability distribution for the random variable , representing the bookstore’s revenue from a single student.
1 2 3 Total
$0 $137 $170
0.20 0.55 0.25 1.00

We computed the average outcome of as $117.85 in Example [revFromStudent]. We call this average the of , denoted by . The expected value of a random variable is computed by adding each outcome weighted by its probability:

If takes outcomes , ..., with probabilities , ..., , the expected value of is the sum of each outcome multiplied by its corresponding probability: Échec d'analyse (fonction inconnue « \notag »): {\displaystyle \begin{aligned} E(X) &= x_1\times P(X=x_1) + \cdots + x_k\times P(X=x_k) \notag \\ &= \sum_{i=1}^{k}x_iP(X=x_i)\end{aligned}} The Greek letter may be used in place of the notation .

The expected value for a random variable represents the average outcome. For example, represents the average amount the bookstore expects to make from a single student, which we could also write as .

It is also possible to compute the expected value of a continuous random variable (see Section [contDist]). However, it requires a little calculus and we save it for a later class.[45]

In physics, the expectation holds the same meaning as the center of gravity. The distribution can be represented by a series of weights at each outcome, and the mean represents the balancing point. This is represented in Figures [bookCostDist] and [bookWts]. The idea of a center of gravity also expands to continuous probability distributions. Figure [contBalance] shows a continuous probability distribution balanced atop a wedge placed at the mean.

Fichier:Ch probability/figures/bookWts/bookWts
caption A weight system representing the probability distribution for . The string holds the distribution at the mean to keep the system balanced.
Fichier:Ch probability/figures/contBalance/contBalance
caption A continuous distribution can also be balanced at its mean.

Variability in random variables[edit | edit source]

Suppose you ran the university bookstore. Besides how much revenue you expect to generate, you might also want to know the volatility (variability) in your revenue.

The and can be used to describe the variability of a random variable. Section [variability] introduced a method for finding the variance and standard deviation for a data set. We first computed deviations from the mean (), squared those deviations, and took an average to get the variance. In the case of a random variable, we again compute squared deviations. However, we take their sum weighted by their corresponding probabilities, just like we did for the expectation. This weighted sum of squared deviations equals the variance, and we calculate the standard deviation by taking the square root of the variance, just as we did in Section [variability].

If takes outcomes , ..., with probabilities , ..., and expected value , then the variance of , denoted by or the symbol , is Échec d'analyse (fonction inconnue « \notag »): {\displaystyle \begin{aligned} \sigma^2 &= (x_1-\mu)^2\times P(X=x_1) + \cdots \notag \\ & \qquad\quad\cdots+ (x_k-\mu)^2\times P(X=x_k) \notag \\ &= \sum_{j=1}^{k} (x_j - \mu)^2 P(X=x_j)\end{aligned}} The standard deviation of , labeled , is the square root of the variance.

Compute the expected value, variance, and standard deviation of , the revenue of a single statistics student for the bookstore. It is useful to construct a table that holds computations for each outcome separately, then add up the results.

1 2 3 Total
$0 $137 $170
0.20 0.55 0.25
0 75.35 42.50 117.85

Thus, the expected value is , which we computed earlier. The variance can be constructed by extending this table:

1 2 3 Total
$0 $137 $170
0.20 0.55 0.25
0 75.35 42.50 117.85
-117.85 19.15 52.15
13888.62 366.72 2719.62
2777.7 201.7 679.9 3659.3

The variance of is , which means the standard deviation is .

The bookstore also offers a chemistry textbook for $159 and a book supplement for $41. From past experience, they know about 25% of chemistry students just buy the textbook while 60% buy both the textbook and supplement.[46]

  1. What proportion of students don’t buy either book? Assume no students buy the supplement without the textbook.
  2. Let represent the revenue from a single student. Write out the probability distribution of , i.e. a table for each outcome and its associated probability.
  3. Compute the expected revenue from a single chemistry student.
  4. Find the standard deviation to describe the variability associated with the revenue from a single student.

Linear combinations of random variables[edit | edit source]

So far, we have thought of each variable as being a complete story in and of itself. Sometimes it is more appropriate to use a combination of variables. For instance, the amount of time a person spends commuting to work each week can be broken down into several daily commutes. Similarly, the total gain or loss in a stock portfolio is the sum of the gains and losses in its components.

John travels to work five days a week. We will use to represent his travel time on Monday, to represent his travel time on Tuesday, and so on. Write an equation using , ..., that represents his travel time for the week, denoted by . His total weekly travel time is the sum of the five daily values: Breaking the weekly travel time into pieces provides a framework for understanding each source of randomness and is useful for modeling .

It takes John an average of 18 minutes each day to commute to work. What would you expect his average commute time to be for the week? We were told that the average (i.e. expected value) of the commute time is 18 minutes per day: . To get the expected time for the sum of the five days, we can add up the expected time for each individual day: The expectation of the total time is equal to the sum of the expected individual times. More generally, the expectation of a sum of random variables is always the sum of the expectation for each random variable.

[elenaIsSellingATVAndBuyingAToasterOvenAtAnAuction] Elena is selling a TV at a cash auction and also intends to buy a toaster oven in the auction. If represents the profit for selling the TV and represents the cost of the toaster oven, write an equation that represents the net change in Elena’s cash.[47]

Based on past auctions, Elena figures she should expect to make about $175 on the TV and pay about $23 for the toaster oven. In total, how much should she expect to make or spend?[48]

[explainWhyThereIsUncertaintyInTheSum] Would you be surprised if John’s weekly commute wasn’t exactly 90 minutes or if Elena didn’t make exactly $152? Explain.[49]

Two important concepts concerning combinations of random variables have so far been introduced. First, a final value can sometimes be described as the sum of its parts in an equation. Second, intuition suggests that putting the individual average values into this equation gives the average value we would expect in total. This second point needs clarification – it is guaranteed to be true in what are called linear combinations of random variables.

A of two random variables and is a fancy phrase to describe a combination where and are some fixed and known numbers. For John’s commute time, there were five random variables – one for each work day – and each random variable could be written as having a fixed coefficient of 1: For Elena’s net gain or loss, the random variable had a coefficient of +1 and the random variable had a coefficient of -1.

When considering the average of a linear combination of random variables, it is safe to plug in the mean of each random variable and then compute the final result. For a few examples of nonlinear combinations of random variables – cases where we cannot simply plug in the means – see the footnote.[50]

If and are random variables, then a linear combination of the random variables is given by Échec d'analyse (fonction inconnue « \label »): {\displaystyle \begin{aligned} \label{linComboOfRandomVariablesXAndY} aX + bY\end{aligned}} where and are some fixed numbers. To compute the average value of a linear combination of random variables, plug in the average of each individual random variable and compute the result: Recall that the expected value is the same as the mean, e.g. .

Leonard has invested $6000 in Google Inc. (stock ticker: GOOG) and $2000 in Exxon Mobil Corp. (XOM). If represents the change in Google’s stock next month and represents the change in Exxon Mobil stock next month, write an equation that describes how much money will be made or lost in Leonard’s stocks for the month. For simplicity, we will suppose and are not in percents but are in decimal form (e.g. if Google’s stock increases 1%, then ; or if it loses 1%, then ). Then we can write an equation for Leonard’s gain as If we plug in the change in the stock value for and , this equation gives the change in value of Leonard’s stock portfolio for the month. A positive value represents a gain, and a negative value represents a loss.

[expectedChangeInLeonardsStockPortfolio] Suppose Google and Exxon Mobil stocks have recently been rising 2.1% and 0.4% per month, respectively. Compute the expected change in Leonard’s stock portfolio for next month.[51]

You should have found that Leonard expects a positive gain in Guided Practice [expectedChangeInLeonardsStockPortfolio]. However, would you be surprised if he actually had a loss this month?[52]

Variability in linear combinations of random variables[edit | edit source]

Quantifying the average outcome from a linear combination of random variables is helpful, but it is also important to have some sense of the uncertainty associated with the total outcome of that combination of random variables. The expected net gain or loss of Leonard’s stock portfolio was considered in Guided Practice [expectedChangeInLeonardsStockPortfolio]. However, there was no quantitative discussion of the volatility of this portfolio. For instance, while the average monthly gain might be about $134 according to the data, that gain is not guaranteed. Figure [changeInLeonardsStockPortfolioFor36Months] shows the monthly changes in a portfolio like Leonard’s during the 36 months from 2009 to 2011. The gains and losses vary widely, and quantifying these fluctuations is important when investing in stocks.

Fichier:Ch probability/figures/changeInLeonardsStockPortfolioFor36Months/changeInLeonardsStockPortfolioFor36Months
caption The change in a portfolio like Leonard’s for the 36 months from 2009 to 2011, where $6000 is in Google’s stock and $2000 is in Exxon Mobil’s.

Just as we have done in many previous cases, we use the variance and standard deviation to describe the uncertainty associated with Leonard’s monthly returns. To do so, the variances of each stock’s monthly return will be useful, and these are shown in Table [sumStatOfGOOGXOM]. The stocks’ returns are nearly independent.

The mean, standard deviation, and variance of the GOOG and XOM stocks. These statistics were estimated from historical stock data, so notation used for sample statistics has been used.
Mean () Standard deviation () Variance ()
GOOG 0.0210 0.0846 0.0072
XOM 0.0038 0.0519 0.0027

Here we use an equation from probability theory to describe the uncertainty of Leonard’s monthly returns; we leave the proof of this method to a dedicated probability course. The variance of a linear combination of random variables can be computed by plugging in the variances of the individual random variables and squaring the coefficients of the random variables: It is important to note that this equality assumes the random variables are independent; if independence doesn’t hold, then more advanced methods are necessary. This equation can be used to compute the variance of Leonard’s monthly return: The standard deviation is computed as the square root of the variance: . While an average monthly return of $134 on an $8000 investment is nothing to scoff at, the monthly returns are so volatile that Leonard should not expect this income to be very stable.

The variance of a linear combination of random variables may be computed by squaring the constants, substituting in the variances for the random variables, and computing the result: This equation is valid as long as the random variables are independent of each other. The standard deviation of the linear combination may be found by taking the square root of the variance.

Suppose John’s daily commute has a standard deviation of 4 minutes. What is the uncertainty in his total commute time for the week? [sdOfJohnsCommuteWeeklyTime] The expression for John’s commute time was Each coefficient is 1, and the variance of each day’s time is . Thus, the variance of the total weekly commute time is The standard deviation for John’s weekly work commute time is about 9 minutes.

The computation in Example [sdOfJohnsCommuteWeeklyTime] relied on an important assumption: the commute time for each day is independent of the time on other days of that week. Do you think this is valid? Explain.[53]

[elenaIsSellingATVAndBuyingAToasterOvenAtAnAuctionVariability] Consider Elena’s two auctions from Guided Practice [elenaIsSellingATVAndBuyingAToasterOvenAtAnAuction] on page . Suppose these auctions are approximately independent and the variability in auction prices associated with the TV and toaster oven can be described using standard deviations of $25 and $8. Compute the standard deviation of Elena’s net gain.[54]

Consider again Guided Practice [elenaIsSellingATVAndBuyingAToasterOvenAtAnAuctionVariability]. The negative coefficient for in the linear combination was eliminated when we squared the coefficients. This generally holds true: negatives in a linear combination will have no impact on the variability computed for a linear combination, but they do impact the expected value computations.

Continuous distributions (special topic)[edit | edit source]

Figure [fdicHistograms] shows a few different hollow histograms of the variable for 3 million US adults from the mid-90’s.[55] How does changing the number of bins allow you to make different interpretations of the data?[usHeights] Adding more bins provides greater detail. This sample is extremely large, which is why much smaller bins still work well. Usually we do not use so many bins with smaller sample sizes since small counts per bin mean the bin heights are very volatile.

Fichier:Ch probability/figures/fdicHistograms/fdicHistograms
caption Four hollow histograms of US adults heights with varying bin widths.

What proportion of the sample is between cm and cm tall (about 5’11“ to 6’1”)?[contDistProb] We can add up the heights of the bins in the range cm and and divide by the sample size. For instance, this can be done with the two shaded bins shown in Figure [usHeightsHist180185]. The two bins in this region have counts of 195,307 and 156,239 people, resulting in the following estimate of the probability: This fraction is the same as the proportion of the histogram’s area that falls in the range to cm.

Fichier:Ch probability/figures/usHeightsHist180185/usHeightsHist180185
caption A histogram with bin sizes of 2.5 cm. The shaded region represents individuals with heights between and cm.

From histograms to continuous distributions[edit | edit source]

Examine the transition from a boxy hollow histogram in the top-left of Figure [fdicHistograms] to the much smoother plot in the lower-right. In this last plot, the bins are so slim that the hollow histogram is starting to resemble a smooth curve. This suggests the population height as a continuous numerical variable might best be explained by a curve that represents the outline of extremely slim bins.

This smooth curve represents a (also called a or ), and such a curve is shown in Figure [fdicHeightContDist] overlaid on a histogram of the sample. A density has a special property: the total area under the density’s curve is 1.

Fichier:Ch probability/figures/fdicHeightContDist/fdicHeightContDist
caption The continuous probability distribution of heights for US adults.

Probabilities from continuous distributions[edit | edit source]

We computed the proportion of individuals with heights to cm in Example [contDistProb] as a fraction: Échec d'analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} \frac{\text{number of people between \resp{180} and \resp{185}}}{\text{total sample size}}\end{aligned}} We found the number of people with heights between and cm by determining the fraction of the histogram’s area in this region. Similarly, we can use the area in the shaded region under the curve to find a probability (with the help of a computer): Échec d'analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{\var{height} between \resp{180} and \resp{185}}) = \text{area between \resp{180} and \resp{185}} = 0.1157\end{aligned}} The probability that a randomly selected person is between and cm is 0.1157. This is very close to the estimate from Example [contDistProb]: 0.1172.

Fichier:Ch probability/figures/fdicHeightContDistFilled/fdicHeightContDistFilled
caption Density for heights in the US adult population with the area between 180 and 185 cm shaded. Compare this plot with Figure [usHeightsHist180185].

Three US adults are randomly selected. The probability a single adult is between and cm is 0.1157.[56]

  1. What is the probability that all three are between and cm tall?
  2. What is the probability that none are between and cm?

What is the probability that a randomly selected person is exactly  cm? Assume you can measure perfectly. [probabilityOfExactly180cm] This probability is zero. A person might be close to cm, but not exactly cm tall. This also makes sense with the definition of probability as area; there is no area captured between  cm and  cm.

Suppose a person’s height is rounded to the nearest centimeter. Is there a chance that a random person’s measured height will be cm?[57]

  1. Here are four examples. (i) Whether someone gets sick in the next month or not is an apparently random process with outcomes and . (ii) We can generate a random process by randomly picking a person and measuring that person’s height. The outcome of this process will be a positive number. (iii) Whether the stock market goes up or down next week is a seemingly random process with possible outcomes , , and . Alternatively, we could have used the percent change in the stock market as a numerical outcome. (iv) Whether your roommate cleans her dishes tonight probably seems like a random process with possible outcomes and .
  2. (a) The random process is a die roll, and at most one of these outcomes can come up. This means they are disjoint outcomes. (b)  or or
  3. (a) Yes. Each email is categorized in only one level of . (b) Small: . Big: . (c) or .
  4. (a) or . (b) Similarly, .
  5. (a) Outcomes and . (b) Yes, events and are disjoint because they share no outcomes. (c) The events and share an outcome in common, , and so are not disjoint.
  6. Since and are disjoint events, use the Addition Rule: or .
  7. The 52 cards are split into four : (club), (diamond), (heart), (spade). Each suit has its 13 cards labeled: , , ..., , (jack), (queen), (king), and (ace). Thus, each card is a unique combination of a suit and a label, e.g. and . The 12 cards represented by the jacks, queens, and kings are called . The cards that are or are typically colored red while the other two suits are typically colored black.
  8. (a) There are 52 cards and 13 diamonds. If the cards are thoroughly shuffled, each card has an equal chance of being drawn, so the probability that a randomly selected card is a diamond is . (b) Likewise, there are 12 face cards, so face card.
  9. (a) If and are disjoint, and can never occur simultaneously. (b) If and are disjoint, then the last term of Equation ([generalAdditionRule]) is 0 (see part (a)) and we are left with the Addition Rule for disjoint events.
  10. 0.65 Both the counts and corresponding probabilities (e.g. ) are shown. Notice that the number of emails represented in the left circle corresponds to , and the number represented in the right circle is .   0.3 image
  11. (a) The solution is represented by the intersection of the two circles: 0.043. (b) This is the sum of the three disjoint probabilities shown in the circles: .
  12. The probabilities of (a) do not sum to 1. The second probability in (b) is negative. This leaves (c), which sure enough satisfies the requirements of a distribution. One of the three was said to be the actual distribution of US household incomes, so it must be (c).
  13. (a) The outcomes are disjoint and each has probability , so the total probability is . (b) We can also see that . Since and are disjoint, .
  14. Brief solutions: (a) , , , and , , , . (b) Noting that each outcome is disjoint, add the individual outcome probabilities to get and . (c)  and  are disjoint, and the same is true of  and . Therefore, and .
  15. (a) The complement of : when the total is equal to . (b) . (c) Use the probability of the complement from part (b), , and Equation ([complement]): less than .
  16. (a) First find , then use the complement: not . (b) First find the complement, which requires much less effort: or . Then calculate . (c) As before, finding the complement is the clever way to determine . First find or . Then calculate .
  17. (a) The probability the first person is left-handed is , which is the same for the second person. We apply the Multiplication Rule for independent processes to determine the probability that both will be left-handed: . (b) It is reasonable to assume the proportion of people who are ambidextrous (both right and left handed) is nearly 0, which results in right-handed. Using the same reasoning as in part (a), the probability that both will be right-handed is .
  18. (a) The abbreviations and are used for right-handed and left-handed, respectively. Since each are independent, we apply the Multiplication Rule for independent processes: Échec d'analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{all five are \resp{RH}}) &= P(\text{first = \resp{RH}, second = \resp{RH}, ..., fifth = \resp{RH}}) \\ &= P(\text{first = \resp{RH}})\times P(\text{second = \resp{RH}})\times \dots \times P(\text{fifth = \resp{RH}}) \\ &= 0.91\times 0.91\times 0.91\times 0.91\times 0.91 = 0.624\end{aligned}} (b) Using the same reasoning as in (a), (c) Use the complement, all five are , to answer this question: Échec d'analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} P(\text{not all \resp{RH}}) = 1 - P(\text{all \resp{RH}}) = 1 - 0.624 = 0.376\end{aligned}}
  19. The actual proportion of the U.S. population that is is about 50%, and so we use 0.5 for the probability of sampling a woman. However, this probability does differ in other countries.
  20. Brief answers are provided. (a) This can be written in probability notation as a randomly selected person is male and right-handed. (b) 0.207. (c) 0.045. (d) 0.0093.
  21. A simulated data set based on real population summaries at .
  22. Each of the four outcome combination are disjoint, all probabilities are indeed non-negative, and the sum of the probabilities is .
  23. (a) Échec d'analyse (erreur de syntaxe): {\displaystyle P(\text{\var{parents} \resp{not}}\ |\ \text{\var{teen} \resp{not}})} . (b) Equation ([condProbEq]) for conditional probability indicates we should first find Échec d'analyse (erreur de syntaxe): {\displaystyle P(\text{\var{parents} \resp{not} and \var{teen} \resp{not}}) = 0.38} and Échec d'analyse (erreur de syntaxe): {\displaystyle P(\text{\var{teen} \resp{not}}) = 0.44} . Then the ratio represents the conditional probability: .
  24. (a) This probability is Échec d'analyse (erreur de syntaxe): {\displaystyle \frac{P(\text{\var{parents} \resp{degree}, \var{teen} \resp{not}})}{P(\text{\var{teen} \resp{not}})} = \frac{0.06}{0.44} = 0.136} . (b) The total equals 1. (c) Under the condition the teenager didn’t attend college, the parents must either have a college degree or not. The complement still works for conditional probabilities, provided the probabilities are conditioned on the same information.
  25. No. While there is an association, the data are observational. Two potential confounding variables include and . Can you think of others?
  26. Fenner F. 1988. Smallpox and Its Eradication (History of International Public Health, No. 6). Geneva: World Health Organization. ISBN 92-4-156110-6.
  27. = = Échec d'analyse (erreur de syntaxe): {\displaystyle ) = \frac{P(\text{\var{result} = \resp{died} and \var{inoculated} = \resp{no}})}{P(\text{\var{inoculated} = \resp{no}})} = \frac{0.1356}{0.9608} = 0.1411} .
  28. = = Échec d'analyse (erreur de syntaxe): {\displaystyle ) = \frac{P(\text{\var{result} = \resp{died} and \var{inoculated} = \resp{yes}})}{P(\text{\var{inoculated} = \resp{yes}})} = \frac{0.0010}{0.0392} = 0.0255} . The death rate for individuals who were inoculated is only about 1 in 40 while the death rate is about 1 in 7 for those who were not inoculated.
  29. Brief answers: (a) Observational. (b) No, we cannot infer causation from this observational study. (c) Accessibility to the latest and best medical care. There are other valid answers for part (c).
  30. The answer is 0.0382, which can be verified using Table [smallpoxProbabilityTable].
  31. There were only two possible outcomes: or . This means that 100% - 97.45% = 2.55% of the people who were inoculated died.
  32. The samples are large relative to the difference in death rates for the “inoculated” and “not inoculated” groups, so it seems there is an association between and . However, as noted in the solution to Guided Practice [SmallpoxInoculationObsExpExercise], this is an observational study and we cannot be sure if there is a causal connection. (Further research has shown that inoculation is effective at reducing death rates.)
  33. Brief solutions: (a) . (b) . (c) Échec d'analyse (erreur de syntaxe): {\displaystyle \frac{P(Y = \text{ \resp{1} and }X=\text{ \resp{1}})}{P(X=\text{ \resp{1}})} = \frac{1/36}{1/6} = 1/6} . (d) The probability is the same as in part (c): . The probability that was unchanged by knowledge about , which makes sense as and are independent.
  34. He has forgotten that the next roulette spin is independent of the previous spins. Casinos do employ this practice; they post the last several outcomes of many betting games to trick unsuspecting gamblers into believing the odds are in their favor. This is called the .
  35. 0.47 (a) The tree diagram is shown to the right. (b) Identify which two joint probabilities represent students who passed, and add them: passed. (c) construct tree diagram passed.
      0.5 image
  36. The probabilities reported here were obtained using studies reported at and .
  37. 0.47 The tree diagram, with three primary branches, is shown to the right. Next, we identify two probabilities from the tree diagram. (1) The probability that there is a sporting event and the garage is full: 0.14. (2) The probability the garage is full: . Then the solution is the ratio of these probabilities: . If the garage is full, there is a 56% probability that there is a sporting event.
      0.5 image
  38. Short answer:
  39. Each probability is conditioned on the same information that the garage is full, so the complement may be used: .
  40. The three probabilities we computed were actually one marginal probability, , and two conditional probabilities: Échec d'analyse (fonction inconnue « \begin{aligned} »): {\displaystyle \begin{aligned} &&P(\text{\var{Q2}} = \text{\resp{not\_\hspace{0.3mm}picked} }|\text{ \var{Q1}} = \text{\resp{not\_\hspace{0.3mm}picked}}) \\ &&P(\text{\var{Q3}} = \text{\resp{not\_\hspace{0.3mm}picked} }|\text{ \var{Q1}} = \text{\resp{not\_\hspace{0.3mm}picked}, }\text{\var{Q2}} = \text{\resp{not\_\hspace{0.3mm}picked}})\end{aligned}} Using the General Multiplication Rule, the product of these three probabilities is the probability of not being picked in 3 questions.
  41. being picked to answer all three questions.
  42. (a) First determine the probability of not winning. The tickets are sampled without replacement, which means the probability you do not win on the first draw is , for the second, ..., and for the seventh. The probability you win no prize is the product of these separate probabilities: . That is, the probability of winning a prize is . (b) When the tickets are sampled with replacement, there are seven independent draws. Again we first find the probability of not winning a prize: . Thus, the probability of winning (at least) one prize when drawing with replacement is 0.211.
  43. There is about a 10% larger chance of winning a prize when using sampling without replacement. However, at most one prize may be won under this sampling procedure.
  44. If they sell a little more or a little less, this should not be a surprise. Hopefully Chapter [introductionToData] helped make clear that there is natural variability in observed data. For example, if we would flip a coin 100 times, it will not usually come up heads exactly half the time, but it will probably be close.
  45. where represents a function for the density curve.
  46. (a) 100% - 25% - 60% = 15% of students do not buy any books for the class. Part (b) is represented by the first two lines in the table below. The expectation for part (c) is given as the total on the line . The result of part (d) is the square-root of the variance listed on in the total on the last line: .
    (scenario) 1 () 2 () 3 () Total
    0.00 159.00 200.00
    0.15 0.25 0.60
    0.00 39.75 120.00
    -159.75 -0.75 40.25
    25520.06 0.56 1620.06
    3828.0 0.1 972.0
  47. She will make dollars on the TV but spend dollars on the toaster oven: .
  48. . She should expect to make about $152.
  49. No, since there is probably some variability. For example, the traffic will vary from one day to next, and auction prices will vary depending on the quality of the merchandise and the interest of the attendees.
  50. If and are random variables, consider the following combinations: , , . In such cases, plugging in the average value for each random variable and computing the result will not generally lead to an accurate average value for the end result.
  51. .
  52. No. While stocks tend to rise over time, they are often volatile in the short term.
  53. One concern is whether traffic patterns tend to have a weekly cycle (e.g. Fridays may be worse than other days). If that is the case, and John drives, then the assumption is probably not reasonable. However, if John walks to work, then his commute is probably not affected by any weekly traffic cycle.
  54. The equation for Elena can be written as The variances of and are 625 and 64. We square the coefficients and plug in the variances: The variance of the linear combination is 689, and the standard deviation is the square root of 689: about $26.25.
  55. This sample can be considered a simple random sample from the US population. It relies on the USDA Food Commodity Intake Database.
  56. Brief answers: (a) . (b)
  57. This has positive probability. Anyone between cm and cm will have a measured height of cm. This is probably a more realistic scenario to encounter in practice versus Example [probabilityOfExactly180cm].