# Probability (special topic)

Probability forms a foundation for statistics. You might already be familiar with many aspects of probability, however, formalization of the concepts is new for most. This chapter aims to introduce probability on familiar terms using processes most people have seen before.

## Defining probability (special topic)

A “die”, the singular of dice, is a cube with six faces numbered , , , , , and . What is the chance of getting when rolling a die?[probOf1] If the die is fair, then the chance of a is as good as the chance of any other number. Since there are six outcomes, the chance must be 1-in-6 or, equivalently, ${\displaystyle 1/6}$.

What is the chance of getting a or in the next roll?[probOf1Or2] and constitute two of the six equally likely possible outcomes, so the chance of getting one of these two outcomes must be ${\displaystyle 2/6=1/3}$.

What is the chance of getting either , , , , , or on the next roll?[probOf123456] 100%. The outcome must be one of these numbers.

What is the chance of not rolling a ?[probNot2] Since the chance of rolling a is ${\displaystyle 1/6}$ or ${\displaystyle 16.{\bar {6}}\%}$, the chance of not rolling a must be ${\displaystyle 100\%-16.{\bar {6}}\%=83.{\bar {3}}\%}$ or ${\displaystyle 5/6}$.

Alternatively, we could have noticed that not rolling a is the same as getting a , , , , or , which makes up five of the six equally likely outcomes and has probability ${\displaystyle 5/6}$.

Consider rolling two dice. If ${\displaystyle 1/6^{th}}$ of the time the first die is a and ${\displaystyle 1/6^{th}}$ of those times the second die is a , what is the chance of getting two s?[probOf2Ones] If ${\displaystyle 16.{\bar {6}}}$% of the time the first die is a and ${\displaystyle 1/6^{th}}$ of those times the second die is also a , then the chance that both dice are is ${\displaystyle (1/6)\times (1/6)}$ or ${\displaystyle 1/36}$.

### Probability

We use probability to build tools to describe and understand apparent randomness. We often frame probability in terms of a giving rise to an .

 Roll a die ${\displaystyle \rightarrow }$ , , , , , or Flip a coin ${\displaystyle \rightarrow }$ or

Rolling a die or flipping a coin is a seemingly random process and each gives rise to an outcome.

The of an outcome is the proportion of times the outcome would occur if we observed the random process an infinite number of times.

Probability is defined as a proportion, and it always takes values between 0 and 1 (inclusively). It may also be displayed as a percentage between 0% and 100%.

Probability can be illustrated by rolling a die many times. Let ${\displaystyle {\hat {p}}_{n}}$ be the proportion of outcomes that are after the first ${\displaystyle n}$ rolls. As the number of rolls increases, ${\displaystyle {\hat {p}}_{n}}$ will converge to the probability of rolling a , ${\displaystyle p=1/6}$. Figure [dieProp] shows this convergence for 100,000 die rolls. The tendency of ${\displaystyle {\hat {p}}_{n}}$ to stabilize around ${\displaystyle p}$ is described by the .

Fichier:Ch probability/figures/dieProp/dieProp
caption The fraction of die rolls that are 1 at each stage in a simulation. The proportion tends to get closer to the probability ${\displaystyle 1/6\approx 0.167}$ as the number of rolls increases.

As more observations are collected, the proportion ${\displaystyle {\hat {p}}_{n}}$ of occurrences with a particular outcome converges to the probability ${\displaystyle p}$ of that outcome.

Occasionally the proportion will veer off from the probability and appear to defy the Law of Large Numbers, as ${\displaystyle {\hat {p}}_{n}}$ does many times in Figure [dieProp]. However, these deviations become smaller as the number of rolls increases.

Above we write ${\displaystyle p}$ as the probability of rolling a . We can also write this probability as aligned} »): {\displaystyle \begin{aligned} P(\text{rolling a \resp{1}})\end{aligned} As we become more comfortable with this notation, we will abbreviate it further. For instance, if it is clear that the process is “rolling a die”, we could abbreviate ${\displaystyle P(}$rolling a ${\displaystyle )}$ as ${\displaystyle P(}$${\displaystyle )}$.

[randomProcessExercise] Random processes include rolling a die and flipping a coin. (a) Think of another random process. (b) Describe all the possible outcomes of that process. For instance, rolling a die is a random process with possible outcomes .[1]

What we think of as random processes are not necessarily random, but they may just be too difficult to understand exactly. The fourth example in the footnote solution to Guided Practice [randomProcessExercise] suggests a roommate’s behavior is a random process. However, even if a roommate’s behavior is not truly random, modeling her behavior as a random process can still be useful.

It can be helpful to model a process as random even if it is not truly random.

### Disjoint or mutually exclusive outcomes

Two outcomes are called or if they cannot both happen. For instance, if we roll a die, the outcomes and are disjoint since they cannot both occur. On the other hand, the outcomes and “rolling an odd number” are not disjoint since both occur if the outcome of the roll is a . The terms disjoint and mutually exclusive are equivalent and interchangeable.

Calculating the probability of disjoint outcomes is easy. When rolling a die, the outcomes and are disjoint, and we compute the probability that one of these outcomes will occur by adding their separate probabilities: aligned} »): {\displaystyle \begin{aligned} P(\text{\resp{1} or \resp{2}}) = P(\text{\resp{1}})+P(\text{\resp{2}}) = 1/6 + 1/6 = 1/3\end{aligned} What about the probability of rolling a , , , , , or ? Here again, all of the outcomes are disjoint so we add the probabilities: aligned} »): {\displaystyle \begin{aligned} &&P(\text{\resp{1} or \resp{2} or \resp{3} or \resp{4} or \resp{5} or \resp{6}}) \\ &&\quad= P(\text{\resp{1}})+P(\text{\resp{2}})+P(\text{\resp{3}})+P(\text{\resp{4}})+P(\text{\resp{5}})+P(\text{\resp{6}}) \\ &&\quad= 1/6 + 1/6 + 1/6 + 1/6 + 1/6 + 1/6 = 1.\end{aligned} The guarantees the accuracy of this approach when the outcomes are disjoint.

If ${\displaystyle A_{1}}$ and ${\displaystyle A_{2}}$ represent two disjoint outcomes, then the probability that one of them occurs is given by {\displaystyle {\begin{aligned}P(A_{1}{\text{ or }}A_{2})=P(A_{1})+P(A_{2})\end{aligned}}} If there are many disjoint outcomes ${\displaystyle A_{1}}$, ..., ${\displaystyle A_{k}}$, then the probability that one of these outcomes will occur is {\displaystyle {\begin{aligned}P(A_{1})+P(A_{2})+\cdots +P(A_{k})\end{aligned}}}

We are interested in the probability of rolling a , , or . (a) Explain why the outcomes , , and are disjoint. (b) Apply the Addition Rule for disjoint outcomes to determine ${\displaystyle P(}$ or or ${\displaystyle )}$.[2]

In the data set in Chapter [introductionToData], the variable described whether no number (labeled ), only one or more small numbers (), or whether at least one big number appeared in an email (). Of the 3,921 emails, 549 had no numbers, 2,827 had only one or more small numbers, and 545 had at least one big number. (a) Are the outcomes , , and disjoint? (b) Determine the proportion of emails with value and separately. (c) Use the Addition Rule for disjoint outcomes to compute the probability a randomly selected email from the data set has a number in it, small or big.[3]

Statisticians rarely work with individual outcomes and instead consider or of outcomes. Let ${\displaystyle A}$ represent the event where a die roll results in or and ${\displaystyle B}$ represent the event that the die roll is a or a . We write ${\displaystyle A}$ as the set of outcomes ${\displaystyle \{}$${\displaystyle \}}$ and ${\displaystyle B=\{}$, ${\displaystyle \}}$. These sets are commonly called . Because ${\displaystyle A}$ and ${\displaystyle B}$ have no elements in common, they are disjoint events. ${\displaystyle A}$ and ${\displaystyle B}$ are represented in Figure [disjointSets].

Fichier:Ch probability/figures/disjointSets/disjointSets
caption Three events, ${\displaystyle A}$, ${\displaystyle B}$, and ${\displaystyle D}$, consist of outcomes from rolling a die. ${\displaystyle A}$ and ${\displaystyle B}$ are disjoint since they do not have any outcomes in common.

The Addition Rule applies to both disjoint outcomes and disjoint events. The probability that one of the disjoint events ${\displaystyle A}$ or ${\displaystyle B}$ occurs is the sum of the separate probabilities: {\displaystyle {\begin{aligned}P(A{\text{ or }}B)=P(A)+P(B)=1/3+1/3=2/3\end{aligned}}}

(a) Verify the probability of event ${\displaystyle A}$, ${\displaystyle P(A)}$, is ${\displaystyle 1/3}$ using the Addition Rule. (b) Do the same for event ${\displaystyle B}$.[4]

[exerExaminingDisjointSetsABD] (a) Using Figure [disjointSets] as a reference, what outcomes are represented by event ${\displaystyle D}$? (b) Are events ${\displaystyle B}$ and ${\displaystyle D}$ disjoint? (c) Are events ${\displaystyle A}$ and ${\displaystyle D}$ disjoint?[5]

In Guided Practice [exerExaminingDisjointSetsABD], you confirmed ${\displaystyle B}$ and ${\displaystyle D}$ from Figure [disjointSets] are disjoint. Compute the probability that event ${\displaystyle B}$ or event ${\displaystyle D}$ occurs.[6]

### Probabilities when events are not disjoint

Let’s consider calculations for two events that are not disjoint in the context of a , represented in Table [deckOfCards]. If you are unfamiliar with the cards in a regular deck, please see the footnote.[7]

(a) What is the probability that a randomly selected card is a diamond? (b) What is the probability that a randomly selected card is a face card?[8]

are useful when outcomes can be categorized as “in” or “out” for two or three variables, attributes, or random processes. The Venn diagram in Figure [cardsDiamondFaceVenn] uses a circle to represent diamonds and another to represent face cards. If a card is both a diamond and a face card, it falls into the intersection of the circles. If it is a diamond but not a face card, it will be in part of the left circle that is not in the right circle (and so on). The total number of cards that are diamonds is given by the total number of cards in the diamonds circle: ${\displaystyle 10+3=13}$. The probabilities are also shown (e.g. ${\displaystyle 10/52=0.1923}$).

Fichier:Ch probability/figures/cardsDiamondFaceVenn/cardsDiamondFaceVenn
caption A Venn diagram for diamonds and face cards.

Let ${\displaystyle A}$ represent the event that a randomly selected card is a diamond and ${\displaystyle B}$ represent the event that it is a face card. How do we compute ${\displaystyle P(A}$ or ${\displaystyle B)}$? Events ${\displaystyle A}$ and ${\displaystyle B}$ are not disjoint – the cards ${\displaystyle J\diamondsuit }$, ${\displaystyle Q\diamondsuit }$, and ${\displaystyle K\diamondsuit }$ fall into both categories – so we cannot use the Addition Rule for disjoint events. Instead we use the Venn diagram. We start by adding the probabilities of the two events: \displaystyle \begin{aligned} P(A) + P(B) = P({\color{redcards}\diamondsuit}) + P(\text{face card}) = 13/52 + 12/52 \label{overCountFaceDiamond}\end{aligned} However, the three cards that are in both events were counted twice, once in each probability. We must correct this double counting: \displaystyle \begin{aligned} P(A\text{ or } B) &=&P({\color{redcards}\diamondsuit}\text{ or face card}) \notag \\ &=& P({\color{redcards}\diamondsuit}) + P(\text{face card}) - P({\color{redcards}\diamondsuit}\text{ and face card}) \label{diamondFace} \\ &=& 13/52 + 12/52 - 3/52 \notag \\ &=& 22/52 = 11/26 \notag\end{aligned} Equation ([diamondFace]) is an example of the .

If ${\displaystyle A}$ and ${\displaystyle B}$ are any two events, disjoint or not, then the probability that at least one of them will occur is \displaystyle \begin{aligned} P(A\text{ or }B) = P(A) + P(B) - P(A\text{ and }B) \label{generalAdditionRule}\end{aligned} where ${\displaystyle P(A}$ and ${\displaystyle B)}$ is the probability that both events occur.

When we write “or” in statistics, we mean “and/or” unless we explicitly state otherwise. Thus, ${\displaystyle A}$ or ${\displaystyle B}$ occurs means ${\displaystyle A}$, ${\displaystyle B}$, or both ${\displaystyle A}$ and ${\displaystyle B}$ occur.

(a) If ${\displaystyle A}$ and ${\displaystyle B}$ are disjoint, describe why this implies ${\displaystyle P(A}$ and ${\displaystyle B)=0}$. (b) Using part (a), verify that the General Addition Rule simplifies to the simpler Addition Rule for disjoint events if ${\displaystyle A}$ and ${\displaystyle B}$ are disjoint.[9]

[emailSpamNumberVennExer] In the data set with 3,921 emails, 367 were spam, 2,827 contained some small numbers but no big numbers, and 168 had both characteristics. Create a Venn diagram for this setup.[10]

(a) Use your Venn diagram from Guided Practice [emailSpamNumberVennExer] to determine the probability a randomly drawn email from the data set is spam and had small numbers (but not big numbers). (b) What is the probability that the email had either of these attributes?[11]

### Probability distributions

A is a table of all disjoint outcomes and their associated probabilities. Table [diceProb] shows the probability distribution for the sum of two dice.

l ccc ccc ccc cc

Dice sum

& 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 & 12
Probability & ${\displaystyle {\frac {1}{36}}}$ & ${\displaystyle {\frac {2}{36}}}$ & ${\displaystyle {\frac {3}{36}}}$ & ${\displaystyle {\frac {4}{36}}}$ & ${\displaystyle {\frac {5}{36}}}$ & ${\displaystyle {\frac {6}{36}}}$ & ${\displaystyle {\frac {5}{36}}}$ & ${\displaystyle {\frac {4}{36}}}$ & ${\displaystyle {\frac {3}{36}}}$ & ${\displaystyle {\frac {2}{36}}}$ & ${\displaystyle {\frac {1}{36}}}$

A probability distribution is a list of the possible outcomes with corresponding probabilities that satisfies three rules:

1. The outcomes listed must be disjoint.
2. Each probability must be between 0 and 1.
3. The probabilities must total 1.

[usHouseholdIncomeDistsExercise] Table [usHouseholdIncomeDists] suggests three distributions for household income in the United States. Only one is correct. Which one must it be? What is wrong with the other two?[12]

### Expectation

We call a variable or process with a numerical outcome a , and we usually represent this random variable with a capital letter such as ${\displaystyle X}$, ${\displaystyle Y}$, or ${\displaystyle Z}$. The amount of money a single student will spend on her statistics books is a random variable, and we represent it by ${\displaystyle X}$.

A random process or variable with a numerical outcome.

The possible outcomes of ${\displaystyle X}$ are labeled with a corresponding lower case letter ${\displaystyle x}$ and subscripts. For example, we write ${\displaystyle x_{1}=\0}$, ${\displaystyle x_{2}=\137}$, and ${\displaystyle x_{3}=\170}$, which occur with probabilities ${\displaystyle 0.20}$, ${\displaystyle 0.55}$, and ${\displaystyle 0.25}$. The distribution of ${\displaystyle X}$ is summarized in Figure [bookCostDist] and Table [statSpendDist].

The probability distribution for the random variable ${\displaystyle X}$, representing the bookstore’s revenue from a single student.
${\displaystyle i}$ 1 2 3 Total
${\displaystyle x_{i}}$ $0$137 $170 ${\displaystyle P(X=x_{i})}$ 0.20 0.55 0.25 1.00 We computed the average outcome of ${\displaystyle X}$ as$117.85 in Example [revFromStudent]. We call this average the of ${\displaystyle X}$, denoted by ${\displaystyle E(X)}$. The expected value of a random variable is computed by adding each outcome weighted by its probability: {\displaystyle {\begin{aligned}E(X)&=0\times P(X=0)+137\times P(X=137)+170\times P(X=170)\\&=0\times 0.20+137\times 0.55+170\times 0.25=117.85\end{aligned}}}

If ${\displaystyle X}$ takes outcomes ${\displaystyle x_{1}}$, ..., ${\displaystyle x_{k}}$ with probabilities ${\displaystyle P(X=x_{1})}$, ..., ${\displaystyle P(X=x_{k})}$, the expected value of ${\displaystyle X}$ is the sum of each outcome multiplied by its corresponding probability: \displaystyle \begin{aligned} E(X) &= x_1\times P(X=x_1) + \cdots + x_k\times P(X=x_k) \notag \\ &= \sum_{i=1}^{k}x_iP(X=x_i)\end{aligned} The Greek letter ${\displaystyle \mu }$ may be used in place of the notation ${\displaystyle E(X)}$.

The expected value for a random variable represents the average outcome. For example, ${\displaystyle E(X)=117.85}$ represents the average amount the bookstore expects to make from a single student, which we could also write as ${\displaystyle \mu =117.85}$.

It is also possible to compute the expected value of a continuous random variable (see Section [contDist]). However, it requires a little calculus and we save it for a later class.[45]

In physics, the expectation holds the same meaning as the center of gravity. The distribution can be represented by a series of weights at each outcome, and the mean represents the balancing point. This is represented in Figures [bookCostDist] and [bookWts]. The idea of a center of gravity also expands to continuous probability distributions. Figure [contBalance] shows a continuous probability distribution balanced atop a wedge placed at the mean.

Fichier:Ch probability/figures/bookWts/bookWts
caption A weight system representing the probability distribution for ${\displaystyle X}$. The string holds the distribution at the mean to keep the system balanced.
Fichier:Ch probability/figures/contBalance/contBalance
caption A continuous distribution can also be balanced at its mean.

### Variability in random variables

Suppose you ran the university bookstore. Besides how much revenue you expect to generate, you might also want to know the volatility (variability) in your revenue.

The and can be used to describe the variability of a random variable. Section [variability] introduced a method for finding the variance and standard deviation for a data set. We first computed deviations from the mean (${\displaystyle x_{i}-\mu }$), squared those deviations, and took an average to get the variance. In the case of a random variable, we again compute squared deviations. However, we take their sum weighted by their corresponding probabilities, just like we did for the expectation. This weighted sum of squared deviations equals the variance, and we calculate the standard deviation by taking the square root of the variance, just as we did in Section [variability].

If ${\displaystyle X}$ takes outcomes ${\displaystyle x_{1}}$, ..., ${\displaystyle x_{k}}$ with probabilities ${\displaystyle P(X=x_{1})}$, ..., ${\displaystyle P(X=x_{k})}$ and expected value ${\displaystyle \mu =E(X)}$, then the variance of ${\displaystyle X}$, denoted by ${\displaystyle Var(X)}$ or the symbol ${\displaystyle \sigma ^{2}}$, is \displaystyle \begin{aligned} \sigma^2 &= (x_1-\mu)^2\times P(X=x_1) + \cdots \notag \\ & \qquad\quad\cdots+ (x_k-\mu)^2\times P(X=x_k) \notag \\ &= \sum_{j=1}^{k} (x_j - \mu)^2 P(X=x_j)\end{aligned} The standard deviation of ${\displaystyle X}$, labeled ${\displaystyle \sigma }$, is the square root of the variance.

Compute the expected value, variance, and standard deviation of ${\displaystyle X}$, the revenue of a single statistics student for the bookstore. It is useful to construct a table that holds computations for each outcome separately, then add up the results.

${\displaystyle i}$ 1 2 3 Total
${\displaystyle x_{i}}$ $0$137 $170 ${\displaystyle P(X=x_{i})}$ 0.20 0.55 0.25 ${\displaystyle x_{i}\times P(X=x_{i})}$ 0 75.35 42.50 117.85 Thus, the expected value is ${\displaystyle \mu =117.85}$, which we computed earlier. The variance can be constructed by extending this table: ${\displaystyle i}$ 1 2 3 Total ${\displaystyle x_{i}}$$0 $137$170
${\displaystyle P(X=x_{i})}$ 0.20 0.55 0.25
${\displaystyle x_{i}\times P(X=x_{i})}$ 0 75.35 42.50 117.85
${\displaystyle x_{i}-\mu }$ -117.85 19.15 52.15
${\displaystyle (x_{i}-\mu )^{2}}$ 13888.62 366.72 2719.62
${\displaystyle (x_{i}-\mu )^{2}\times P(X=x_{i})}$ 2777.7 201.7 679.9 3659.3

The variance of ${\displaystyle X}$ is ${\displaystyle \sigma ^{2}=3659.3}$, which means the standard deviation is ${\displaystyle \sigma ={\sqrt {3659.3}}=\60.49}$.

The bookstore also offers a chemistry textbook for $159 and a book supplement for$41. From past experience, they know about 25% of chemistry students just buy the textbook while 60% buy both the textbook and supplement.[46]

1. What proportion of students don’t buy either book? Assume no students buy the supplement without the textbook.
2. Let ${\displaystyle Y}$ represent the revenue from a single student. Write out the probability distribution of ${\displaystyle Y}$, i.e. a table for each outcome and its associated probability.
3. Compute the expected revenue from a single chemistry student.
4. Find the standard deviation to describe the variability associated with the revenue from a single student.

### Linear combinations of random variables

So far, we have thought of each variable as being a complete story in and of itself. Sometimes it is more appropriate to use a combination of variables. For instance, the amount of time a person spends commuting to work each week can be broken down into several daily commutes. Similarly, the total gain or loss in a stock portfolio is the sum of the gains and losses in its components.

John travels to work five days a week. We will use ${\displaystyle X_{1}}$ to represent his travel time on Monday, ${\displaystyle X_{2}}$ to represent his travel time on Tuesday, and so on. Write an equation using ${\displaystyle X_{1}}$, ..., ${\displaystyle X_{5}}$ that represents his travel time for the week, denoted by ${\displaystyle W}$. His total weekly travel time is the sum of the five daily values: ${\displaystyle W=X_{1}+X_{2}+X_{3}+X_{4}+X_{5}}$ Breaking the weekly travel time ${\displaystyle W}$ into pieces provides a framework for understanding each source of randomness and is useful for modeling ${\displaystyle W}$.

It takes John an average of 18 minutes each day to commute to work. What would you expect his average commute time to be for the week? We were told that the average (i.e. expected value) of the commute time is 18 minutes per day: ${\displaystyle E(X_{i})=18}$. To get the expected time for the sum of the five days, we can add up the expected time for each individual day: {\displaystyle {\begin{aligned}E(W)&=E(X_{1}+X_{2}+X_{3}+X_{4}+X_{5})\\&=E(X_{1})+E(X_{2})+E(X_{3})+E(X_{4})+E(X_{5})\\&=18+18+18+18+18=90{\text{ minutes}}\end{aligned}}} The expectation of the total time is equal to the sum of the expected individual times. More generally, the expectation of a sum of random variables is always the sum of the expectation for each random variable.

[elenaIsSellingATVAndBuyingAToasterOvenAtAnAuction] Elena is selling a TV at a cash auction and also intends to buy a toaster oven in the auction. If ${\displaystyle X}$ represents the profit for selling the TV and ${\displaystyle Y}$ represents the cost of the toaster oven, write an equation that represents the net change in Elena’s cash.[47]

Based on past auctions, Elena figures she should expect to make about $175 on the TV and pay about$23 for the toaster oven. In total, how much should she expect to make or spend?[48]

[explainWhyThereIsUncertaintyInTheSum] Would you be surprised if John’s weekly commute wasn’t exactly 90 minutes or if Elena didn’t make exactly 152? Explain.[49] Two important concepts concerning combinations of random variables have so far been introduced. First, a final value can sometimes be described as the sum of its parts in an equation. Second, intuition suggests that putting the individual average values into this equation gives the average value we would expect in total. This second point needs clarification – it is guaranteed to be true in what are called linear combinations of random variables. A of two random variables ${\displaystyle X}$ and ${\displaystyle Y}$ is a fancy phrase to describe a combination ${\displaystyle aX+bY}$ where ${\displaystyle a}$ and ${\displaystyle b}$ are some fixed and known numbers. For John’s commute time, there were five random variables – one for each work day – and each random variable could be written as having a fixed coefficient of 1: ${\displaystyle 1X_{1}+1X_{2}+1X_{3}+1X_{4}+1X_{5}}$ For Elena’s net gain or loss, the ${\displaystyle X}$ random variable had a coefficient of +1 and the ${\displaystyle Y}$ random variable had a coefficient of -1. When considering the average of a linear combination of random variables, it is safe to plug in the mean of each random variable and then compute the final result. For a few examples of nonlinear combinations of random variables – cases where we cannot simply plug in the means – see the footnote.[50] If ${\displaystyle X}$ and ${\displaystyle Y}$ are random variables, then a linear combination of the random variables is given by \displaystyle \begin{aligned} \label{linComboOfRandomVariablesXAndY} aX + bY\end{aligned} where ${\displaystyle a}$ and ${\displaystyle b}$ are some fixed numbers. To compute the average value of a linear combination of random variables, plug in the average of each individual random variable and compute the result: {\displaystyle {\begin{aligned}a\times E(X)+b\times E(Y)\end{aligned}}} Recall that the expected value is the same as the mean, e.g. ${\displaystyle E(X)=\mu _{X}}$. Leonard has invested6000 in Google Inc. (stock ticker: GOOG) and 2000 in Exxon Mobil Corp. (XOM). If ${\displaystyle X}$ represents the change in Google’s stock next month and ${\displaystyle Y}$ represents the change in Exxon Mobil stock next month, write an equation that describes how much money will be made or lost in Leonard’s stocks for the month. For simplicity, we will suppose ${\displaystyle X}$ and ${\displaystyle Y}$ are not in percents but are in decimal form (e.g. if Google’s stock increases 1%, then ${\displaystyle X=0.01}$; or if it loses 1%, then ${\displaystyle X=-0.01}$). Then we can write an equation for Leonard’s gain as {\displaystyle {\begin{aligned}\6000\times X+\2000\times Y\end{aligned}}} If we plug in the change in the stock value for ${\displaystyle X}$ and ${\displaystyle Y}$, this equation gives the change in value of Leonard’s stock portfolio for the month. A positive value represents a gain, and a negative value represents a loss. [expectedChangeInLeonardsStockPortfolio] Suppose Google and Exxon Mobil stocks have recently been rising 2.1% and 0.4% per month, respectively. Compute the expected change in Leonard’s stock portfolio for next month.[51] You should have found that Leonard expects a positive gain in Guided Practice [expectedChangeInLeonardsStockPortfolio]. However, would you be surprised if he actually had a loss this month?[52] ### Variability in linear combinations of random variables Quantifying the average outcome from a linear combination of random variables is helpful, but it is also important to have some sense of the uncertainty associated with the total outcome of that combination of random variables. The expected net gain or loss of Leonard’s stock portfolio was considered in Guided Practice [expectedChangeInLeonardsStockPortfolio]. However, there was no quantitative discussion of the volatility of this portfolio. For instance, while the average monthly gain might be about134 according to the data, that gain is not guaranteed. Figure [changeInLeonardsStockPortfolioFor36Months] shows the monthly changes in a portfolio like Leonard’s during the 36 months from 2009 to 2011. The gains and losses vary widely, and quantifying these fluctuations is important when investing in stocks.

Fichier:Ch probability/figures/changeInLeonardsStockPortfolioFor36Months/changeInLeonardsStockPortfolioFor36Months
caption The change in a portfolio like Leonard’s for the 36 months from 2009 to 2011, where $6000 is in Google’s stock and$2000 is in Exxon Mobil’s.

Just as we have done in many previous cases, we use the variance and standard deviation to describe the uncertainty associated with Leonard’s monthly returns. To do so, the variances of each stock’s monthly return will be useful, and these are shown in Table [sumStatOfGOOGXOM]. The stocks’ returns are nearly independent.

The mean, standard deviation, and variance of the GOOG and XOM stocks. These statistics were estimated from historical stock data, so notation used for sample statistics has been used.
Mean (${\displaystyle {\bar {x}}}$) Standard deviation (${\displaystyle s}$) Variance (${\displaystyle s^{2}}$)
GOOG 0.0210 0.0846 0.0072
XOM 0.0038 0.0519 0.0027

Here we use an equation from probability theory to describe the uncertainty of Leonard’s monthly returns; we leave the proof of this method to a dedicated probability course. The variance of a linear combination of random variables can be computed by plugging in the variances of the individual random variables and squaring the coefficients of the random variables: {\displaystyle {\begin{aligned}Var(aX+bY)=a^{2}\times Var(X)+b^{2}\times Var(Y)\end{aligned}}} It is important to note that this equality assumes the random variables are independent; if independence doesn’t hold, then more advanced methods are necessary. This equation can be used to compute the variance of Leonard’s monthly return: {\displaystyle {\begin{aligned}Var(6000\times X+2000\times Y)&=6000^{2}\times Var(X)+2000^{2}\times Var(Y)\\&=36,000,000\times 0.0072+4,000,000\times 0.0027\\&=270,000\end{aligned}}} The standard deviation is computed as the square root of the variance: ${\displaystyle {\sqrt {270,000}}=\520}$. While an average monthly return of $134 on an$8000 investment is nothing to scoff at, the monthly returns are so volatile that Leonard should not expect this income to be very stable.

The variance of a linear combination of random variables may be computed by squaring the constants, substituting in the variances for the random variables, and computing the result: {\displaystyle {\begin{aligned}Var(aX+bY)=a^{2}\times Var(X)+b^{2}\times Var(Y)\end{aligned}}} This equation is valid as long as the random variables are independent of each other. The standard deviation of the linear combination may be found by taking the square root of the variance.

Suppose John’s daily commute has a standard deviation of 4 minutes. What is the uncertainty in his total commute time for the week? [sdOfJohnsCommuteWeeklyTime] The expression for John’s commute time was {\displaystyle {\begin{aligned}X_{1}+X_{2}+X_{3}+X_{4}+X_{5}\end{aligned}}} Each coefficient is 1, and the variance of each day’s time is ${\displaystyle 4^{2}=16}$. Thus, the variance of the total weekly commute time is {\displaystyle {\begin{aligned}&{\text{variance }}=1^{2}\times 16+1^{2}\times 16+1^{2}\times 16+1^{2}\times 16+1^{2}\times 16=5\times 16=80\\&{\text{standard deviation }}={\sqrt {\text{variance}}}={\sqrt {80}}=8.94\end{aligned}}} The standard deviation for John’s weekly work commute time is about 9 minutes.

The computation in Example [sdOfJohnsCommuteWeeklyTime] relied on an important assumption: the commute time for each day is independent of the time on other days of that week. Do you think this is valid? Explain.[53]

[elenaIsSellingATVAndBuyingAToasterOvenAtAnAuctionVariability] Consider Elena’s two auctions from Guided Practice [elenaIsSellingATVAndBuyingAToasterOvenAtAnAuction] on page . Suppose these auctions are approximately independent and the variability in auction prices associated with the TV and toaster oven can be described using standard deviations of $25 and$8. Compute the standard deviation of Elena’s net gain.[54]

Consider again Guided Practice [elenaIsSellingATVAndBuyingAToasterOvenAtAnAuctionVariability]. The negative coefficient for ${\displaystyle Y}$ in the linear combination was eliminated when we squared the coefficients. This generally holds true: negatives in a linear combination will have no impact on the variability computed for a linear combination, but they do impact the expected value computations.

## Continuous distributions (special topic)

Figure [fdicHistograms] shows a few different hollow histograms of the variable for 3 million US adults from the mid-90’s.[55] How does changing the number of bins allow you to make different interpretations of the data?[usHeights] Adding more bins provides greater detail. This sample is extremely large, which is why much smaller bins still work well. Usually we do not use so many bins with smaller sample sizes since small counts per bin mean the bin heights are very volatile.

Fichier:Ch probability/figures/fdicHistograms/fdicHistograms
caption Four hollow histograms of US adults heights with varying bin widths.

What proportion of the sample is between cm and cm tall (about 5’11“ to 6’1”)?[contDistProb] We can add up the heights of the bins in the range cm and and divide by the sample size. For instance, this can be done with the two shaded bins shown in Figure [usHeightsHist180185]. The two bins in this region have counts of 195,307 and 156,239 people, resulting in the following estimate of the probability: {\displaystyle {\begin{aligned}{\frac {195307+156239}{\text{3,000,000}}}=0.1172\end{aligned}}} This fraction is the same as the proportion of the histogram’s area that falls in the range to cm.

Fichier:Ch probability/figures/usHeightsHist180185/usHeightsHist180185
caption A histogram with bin sizes of 2.5 cm. The shaded region represents individuals with heights between and cm.

### From histograms to continuous distributions

Examine the transition from a boxy hollow histogram in the top-left of Figure [fdicHistograms] to the much smoother plot in the lower-right. In this last plot, the bins are so slim that the hollow histogram is starting to resemble a smooth curve. This suggests the population height as a continuous numerical variable might best be explained by a curve that represents the outline of extremely slim bins.

This smooth curve represents a (also called a or ), and such a curve is shown in Figure [fdicHeightContDist] overlaid on a histogram of the sample. A density has a special property: the total area under the density’s curve is 1.

Fichier:Ch probability/figures/fdicHeightContDist/fdicHeightContDist
caption The continuous probability distribution of heights for US adults.

### Probabilities from continuous distributions

We computed the proportion of individuals with heights to cm in Example [contDistProb] as a fraction: aligned} »): {\displaystyle \begin{aligned} \frac{\text{number of people between \resp{180} and \resp{185}}}{\text{total sample size}}\end{aligned} We found the number of people with heights between and cm by determining the fraction of the histogram’s area in this region. Similarly, we can use the area in the shaded region under the curve to find a probability (with the help of a computer): aligned} »): {\displaystyle \begin{aligned} P(\text{\var{height} between \resp{180} and \resp{185}}) = \text{area between \resp{180} and \resp{185}} = 0.1157\end{aligned} The probability that a randomly selected person is between and cm is 0.1157. This is very close to the estimate from Example [contDistProb]: 0.1172.

Fichier:Ch probability/figures/fdicHeightContDistFilled/fdicHeightContDistFilled
caption Density for heights in the US adult population with the area between 180 and 185 cm shaded. Compare this plot with Figure [usHeightsHist180185].

Three US adults are randomly selected. The probability a single adult is between and cm is 0.1157.[56]

1. What is the probability that all three are between and cm tall?
2. What is the probability that none are between and cm?

What is the probability that a randomly selected person is exactly  cm? Assume you can measure perfectly. [probabilityOfExactly180cm] This probability is zero. A person might be close to cm, but not exactly cm tall. This also makes sense with the definition of probability as area; there is no area captured between  cm and  cm.

Suppose a person’s height is rounded to the nearest centimeter. Is there a chance that a random person’s measured height will be cm?[57]

1. Here are four examples. (i) Whether someone gets sick in the next month or not is an apparently random process with outcomes and . (ii) We can generate a random process by randomly picking a person and measuring that person’s height. The outcome of this process will be a positive number. (iii) Whether the stock market goes up or down next week is a seemingly random process with possible outcomes , , and . Alternatively, we could have used the percent change in the stock market as a numerical outcome. (iv) Whether your roommate cleans her dishes tonight probably seems like a random process with possible outcomes and .
2. (a) The random process is a die roll, and at most one of these outcomes can come up. This means they are disjoint outcomes. (b) ${\displaystyle P(}$ or or ${\displaystyle )=P(}$${\displaystyle )+P(}$${\displaystyle )+P(}$${\displaystyle )={\frac {1}{6}}+{\frac {1}{6}}+{\frac {1}{6}}={\frac {3}{6}}={\frac {1}{2}}}$
3. (a) Yes. Each email is categorized in only one level of . (b) Small: ${\displaystyle {\frac {2827}{3921}}=0.721}$. Big: ${\displaystyle {\frac {545}{3921}}=0.139}$. (c) ${\displaystyle P(}$ or ${\displaystyle )=P(}$${\displaystyle )+P(}$${\displaystyle )=0.721+0.139=0.860}$.
4. (a) ${\displaystyle P(A)=P(}$ or ${\displaystyle )=P(}$${\displaystyle )+P(}$${\displaystyle )={\frac {1}{6}}+{\frac {1}{6}}={\frac {2}{6}}={\frac {1}{3}}}$. (b) Similarly, ${\displaystyle P(B)=1/3}$.
5. (a) Outcomes and . (b) Yes, events ${\displaystyle B}$ and ${\displaystyle D}$ are disjoint because they share no outcomes. (c) The events ${\displaystyle A}$ and ${\displaystyle D}$ share an outcome in common, , and so are not disjoint.
6. Since ${\displaystyle B}$ and ${\displaystyle D}$ are disjoint events, use the Addition Rule: ${\displaystyle P(B}$ or ${\displaystyle D)=P(B)+P(D)={\frac {1}{3}}+{\frac {1}{3}}={\frac {2}{3}}}$.
7. The 52 cards are split into four : ${\displaystyle \clubsuit }$ (club), ${\displaystyle \diamondsuit }$ (diamond), ${\displaystyle \heartsuit }$ (heart), ${\displaystyle \spadesuit }$ (spade). Each suit has its 13 cards labeled: , , ..., , (jack), (queen), (king), and (ace). Thus, each card is a unique combination of a suit and a label, e.g. and . The 12 cards represented by the jacks, queens, and kings are called . The cards that are ${\displaystyle \diamondsuit }$ or ${\displaystyle \heartsuit }$ are typically colored red while the other two suits are typically colored black.
8. (a) There are 52 cards and 13 diamonds. If the cards are thoroughly shuffled, each card has an equal chance of being drawn, so the probability that a randomly selected card is a diamond is ${\displaystyle P({\color {redcards}\diamondsuit })={\frac {13}{52}}=0.250}$. (b) Likewise, there are 12 face cards, so ${\displaystyle P(}$face card${\displaystyle )={\frac {12}{52}}={\frac {3}{13}}=0.231}$.
9. (a) If ${\displaystyle A}$ and ${\displaystyle B}$ are disjoint, ${\displaystyle A}$ and ${\displaystyle B}$ can never occur simultaneously. (b) If ${\displaystyle A}$ and ${\displaystyle B}$ are disjoint, then the last term of Equation ([generalAdditionRule]) is 0 (see part (a)) and we are left with the Addition Rule for disjoint events.
10. 0.65 Both the counts and corresponding probabilities (e.g. ${\displaystyle 2659/3921=0.678}$) are shown. Notice that the number of emails represented in the left circle corresponds to ${\displaystyle 2659+168=2827}$, and the number represented in the right circle is ${\displaystyle 168+199=367}$.   0.3 image
11. (a) The solution is represented by the intersection of the two circles: 0.043. (b) This is the sum of the three disjoint probabilities shown in the circles: ${\displaystyle 0.678+0.043+0.051=0.772}$.
12. The probabilities of (a) do not sum to 1. The second probability in (b) is negative. This leaves (c), which sure enough satisfies the requirements of a distribution. One of the three was said to be the actual distribution of US household incomes, so it must be (c).
13. (a) The outcomes are disjoint and each has probability ${\displaystyle 1/6}$, so the total probability is ${\displaystyle 4/6=2/3}$. (b) We can also see that ${\displaystyle P(D)={\frac {1}{6}}+{\frac {1}{6}}=1/3}$. Since ${\displaystyle D}$ and ${\displaystyle D^{c}}$ are disjoint, ${\displaystyle P(D)+P(D^{c})=1}$.
14. Brief solutions: (a) ${\displaystyle A^{c}=\{}$, , , ${\displaystyle \}}$ and ${\displaystyle B^{c}=\{}$, , , ${\displaystyle \}}$. (b) Noting that each outcome is disjoint, add the individual outcome probabilities to get ${\displaystyle P(A^{c})=2/3}$ and ${\displaystyle P(B^{c})=2/3}$. (c) ${\displaystyle A}$ and ${\displaystyle A^{c}}$ are disjoint, and the same is true of ${\displaystyle B}$ and ${\displaystyle B^{c}}$. Therefore, ${\displaystyle P(A)+P(A^{c})=1}$ and ${\displaystyle P(B)+P(B^{c})=1}$.
15. (a) The complement of ${\displaystyle A}$: when the total is equal to . (b) ${\displaystyle P(A^{c})=1/36}$. (c) Use the probability of the complement from part (b), ${\displaystyle P(A^{c})=1/36}$, and Equation ([complement]): ${\displaystyle P(}$less than ${\displaystyle )=1-P(}$${\displaystyle )=1-1/36=35/36}$.
16. (a) First find ${\displaystyle P(}$${\displaystyle )=5/36}$, then use the complement: ${\displaystyle P(}$not ${\displaystyle )=1-P(}$${\displaystyle )=31/36}$. (b) First find the complement, which requires much less effort: ${\displaystyle P(}$ or ${\displaystyle )=1/36+2/36=1/12}$. Then calculate ${\displaystyle P(B)=1-P(B^{c})=1-1/12=11/12}$. (c) As before, finding the complement is the clever way to determine ${\displaystyle P(D)}$. First find ${\displaystyle P(D^{c})=P(}$ or ${\displaystyle )=2/36+1/36=1/12}$. Then calculate ${\displaystyle P(D)=1-P(D^{c})=11/12}$.
17. (a) The probability the first person is left-handed is ${\displaystyle 0.09}$, which is the same for the second person. We apply the Multiplication Rule for independent processes to determine the probability that both will be left-handed: ${\displaystyle 0.09\times 0.09=0.0081}$. (b) It is reasonable to assume the proportion of people who are ambidextrous (both right and left handed) is nearly 0, which results in ${\displaystyle P(}$right-handed${\displaystyle )=1-0.09=0.91}$. Using the same reasoning as in part (a), the probability that both will be right-handed is ${\displaystyle 0.91\times 0.91=0.8281}$.
18. (a) The abbreviations and are used for right-handed and left-handed, respectively. Since each are independent, we apply the Multiplication Rule for independent processes: aligned} »): {\displaystyle \begin{aligned} P(\text{all five are \resp{RH}}) &= P(\text{first = \resp{RH}, second = \resp{RH}, ..., fifth = \resp{RH}}) \\ &= P(\text{first = \resp{RH}})\times P(\text{second = \resp{RH}})\times \dots \times P(\text{fifth = \resp{RH}}) \\ &= 0.91\times 0.91\times 0.91\times 0.91\times 0.91 = 0.624\end{aligned} (b) Using the same reasoning as in (a), ${\displaystyle 0.09\times 0.09\times 0.09\times 0.09\times 0.09=0.0000059}$ (c) Use the complement, ${\displaystyle P(}$all five are ${\displaystyle )}$, to answer this question: aligned} »): {\displaystyle \begin{aligned} P(\text{not all \resp{RH}}) = 1 - P(\text{all \resp{RH}}) = 1 - 0.624 = 0.376\end{aligned}
19. The actual proportion of the U.S. population that is is about 50%, and so we use 0.5 for the probability of sampling a woman. However, this probability does differ in other countries.
20. Brief answers are provided. (a) This can be written in probability notation as ${\displaystyle P(}$a randomly selected person is male and right-handed${\displaystyle )=0.455}$. (b) 0.207. (c) 0.045. (d) 0.0093.
21. A simulated data set based on real population summaries at .
22. Each of the four outcome combination are disjoint, all probabilities are indeed non-negative, and the sum of the probabilities is ${\displaystyle 0.29+0.06+0.27+0.38=1.00}$.
23. (a) $\displaystyle P(\text{\var{parents} \resp{not}}\ |\ \text{\var{teen} \resp{not}})$ . (b) Equation ([condProbEq]) for conditional probability indicates we should first find $\displaystyle P(\text{\var{parents} \resp{not} and \var{teen} \resp{not}}) = 0.38$ and $\displaystyle P(\text{\var{teen} \resp{not}}) = 0.44$ . Then the ratio represents the conditional probability: ${\displaystyle 0.38/0.44=0.864}$.
24. (a) This probability is $\displaystyle \frac{P(\text{\var{parents} \resp{degree}, \var{teen} \resp{not}})}{P(\text{\var{teen} \resp{not}})} = \frac{0.06}{0.44} = 0.136$ . (b) The total equals 1. (c) Under the condition the teenager didn’t attend college, the parents must either have a college degree or not. The complement still works for conditional probabilities, provided the probabilities are conditioned on the same information.
25. No. While there is an association, the data are observational. Two potential confounding variables include and . Can you think of others?
26. Fenner F. 1988. Smallpox and Its Eradication (History of International Public Health, No. 6). Geneva: World Health Organization. ISBN 92-4-156110-6.
27. ${\displaystyle P(}$ = ${\displaystyle |}$ = $\displaystyle ) = \frac{P(\text{\var{result} = \resp{died} and \var{inoculated} = \resp{no}})}{P(\text{\var{inoculated} = \resp{no}})} = \frac{0.1356}{0.9608} = 0.1411$ .
28. ${\displaystyle P(}$ = ${\displaystyle |}$ = $\displaystyle ) = \frac{P(\text{\var{result} = \resp{died} and \var{inoculated} = \resp{yes}})}{P(\text{\var{inoculated} = \resp{yes}})} = \frac{0.0010}{0.0392} = 0.0255$ . The death rate for individuals who were inoculated is only about 1 in 40 while the death rate is about 1 in 7 for those who were not inoculated.
29. Brief answers: (a) Observational. (b) No, we cannot infer causation from this observational study. (c) Accessibility to the latest and best medical care. There are other valid answers for part (c).
30. The answer is 0.0382, which can be verified using Table [smallpoxProbabilityTable].
31. There were only two possible outcomes: or . This means that 100% - 97.45% = 2.55% of the people who were inoculated died.
32. The samples are large relative to the difference in death rates for the “inoculated” and “not inoculated” groups, so it seems there is an association between and . However, as noted in the solution to Guided Practice [SmallpoxInoculationObsExpExercise], this is an observational study and we cannot be sure if there is a causal connection. (Further research has shown that inoculation is effective at reducing death rates.)
33. Brief solutions: (a) ${\displaystyle 1/6}$. (b) ${\displaystyle 1/36}$. (c) $\displaystyle \frac{P(Y = \text{ \resp{1} and }X=\text{ \resp{1}})}{P(X=\text{ \resp{1}})} = \frac{1/36}{1/6} = 1/6$ . (d) The probability is the same as in part (c): ${\displaystyle P(Y=1)=1/6}$. The probability that ${\displaystyle Y=1}$ was unchanged by knowledge about ${\displaystyle X}$, which makes sense as ${\displaystyle X}$ and ${\displaystyle Y}$ are independent.
34. He has forgotten that the next roulette spin is independent of the previous spins. Casinos do employ this practice; they post the last several outcomes of many betting games to trick unsuspecting gamblers into believing the odds are in their favor. This is called the .
35. 0.47 (a) The tree diagram is shown to the right. (b) Identify which two joint probabilities represent students who passed, and add them: ${\displaystyle P(}$passed${\displaystyle )=0.7566+0.1254=0.8820}$. (c) ${\displaystyle P(}$construct tree diagram ${\displaystyle |}$ passed${\displaystyle )={\frac {0.7566}{0.8820}}=0.8578}$.
0.5 image
36. The probabilities reported here were obtained using studies reported at and .
37. 0.47 The tree diagram, with three primary branches, is shown to the right. Next, we identify two probabilities from the tree diagram. (1) The probability that there is a sporting event and the garage is full: 0.14. (2) The probability the garage is full: ${\displaystyle 0.0875+0.14+0.0225=0.25}$. Then the solution is the ratio of these probabilities: ${\displaystyle {\frac {0.14}{0.25}}=0.56}$. If the garage is full, there is a 56% probability that there is a sporting event.
0.5 image
38. Short answer: {\displaystyle {\begin{aligned}P(A_{2}|B)&={\frac {P(B|A_{2})P(A_{2})}{P(B|A_{1})P(A_{1})+P(B|A_{2})P(A_{2})+P(B|A_{3})P(A_{3})}}\\&={\frac {(0.25)(0.35)}{(0.7)(0.2)+(0.25)(0.35)+(0.05)(0.45)}}\\&=0.35\end{aligned}}}
39. Each probability is conditioned on the same information that the garage is full, so the complement may be used: ${\displaystyle 1.00-0.56-0.35=0.09}$.
40. The three probabilities we computed were actually one marginal probability, ${\displaystyle P(}$${\displaystyle =}$${\displaystyle )}$, and two conditional probabilities: aligned} »): {\displaystyle \begin{aligned} &&P(\text{\var{Q2}} = \text{\resp{not\_\hspace{0.3mm}picked} }|\text{ \var{Q1}} = \text{\resp{not\_\hspace{0.3mm}picked}}) \\ &&P(\text{\var{Q3}} = \text{\resp{not\_\hspace{0.3mm}picked} }|\text{ \var{Q1}} = \text{\resp{not\_\hspace{0.3mm}picked}, }\text{\var{Q2}} = \text{\resp{not\_\hspace{0.3mm}picked}})\end{aligned} Using the General Multiplication Rule, the product of these three probabilities is the probability of not being picked in 3 questions.
41. ${\displaystyle P(}$being picked to answer all three questions${\displaystyle )=\left({\frac {1}{15}}\right)^{3}=0.00030}$.
42. (a) First determine the probability of not winning. The tickets are sampled without replacement, which means the probability you do not win on the first draw is ${\displaystyle 29/30}$, ${\displaystyle 28/29}$ for the second, ..., and ${\displaystyle 23/24}$ for the seventh. The probability you win no prize is the product of these separate probabilities: ${\displaystyle 23/30}$. That is, the probability of winning a prize is ${\displaystyle 1-23/30=7/30=0.233}$. (b) When the tickets are sampled with replacement, there are seven independent draws. Again we first find the probability of not winning a prize: ${\displaystyle (29/30)^{7}=0.789}$. Thus, the probability of winning (at least) one prize when drawing with replacement is 0.211.
43. There is about a 10% larger chance of winning a prize when using sampling without replacement. However, at most one prize may be won under this sampling procedure.
44. If they sell a little more or a little less, this should not be a surprise. Hopefully Chapter [introductionToData] helped make clear that there is natural variability in observed data. For example, if we would flip a coin 100 times, it will not usually come up heads exactly half the time, but it will probably be close.
45. ${\displaystyle \mu =\int xf(x)dx}$ where ${\displaystyle f(x)}$ represents a function for the density curve.
46. (a) 100% - 25% - 60% = 15% of students do not buy any books for the class. Part (b) is represented by the first two lines in the table below. The expectation for part (c) is given as the total on the line ${\displaystyle y_{i}\times P(Y=y_{i})}$. The result of part (d) is the square-root of the variance listed on in the total on the last line: ${\displaystyle \sigma ={\sqrt {Var(Y)}}=\69.28}$.
${\displaystyle i}$ (scenario) 1 () 2 () 3 () Total
${\displaystyle y_{i}}$ 0.00 159.00 200.00
${\displaystyle P(Y=y_{i})}$ 0.15 0.25 0.60
${\displaystyle y_{i}\times P(Y=y_{i})}$ 0.00 39.75 120.00 ${\displaystyle E(Y)=159.75}$
${\displaystyle y_{i}-E(Y)}$ -159.75 -0.75 40.25
${\displaystyle (y_{i}-E(Y))^{2}}$ 25520.06 0.56 1620.06
${\displaystyle (y_{i}-E(Y))^{2}\times P(Y)}$ 3828.0 0.1 972.0 ${\displaystyle Var(Y)\approx 4800}$
47. She will make ${\displaystyle X}$ dollars on the TV but spend ${\displaystyle Y}$ dollars on the toaster oven: ${\displaystyle X-Y}$.
48. ${\displaystyle E(X-Y)=E(X)-E(Y)=175-23=\152}$. She should expect to make about 152. 49. No, since there is probably some variability. For example, the traffic will vary from one day to next, and auction prices will vary depending on the quality of the merchandise and the interest of the attendees. 50. If ${\displaystyle X}$ and ${\displaystyle Y}$ are random variables, consider the following combinations: ${\displaystyle X^{1+Y}}$, ${\displaystyle X\times Y}$, ${\displaystyle X/Y}$. In such cases, plugging in the average value for each random variable and computing the result will not generally lead to an accurate average value for the end result. 51. ${\displaystyle E(\6000\times X+\2000\times Y)=\6000\times 0.021+\2000\times 0.004=\134}$. 52. No. While stocks tend to rise over time, they are often volatile in the short term. 53. One concern is whether traffic patterns tend to have a weekly cycle (e.g. Fridays may be worse than other days). If that is the case, and John drives, then the assumption is probably not reasonable. However, if John walks to work, then his commute is probably not affected by any weekly traffic cycle. 54. The equation for Elena can be written as {\displaystyle {\begin{aligned}(1)\times X+(-1)\times Y\end{aligned}}} The variances of ${\displaystyle X}$ and ${\displaystyle Y}$ are 625 and 64. We square the coefficients and plug in the variances: {\displaystyle {\begin{aligned}(1)^{2}\times Var(X)+(-1)^{2}\times Var(Y)=1\times 625+1\times 64=689\end{aligned}}} The variance of the linear combination is 689, and the standard deviation is the square root of 689: about26.25.
55. This sample can be considered a simple random sample from the US population. It relies on the USDA Food Commodity Intake Database.
56. Brief answers: (a) ${\displaystyle 0.1157\times 0.1157\times 0.1157=0.0015}$. (b) ${\displaystyle (1-0.1157)^{3}=0.692}$
57. This has positive probability. Anyone between cm and cm will have a measured height of cm. This is probably a more realistic scenario to encounter in practice versus Example [probabilityOfExactly180cm].