Dr. Mine Dogucu

Belief in afterlife | |||
---|---|---|---|

Taken a college science class | Yes | No | Total |

Yes | 2702 | 634 | 3336 |

No | 3722 | 837 | 4559 |

Total | 6424 | 1471 | 7895 |

Data from General Social Survey

\(P(\text{belief in afterlife})\) = ? \(P(\text{belief in afterlife and taken a college science class})\) = ?

\(P(\text{belief in afterlife given taken a college science class})\) = ?

Calculate these probabilities and write them using correct notation. Use \(A\) for belief in afterlife and \(B\) for college science class.

Belief in afterlife | |||
---|---|---|---|

Taken a college science class | Yes | No | Total |

Yes | 2702 | 634 | 3336 |

No | 3722 | 837 | 4559 |

Total | 6424 | 1471 | 7895 |

Data from General Social Survey

\(P(\text{belief in afterlife})\) = ?

\(P(A) = \frac{6424}{7895}\)

\(P(A)\) represents a **marginal probability**. So do \(P(B)\), \(P(A^C)\) and \(P(B^C)\). In order to calculate these probabilities we could only use the values in the margins of the contingency table, hence the name.

Belief in afterlife | |||
---|---|---|---|

Taken a college science class | Yes | No | Total |

Yes | 2702 | 634 | 3336 |

No | 3722 | 837 | 4559 |

Total | 6424 | 1471 | 7895 |

Data from General Social Survey

\(P(\text{belief in afterlife and taken a college science class})\) = ? \(P(A \text{ and } B) = P(A \cap B) = \frac{2702}{7895}\)

\(P(A \cap B)\) represents a **joint probability**. So do \(P(A^c \cap B)\), \(P(A\cap B^c)\) and \(P(B^c\cap B^c)\).

Note that \(P(A\cap B) = P(B\cap A)\). Order does *not* matter.

Belief in afterlife | |||
---|---|---|---|

Taken a college science class | Yes | No | Total |

Yes | 2702 | 634 | 3336 |

No | 3722 | 837 | 4559 |

Total | 6424 | 1471 | 7895 |

Data from General Social Survey

\(P(\text{belief in afterlife given taken a college science class})\) = ? \(P(A \text{ given } B) = P(A | B) = \frac{2702}{3336}\)

\(P(A|B)\) represents a **conditional probability**. So do \(P(A^c|B)\), \(P(A | B^c)\) and \(P(A^c|B^c)\). In order to calculate these probabilities we would focus on the row or the column of the given information. In a way we are *reducing* our sample space to this given information only.

\(P(\text{attending every class | getting an A}) \neq\) \(P(\text{getting an A | attending every class})\)

The order matters!

\(P(A^C)\) is called **complement** of event A and represents the probability of selecting someone that does not believe in afterlife.

The notes for this lecture are derived from Section 2.1 of the Bayes Rules! book

Priya, a data science student, notices that her college’s email server is using a faulty spam filter. Taking matters into her own hands, Priya decides to build her own spam filter. As a first step, she manually examines all emails she received during the previous month and determines that 40% of these were spam.

Let event B represent an event of an email being spam.

\(P(B) = 0.40\)

If Priya was to act on this prior what should she do about incoming emails?

Since most email is non-spam, sort all emails into the inbox.

This filter would certainly solve the problem of losing non-spam email in the spam folder, but at the cost of making a mess in Priya’s inbox.

Priya realizes that some emails are written in all capital letters (“all caps”) and decides to look at some data. In her one-month email collection, 20% of spam but only 5% of non-spam emails used all caps.

Using notation:

\(P(A|B) = 0.20\)

\(P(A|B^c) = 0.05\)

Which of the following best describes your posterior understanding of whether the email is spam?

- The chance that this email is spam drops from 40% to 20%. After all, the subject line might indicate that the email was sent by an excited professor that’s offering Priya an automatic “A” in their course!

- The chance that this email is spam jumps from 40% to roughly 70%. Though using all caps is more common among spam emails, let’s not forget that only 40% of Priya’s emails are spam.

- The chance that this email is spam jumps from 40% to roughly 95%. Given that so few non-spam emails use all caps, this email is almost certainly spam.

event | \(B\) | \(B^c\) | Total |
---|---|---|---|

probability | 0.4 | 0.6 | 1 |

Looking at the conditional probabilities

\(P(A|B) = 0.20\)

\(P(A|B^c) = 0.05\)

we can conclude that all caps is more common among spam emails than non-spam emails. Thus, the email is more **likely** to be spam.

Consider likelihoods \(L(.|A)\):

\(L(B|A) := P(A|B)\) and \(L(B^c|A) := P(A|B^c)\)

When \(B\) is known, the **conditional probability function** \(P(\cdot | B)\) allows us to compare the probabilities of an unknown event, \(A\) or \(A^c\), occurring with \(B\):

\[P(A|B) \; \text{ vs } \; P(A^c|B) \; .\]

When \(A\) is known, the **likelihood function** \(L( \cdot | A) := P(A | \cdot)\) allows us to compare the likelihoods of different unknown scenarios, \(B\) or \(B^c\), producing data \(A\):

\[L(B|A) \; \text{ vs } \; L(B^c|A) \; .\] Thus the likelihood function provides the tool we need to evaluate the relative compatibility of events \(B\) or \(B^c\) with data \(A\).

\(P(B|A) = \frac{P(A\cap B)}{P(A)}\)

\(P(B|A) = \frac{P(B)P(A|B)}{P(A)}\)

\(P(B|A) = \frac{P(B)L(B|A)}{P(A)}\)

Recall Law of Total Probability,

\(P(A) = P(A\cap B) + P(A\cap B^c)\)

\(P(A) = P(A|B)P(B) + P(A|B^c)P(B^c)\)

\(P(B|A) = \frac{P(B)L(B|A)}{P(A|B) P(B)+P(A|B^c) P(B^c)}\)

\(P(B) = 0.40\)

\(P(A|B) = 0.20\)

\(P(A|B^c) = 0.05\)

\(P(B|A) = \frac{0.40 \cdot 0.20}{(0.20 \cdot 0.40) + (0.05 \cdot 0.60)}\)

event | \(B\) | \(B^c\) | Total |
---|---|---|---|

prior probability | 0.4 | 0.6 | 1 |

posterior probability | 0.72 | 0.18 | 1 |

event | \(B\) | \(B^c\) | Total |
---|---|---|---|

prior probability | 0.4 | 0.6 | 1 |

likelihood | 0.20 | 0.05 | 0.25 |

posterior probability | 0.72 | 0.18 | 1 |

\[P(B |A) = \frac{P(B)L(B|A)}{P(A)}\]

\[\text{posterior} = \frac{\text{prior}\cdot\text{likelihood}}{\text{marginal probability}}\]

\[\text{posterior} = \frac{\text{prior}\cdot\text{likelihood}}{\text{normalizing constant}}\]