This set of lecture notes uses data on incoming emails for the first three months of 2012 for David Diez’s (An Open Intro Statistics Textbook author) Gmail Account, early months of 2012. All personally identifiable information has been removed.
email <- read.delim("https://norcalbiostat.netlify.com/data/email.txt"
, header=TRUE, sep="\t") email <- email %>% mutate(hasnum = ifelse(number %in% c("big", "small"), 1, 0)) Two categorical variables of current interest are
spam (0/1 binary indicator if a an email is flagged as spam).