Short report #1: The Case of the Mystery Coins

This is a short homework assignment, to help calibrate the amount of detail you'll be putting into reports for this class.

The homework you hand in will be both an Rmarkdown document and the html report that it creates. This report should be readable (by a statistically-literate reader, but not one who has read the homework assignment), should unambiguously describe the steps you have done, address assumptions and/or limitations where necessary, and describe the results. The report should be self-contained, including all R code, with the possible exception of scripts that take a long time to run and/or a file containing auxiliary helper functions (that should still be described as necessary). It should read like a walk-through of the problem and how to use R to solve it. Please let me know if you have questions.

The problem.

set.seed(32)
ncoins <- 100
true_alpha <- 3
true_beta <- 2
probs <- rbeta(ncoins, shape1=true_alpha, shape2=true_beta)
nflips <- rnbinom(ncoins, size=5, prob=0.4)
nheads <- rbinom(ncoins, size=nflips, prob=probs)
outfile <- file("coin_flips.txt", open="w")
outliers <- 32
outlier_num <- c(max(nflips)-1, 1)
for (k in 1:ncoins) {
    if (k %in% outliers) {
        cat(paste0(paste(sample(c(rep("H",outlier_num[1]), rep("T",outlier_num[2]))), collapse=''), "\n"), file=outfile)
    } else {
        cat(paste0(paste(sample(c(rep("H",nheads[k]), rep("T",nflips[k]-nheads[k]))), collapse=''), "\n"), file=outfile)
    }
}
close(outfile)

I have a sizeable collection of old coins, some of strange shapes. (Not really, but imagine.) I have selected 100 of these and flipped each one some number of times: some more, some less. The resulting data is in the file coin_flips.txt, with one row per coin. Each coin has in intrinsic probability, \(\theta\), with which it comes up Heads; let's assume that the distribution of these probabilities across my collection of coins is close to a Beta distribution. This means that the number of Heads in each row of the data file has a https://en.wikipedia.org/wiki/Beta-binomial_distribution. The overall goal is to infer this distribution of \(\theta\) values, i.e., the values of \(\alpha\) and \(\beta\) in that distribution.

Your report should:

  1. Read in and describe the data.

  2. Use maximum likelihood to estimate the shape of the Beta distribution describing the distribution of \(\theta\) across coins (you'll need to write the log-likelihood function yourself; lchoose() and lbeta() may be useful).

  3. Assess goodness of fit by simulating data under the maximum likelihood parameter values, and comparing the observed spread of proportions to that seen in the simulated data.