Topic: Here ya go
 Message: Posted by: LobowolfXXX (Jul 14, 2007 01:04AM)
Ok, as I prepare to take the California bar, my head is too full of law to think about brainteasers, so I leave this one (for now) to my Café buddies...and maybe it will help me!

Applicant Jones is taking a 200-question multiple choice test. Each question has 4 answer choices listed, and the correct answer for each question is randomly determined -- either A, B, C, or D -- indepdently of the answer choices of the other questions.

Jones has about a 75% chance of getting any given question right. After making a first pass through the answer sheet, he's left about 10 questions blank, and on these, he has no reasonable guess or way to eliminate any answer choice. Looking back at the questions he has answered, he notices a disproportionately high percentage of his choices have been "C." A disproportionately low percentage have been "D." The other choices are equally represented. Call it (out of 190) 48-48-58-36.

There is no penalty for wrong answers, so it behooves Jones to guess. Is the distribution of his past answers relevant to any strategy he should implement for the final 10 guesses?

If his educated guess about one of the remaining questions was: 25% A, 25%B, 30%C, and 20% D, and the previous distribution is as above, what should he guess for this question? (This is to say: Assess the benefits of knowledge of the subject matter vs. the intuitive inclination to try to "even out" the answer choices.)
 Message: Posted by: Daegs (Jul 14, 2007 06:09AM)
This would be the same as guessing on a coin flip after 5 heads in a row, no?
 Message: Posted by: Psy-Kosh (Jul 15, 2007 12:51AM)
If they're independant, they're independant. Period.

Now, does he happen to know as fact that the system that decided which one of each of the four options would end up as the correct one selected each with equal probability?

If yes, then the distrubution of the others is completely irrelevent. If not, then one may wish to take into account the observed distribution.

But without detailed math, one can note a simple fact: If for those ten questions he has _no clue_ as to what the answer may be... completely and utterly stumped, then, unles there is a greater penalty for a wrong answer than a blank answer, he ought to choose C for all of them, as per the observed distribution.

Ie, if the distribution is such that each one is equal probability, then that'll give the same expected distribution of right answers among those ten, and if not, well, the observed distribution would imply that C has a slightly higher chance of being prefered... (actually working out the expected values and stuff involves some Bayesian mojo, but the fact that it's higher, and he has _no_ preference for any of the answers for those ten questions, is sufficient.)

If he has some preference/suspicion/educated guesses, then there would be a bit more calculation involved.

Interestingly enough, in practice, they may not actually be independant. What I mean is that I at least remember hearing something about how random distributions produce longer runs of the same result than one would intuitively expect, and on standardized tests, students were getting paranoid that they were messing up and started answering differently after too long of a run, so to deal with this issue, the test makers have started actually making the distribution less independant, deliberately giving preference to answers different than the previous result, to make it "look" more random, even though it's effectively less random in fact.