How were these p-values calculated? Well, because I said "Fun with math" we won't do any *actual* math. Suffice it to say, the problem is exactly the same as calculating probabilities in poker (see the hypergeometric distribution). That is, saying "How likely is it that 5 of my 12 up-regulated genes are also one of the 16 genes with the GO term 'glucose-metabolism' out of all known genes" is the same thing as asking "How likely is it that 2 of my 5 cards are also one of the 4 aces in a standard deck". People went on like this, quite happily for a number of years. Problematically, the statistics were wrong.
Let's get back to poker. Take it from me that (or look it up on wikipedia) the probability (by the hypergeometric distribution) of you getting a pair of deuces is 0.02. This, fantastically, is < 0.05 and therefor statistically significant (remember, in science, 0.05 is a magic number that all things must be lower than)!!! Logically, then, whenever you have 2-twos you should all you go bet your house and mortgage your baby, right? Well, if you're playing against me, yeah, that's right! Also, you can stop reading now.... Unfortunately, 50% of all hands will give you something better than a pair of twos, and you will lose all your money (and baby). This is why finding that 5 of your 12 genes ISN'T NECISSARILY that interesting, because maybe, on a different day with slightly different genes you would have found something EVEN MORE interesting than something with a p-value of 10^-10!
So along came GoMiner, and GoMiner said, "Wait, what if we figure out, if you had a random set of genes, how often you'd find a set of go-terms that gave you a lower p-value than you got with your actual, experimentally determined genes". What we're really calculating here is the false discovery rate (FDR). The FDR is usually the same thing as the p-value. Indeed, if, before we conducted the experiment, we KNEW and hypothesized that the glucose metabolism genes were involved we would be able to accept the p-value (10^-10) given to us, but because we were willing to accept ANYTHING (i.e. a pair of twos, a couple fours, a straight, ANYTHING) we actually have to see how likely it is that we could have found something at least as significant as the glucose thing. This CAN be calculated exactly, but it's really hard. So it's typically done by simulation (randomly pick 12 genes 1000 times and see how often interesting GO terms come up). What often happens is that we find it's really rather common (i.e. FDR > 0.05) to get a very very low p-value given any particular set of genes based on go-terms. This typically ruins a lot of people's results, but there you have it. Rules to live by: Beware of astonishingly low p-values, you probably did something wrong.
So... problem solved, right? Well no. Because we're doing it all over again with exome sequencing and god-bloody-damn "pathway" analysis. See, at the end of an exome experiemnt (lets say, sequencing a cohort of 100 people with some disease) if you dont' end up with 1 or 2 genes shared across your cohort you end up with a list of several dozen (hundreds?) of disparate genes. The first thing everyone thinks is "What if instead of a single gene, there's some pathway(s) shared across the cohort?! of course! That's what I knew all along." -- They then use some pathway analysis software (or GO-terms) and find, lo-and-behold that they have a glucose metabolism pathway with a p-value of 10^-10! Problem solved, right? Well no. Unfortunately many of those pathway analysis tools make the same mistake with p-values that we did back-in-the-day with gene expression microarray analysis. So why don't they just calculated FDRs like GoMiner does? My guess is that if they did they would find not many of their associations held up and had ludicrously high FDRs... and who would pay for software like that? Alternatively they could just be ignorant of this problem. I am not sure which is better.