We're doing something similar to this at our office this year. The wrinkles are that it isn't a standard tournament bracket point structure, and nobody is sharing their algorithms.
The point structure is that each person picks 8 of the 64 teams. For every game that those 8 teams win, the person gets the team's seed added to their point total. This makes the team selection a very, very interesting exercise: pick a high seeded team and be very confident of getting a few points, or pick a lower seeded team and risk getting no points, but perhaps win the whole thing with that one team.
I bought a copy (price: one starbucks coffee) of 24 seasons worth of data from a dependable source (he knows i'll poison his next coffee if he made a mistake). Now the issue I'm grappling with is that he and I both know we have the same data, so a game theory element is added.
For those who are interested, it turns out that 3 seeds and 6 seeds have historically done fairly well, and I've cooked up some rationalizations that I like for why this might be the case, but then we'll just see what happens this time around.
Any suggestions on where we can get relevant data to feed to our algorithm? (preferably in a machine readable format. I'd rather not scrape ESPN.com or something)
Also, the password to the jottit page you link to isn't "hackernews" like it says.
I've been following my own algorithm for ranking teams since the start of the NBA season. USA Today has the best computer readable format that I could find.
I haven't found a great site for scrapable stats. I'm using http://msn.foxsports.com/cbk/stats which is at least a step up from ESPN's stats.
If anyone has a better source, I'd be very interested as well. And if anyone wants to create one -- I guarantee it will make money from sports touts, especially if you niche down to one sport and charge less than statfox.com
OK - screw jottit then. It's set to 'anyone can view and edit' and i've changed the password to hackernews like 5 times.
Can someone recommend a similar site?
In the meantime -- don't even worry about posting the algorithm. If your bracket starts to kick ass, just email me with your description or a link to your personal blog post about it, and we'll have a writeup of the top performers. Or -- easiest solution yet -- just post it as a message in the Yahoo! group.
That's probably the best strategy without putting much effort, but I bet some enterprising HN user will come up with a more complex strategy that has a higher expected value.
The hardest part of that is actually finding all the relevant historical data & getting it into a good form. Anyway, I'm not about to spend hours on picking a bracket, so I'll leave that to someone else.
The Logistic Regression Markov Chain developed at Georgia Tech to predict results of NCAA tournament games does a good job at picking both upsets and late round game winners. A bit soft in the middle rounds.
The cleverest dead-simple algorithm I've seen is to favor the team whose head coach has the highest salary. When I first heard of this, it did pretty well that year.
The point structure is that each person picks 8 of the 64 teams. For every game that those 8 teams win, the person gets the team's seed added to their point total. This makes the team selection a very, very interesting exercise: pick a high seeded team and be very confident of getting a few points, or pick a lower seeded team and risk getting no points, but perhaps win the whole thing with that one team.
I bought a copy (price: one starbucks coffee) of 24 seasons worth of data from a dependable source (he knows i'll poison his next coffee if he made a mistake). Now the issue I'm grappling with is that he and I both know we have the same data, so a game theory element is added.
For those who are interested, it turns out that 3 seeds and 6 seeds have historically done fairly well, and I've cooked up some rationalizations that I like for why this might be the case, but then we'll just see what happens this time around.