This is a long one, so grab your espresso now...
I was wondering recently if we could look at the seasons VDS scores and determine if there was any predictions that we could make by group (by price). A note about these groupings: I have grouped some of the higher priced riders together so that I could actually see patterns, so the groups are 20-28, 16-18, 12-14, 8-10, 6, 4, 2 and 1. And yes I have and will continue to ignore both Contador and Valverde.
Below the flip, I compare these two groups using two separate approaches.
First of all, a theory of pricing (at least in my head) would be that higher priced riders should bring in more points. So to look at this we can calculate the average Points per price for each group (PPP = sum(points)/sum(price)).
Here is a graph of points plotted against PPP. I theory the line should be generally from bottom left to upper right if increased price predicts increased income (hmmm... apparently I don't know how to show graphs made in google docs...help anyone?). So here's the table for said graph:
What you might notice (which is screamingly obvious in the graph) is that for the most part, the more you spend on a rider, the more you get. However, there are two exceptions, the 4 pt'ers are higher than expected, and the 16-18 pt'ers are lower. The fact that the average points per price is slightly lower for the 16-18's as the 2's is kind of embarrassing.
But averages can only tell you us much, so I looked at each group's distributions. To do this I took the full range of scores in a group, and split that into 4. Then I assigned each of the riders in the group to one of the 4 quartiles based on their score and looked at the distribution amongst the four quartiles. If the score in a group are normally distributed., then you would expect that the quartiles to look something like this:
with more of the riders clustered around the average score. i will go through each group in turn. Percentages may not add to 100 because I am a lazy rounder. Numbers in brackets are the score range for that quartile.
In my initial analysis I used all riders in the group unless there was a good reason for their exclusion. But in the last big VDS analysis the issue of non-scorers was raised, so I will include a second breakdown of groups where exclusion of non-scorers changes things dramatically.
20-28pt (13 riders)
4th (1746-2140): 15%
3rd (1347-1746): 38%
2nd (948-1346): 23%
1st (550-947): 23%
This distribution is normalish. Most of the riders are in the middle quartiles. This isn't the biggest group (only 13 riders) so 1st and 4th quartile only differ by 1 rider. You are just as likely to boom as bust with this group. But its a pretty safe bet. (no non-scorers)
16-18pt (13 riders)
4th (766-994): 46%
3rd (537-765): 7%
2nd (308-536): 7%
1st (80-307): 38%
First of all, I have excluded both Nibali and Pellizotti have been excluded. Nibs is an animal scoring over twice the second rider in the group (Basso), and Franco's big fat donut has non sporting reasons (and he was the only non-scorer). We knew from the PPP graph that this group was weird, and now we know why. The distribution is split with almost half of the riders in the top quartile and same with the bottom. Not only is this an underperfoming group on average, but the distribution is also split with almost half of them flopping massively. **Risky group**
12-14pt (15 riders)
4th (1127-1482): 20%
3rd (773-1127): 13%
2nd (419-773): 33%
1st (65-419): 33%
Like in the last group I excluded the obvious outliers, in this case J-Rod and Kim Kirchen. Here we see almost the identical average PPP at the high rollers in the 20-28 group. But as you can see from the distribution, this is due to the hared work of a few. (again, no non-scorers besides Kirchen)
8-10pt (26 riders, 3 donuts)
4th (970-1293): 11%
3rd (646-970): 15%
2nd (323-646): 23%
1st (0-323): 50%
First of all, I looked at how this changes with the exclusion of the 3 non-scorers. Not a lot, the first non-non-scorer is Lars Bak with 16pts, so the range is not effected much. But the excluded riders are all from the bottom quartile, so it does alter the percentages a bit (13, 17, 21 & 47%). This groups distribution look much like the group ahead of them, but a little riskier. About a third clustered around the middle, but a larger proportion flopped than excelled.
6pt (28 riders, 5 donuts)
4th (513-685): 14%
3rd (342-513): 17%
2nd (171-342): 28%
1st (0-171): 39%
The tendency for these distributions to skew towards the bottom reminds me of what majope said in the last VDS analysis thread "only 20 teams actually scored below [the score for a team of average scoring riders]". In theory one could compose a team of 25 6pt riders. But even if you picked the top 25 from this category, you would only score 7048, placing you over a hundred points behind the Drewd.
Oh, and if we exclude the 5 non-scorers, the distribution flattens out a bit again (13, 26, 30 & 30%). Note that since the range shrinks, Romain Feillu gets the boot from the upper quartile.
4pt (75 riders, 11 donuts)
4th (570-760): 5%
3rd (380-570): 8%
2nd (190-380): 30%
1st (0-190): 56%
First of all, Horner has been excluded because he's just too awesome. As you may recall from hours ago when you were reading the top of this post, this group was one of the groups that was flagged by the group PPP graph. As we can see, this is due to the fact that the quartiles of the 4pters and the 6pters line up almost exactly. This suggests that 4pters are better value than 6pters. But is this true? The boundaries of the quartiles are pretty much the same, so we can compare directly,. The number of riders in both groups that scored in the upper two quartiles is about the same (9, and 11 in the 6 and 4pt respectively). However, since the 4pt group is that much larger, those riders make up 31% of 6pters and only 14% of 4pters (if you remove the non-scorers this shifts to 39% and 17%). So, even though the range of points scored are about the same, and the PPP favours the 4pters, the distribution shows that the 6pters are a safer bet. With non-scorers excluded (6, 11, 33, & 49%).
2pt (200 riders, 78 donuts)
4th (511-680): 1%
3rd (340-511): 2%
2nd (170-340): 10%
1st (0-170): 87%
Initially, I just excluded Anton from this analysis. As you can see, it doesn't really get to what this group really looks like. So here is the same but I've excluded the top three riders (Anton, Marcato, Pozzovivo). If you look at their scores in the context of the rest of the top 10, they are head and shoulders above the rest. So, chapeau to you...get out of my analysis. And because of the large number of non-scorers, I've taken them out too. With this smaller group of scorers (120) it looks like this.
4th (313-417): 5%
3rd (210-313): 9%
2nd (107-210): 23%
1st (4-107): 62%
By no stretch of the imagination does this even out the distribution, but it tell a little more about how the 2pters are doing. And they look a lot like the 4pters. 14% in the top two quartiles...ring any bells? Regardless, a lot of these guys aren't doing very much.
1pt (643 riders, 399 donuts!!!)
I'm not even going to put up the first breakdown...here is the 1pters with non-scorers excluded (and Porte and Sagan too, duh):
4th (293-390): 3%
3rd (197-293): 9%
2nd (100-197): 12%
1st (4-100): 75%
Wow...these guys are a crap shoot. even with the non-scorers removed, 75% of riders are in the bottom quartile.
Besides those crazy 16-18pters, only the top riders had more score in the top two quartiles (52%) than the bottom two, reinforcing the idea that at the very top end of the scale, you are paying for reliability. As long as you don't pick Cunego...
If you've made it to the end, congratulations, if you got anything of value...you deserve a cookie. If you just skipped to the end cause I was blabbering too much, feel free to check my work directly and make your own analyses.
Even if this doesn't help you design your teams next year, I think that something like this could be used to objectively determine riders scores for next year. I know that there are subjective ways of determining rider values...but are they working? (*cough*EBH*cough*)