clock menu more-arrow no yes

Filed under:

How Well Do Early Season Results Predict Full Season Performance?

New, comments

How can dispatches from January help our VDS choices?

2018 Tour Down Under - Stage 5
Just walk away.
Photo by Daniel Kalisz/Getty Images

Every year it’s the same thing. With stars in my eyes, I fall head over feels in love with a rider who is toeing the antipodean line in the Southern Hemisphere, cleaning up in the Silly Season: those weird early season races that follow the endless summer or just simply brave the dead of winter, which always seem to attract just enough big names to make you think they’re worth your time and effort and attention. Often times those big names win it.

But you notice a guy who finished 4th in the Cadel Evans Great Ocean Road Race and Upside Down Beer Drinking Extravaganza, and then he heads to San Juan and finishes maybe 5th, before nabbing a runner-up at one of the Trofeos and then the VDS price list comes out and he’s only 4 points! and he becomes your (and guess what… everyone else’s) super secret weapon before it’s announced that he’s on climbing domestique duty or is third wheel on the lead out train of some washed-up sprinter in a contract year, but you hang on to hope that he could maybe hold on and win a big one week or maybe sneak into an Italian SSSR or maybe take over when that sprinter falters, but in actuality he belly flops all over your VDS dreams, never to be seen or heard from again, leaving only the last remaining donut on your otherwise perfect team. I know this has happened to you.

So my question is, do we have any right to become so intoxicated by these Silly Season results? How well do these early spring races actually correlate with VDS performance? What sorts of insights can we reasonably expect to gain from these results?

Methods

I went back to 2018 and looked at the Silly Season results. I calculated how many points would have been scored from a healthy complement of races exactly how they would have been tallied if it were a VDS race. I was deliberately conservative with the race designation, which led to a lot of category 5s, as you can see below. This was partly done for convenience but also it added a little more weight to higher places and less weight to the randomness that seems to crop up in these types of races. I also did not include National Championship races. Sorry Kiwis. I also skipped Algarve and Andalucia because they aren’t finished by the time VDS starts. There’s a very compelling but also extraordinarily boring statistical argument to be made against their inclusion, so if that’s your bag, I’ll meet you in the comments.

Early Season Races

Race Category
Race Category
Tour Down Under 3
Cadel Evans Great Ocean Road Race (& UDBDE) 4
Herald Sun Tour 5
Vuelta San Juan 5
Trofeo Ses Selines 5
Trofeo Andratx 5
Trofeo Tramuntana 5
Trofeo Palma 5
GP La Marseillaise 5
Communitat Valencia 5
Etoile Besseges 5
Tour Colombia 5
Tour de La Provence 5
Vuelta a Murcia 5
Clasica de Almeria 4
Tour of Oman 3
Trofeo Laigueglia 4

This yielded 182 different riders with an early spring datum, the highest of which was Alejandro Valverde (of course) with 535 points and 17 riders who scored only 10 points. I threw out 5 whose names I just simply couldn’t find in the VDS database; suffice it to say, they scored 0. The mean number of points among those riders was 104.7 and the median was 70. Worth noting is that these measures were only among those 182 riders; if we look at all the available riders, the average Silly Season point total is probably like 0.4 and the median is almost certainly 0.

I also gathered their total 2017 VDS Points and 2018 VDS Points and their birth year (because I had a hunch that age might be a factor). The means and medians for those were: 2017- 407.3 and 175; 2018- 438.7 and 250; birth year - 1990.1 and 1991.

All analyses and figures were made using MATLAB, not that any of you care about that.

How well do Silly Season Points predict VDS Points?

Not very well. I performed a linear regression using the Silly Season points as an independent variable and the 2018 VDS points as the response. The model looked like this:

2018 Total = 301.5 + 1.31 x (Silly Season Points)

But the RSquared , which is the proportion of total variation in the response that can be explained by variation in the independent (or: how good the prediction is… higher is better) is really quite low: 0.052. This is a very noisy model and not particularly useful. You might expect this by looking at the figure below.

2018 VDS points as a function of early spring Silly Season points. As a linear predictor, these points are not a particularly good indicator of 2018 performance.

This is especially true if you compare it to another variable that can be used to predict 2018 score: 2017 score. When performing this linear regression, the model looks like this:

2018 Total = 120.4 + 0.781 x (2017 Total)

And the RSquared is much higher: 0.614. This is very obvious looking at the figure below, and a fact that ursula is probably well aware of.

Conversely, 2017 performance is a terrific indicator of 2018 performance. But ursula already knows that.

[Warning: Nerd Stuff Be Here - I also ran a multiple linear regression with all variables, and the RSquared only increased to 0.63 (all variables were statistically significant). Given this paltry improvement in the performance of the model, I thought it was simpler and more informative to run them separately. ]

What about the young bucks?

One scenario I played out in my head was that young, up-and-coming riders can use the Silly Season as an audition. They score a few points early on and that opens the door to a bigger role with the team for the rest of the season. To test this hypothesis, I divided my sample into three groups (based on percentile ages): born before 1993, after 1988 or between 1993 and 1988. All three had approximately the same sample size.

As I anticipated, Silly Season points was a better predictor of 2018 VDS points among youngsters than for middle aged and elderly riders. That predictive ability was still not very good (RSquared= 0.165).

As I predicted, Silly Season points are a better indicator of season-long performance for youngster (born after 1993). However, the model is still pretty lousy. Sorry Caleb.

Let’s just say I want to get my 100 points and be done with it

Okay, so I’ve picked our team. I’ve got my 19 all-stars, my 3 aces-in-the-hole, my 1 Richie Porte, and now I’m fishing for 1 pointers. Let’s assume I’m happy to just get my 100:1 point return-on-investment. This is a scenario where Silly Season points may actually be very helpful.

This analysis is a bit complicated, so bear with me. Iterating across the domain from 10 points to the Valverde Maximum of 535, I calculated the proportion of total riders with a Silly Season score greater than that value that scored more than 100 points in 2018. So for example, if I had 100 riders with a Silly Season score greater than 30, and 80 of them scored more than 100 in 2018, the value at 30 would be 0.8. At the first value in the domain (10), you can think of this as the total number of riders that scored during the Silly Season that got more than 100 points. I also did the same calculation with 200 and 400 points as my criterion, for those of you with expensive taste. Here’s what I found:

If you track the blue curve, you’ll see that of all riders that score Silly Season points, almost 75% reach their 100 point criterion. This improves to almost 90% by 200 points. By 258, all riders scored more than 100 points, but that was only 15 riders.

The proportion of riders scoring greater than a criterion value (100 = blue, 200 = green, 400 = orange) as a function of Silly Season Points scored. It seems that if you are just looking for a quick payoff, Silly Season points might be helpful. Note the green and orange curves are superimposed after 400 points.

If you want to get off the Night Train Express and want 200 points out of your fliers, Silly Season points are almost as good of an indicator for reaching that criterion, hovering just below the blue curve for most of the domain, until Daryl Impey and Tony Gallopin eff everything up. But if you go fishing for 4-pointers, seeking that fabled 400 points, the prediction is a little less reliable, hovering just around 50%. (Note, the green and orange curves are superimposed after around 400. Blame Valverde for that. In fact, blame Valverde for everything.)

So what does all this mean?

After you’ve boozed yourself delirious and are reorganizing your entire VDS squad based on Stage 3 of Tour Colombia…. STOP. Put the booze down. Or rather, put the computer down. Silly Season performance is simply not a very good predictor of VDS points total. The previous year’s performance seems to be a way better predictor of this year’s performance, and it seems to be fairly immune to these early spring races. Walk away from Daryl Impey and the anonymous Kiwi in the CEGORRAUDBDE.

The one place where you might benefit from looking at early spring results is when you’re digging for 1-or-2-point fillers. Silly Season points might be a good indicator that a rider is at least primed to hit that 100 or 200 point threshold, and if that’s what you’re after in a one-pointer, there are worse symbologies to hypnotize yourself with. There’s a bit more gambling going on if you need a rider to hit 400 points, but there may be a trend there.

So you heard it here first: Nairo Quintana is a mathematical guarantee to win Paris-Roubaix might do alright this year.