The Signal, the Noise, and the 10-Day Data Fetch

It's been a heck of a day at PowDay global headquarters — which is to say, my loft. About 10 days ago, after investigating the initial results from PowDay's inference engine, I came to the realization that NOAA HRRR's PRATE (precipitation rate) field, measured in Kilograms per square meter per second, was an incredibly volatile and noisy signal to be trying to base snow forecasts on. It's difficult for a model to latch onto for steady forecasting, and I realized more granular input would be better.

After a little more research, I decided to replace PRATE with six other fields to feed to Chronos-2 as covariates to give it a better picture of what NOAA is forecasting:

temp_850mb_c & temp_c: Air temperature at the 850 millibar pressure level (~3,000m) and 2m above ground. Crucial for determining that rain/snow line.
wind_speed_ms: Wind speed, measured in meters per second.
apcp_mm: Accumulated Precipitation, measured in millimeters.
pwat_kgm2: Precipitable Water. This field helps PowDay see atmospheric rivers coming by measuring moisture availability in the atmospheric column.
vvel_700mb_pas: Vertical velocity at 700 millibar. This indicates whether air is rising or sinking at 3,000m, a key indicator of storm onset.

Unfortunately, that meant having to throw out the old dataset and fetch everything from scratch. Getting 10 years of data that included all those fields took 10 full days. I'll admit it feels kinda cool knowing that because of me a bunch of tape drives spun up in a cold storage datacenter somewhere, but that was painful and I hope to not have to do it again. Either way, now I've got the data, and it's going to be even more useful than I thought (more on that in a bit).

The Siren Song of Zero-Shot

The fetch finished, and I was able to fine-tune Chronos-2. The results were... discouraging. That initial 79.4% zero-shot accuracy turned out to be a siren song. It totally fell apart when I really started digging into the details. Zero-shot wasn't good enough, fine-tuning on PRATE didn't help, and fine-tuning on the 6 new fields made things worse.

This afternoon I experimented with an isometric calibration algorithm run over the forecast results to try to smooth out some of the inaccuracies. The results at P90 actually started to move in the right direction, but P50 was rendered utterly useless. Not exactly stellar results, and the whole practice felt a lot more like massaging and manipulating AI-generated numbers than leveraging AI to provide trustworthy, accurate snow forecast data.

The Pivot: Temporal Fusion Transformer

All of the output I was getting from Chronos-2 boiled down to "Try a different model." I've designed PowDay so that the model can be swapped out as painlessly as possible. I'll still be busy building scaffolding to support it and refactoring some existing scripts for the next couple of days, but I'll be able to add support for the Temporal Fusion Transformer (TFT) model without having to re-write everything.

Which brings me back to that 10 years of NOAA data: Chronos-2 ships pre-trained on a broad dataset. You can fine-tune it, but the heavy lifting of training the model has been handled already. TFT does not ship pre-trained. I'll need to provide it with a corpus of domain data and pin my PC's GPU at 100% for a day or two to do the training work myself, but then TFT will be trained from the ground up on Sierra snow and weather data and only Sierra snow and weather data.

With Chronos-2's broad training dataset, it's not really set up to handle the type of big storms and extreme weather events that I'm trying to forecast with PowDay. Chronos-2's training suggests that the extreme snowfall values we see here are "unlikely," so it tends to under-forecast them. Even with 5x weighting for storm events during fine-tuning, it's still influenced by a world of data where a 50-inch dump is a statistical impossibility.

I can't argue with the global stats, but the Tahoe Basin doesn't really experience snow the way most of the rest of the world does. I'm very interested to find out how TFT performs when it's raised specifically on the Sierra snowpack and weather dataset currently sitting on my hard drive.

I'm Jon Eby. I'm building PowDay.AI as a solo project on consumer hardware (i.e. RTX 4070 Ti), and writing about what I'm learning as I go. If this kind of work interests you, connect on LinkedIn.