Engineering
Built to spec.
PowDay was built document-first: requirements defined before code, each component designed before implementation, and every forecast claim validated against historical ground truth before shipping.
Document-driven development
Every engineering decision in PowDay has a paper trail. The PRD defined success metrics and data contracts before a single line of Python was written. Each pipeline component — SNOTEL ingestion, NOAA HRRR ingestion, Chronos-2 inference — has its own design doc specifying goals, non-goals, data formats, error handling, and known limitations.
The test plan was written alongside the design docs, not after the fact. Regression tests are anchored to specific dates with known behavior: quiet days, storm onsets, active storms, and the PRATE ceiling edge case. The backtesting report quantifies model performance against actual SNOTEL data before any forecast is published.
The Document Suite

Product Requirements Document
Defines MVP success criteria, functional and non-functional requirements (including a 4-hour pipeline execution budget and 12GB VRAM constraint), the JSON data contract, and the initial product roadmap. Written before any code.

MVP Overall Design Doc
System architecture, phases and success criteria, data flow, inference design, forecast data contract, and a full Alternatives Considered section evaluating cloud ML, traditional time-series models (ARIMA/Prophet), and commercial weather APIs.

SNOTEL Ingestion Design Doc
Component-level spec for ingest_snotel.py: batched AWDB API parameters, English-to-metric unit conversion, idempotency guarantees, per-station failure isolation, and documented sensor limitations at Northstar Upper.

NOAA HRRR Ingestion Design Doc
Design for ingest_noaa.py: FastHerbie batch behavior, S3 byte-range requests for efficient GRIB2 extraction, per-month idempotency, auto-resume and retry logic for interrupted backfills, and future Open-Meteo ERA5 integration path.
Chronos-2 Inference Pipeline Design Doc
Six-stage inference pipeline in predict.py: loads SNOTEL context, fetches live HRRR atmospheric covariates as future regressors, loads Chronos-2 on GPU, runs per-station inference, applies PRATE ceiling post-processing, and writes the forecast JSON payload and CSV audit log. InferenceEngine abstraction keeps the orchestration model-agnostic.
Data Publishing Design Doc
Design for publish.py: validates the forecast payload before upload (hard-blocking is_backtest check, empty-forecasts guard, schema checks), uploads to Vercel Blob via REST API with a stable path, then triggers on-demand ISR cache revalidation. Includes retry logic on HTTP 5xx and a dry-run mode for testing without touching production.

Test Plan
Full test strategy covering ingest unit tests, inference pipeline tests (load_context, post_process, run_inference), output contract validation, and regression tests against known storm and quiet reference dates from Jan–Feb 2026.
Docker Image Technical Specification
Containerized GPU inference environment: Ubuntu 22.04 + CUDA 12.4.1/cuDNN + Python 3.11 via deadsnakes PPA. PyTorch 2.5.1 installed in a dedicated layer ahead of requirements.txt so application changes don't trigger a 2GB+ re-download. Headless ENV vars baked in for CI/CD and local dev parity.

Initial Backtesting Report
Walk-forward backtest across 7 stations, Jan–Feb 2026. Identifies storm onset lag and extreme-event collapse as the primary failure modes motivating fine-tuning on 10 years of Sierra Nevada data.
Initial Backtesting Results
Walk-forward validation across 7 SNOTEL stations, January–February 2026. These numbers drove the decision to fine-tune on 10 years of Sierra Nevada data rather than ship zero-shot.
What the numbers mean
P90 coverage at 79% means the actual snowfall fell within the forecast envelope on 79% of days — against a target of 90%. The 11-point gap traces almost entirely to two extreme storm days where the model collapsed, not systematic underperformance.
5% false alarm rate is the more important number for skiers: only 17 quiet days out of 342 where the model incorrectly predicted snow. The model is conservative, not trigger-happy. Fine-tuning on 10 years of Sierra data is expected to push P90 coverage above 90% while maintaining the low false alarm rate.
Want to talk about the engineering?
I'm Jon Eby. I built PowDay to answer a question: Can a single engineer with consumer hardware produce probabilistic snow forecasts that beat a naive baseline? The answer is looking like yes. I'm currently open to senior product engineering roles — particularly where AI meets real-world systems.