Engineering

Built to spec.

PowDay was built document-first: requirements defined before code, each component designed before implementation, and every forecast claim validated against historical ground truth before shipping.

Document-driven development

Every engineering decision in PowDay has a paper trail. The PRD defined success metrics and data contracts before a single line of Python was written. Each pipeline component — SNOTEL ingestion, NOAA HRRR ingestion, Chronos-2 inference — has its own design doc specifying goals, non-goals, data formats, error handling, and known limitations.

The test plan was written alongside the design docs, not after the fact. Regression tests are anchored to specific dates with known behavior: quiet days, storm onsets, active storms, and the PRATE ceiling edge case. The backtesting report quantifies model performance against actual SNOTEL data before any forecast is published.

PRD

Requirements & contract

Design Docs

Component specs

Test Plan

Before implementation

Build

Code follows spec

Backtest

Quantified validation

Ship

Evidence-based

The Document Suite

Product Requirements Document

PRD

Defines MVP success criteria, functional and non-functional requirements (including a 4-hour pipeline execution budget and 12GB VRAM constraint), the JSON data contract, and the initial product roadmap. Written before any code.

MVP Overall Design Doc

Design Doc

System architecture, phases and success criteria, data flow, inference design, forecast data contract, and a full Alternatives Considered section evaluating cloud ML, traditional time-series models (ARIMA/Prophet), and commercial weather APIs.

SNOTEL Ingestion Design Doc

Design Doc

Component-level spec for ingest_snotel.py: batched AWDB API parameters, English-to-metric unit conversion, idempotency guarantees, per-station failure isolation, and documented sensor limitations at Northstar Upper.

NOAA HRRR Ingestion Design Doc

Design Doc

Design for ingest_noaa.py: FastHerbie batch behavior, S3 byte-range requests for efficient GRIB2 extraction, per-month idempotency, auto-resume and retry logic for interrupted backfills, and future Open-Meteo ERA5 integration path.

Screenshot coming

Chronos-2 Inference Pipeline Design Doc

Design Doc

Six-stage inference pipeline in predict.py: loads SNOTEL context, fetches live HRRR atmospheric covariates as future regressors, loads Chronos-2 on GPU, runs per-station inference, applies PRATE ceiling post-processing, and writes the forecast JSON payload and CSV audit log. InferenceEngine abstraction keeps the orchestration model-agnostic.

Screenshot coming

Data Publishing Design Doc

Design Doc

Design for publish.py: validates the forecast payload before upload (hard-blocking is_backtest check, empty-forecasts guard, schema checks), uploads to Vercel Blob via REST API with a stable path, then triggers on-demand ISR cache revalidation. Includes retry logic on HTTP 5xx and a dry-run mode for testing without touching production.

Test Plan

Full test strategy covering ingest unit tests, inference pipeline tests (load_context, post_process, run_inference), output contract validation, and regression tests against known storm and quiet reference dates from Jan–Feb 2026.

Screenshot coming

Docker Image Technical Specification

Infra

Containerized GPU inference environment: Ubuntu 22.04 + CUDA 12.4.1/cuDNN + Python 3.11 via deadsnakes PPA. PyTorch 2.5.1 installed in a dedicated layer ahead of requirements.txt so application changes don't trigger a 2GB+ re-download. Headless ENV vars baked in for CI/CD and local dev parity.

Initial Backtesting Report

Validation

Walk-forward backtest across 7 stations, Jan–Feb 2026. Identifies storm onset lag and extreme-event collapse as the primary failure modes motivating fine-tuning on 10 years of Sierra Nevada data.

Initial Backtesting Results

Walk-forward validation across 7 SNOTEL stations, January–February 2026. These numbers drove the decision to fine-tune on 10 years of Sierra Nevada data rather than ship zero-shot.

79%

P90 coverage

Actuals within forecast envelope

False alarm rate

Quiet days with non-zero p50

90%+

Fine-tuning target

Post Sierra Nevada training

What the numbers mean

P90 coverage at 79% means the actual snowfall fell within the forecast envelope on 79% of days — against a target of 90%. The 11-point gap traces almost entirely to two extreme storm days where the model collapsed, not systematic underperformance.

5% false alarm rate is the more important number for skiers: only 17 quiet days out of 342 where the model incorrectly predicted snow. The model is conservative, not trigger-happy. Fine-tuning on 10 years of Sierra data is expected to push P90 coverage above 90% while maintaining the low false alarm rate.

Want to talk about the engineering?

I'm Jon Eby. I built PowDay to answer a question: Can a single engineer with consumer hardware produce probabilistic snow forecasts that beat a naive baseline? The answer is looking like yes. I'm currently open to senior product engineering roles — particularly where AI meets real-world systems.

Connect on LinkedIn ↗See the Architecture