How Umbrella thinks
about traffic.
The whole product fits in three nouns — pools, backends, routes — and four ideas: snapshot, strategy, health, analytics. This is everything you need to know.
Concepts
Three primitives:
- Pool — a group of backends that share a balancing strategy and health-check policy.
- Backend — a single upstream URL, member of exactly one pool. Has a weight and an enabled flag.
- Route — a prioritized rule that maps an incoming request to a pool.
Configuration lives in SQL (SQLite or Postgres). Audit logs record every write. The proxy hot path never reads from SQL — see snapshots.
Atomic snapshots
Every request the proxy serves reads from an immutable in-memory RouterSnapshot. On any config write Umbrella:
- Persists the change to SQL inside a transaction.
- Bumps a monotonically-increasing
config_versioncounter. - Rebuilds a fresh
RouterSnapshotwith pre-compiled regex, weight tables, and pool indices. - Reassigns the snapshot reference atomically — readers grab a local pointer so no locks are needed.
Effect: configuration changes apply to the very next request. No restart. No request loss. The version number you see in the dashboard’s Snapshot v6 indicator is exactly this counter.
Balancing strategies
Pick per pool. Switchable live; the next request uses the new strategy.
| Strategy | Behavior | When to use |
|---|---|---|
| round_robin | cycles backends equally | default; works for stateless services |
| weighted | random pick proportional to weight | mixed-capacity backends; canary deploys |
| least_conn | fewest in-flight wins | long-lived requests of varying duration |
| ip_hash | sticky-by-client-IP | session affinity without a sticky cookie |
| random | uniform random pick | chaos testing; small-N pools |
Routing rules
Routes are evaluated in priority order — lower wins. The first route whose path / host / methods / headers / query all match captures the request.
Path matchers
exact— full-path equality (/login)prefix— path prefix (/api/)glob— fnmatch-style wildcards (/users/*/posts)regex— Pythonre.searchon the full path
Host matchers
Match by Host header. Supports wildcards: api.example.com, *.example.com, *-staging.example.com. Leave blank to match any host.
Header & query matchers
One name=value per line. Value supports literal match, */? globs, and regex:<pattern>.
X-Tenant=acme Authorization=Bearer * X-Version=regex:^v[2-9]$
Forwarding tweaks
- Strip prefix — drop the matched path prefix before forwarding upstream.
- Rewrite host — replace incoming Host with the upstream’s host (default).
- Preserve host — keep the original Host header (use for vhost-routed upstreams).
- Per-route timeout — override the global httpx timeout for slow endpoints.
Health checks
Two layers, working together.
Active probes
One asyncio task per pool sends configurable probes against every backend. Backends transition through a state machine:
UNKNOWN → HEALTHY ↔ UNHEALTHY → HALF_OPEN → HEALTHY
- Interval / timeout — how often, how long to wait.
- Path / method / expected status — e.g.
GET /healthzexpecting200-299. - Healthy threshold — consecutive successes before flipping to HEALTHY.
- Unhealthy threshold — consecutive failures before flipping to UNHEALTHY.
Passive circuit breaker
The forwarder tracks real-traffic outcomes. If a backend returns N consecutive 5xx, its circuit opens for a configurable cool-off, then half-opens for the next active probe to decide.
Result
Backends that flap on probes alone but fail on real traffic still get pulled out. Pools with no healthy backends return 503 Retry-After: 5 instead of queuing.
Analytics & ClickHouse
The Analytics page in the dashboard shows per-minute traffic, p50/p95 latency, top routes, and top error paths. By default this is backed by the SQL database; for high-volume production, opt into ClickHouse.
SQL (default)
Every request is logged into the request_logs table at a configurable sample rate. Aggregations run in Python over SQL rows. Fine up to a few million rows; sluggish past that.
ClickHouse (opt-in)
UMBRELLA_CLICKHOUSE_URL=http://default:password@clickhouse:8123/umbrella UMBRELLA_CLICKHOUSE_TTL_DAYS=30 UMBRELLA_CLICKHOUSE_FLUSH_INTERVAL_S=2.0 UMBRELLA_CLICKHOUSE_FLUSH_MAX_ROWS=1000
Schema is auto-created. Inserts are buffered and batched over HTTP. Aggregate queries (KPIs, time-series, top-N) run directly on ClickHouse with quantile() functions, so the dashboard stays fast even over hundreds of millions of rows.
Schema
CREATE TABLE umbrella.request_logs ( ts DateTime64(3, 'UTC'), route_id Nullable(UInt32), pool_id Nullable(UInt32), backend_id Nullable(UInt32), method LowCardinality(String), path String, status_code UInt16, duration_ms Float32, client_ip Nullable(String), error Nullable(String) ) ENGINE = MergeTree() PARTITION BY toYYYYMM(ts) ORDER BY (ts, status_code, route_id) TTL toDateTime(ts) + INTERVAL 30 DAY;
Metrics & logs
Prometheus
Exposed at /metrics:
umbrella_proxy_requests_totalumbrella_proxy_responses_total{status, route}umbrella_proxy_request_duration_seconds{route, pool}— histogram with p50/p95/p99 bucketsumbrella_backend_up{pool, backend}— gauge
Add Umbrella as a Prometheus scrape target and use Grafana for time-series dashboards.
Structured logs
JSON to stdout via structlog. Every request, every health-state transition, every audit event. Ship to Loki, Vector, Datadog, CloudWatch — whatever you have.
Audit log
Every config write (pools / backends / routes / users) records the user, IP, action, target, and full payload diff into audit_logs. Available via the dashboard’s admin pages.
JSON API
The dashboard is just a thin HTML view on top of a complete REST API. Everything you can click, you can curl.
POST /dashboard/api/v1/auth/login— get a JWT cookieGET|POST|PATCH|DELETE /dashboard/api/v1/poolsGET|POST|PATCH|DELETE /dashboard/api/v1/backendsGET|POST|PATCH|DELETE /dashboard/api/v1/routesGET /dashboard/api/v1/topology— full route → pool → backend graphGET /dashboard/api/v1/analytics/{summary,top-routes,top-errors}
Interactive Swagger UI at /dashboard/api/docs.
Non-goals
Things Umbrella explicitly does not do, by design:
- TLS termination — put Caddy / nginx / Traefik / Cloudflare in front.
- Multi-tenancy — single-tenant only.
- Service discovery — give it URLs, not Consul integration.
- Distributed control plane — single-instance v1; HA via Postgres + LISTEN/NOTIFY is on the roadmap.
If you need any of the above, reach for Envoy / HAProxy / Traefik. If you want a small, sharp, self-hosted thing, Umbrella.