> hypequery

Caching

Cache semantic query results keyed by the query signature.

The dataset client can cache query results so repeated semantic queries do not hit ClickHouse. Results are keyed by the full query signature — target, dimensions, measures, filters, ordering, pagination, time grain, tenant scope, and any explicit cache scope — so two different queries never share an entry, and tenant-scoped datasets are partitioned per tenant.

This is the semantic-layer cache in @hypequery/datasets, keyed by what a query means. It is independent of query caching in @hypequery/clickhouse, which caches raw execute() results on the typed query builder.

Enable caching on the client

Pass cache to createDatasetClient to set a default TTL for every query.

import { createDatasetClient } from '@hypequery/datasets';
import { createQueryBuilder } from '@hypequery/clickhouse';

const db = createQueryBuilder({
  url: process.env.CLICKHOUSE_URL!,
  username: process.env.CLICKHOUSE_USER!,
  password: process.env.CLICKHOUSE_PASSWORD!,
  database: process.env.CLICKHOUSE_DATABASE!,
});

const analytics = createDatasetClient({
  queryBuilder: db,
  cache: {
    ttlMs: 60_000,
    staleWhileRevalidateMs: 300_000,
  },
});
  • ttlMs — how long a result is served as fresh.
  • staleWhileRevalidateMs — an optional window after the TTL during which the stale result is returned immediately while a background refresh repopulates the entry.
  • Errors are never cached, and concurrent identical queries share a single execution.

Per-call overrides

Every execute call can override or bypass the cache through the execution context.

// Opt a single call into caching (works even without client-level cache config).
await analytics.execute(revenue, { dimensions: ['country'] }, {
  cache: { ttlMs: 30_000 },
});

// Bypass the cache for one call.
await analytics.execute(revenue, { dimensions: ['country'] }, {
  cache: false,
});

// Skip the read but store the fresh result (force refresh).
await analytics.execute(revenue, { dimensions: ['country'] }, {
  cache: { mode: 'refresh' },
});

mode: 'refresh' needs a TTL to write under — either a per-call ttlMs or a client-level cache.ttlMs. If neither is configured the call executes uncached and logs a one-time warning, so a config drift never fails requests but does not go undiagnosed. A refresh always runs its own execution, even when an identical query is already in flight.

Cache metadata

Results that went through the cache carry meta.cache.

const result = await analytics.execute(revenue, {
  dimensions: ['country'],
}, {
  cache: { ttlMs: 60_000 },
});

result.meta?.cache;
// { hit: true, ageMs: 1200 } — served fresh from the cache
// { hit: true, ageMs: 65000, stale: true } — served stale, refreshing in background
// { hit: false } — executed and stored

Serve endpoints

Metric and dataset endpoints registered with a cache value cache results server-side with that TTL, in addition to emitting Cache-Control headers.

export const api = serve({
  queryBuilder: db,
  metrics: {
    revenue: {
      metric: revenue,
      cache: 60_000,
    },
  },
  datasets: {
    orders: {
      dataset: Orders,
      cache: 60_000,
    },
  },
});

Repeated identical requests within the TTL are served from the cache without querying ClickHouse. Because tenant scope is part of the cache key, runtime tenancy stays isolated: each tenant only ever sees entries for its own scope.

Custom stores

The default store is an in-process LRU (500 entries). For multi-instance deployments, provide a shared store implementing SemanticCacheStore — three methods, sync or async. Here is a Redis-backed store using a Redis client such as ioredis:

import {
  createDatasetClient,
  type SemanticCacheEntry,
  type SemanticCacheStore,
} from '@hypequery/datasets';

interface RedisLike {
  get(key: string): Promise<string | null>;
  set(key: string, value: string, mode: 'PX', ttlMs: number): Promise<unknown>;
  del(key: string): Promise<unknown>;
}

declare const redis: RedisLike;

const TTL_MS = 60_000;
const SWR_MS = 300_000;

const redisStore: SemanticCacheStore = {
  async get(key): Promise<SemanticCacheEntry | undefined> {
    const raw = await redis.get(key);
    return raw ? JSON.parse(raw) : undefined;
  },
  async set(key, entry) {
    // Expire in Redis once the entry can never be served again. Freshness
    // within that window is decided by the client from `storedAt`, so the
    // Redis expiry is only garbage collection.
    await redis.set(key, JSON.stringify(entry), 'PX', TTL_MS + SWR_MS);
  },
  async delete(key) {
    await redis.del(key);
  },
};

const cachedAnalytics = createDatasetClient({
  queryBuilder: db,
  cache: {
    ttlMs: TTL_MS,
    staleWhileRevalidateMs: SWR_MS,
    store: redisStore,
  },
});

Behavior to rely on when writing a store:

  • Entries are opaque { value, storedAt } objects; freshness is always decided by the client from storedAt and the effective TTL, so per-call TTL overrides work against shared entries. Give the store-side expiry (Redis PX above) at least ttlMs + staleWhileRevalidateMs.
  • Store failures are non-fatal: a failed get is treated as a cache miss and a failed set is dropped, so a Redis outage degrades to "no caching" instead of failing queries.
  • Concurrent identical queries are deduplicated per process even while an async get is in flight — a burst of the same query does one store read and at most one execution per instance.
  • Keys are readable canonical signatures and can get long; stores are free to hash them internally (e.g. SHA-256) as long as the mapping is stable.
  • Values are plain rows plus meta and survive JSON.stringify round-trips.

Cache scopes

A cache key describes the query, not the connection it runs against. When the same semantic query can resolve against different data sources, partition entries with scope:

import {
  createDatasetClient,
  type QueryBuilderFactoryLike,
  type SemanticCacheStore,
} from '@hypequery/datasets';

declare const euDb: QueryBuilderFactoryLike;
declare const replicaDb: QueryBuilderFactoryLike;
declare const redisStore: SemanticCacheStore;

// Client-level: namespace clients that share one store (e.g. one Redis
// serving several warehouses), so identical queries never collide.
const euAnalytics = createDatasetClient({
  queryBuilder: euDb,
  cache: { ttlMs: 60_000, store: redisStore, scope: 'warehouse-eu' },
});

// Per-call: required when overriding the query builder at runtime. Without a
// scope, calls that pass `runtime.builderFactory` skip the cache entirely,
// because the key alone cannot tell two data sources apart.
await analytics.execute(revenue, { dimensions: ['country'] }, {
  runtime: { builderFactory: replicaDb },
  cache: { scope: 'replica-2' },
});

On this page