jwt · sessions · design

Short-lived tokens for AI: the OAuth-style answer

2026-05-20 · Skelf-Research

There is nothing new about handing a browser a short-lived credential and letting a server hold the long-lived secret. It is what every OAuth-shaped flow has been doing since the late 2000s. Bearer tokens with TTLs. Refresh tokens. Audience binding. The entire pattern is boring, well-understood, and exactly the right answer for client-side AI calls.

So why has it not been the default for the OpenAI ecosystem? Mostly because the OpenAI SDK ships with apiKey as a constructor argument and the path of least resistance for a developer in a hurry is to put that value in front-end env. The OAuth-style answer requires one extra component — a thing that exchanges “I am a real user” for “here is a short-lived bearer”. Perishable is that thing.

This post is about the design decisions inside the exchange. Why JWTs, why fingerprints, why a TTL of minutes rather than hours, and what each of those choices buys you.

Why JWT specifically

JWT is a compact, signed claim that the server can verify without a database lookup. For a proxy that has to validate one token on every inbound request, that property matters. The alternative is a session-store hit (Redis, in-memory map, SQLite) on every request, which adds latency and a moving part to keep alive.

We use HMAC-signed JWTs by default. The server-side secret is set at process start and never leaves. The claim payload is small:

fp: the fingerprint hash of the client that requested the session.
iat / exp: standard issued-at / expires-at, in seconds.
aud: audience, optionally set to the upstream provider name.
jti: an opaque ID so a specific session can be revoked.

That is it. No PII. No user data. The proxy can verify the signature, check that exp is in the future, check the fp claim against the fingerprint of the incoming request, and either pass the request through or refuse it. All of that is O(1).

Why a TTL of minutes

OAuth access tokens are typically minutes to an hour. The reason is simple: the shorter the TTL, the smaller the window in which a leaked token is useful. The longer the TTL, the fewer round trips the client has to make to refresh.

For an AI proxy, the calculation tilts toward the short end. The cost of a refresh is one extra request every few minutes. The cost of a leak is potentially a model bill measured in screenshots-per-second. Perishable defaults to short TTLs and refreshes ahead of expiry via the sessionOptions.expiryBuffer config — the client SDK asks for a new session before the current one expires, so the user never sees a 401.

A practical example: if your TTL is fifteen minutes and your buffer is five minutes, the client refreshes every ten minutes of active use. For a chat app, that is invisible. For an attacker who scraped a token from a network capture an hour ago, it is the difference between a working credential and a 401.

Why bind to a fingerprint

A bearer token, by default, is bearer. Anyone holding it can use it. That is fine when the holder is the legitimate browser. It is bad when the token is in someone else’s debugger.

Binding the token to a client fingerprint adds a constraint: the verifier checks that the fingerprint on the inbound request matches the fp claim in the token. A copied token on a different browser has a different fingerprint and is refused.

Fingerprinting is not magic. A determined attacker with a real browser can spoof the inputs. But for the common case — a token pasted into someone else’s curl, or a session captured from one device and replayed from another — the bound token is rejected on arrival. That covers the overwhelmingly common abuse path without needing anything more elaborate.

The fingerprint inputs are public, stable browser features (the same ones any analytics SDK uses) hashed together with an entropy sample. The hash is what goes into the JWT. The raw inputs never leave the client.

Why entropy collection

The reason initEntropyCollection exists is that a fresh, server-less session request looks identical whether it came from a real human opening your app or from a headless Chromium that started up two seconds ago. The legitimate client will, within a few hundred milliseconds, have mouse movements, keystrokes, scroll events. The scraper will not.

Perishable’s client SDK gathers a small entropy sample from those events before it is allowed to request a session. The server inspects the sample. If it is too clean — a single point, perfect zeros, the default canvas hash for a known headless build — the request is refused.

This is not a CAPTCHA. The user never sees anything. The check happens on the first session request and on each refresh. The cost is one event listener; the benefit is that the cheap end of the attack spectrum — drive-by scrapers, automated key extractors — gets nothing.

Why a refresh rather than a long-lived session

Long-lived sessions tempt you toward storing them in localStorage, where they survive page reloads and become part of the attack surface. Short-lived sessions plus an in-memory cache plus a refresh on next-tab-focus give you the same UX with a fraction of the leak window.

The client SDK exposes the buffer directly:

const ai = new client.PerishableOpenAI({
  proxyUrl: 'https://your-proxy.example.com',
  sessionOptions: {
    expiryBuffer: 5 * 60 * 1000  // refresh 5 min before expiry
  }
});

Set it tighter if you are paranoid. Set it looser if your network is flaky and you want to reduce refresh churn. The default is reasonable.

Why per-fingerprint rate limits

Even if everything above holds — token bound, TTL short, entropy checked — a real user can still hammer your proxy. The rateLimitOptions block on the server caps requests per session, per window. Defaults are sensible (points: 100, duration: 60 — a hundred requests per minute), and configurable per deployment.

There is also maxSessionsPerFingerprint. A single fingerprint that suddenly mints fifty sessions is suspicious. The default cap is five. Most legitimate users are fine; abusers hit the wall.

What this gives you, end to end

Put together, the OAuth-style answer for AI keys is:

The client requests a session, providing fingerprint and entropy.
The server verifies the entropy looks human, issues a short-lived JWT bound to the fingerprint.
Each AI request from the client carries the JWT as Authorization.
The proxy verifies signature, expiry, fingerprint, and rate limit, then attaches the real upstream key and forwards.
The client refreshes the JWT ahead of expiry, transparently.

There is nothing exotic in that list. It is OAuth-shaped, deliberately. The novelty is only that it has been packaged into a single Node process you can run with one command. The pattern is old; the ergonomics are new.

A worked example of the leak math

Take a concrete scenario. You ship a React Native app. A malicious user attaches a proxy on their device, captures the session JWT, and posts it to a pastebin at 14:02. The JWT has a 15-minute TTL and was issued at 14:00. By 14:15 it is dead.

In those thirteen minutes, anyone who picks up the token from pastebin cannot use it from a different browser because the fingerprint claim does not match. They could use it from the original device — but the original device is the malicious user’s own device, and they already had it. The leak adds no new capability.

Compare to the same scenario with a long-lived sk- key. The key has no expiry, no fingerprint, no rate limit specific to this client. Any number of attackers can pick it up from pastebin and use it from any infrastructure. The blast radius is bounded only by OpenAI’s own account-level rate limits and your ability to notice and rotate.

The OAuth-style answer makes the worst case bounded, automatically, without you having to detect anything.

Where this falls short

There are scenarios this design does not solve. If your concern is “a single user, on their own device, abusing the API with a real fingerprint”, short-lived tokens do nothing for you — the user holds a legitimate session for as long as they keep playing the role of a normal user. That is what per-fingerprint rate limits are for.

If your concern is auditability — which logged-in user made which request — this design is orthogonal. You still need your own user auth on top. The JWT here is a session, not an identity.

The OAuth-style answer is the right tool for one job: keeping long-lived secrets off untrusted clients while preserving the UX of direct browser-to-API calls. For everything else, layer accordingly.

Filed under: jwt, sessions, design. Spotted a mistake? Open an issue.