The 80/20 of auth: what I learned shipping Cognito PKCE flows twice

I've now built Cognito PKCE login twice — once for Addris, once for Tourismo. The first time it took ages and I didn't fully understand why it worked. The second time the happy path took an afternoon, and all my time went into the edge cases. This is the post I wish I'd had: the 20% that bites.

PKCE in 30 seconds

PKCE (Proof Key for Code Exchange) is how a single-page app does OAuth without a client secret — which it couldn't keep secret anyway. Before redirecting to login, you generate a random verifier, hash it into a challenge, and send the challenge. After login you get an authorization code back, and you exchange it for tokens by presenting the original verifier. Only the app that created the verifier can complete the exchange. No secret ever ships to the browser.

In practice it's a few crypto calls:

function generateVerifier(): string {
  const bytes = new Uint8Array(32);
  crypto.getRandomValues(bytes);
  return base64URLEncode(bytes);
}

async function deriveChallenge(verifier: string): Promise<string> {
  const encoded = new TextEncoder().encode(verifier);
  const digest  = await crypto.subtle.digest('SHA-256', encoded);
  return base64URLEncode(new Uint8Array(digest));
}

Generate verifier, stash it, redirect with the challenge and code_challenge_method: 'S256'. That part is genuinely easy.

Where Cognito is opinionated

The hosted UI is convenient and rigid. A few things I had to learn by hitting them:

  • Callback and logout URLs are an exact-match allowlist. Every origin you run from — localhost:5173, dev, prod — has to be registered, with the exact path. One typo and you get an opaque redirect error.
  • You don't get to pick which token to send your API. This one cost me real time. The intuitive choice is the access token. But my API Gateway JWT authorizer validates aud against the user pool client ID, and only the ID token carries that claim. The access token also lacks email, which my handlers need. So I send the ID token, despite the function being named getAccessToken:
export function getAccessToken(): string | null {
  // Returns the id_token (despite the name): the API Gateway JWT authorizer
  // validates `aud` against the client ID — only id tokens have that claim.
  if (_tokens && Date.now() < _tokens.expiresAt - 60_000) {
    return _tokens.idToken;
  }
  return null;
}
  • Refresh-token rotation is off by default. Cognito's refresh grant doesn't return a new refresh token unless you enable rotation, so the naive code stores an empty string and the next reload can't refresh. You have to preserve the one you already have:
if (!tokens.refreshToken) tokens.refreshToken = refreshToken;

Token storage: in memory, by default

I keep the access/ID tokens in memory only — never localStorage — so an XSS bug can't read them off disk. The only thing persisted to sessionStorage is the refresh token, purely so a page reload can silently re-authenticate. That's the pragmatic 80/20 of SPA token security: accept that a live XSS can do damage in-session, but make sure it can't walk away with a token that survives the tab closing.

The bug that actually hurt

Here's the one I'm writing this post for. Logins would mysteriously fail with "No PKCE verifier" — but only sometimes, only in production, never when I was debugging. Classic race.

The flow is: initiateLogin writes the verifier to sessionStorage, redirects to Cognito, Cognito redirects back to /auth/callback, and the callback page exchanges the code using the verifier. The callback page is lazy-loaded — a separate chunk that takes a beat to download and mount.

Meanwhile, my AuthProvider runs a restore-session effect on every mount. It runs immediately, before the lazy callback chunk has loaded. Finding no existing session, the original code fell through to clearTokens() — and clearTokens() was also wiping the PKCE verifier. So the sequence was:

  1. initiateLogin sets the verifier, redirects out and back.
  2. App boots. AuthProvider's restore effect fires.
  3. No session yet → clearTokens()verifier deleted.
  4. The lazy callback chunk finally mounts, tries to exchange the code… verifier's gone.

The fix was conceptual, not clever: the PKCE verifier is not a token. It's owned by the login flow — set in initiateLogin, consumed in exchangeCode — and nothing else has any business touching it. So clearTokens stops deleting it entirely:

export function clearTokens(): void {
  _tokens = null;
  sessionStorage.removeItem(SESSION_KEY);
  // PKCE verifier is owned by the login flow — NOT cleared here. Doing so
  // races AuthCallbackPage: the restore effect runs before the lazy callback
  // chunk loads, falls through to clearTokens(), and wipes the verifier that
  // initiateLogin just set — breaking the very next exchange.
}

The lesson generalises well beyond auth: be ruthless about ownership. A piece of transient state should have exactly one writer and one reader. The bug existed because two unrelated bits of code — session teardown and login handshake — both felt entitled to the same key. Give every scrap of state one owner, and a whole category of "works on my machine" race conditions just disappears.