When Your Analytics Dashboard Shows 105%

I ran the analytics dashboard and it told me 105% of users had used dictation. The dashboard was bash and jq, four components, nothing fancy. The number was impossible.

📈 Feature Distribution
────────────────────────────────────────
Dictation:                     105%
Voice AI:                       11%

105%. Not 100%. Not "approximately all." One hundred and five percent of users used dictation. More dictation sessions than total sessions. I stared at the terminal for a solid thirty seconds before my brain accepted what my eyes were seeing.

The Data That Broke Math

The raw numbers:

{
  "totalSessions": 17,
  "totalDictationSessions": 18,
  "totalVoiceAISessions": 2
}

Eighteen dictation sessions out of seventeen total sessions. This is the kind of number that makes you question your entire analytics pipeline. Is the denominator wrong? The numerator? Both? Am I dividing by the wrong thing? Did I accidentally deploy a counter that counts backwards?

The temptation was to cap it at 100% in the display code and move on. I have shipped that kind of fix before. But if the data was wrong here, it was wrong everywhere.

It Felt Like a Script

The telemetry stack is four components. You could sketch it on a napkin:

Yakki (macOS app)
    ↓ batched events
Cloudflare Worker
    ↓ aggregated counters
KV Store
    ↓ API
Bash dashboard

The dashboard code was straightforward division. If the output was wrong, the data was wrong. So I started at the source.

Hypothesis 1: The dashboard math is broken. I checked. 18 / 17 * 100 = 105.88. The math was fine. The data was the problem.

Hypothesis 2: The client is sending duplicate events. I added logging to the macOS app's telemetry batch sender. Each session generated exactly one session_start event with a unique session ID. No duplicates on the client side.

Hypothesis 3: The Worker is counting things twice. I opened the Cloudflare Worker code and found the event handler:

async function handleSessionEvent(event, env) {
  const stats = await getStats(env);
  stats.totalSessions++;

  if (event.feature === 'dictation') {
    stats.totalDictationSessions++;
  }

  await saveStats(env, stats);
}

Clean. Simple. And hiding a subtle distributed systems bug.

The actual problem: Cloudflare Workers can receive the same request more than once. Network retries. Edge-location failovers. The client's exponential backoff after a timeout that wasn't actually a failure. Any of these can cause a single session event to be processed multiple times.

When that happens, the counters diverge. The feature counter increments. The total counter might not. Which duplicate hits which edge location, which KV write wins the eventual consistency race, that determines whether your math is possible or not.

The Fix (And Why Order Matters)

The solution was session-level deduplication. Before incrementing any counter, check if this session has already been counted:

async function handleSessionEvent(event, env) {
  const sessionKey = `counted_session_${event.sessionId}`;
  const alreadyCounted = await env.TELEMETRY_KV.get(sessionKey);

  if (alreadyCounted) {
    // Already processed this session.
    // Return success so the client doesn't retry.
    return;
  }

  // Mark as counted with a 24-hour TTL
  await env.TELEMETRY_KV.put(sessionKey, 'true', {
    expirationTtl: 86400
  });

  // Now safe to increment
  const stats = await getStats(env);
  stats.totalSessions++;

  if (event.feature === 'dictation') {
    stats.totalDictationSessions++;
  }

  await saveStats(env, stats);
}

The 24-hour TTL is important. Without it, the KV store accumulates session keys forever. With it, old keys expire naturally, and the storage stays bounded. Twenty-four hours is generous enough to catch any retry window.

But I also learned something about defensive analytics: never trust your own data pipeline completely. So I added validation to the dashboard itself:

if [ "$DICTATION_SESSIONS" -gt "$TOTAL_SESSIONS" ]; then
    echo "⚠️  Data Integrity Issue Detected"
    echo "    Dictation sessions ($DICTATION_SESSIONS) > Total ($TOTAL_SESSIONS)"
    echo "    Possible cause: duplicate event processing"
    echo "    Run: ./scripts/repair-counters.sh"
fi

The dashboard now tells me when the data smells wrong, instead of silently rendering impossible numbers as though they were normal.

What This Taught Me

Distributed counters are a solved problem, but only if you know you have one. When you are writing a Cloudflare Worker that increments a number in KV, it doesn't feel like a distributed system. It feels like counter++. That's the trap.

I should have made the event handler idempotent from day one. A function that assumes it will be called exactly once is a function that hasn't met a network. Every handler should be safe to call twice with the same input. Not as good practice, but as a survival requirement.

I should have built validation into the dashboard itself. Before rendering a percentage, check if it's possible. Before displaying a count, check if it's positive. The dashboard should have been the last line of defense, not a dumb pipe from database to terminal.

I should have written a repair script before I needed one. When bad data gets in (and it will) you need a way to fix it without redeploying anything. A script that recalculates aggregates from raw events will save you at 2 AM when nothing else can.

And I should have had alerts on impossible states. If feature_sessions > total_sessions, something is broken. I want to know immediately, not three weeks later when I happen to glance at a dashboard.

When you are a solo developer with 20 users, you might think these problems don't apply to you. The moment you put a Worker at the edge and a KV store behind it, you have a distributed system. It doesn't matter that your total traffic fits in a single HTTP request log. The failure modes are the same.

A counter incremented by 18 events across 3 edge locations over 2 hours has the same consistency challenges as one processing millions. The math doesn't care about scale. Neither do the decisions you made while the numbers were wrong.

This is part of an ongoing series about building Yakki, a macOS dictation app. Want to see how the telemetry system itself was built? Read Telemetry That Respects Your Users: the opt-in, privacy-first analytics pipeline that produced these (eventually accurate) numbers.