Skip to main content

AI Usage Ops Dashboard Design

Date: 2026-06-07 Status: Approved for implementation planning

Summary

Add a global AI usage dashboard at /dashboard/ai-usage for Volvox.Bot operators. The page is part of the dashboard’s System Ops area and is visible only to users listed in BOT_OWNER_IDS. It aggregates the existing ai_usage table across every server so bot owners can inspect token volume, cache usage, cost, latency, throughput, and model/provider mix without jumping between server-scoped Analytics tabs. The implementation should not add a new telemetry store. ai_usage already records guild_id, channel_id, request type, model, input/output tokens, cache read/write tokens, cost, duration, search count, user id, and timestamp. Provider names can be parsed from the stored provider:model string, and guild names can be resolved from the bot client’s current guild cache with guild ID fallback.

Goals

  • Show global AI usage for bot owners, not server admins.
  • Make spend spikes easy to trace by server, provider, model, and request type.
  • Include cache effectiveness and throughput metrics, not just prompt/completion totals.
  • Reuse the existing owner-only Performance/Logs access patterns.
  • Keep the first implementation query-driven with no schema migration.
  • Keep the page useful when there is no data or when some guild names cannot be resolved.

Non-Goals

  • Do not expose this dashboard to server owners, server admins, moderators, or viewers.
  • Do not replace the server-scoped Analytics AI usage card.
  • Do not store guild names in ai_usage.
  • Do not add provider/model configuration controls to this page.
  • Do not manually edit docs/changelog.mdx; it is automation-owned except for repairs to generated output required to keep docs validation green.

Access Model

The dashboard page uses defense-in-depth checks:
  • web/src/app/dashboard/ai-usage/page.tsx calls isDashboardGlobalAdmin() and redirects non-owners to /dashboard.
  • web/src/app/api/ai-usage/route.ts calls authorizeRequestGlobalAdmin() before proxying.
  • The bot API route is mounted at /api/v1/ai-usage and accepts only x-api-secret traffic, matching the Performance route pattern.
  • The sidebar adds an AI Usage item under System Ops and includes it in the global-admin-only href set.

Data Model

Use the existing ai_usage columns:
  • input_tokens and output_tokens for total token volume.
  • cache_read_tokens and cache_creation_tokens for cache effectiveness.
  • duration_ms for latency and tokens-per-second calculations.
  • cost_usd for spend.
  • type for classify, respond, and safety request breakdowns.
  • model for model and provider grouping.
  • guild_id, channel_id, user_id, and search_count for drilldown context.
  • created_at for range filters and time-series buckets.
Provider parsing should use the first colon in model. Values without a colon are grouped under provider unknown and preserve the raw model string. TPS should be derived as (input_tokens + output_tokens) / greatest(duration_ms / 1000, 0.001). Rows with zero or missing duration should not divide by zero; aggregate TPS should use weighted tokens and duration rather than averaging per-row ratios blindly. Cache hit rate should be:
cache_read_tokens / nullif(input_tokens, 0)
Cache write rate should be:
cache_creation_tokens / nullif(input_tokens, 0)

API Shape

Query parameters:
  • range: 24h, 7d, 30d, or 90d; default 7d.
  • provider: optional exact provider filter.
  • model: optional exact model filter.
  • type: optional exact request type filter.
Response shape:
interface AiUsageOpsSnapshot {
  range: {
    preset: '24h' | '7d' | '30d' | '90d';
    from: string;
    to: string;
    interval: 'hour' | 'day';
  };
  summary: {
    requests: number;
    activeGuilds: number;
    inputTokens: number;
    outputTokens: number;
    totalTokens: number;
    cacheReadTokens: number;
    cacheCreationTokens: number;
    cacheReadRate: number | null;
    cacheWriteRate: number | null;
    costUsd: number;
    avgLatencyMs: number | null;
    p95LatencyMs: number | null;
    avgTokensPerSecond: number | null;
    searchCount: number;
  };
  timeseries: Array<{
    bucket: string;
    label: string;
    requests: number;
    inputTokens: number;
    outputTokens: number;
    totalTokens: number;
    cacheReadTokens: number;
    cacheCreationTokens: number;
    costUsd: number;
    avgLatencyMs: number | null;
    avgTokensPerSecond: number | null;
  }>;
  byProvider: AiUsageComparisonRow[];
  byModel: AiUsageComparisonRow[];
  byType: AiUsageComparisonRow[];
  topGuilds: Array<AiUsageComparisonRow & { guildId: string; guildName: string | null }>;
  recentExpensiveRequests: Array<{
    id: number;
    createdAt: string;
    guildId: string;
    guildName: string | null;
    channelId: string;
    type: string;
    provider: string;
    model: string;
    inputTokens: number;
    outputTokens: number;
    cacheReadTokens: number;
    cacheCreationTokens: number;
    costUsd: number;
    durationMs: number;
    tokensPerSecond: number | null;
    searchCount: number;
  }>;
}

interface AiUsageComparisonRow {
  name: string;
  requests: number;
  inputTokens: number;
  outputTokens: number;
  totalTokens: number;
  cacheReadTokens: number;
  cacheCreationTokens: number;
  cacheReadRate: number | null;
  cacheWriteRate: number | null;
  costUsd: number;
  avgLatencyMs: number | null;
  p95LatencyMs: number | null;
  avgTokensPerSecond: number | null;
  searchCount: number;
}

UI

The page follows the existing dashboard control-plane style from DESIGN.md. Sections:
  1. Header with range and optional provider/model/type filters.
  2. KPI row for requests, total tokens, input tokens, output tokens, cache read rate, cost, TPS, p95 latency, and active servers.
  3. Time-series chart for requests, token volume, and cost.
  4. Provider comparison table sorted by cost by default.
  5. Model comparison table sorted by cost by default.
  6. Request type mix for classify/respond/safety.
  7. Top servers table showing server name when available and guild ID fallback.
  8. Recent expensive or slow requests table for operator triage.
Use StableResponsiveContainer for every Recharts chart. Empty states should say the selected range has no AI usage rows instead of displaying fake zeros. Loading, error, and unauthorized states should match the existing Performance/Logs tone.

Error Handling

  • Invalid range or filter values return 400 from the bot API.
  • Database errors return 500 with a compact { error: string } response and log through src/logger.js.
  • The web proxy returns existing global-admin auth failures for non-owners.
  • Missing guild names are not errors; return guildName: null.
  • Unknown provider/model strings stay visible as raw values and group provider as unknown.

Tests

Backend:
  • Repository/helper tests for range parsing and SQL parameterization.
  • Aggregation tests for totals, cache rates, weighted TPS, p95 latency, provider parsing, and unknown model strings.
  • Route tests for x-api-secret enforcement, bad filters, empty data, and populated responses.
Web:
  • API route tests for authorizeRequestGlobalAdmin() gating and bot proxy path.
  • Sidebar test for AI Usage hidden from non-owners and visible to global admins.
  • Page/component tests for loading, empty, error, and populated states.
  • Page title test for /dashboard/ai-usage.
Manual/browser:
  • Run the dashboard locally with dev login.
  • Verify /dashboard/ai-usage redirects for non-owner status.
  • Verify the owner view renders desktop and mobile without overflow.
  • Verify filters update the request URL and tables/charts.

Implementation Notes

  • Prefer a new route file, src/api/routes/aiUsage.js, and a small repository module such as src/api/repositories/aiUsageRepository.js.
  • Mount /ai-usage near /performance in src/api/index.js.
  • Keep query SQL parameterized. Do not interpolate filter values.
  • Use provider:model parsing logic locally or a pure helper; do not import aiClient.js into API routes just to parse strings.
  • Do not add a migration unless implementation proves a query cannot be made efficient with the current indexes.
Last modified on June 8, 2026