Moderation (Active Sentry)

Active Sentry is Volvox.Bot’s moderation engine. It runs automatically, and you can tune each safety layer from the dashboard.

What it does

Auto-mod — The bot detects spam, blocked links, invite links, and AI safety issues, then runs your configured actions
Spam Detection — Classic phrase patterns and the AI spam category share one dashboard tab for thresholds and response actions
Content Safety — AI scores incoming messages across toxicity, harassment, doxxing, hate speech, sexual content, violence, self-harm, and child endangerment. Spam is handled separately in Spam Detection. The bot runs your configured actions when a category crosses its threshold
Warnings — Warnings accumulate and trigger escalating actions
Bans and kicks — Full ban and kick support with reason tracking
Timeouts — Discord native timeouts with configurable durations
Cases — Every moderation action creates a case with full context

Settings location

Feature area	Dashboard location
AI content safety model, thresholds, and actions	Settings -> Moderation & Safety -> Content Safety
Spam detection and spam response actions	Settings -> Moderation & Safety -> Spam Detection
Classic moderation, link filter, protected roles, logs, and notifications	Settings -> Moderation & Safety -> Moderation
Warning expiry, severity points, and escalation	Settings -> Moderation & Safety -> Warning Rules
Command access and role-based permissions	Settings -> Moderation & Safety -> Permissions
Audit-log retention for moderation evidence	Settings -> Moderation & Safety -> Audit Log
AI triage classifier and responder models	Settings -> AI & Automation -> Triage

How to use

Open the dashboard -> Moderation for your server.
Review active cases, filter by type or member.
Use Settings -> Moderation & Safety -> Moderation to set up classic auto-mod rules.
Use Settings -> Moderation & Safety -> Spam Detection to tune classic spam detection, spam threshold, and spam response actions.
Use Settings -> Moderation & Safety -> Warning Rules to configure warning expiry, severity points, and escalation actions.
Use Settings -> Moderation & Safety -> Content Safety to select the detection model, thresholds, and actions for other safety categories.
Use Settings -> AI & Automation -> Triage to select the supported classifier and response models for AI triage.

Moderation dashboard

The Moderation page at /dashboard/moderation is the case-management view. It shows moderation stats, recent cases, action filters, member filters, active state filters, and case details. Use it when you need the full moderation timeline for a server, including warnings, timeouts, kicks, bans, AI auto-mod cases, spam detections, and manual actions.

Review a member’s moderation history

Open Dashboard -> Members -> [member]. The Moderation History panel lists every case for that member, paginated 10 per page. A breakdown at the top tallies warnings, timeouts, kicks, and bans. Use Dashboard -> Moderation when you need the broader case table, action filters, and member ID filter across all cases.

Member details dashboard

The Members page at /dashboard/members lists server members with searchable identity, roles, XP, activity, and moderation signals. Opening a member detail page shows member profile data plus their moderation history. Use it when you need to inspect one member before taking action, export member data, or jump from a warning/case into that member’s broader history.

Warnings dashboard

Open Dashboard -> Warnings to review warning records without digging through the broader case table. The page shows total warnings, active warnings, high-severity active warnings, and the current top user by active warning points. Use the inventory filters to narrow records by severity, active state, or Discord user ID. Each warning row keeps the case link available so moderators can jump into the broader moderation history when needed.

Jump to the Discord log message

When you open a case, the Log Message field links straight to the original log post in Discord. The link appears whenever the bot still has the channel and message ID for that case. Select it to open Discord in a new tab at the exact log message. Cases logged before the bot started recording channel IDs still show the message ID as plain text — those cases stay readable but aren’t clickable.

Configure escalation rules so that repeated warnings automatically escalate to timeouts or bans.

Configuration

Classic moderation settings

Core moderation settings are in Settings -> Moderation & Safety -> Moderation:

Setting	Description
Auto-mod enabled	Toggle automatic moderation
Classic spam detection	Toggle phrase-pattern spam detection before AI moderation runs
Message spam protection	Select Relaxed, Standard, Strict, or Custom message-spam thresholds. Every hit deletes the excess message, and repeated hits can timeout the member
Link filter	Add or remove blocked domains for link enforcement. Runs whenever the Link filter toggle is on, even if the parent auto-mod toggle is paused. See Link filter
Invite blocking	Block Discord invite hosts (for example, `discord.gg`) by adding them to the Link filter blocklist
Log channel	Where the bot posts moderation actions

Message spam protection uses presets by default so most servers only select a sensitivity: Relaxed, Standard, or Strict. Strict allows 10 messages in 10 seconds; the first hit over that threshold deletes the hit message and applies a timeout. Use Custom only when you need to tune the message window, timeout trigger count, trigger tracking window, or timeout length.

Preset	Rate limit	Timeout escalation
Relaxed	10 messages / 10s	Timeout after 3 hits in 5 minutes, 5-minute timeout
Standard (default)	10 messages / 10s	Timeout after 2 hits in 3 minutes, 5-minute timeout
Strict	10 messages / 10s	Timeout on the first hit, 10-minute timeout

Every hit deletes the offending message even before a timeout applies. Select Custom to expand the advanced controls and tune the message window, timeout trigger count, trigger tracking window, or timeout length yourself. The custom controls stay collapsed on a preset and reset when you switch servers or reload the page.

Spam detection settings

Spam settings are in Settings -> Moderation & Safety -> Spam Detection. This is where AI spam controls live; Content Safety covers the other AI safety categories listed above.

Setting	Description
Enable Spam Detection	Toggle both phrase-pattern spam detection and the AI spam scoring category
Spam threshold	Tune the AI spam category threshold
Spam response	Select any combination of flag, delete, warn, timeout, kick, or ban for spam detections

Spam Detection defaults to on and can be disabled in the dashboard. When it is off, Volvox.Bot skips both classic phrase-pattern spam checks and the AI spam category. When Spam Detection is on, classic spam runs before AI scoring. If AI auto-moderation is also enabled, classic spam honors the Hard Delete spam response under Settings -> Moderation & Safety -> Spam Detection. Classic spam actions run independently of the log channel. If you leave the log channel unset, Volvox.Bot still applies the configured spam action; it just skips the Discord alert post. Delete only runs when the spam response includes delete. Every classic spam detection writes a spam.detect audit-log entry and a redacted debug-log entry with the alert/delete outcome, message ID, channel ID, and user ID.

Warning expiry and escalation

Use Settings -> Moderation & Safety -> Warning Rules to tune how warning pressure decays and when it becomes a stronger action.

Warning expiry days controls how long warning records count as active. 30 keeps the default 30-day window. 0 disables automatic expiry.
Severity points set how much each warning severity contributes. Low, medium, and high warnings default to 1, 2, and 3 points.
Warning thresholds define the active warning points, lookback window, and action. Timeout thresholds require a duration like 1h, 30m, or 7d; ban thresholds don’t use a duration.
When multiple thresholds match, Volvox.Bot applies the highest point threshold first, so a 5-point ban is not masked by an earlier 3-point timeout rule.

Example warning rules config:

moderation.warnings + escalation

{
  "moderation": {
    "warnings": {
      "expiryDays": 30,
      "severityPoints": { "low": 1, "medium": 2, "high": 3 }
    },
    "escalation": {
      "enabled": true,
      "thresholds": [
        { "points": 3, "withinDays": 7, "action": "timeout", "duration": "1h" },
        { "points": 5, "withinDays": 30, "action": "ban" }
      ]
    }
  }
}

If your config still uses the older warns field on escalation thresholds, Volvox.Bot reads it as points for backward compatibility. No manual migration is required, but use points for new entries.

Warning replies in Discord

When a moderator runs /warn, the bot responds in-channel with an embed instead of a plain confirmation line. The reply includes:

The case number with a direct link to Dashboard -> Moderation filtered to that case.
The warned member’s tag, mention, and a link to Dashboard -> Members -> [member].
The severity (Low, Medium, or High) and how many points that warning contributed.
The member’s active point progression — for example, 2 points -> 3 points — so moderators can see at a glance how close they are to the next threshold.
The reason text the moderator supplied, truncated if it would overflow the embed field limit.

The dashboard links use the DASHBOARD_URL environment variable when set, and fall back to https://volvox.bot.

AI auto-moderation settings

AI auto-moderation settings are in Settings -> Moderation & Safety -> Content Safety:

Setting	Description
Detection model	The supported provider/model used to score incoming messages
Incident report channel	Where the bot posts flagged AI auto-moderation reports
Incident ping roles	Staff roles pinged when Content Safety posts an incident report
Action ladders	Each category has its own ladder of Flag & Log, Hard Delete, Issue Warning, Temporary Timeout, Server Kick, and Permanent Ban. Turn an action off, or set the confidence percentage required before that specific action runs.
Excluded channels	Channels where Content Safety skips scoring, thresholds, and response actions
User DM notifications	Select whether AI auto-moderation warnings, timeouts, kicks, and bans send a direct message to the affected member. If you haven’t set AI-specific DM preferences, Content Safety follows your existing moderation DM notification settings.

warn creates a warning record, sends the warning DM when enabled, and applies your escalation rules. If multiple action thresholds are met for the same category, Volvox.Bot queues every enabled action once for the incident. Every Content Safety scoring pass writes a compact AI automod classified: ... entry to the operator log stream and records an automod row in the AI usage ledger. The visible log line includes the clean/flagged decision, selected action, and the top category scores as score%/threshold%. Expanded metadata keeps the same summary plus the strongest action-ladder threshold rows, so operators can tune ladders without exposing raw message content. Provider failures fail open by default so messages are not punished because an AI provider is down. Those failures are recorded as throttled ai_automod.provider_failed audit entries. If Discord accepts a timeout, kick, or ban but the moderation case cannot be created, Volvox.Bot records ai_automod.case_failed alongside the successful action audit so operators can repair the missing case trail.

Investigation mode

Investigation mode adds a pre-enforcement trust and fact-check pass for punitive Content Safety actions (warn, timeout, kick, and ban). It profiles the member’s message history, active unexpired warnings, and tenure; trusted-member cases can optionally run a web-search fact-check before punitive actions execute. High-confidence skip or downgrade recommendations can suppress punitive actions, but low-confidence suppressive suggestions are ignored so enforcement proceeds. Investigation mode is off by default until dashboard controls are available. Self-hosters can opt in through config by setting aiAutoMod.investigation.enabled to true, tuning minMessagesTrusted and minDaysActive, and optionally setting aiAutoMod.investigation.model. Leave model as null to reuse the guild’s Content Safety detection model.

Incident ping roles

Use Incident Ping Roles when Content Safety incident reports should notify staff reviewers. Content Safety pings the selected roles whenever it posts an incident report to the configured Incident Report Channel. Flag & Log actions generate these reports, including combined reports that also queue Hard Delete, Issue Warning, Temporary Timeout, Server Kick, or Permanent Ban. Incident pings only happen in the incident report channel post. They do not ping roles in member DMs, moderation case embeds, audit-log entries, or operator logs. If no incident report channel is configured, Volvox.Bot has nowhere to post the report and no roles are pinged. When to use it

You want immediate staff notifications for Content Safety incidents that post to the incident report channel.
You have dedicated moderation roles that should review flagged content in real time.
You route incident reports into an escalation or audit workflow that depends on role mentions.

How to configure

Open Settings -> Moderation & Safety -> Content Safety.
Set Incident Report Channel to the staff channel that should receive AI auto-moderation reports.
Use Incident Ping Roles to select the staff roles that should be mentioned on those reports.
Save your changes. The next incident report uses the selected roles.

For direct config updates, store numeric Discord role IDs in aiAutoMod.incidentMentionRoleIds:

aiAutoMod.incidentMentionRoleIds

{
  "aiAutoMod": {
    "incidentMentionRoleIds": ["123456789012345678", "234567890123456789"]
  }
}

Action ladders

Action ladders let you set a separate confidence threshold for every response action in every Content Safety category. Instead of one sensitivity number per category, you pick the score at which the bot should flag, delete, warn, timeout, kick, or ban. Turn off any actions you don’t want for that category. If a message clears more than one threshold, the bot queues each enabled action once for the incident. When to use it

You want soft actions (flag, delete) to trigger early but only escalate to a timeout or ban on very high confidence.
A category needs a different posture than the rest — for example, treating self-harm as review-only or letting doxxing auto-ban at high confidence.
You’re tuning false positives in one category without loosening every other category at the same time.

How to configure

Open Settings -> Moderation & Safety -> Content Safety.
Expand the category you want to tune (for example, Toxicity).
For each action in the ladder, toggle it on and drag the slider to the confidence percentage that should trigger it. Toggle an action off to disable it for that category.
Save your changes. The next scored message uses the new thresholds.

Defaults New servers and unmigrated categories start with these thresholds:

Standard categories (toxicity, spam, harassment, hate speech, sexual content, violence): flag at 55%, delete at 70%, warn at 75%, timeout at 88%. Kick and ban are off.
Self-harm: flag only at 30%. All destructive actions are off so the bot routes incidents to human review.
Doxxing: flag at 35%, delete at 60%, warn at 70%, timeout at 82%, ban at 92%.
Child endangerment: flag at 25%, delete at 50%, warn at 65%, timeout at 78%, ban at 85%.

Existing servers keep their previous per-category sensitivity and selected actions. The migration converts each enabled action to the category’s old threshold so behavior matches your current setup until you tune individual rungs.

moderation.aiAutoMod.actionThresholds

{
  "moderation": {
    "aiAutoMod": {
      "actionThresholds": {
        "toxicity": {
          "flag": 0.55,
          "delete": 0.7,
          "warn": 0.75,
          "timeout": 0.88,
          "kick": null,
          "ban": null
        },
        "selfHarm": {
          "flag": 0.3,
          "delete": null,
          "warn": null,
          "timeout": null,
          "kick": null,
          "ban": null
        }
      }
    }
  }
}

Use null (or omit the key) to disable that rung. Values are decimals between 0 and 1 representing model confidence.

Excluded channels

Excluded channels let you carve specific channels out of AI auto-moderation entirely. Messages posted in an excluded channel skip Content Safety scoring, thresholds, and every response action. Threads inherit their parent channel’s setting, so excluding #vent also excludes every thread under it without you adding each one by hand. When to use it

You run a venting, debate, or roleplay channel where stricter Content Safety responses cause more friction than they’re worth.
You have a staff-only channel where moderators discuss flagged content and don’t want the bot scoring their own quotes.
You’re rolling out Content Safety gradually and want to opt specific channels in (or out) before turning it on server-wide.

How to configure

Open Settings -> Moderation & Safety -> Content Safety.
Under Excluded Channels, select one or more channels from the picker.
Save your changes. Exclusions take effect on the next message.

To resume scoring a channel, remove it from the list and save. Manual moderation commands, the Link filter, and classic auto-mod rules are unaffected — only AI Content Safety honors this list.

moderation.aiAutoMod

{
  "moderation": {
    "aiAutoMod": {
      "enabled": true,
      "excludedChannelIds": ["123456789012345678", "987654321098765432"]
    }
  }
}

Link filter

The Link filter blocks messages that link to domains you’ve added to your blocklist. Use it to stop ad spam, phishing links, competitor invites, or unwanted referral URLs without enabling the rest of classic auto-mod. Once a message matches, the bot deletes it and notifies your configured moderator channel. The Link filter has its own toggle nested under classic moderation, so it runs even when the parent Auto-mod enabled toggle is paused. You can keep aggressive spam rules off while still enforcing your domain blocklist. When to use it

You want to block specific Discord invite hosts (for example, discord.gg) or shorteners (bit.ly).
You’re rolling out moderation gradually and only need URL enforcement right now.
You need to add or remove a domain quickly without touching other auto-mod rules.

How to configure

Open Settings -> Moderation & Safety -> Moderation -> Link filter.
Turn on the Link filter toggle.
In the domain input, paste one or more domains. You can separate entries with spaces, commas, semicolons, or new lines. URLs and Markdown links are normalized to their hostname automatically — https://www.example.com/path becomes example.com.
Select Add to save the entries. Invalid entries are rejected with an inline error and won’t be added.
To remove a domain, select the × next to it in the blocked list.

Who is exempt The Link filter respects your configured admin and moderator roles, plus any role listed under Protect Roles. The raw Discord Administrator permission alone does not grant a bypass for the Link filter — this lets server owners block links posted by other administrators when needed. Manual moderation actions and other auto-mod rules still treat the Administrator permission as an exemption. Example blocklist

Example blocked domains

discord.gg
bit.ly
example-phishing.com

Entries are stored as normalized hostnames in your server config:

moderation.linkFilter

{
  "moderation": {
    "linkFilter": {
      "enabled": true,
      "blockedDomains": ["discord.gg", "bit.ly", "example-phishing.com"]
    }
  }
}

Protected roles

Use Settings -> Moderation & Safety -> Moderation -> Protect Roles to mark roles whose members should never be moderated. Members with any selected role are exempt from:

Manual moderation commands and dashboard actions (warn, timeout, kick, ban, tempban).
AI auto-moderation responses from Content Safety.
Spam rate limiting and link/invite filters.
AI triage moderation nudges (the in-channel warning the bot posts when triage flags a message for moderation).

Use this for staff, partnered creators, or other accounts that should bypass every automated and manual moderation path. Removing the role restores normal moderation immediately on the next message.

How the bot sends AI auto-mod DMs

When the bot queues multiple destructive actions for the same message (for example, warn -> kick -> ban), each destructive step sends its own DM right before that action runs:

Actions taken lists actions the bot has already applied in this incident.
Actions planned lists the single next destructive action.
Triggered categories and Reason show what the model flagged and why.

The first DM in an incident uses a title like Moderation actions planned in <server name>. Follow-up DMs that include actions already taken use a title like Moderation action update in <server name>. The updated title helps members recognize this is the same ongoing incident. The bot tracks which actions a DM has already covered and never sends a duplicate for the same action. The bot only sends DMs for actions you have enabled under User DM Notifications. The bot records every triggered AI auto-moderation response in the audit log, with one entry per action that actually runs. Classic spam detections are logged as spam.detect. Legacy configs that still cross a category threshold without a selected action record a “no action” entry for auditability. New action-ladder configs only trigger when at least one enabled action threshold is met.

AI auto-mod case reason format

Cases opened by Content Safety use a compact, structured reason so case tables stay scannable. Each reason follows this shape:

AI Auto Mod: <Category> <score>%, <Category> <score>% / <action>, <action>

Categories and scores list every category that crossed its threshold, with the model’s confidence as a percentage.
Actions list every response action that ran for the incident (for example, delete, warn, timeout, kick, ban), separated by commas.

For example, a message that tripped two categories and led to a delete plus a warning would record:

AI Auto Mod: Toxicity 82%, Harassment 64% / delete, warn

You don’t need to configure anything. New cases adopt this format automatically. Existing cases keep their original reason text so historical context isn’t rewritten.

AI triage settings

Triage model settings are in Settings -> AI & Automation -> Triage. The classifier and response engines use the same supported model dropdown as Content Safety. The dashboard validates provider:model strings when you save, so it rejects unsupported or malformed selections up front. If a stored config still contains an invalid value at runtime, the bot logs a warning. It falls back through your configured models in order. When none are supported, it uses the default AI model.

Triage latency tuning

These are advanced tuning knobs; defaults work for most servers. The dashboard exposes these fields under Settings -> AI & Automation -> Triage -> Performance. Config loading normalizes each value and clamps it to the listed range. The runtime currently enforces responseCooldownMs and memoryTimeoutMs.

Setting	Description	Default	Range
`responseCooldownMs`	Minimum gap between bot replies in the same channel. Set to `0` to allow replies as fast as the model and other gates permit. Direct mentions bypass this cooldown when fast direct replies are enabled.	0	0-60000
`triageDebounceMs`	Reserved for a future debounce feature; has no effect today. The current channel evaluation timer still uses `triage.defaultInterval`.	500	0-2000
`memoryTimeoutMs`	Maximum time triage waits for user memory context before continuing without memory. This keeps mem0 latency from blocking classification or response generation.	2000	500-30000

Triage classification tuning

Each classifier call returns two independent scores. Confidence measures how certain the model is about the label it picked (for example, chime-in vs. skip). Relevance measures how much value a reply would add to the conversation. Both scores must clear their threshold before triage proceeds to the response stage. Direct @mentions bypass these gates when fast direct replies are enabled, and safety-critical classifications always bypass them. Set these knobs in the triage config block. Defaults work for most servers; raise the thresholds to make the bot pickier, lower them to make it chime in more often.

Setting	Description	Default	Range
`confidenceThreshold`	Minimum label certainty required before the bot acts on a classification. Classifications below this score are dropped.	0.6	0-1
`relevanceThreshold`	Minimum response-value score required before the bot generates a reply. A high-confidence label with low relevance is treated as a skip.	0.4	0-1
`soloUserBoost`	Lowers both thresholds when the recent channel context contains exactly one non-bot participant, since a solo speaker is implicitly addressing the bot. Only applies to `chime-in` classifications.	`true`	boolean
`soloUserBoostAmount`	How much to subtract from each threshold while the solo-user boost is active. Effective thresholds are clamped at 0, so large values reduce — but never invert — the gate.	0.15	0-1
`maxMessageChars`	Hard cap on the number of characters from each newly buffered Discord message passed into the classifier. Recent channel history is truncated separately before the prompt is built.	1000	100-10000
`maxReplyChars`	Hard cap on the number of characters copied from a referenced Discord reply into the classifier context. Longer referenced messages are truncated before the prompt is built.	500	100-5000

Example triage block in your server config:

config.json

{
  "triage": {
    "confidenceThreshold": 0.6,
    "relevanceThreshold": 0.4,
    "soloUserBoost": true,
    "soloUserBoostAmount": 0.15,
    "maxMessageChars": 1000,
    "maxReplyChars": 500
  }
}

Triage classification logs use compact AI triage classified: ... and AI triage skipped: ... entries with both confidence and relevance alongside the picked label, so you can spot-check threshold behavior before adjusting either knob.

Provider capability gating

Triage checks provider capabilities before building the responder prompt. When the selected triage model can’t search the web, triage appends a SEARCH_UNAVAILABLE directive to the responder prompt. The bot then hedges claims about current events, prices, or other time-sensitive facts instead of inventing answers. System prompt information — the bot’s identity, team, and invite links — stays authoritative regardless. To restore live search, select a model whose provider supports web search for triage responses. MiniMax M2.7 now also advertises thinking support, so reasoning traces appear when the underlying model emits them. New server config uses contextMessages: 5 by default. Runtime triage only falls back to 10 recent messages when you leave contextMessages unset. Fast direct replies are enabled by default. With that toggle on, direct @mentions and replies to the bot force immediate triage evaluation and bypass the response cooldown for mentioned messages. Classification still runs before the responder, so safety and relevance checks remain in place. Settings changes apply on reload or the next config refresh. You don’t need to restart the bot process.

Getting Started

Features

Configuration

Active Sentry Discord Auto-Moderation and Case Management

Moderation (Active Sentry)

What it does

Settings location

How to use

Moderation dashboard

Review a member’s moderation history

Member details dashboard

Warnings dashboard

Jump to the Discord log message

Configuration

Classic moderation settings

Spam detection settings

Warning expiry and escalation

Warning replies in Discord

AI auto-moderation settings

Investigation mode

Incident ping roles

Action ladders

Excluded channels

Link filter

Protected roles

How the bot sends AI auto-mod DMs

AI auto-mod case reason format

AI triage settings

Triage latency tuning

Triage classification tuning

Provider capability gating

​Moderation (Active Sentry)

​What it does

​Settings location

​How to use

​Moderation dashboard

​Review a member’s moderation history

​Member details dashboard

​Warnings dashboard

​Jump to the Discord log message

​Configuration

​Classic moderation settings

​Spam detection settings

​Warning expiry and escalation

​Warning replies in Discord

​AI auto-moderation settings

​Investigation mode

​Incident ping roles

​Action ladders

​Excluded channels

​Link filter

​Protected roles

​How the bot sends AI auto-mod DMs

​AI auto-mod case reason format

​AI triage settings

​Triage latency tuning

​Triage classification tuning

​Provider capability gating

Moderation (Active Sentry)

What it does

Settings location

How to use

Moderation dashboard

Review a member’s moderation history

Member details dashboard

Warnings dashboard

Jump to the Discord log message

Configuration

Classic moderation settings

Spam detection settings

Warning expiry and escalation

Warning replies in Discord

AI auto-moderation settings

Investigation mode

Incident ping roles

Action ladders

Excluded channels

Link filter

Protected roles

How the bot sends AI auto-mod DMs

AI auto-mod case reason format

AI triage settings

Triage latency tuning

Triage classification tuning

Provider capability gating