Moderation (Active Sentry)
Active Sentry is Volvox.Bot’s moderation engine. It runs automatically, and you can tune each safety layer from the dashboard.What it does
- Auto-mod — The bot detects spam, blocked links, invite links, and AI safety issues, then runs your configured actions
- Spam Detection — Classic phrase patterns and the AI spam category share one dashboard tab for thresholds and response actions
- Content Safety — AI scores incoming messages across toxicity, harassment, doxxing, hate speech, sexual content, violence, self-harm, and child endangerment. Spam is handled separately in Spam Detection. The bot runs your configured actions when a category crosses its threshold
- Warnings — Warnings accumulate and trigger escalating actions
- Bans and kicks — Full ban and kick support with reason tracking
- Timeouts — Discord native timeouts with configurable durations
- Cases — Every moderation action creates a case with full context
Settings location
| Feature area | Dashboard location |
|---|---|
| AI content safety model, thresholds, and actions | Settings -> Moderation & Safety -> Content Safety |
| Spam detection and spam response actions | Settings -> Moderation & Safety -> Spam Detection |
| Classic moderation, link filter, protected roles, logs, and notifications | Settings -> Moderation & Safety -> Moderation |
| Warning expiry, severity points, and escalation | Settings -> Moderation & Safety -> Warning Rules |
| Command access and role-based permissions | Settings -> Moderation & Safety -> Permissions |
| Audit-log retention for moderation evidence | Settings -> Moderation & Safety -> Audit Log |
| AI triage classifier and responder models | Settings -> AI & Automation -> Triage |
How to use
- Open the dashboard -> Moderation for your server.
- Review active cases, filter by type or member.
- Use Settings -> Moderation & Safety -> Moderation to set up classic auto-mod rules.
- Use Settings -> Moderation & Safety -> Spam Detection to tune classic spam detection, spam threshold, and spam response actions.
- Use Settings -> Moderation & Safety -> Warning Rules to configure warning expiry, severity points, and escalation actions.
- Use Settings -> Moderation & Safety -> Content Safety to select the detection model, thresholds, and actions for other safety categories.
- Use Settings -> AI & Automation -> Triage to select the supported classifier and response models for AI triage.
Moderation dashboard
The Moderation page at/dashboard/moderation is the case-management view. It shows moderation stats, recent cases, action filters, member filters, active state filters, and case details.
Use it when you need the full moderation timeline for a server, including warnings, timeouts, kicks, bans, AI auto-mod cases, spam detections, and manual actions.
Review a member’s moderation history
Open Dashboard -> Members -> [member]. The Moderation History panel lists every case for that member, paginated 10 per page. A breakdown at the top tallies warnings, timeouts, kicks, and bans. Use Dashboard -> Moderation when you need the broader case table, action filters, and member ID filter across all cases.Member details dashboard
The Members page at/dashboard/members lists server members with searchable identity, roles, XP, activity, and moderation signals. Opening a member detail page shows member profile data plus their moderation history.
Use it when you need to inspect one member before taking action, export member data, or jump from a warning/case into that member’s broader history.
Warnings dashboard
Open Dashboard -> Warnings to review warning records without digging through the broader case table. The page shows total warnings, active warnings, high-severity active warnings, and the current top user by active warning points. Use the inventory filters to narrow records by severity, active state, or Discord user ID. Each warning row keeps the case link available so moderators can jump into the broader moderation history when needed.Jump to the Discord log message
When you open a case, the Log Message field links straight to the original log post in Discord. The link appears whenever the bot still has the channel and message ID for that case. Select it to open Discord in a new tab at the exact log message. Cases logged before the bot started recording channel IDs still show the message ID as plain text — those cases stay readable but aren’t clickable.Configure escalation rules so that repeated warnings automatically escalate to timeouts or bans.
Configuration
Classic moderation settings
Core moderation settings are in Settings -> Moderation & Safety -> Moderation:| Setting | Description |
|---|---|
| Auto-mod enabled | Toggle automatic moderation |
| Classic spam detection | Toggle phrase-pattern spam detection before AI moderation runs |
| Message spam protection | Select Relaxed, Standard, Strict, or Custom message-spam thresholds. Every hit deletes the excess message, and repeated hits can timeout the member |
| Link filter | Add or remove blocked domains for link enforcement. Runs whenever the Link filter toggle is on, even if the parent auto-mod toggle is paused. See Link filter |
| Invite blocking | Block Discord invite hosts (for example, discord.gg) by adding them to the Link filter blocklist |
| Log channel | Where the bot posts moderation actions |
| Preset | Rate limit | Timeout escalation |
|---|---|---|
| Relaxed | 10 messages / 10s | Timeout after 3 hits in 5 minutes, 5-minute timeout |
| Standard (default) | 10 messages / 10s | Timeout after 2 hits in 3 minutes, 5-minute timeout |
| Strict | 10 messages / 10s | Timeout on the first hit, 10-minute timeout |
Spam detection settings
Spam settings are in Settings -> Moderation & Safety -> Spam Detection. This is where AI spam controls live; Content Safety covers the other AI safety categories listed above.| Setting | Description |
|---|---|
| Enable Spam Detection | Toggle both phrase-pattern spam detection and the AI spam scoring category |
| Spam threshold | Tune the AI spam category threshold |
| Spam response | Select any combination of flag, delete, warn, timeout, kick, or ban for spam detections |
delete. Every classic spam detection writes a spam.detect audit-log entry and a redacted debug-log entry with the alert/delete outcome, message ID, channel ID, and user ID.
Warning expiry and escalation
Use Settings -> Moderation & Safety -> Warning Rules to tune how warning pressure decays and when it becomes a stronger action.- Warning expiry days controls how long warning records count as active.
30keeps the default 30-day window.0disables automatic expiry. - Severity points set how much each warning severity contributes. Low, medium, and high warnings default to
1,2, and3points. - Warning thresholds define the active warning points, lookback window, and action. Timeout thresholds require a duration like
1h,30m, or7d; ban thresholds don’t use a duration. - When multiple thresholds match, Volvox.Bot applies the highest point threshold first, so a 5-point ban is not masked by an earlier 3-point timeout rule.
moderation.warnings + escalation
warns field on escalation thresholds, Volvox.Bot reads it as points for backward compatibility. No manual migration is required, but use points for new entries.
Warning replies in Discord
When a moderator runs/warn, the bot responds in-channel with an embed instead of a plain confirmation line. The reply includes:
- The case number with a direct link to Dashboard -> Moderation filtered to that case.
- The warned member’s tag, mention, and a link to Dashboard -> Members -> [member].
- The severity (Low, Medium, or High) and how many points that warning contributed.
- The member’s active point progression — for example,
2 points -> 3 points— so moderators can see at a glance how close they are to the next threshold. - The reason text the moderator supplied, truncated if it would overflow the embed field limit.
DASHBOARD_URL environment variable when set, and fall back to https://volvox.bot.
AI auto-moderation settings
AI auto-moderation settings are in Settings -> Moderation & Safety -> Content Safety:| Setting | Description |
|---|---|
| Detection model | The supported provider/model used to score incoming messages |
| Incident report channel | Where the bot posts flagged AI auto-moderation reports |
| Incident ping roles | Staff roles pinged when Content Safety posts an incident report |
| Action ladders | Each category has its own ladder of Flag & Log, Hard Delete, Issue Warning, Temporary Timeout, Server Kick, and Permanent Ban. Turn an action off, or set the confidence percentage required before that specific action runs. |
| Excluded channels | Channels where Content Safety skips scoring, thresholds, and response actions |
| User DM notifications | Select whether AI auto-moderation warnings, timeouts, kicks, and bans send a direct message to the affected member. If you haven’t set AI-specific DM preferences, Content Safety follows your existing moderation DM notification settings. |
warn creates a warning record, sends the warning DM when enabled, and applies your escalation rules. If multiple action thresholds are met for the same category, Volvox.Bot queues every enabled action once for the incident.
Every Content Safety scoring pass writes a compact AI automod classified: ... entry to the operator log stream and records an automod row in the AI usage ledger. The visible log line includes the clean/flagged decision, selected action, and the top category scores as score%/threshold%. Expanded metadata keeps the same summary plus the strongest action-ladder threshold rows, so operators can tune ladders without exposing raw message content.
Provider failures fail open by default so messages are not punished because an AI provider is down. Those failures are recorded as throttled ai_automod.provider_failed audit entries. If Discord accepts a timeout, kick, or ban but the moderation case cannot be created, Volvox.Bot records ai_automod.case_failed alongside the successful action audit so operators can repair the missing case trail.
Incident ping roles
Use Incident Ping Roles when Content Safety incident reports should notify staff reviewers. Content Safety pings the selected roles whenever it posts an incident report to the configured Incident Report Channel. Flag & Log actions generate these reports, including combined reports that also queue Hard Delete, Issue Warning, Temporary Timeout, Server Kick, or Permanent Ban. Incident pings only happen in the incident report channel post. They do not ping roles in member DMs, moderation case embeds, audit-log entries, or operator logs. If no incident report channel is configured, Volvox.Bot has nowhere to post the report and no roles are pinged. When to use it- You want immediate staff notifications for Content Safety incidents that post to the incident report channel.
- You have dedicated moderation roles that should review flagged content in real time.
- You route incident reports into an escalation or audit workflow that depends on role mentions.
- Open Settings -> Moderation & Safety -> Content Safety.
- Set Incident Report Channel to the staff channel that should receive AI auto-moderation reports.
- Use Incident Ping Roles to select the staff roles that should be mentioned on those reports.
- Save your changes. The next incident report uses the selected roles.
aiAutoMod.incidentMentionRoleIds:
aiAutoMod.incidentMentionRoleIds
Action ladders
Action ladders let you set a separate confidence threshold for every response action in every Content Safety category. Instead of one sensitivity number per category, you pick the score at which the bot should flag, delete, warn, timeout, kick, or ban. Turn off any actions you don’t want for that category. If a message clears more than one threshold, the bot queues each enabled action once for the incident. When to use it- You want soft actions (flag, delete) to trigger early but only escalate to a timeout or ban on very high confidence.
- A category needs a different posture than the rest — for example, treating self-harm as review-only or letting doxxing auto-ban at high confidence.
- You’re tuning false positives in one category without loosening every other category at the same time.
- Open Settings -> Moderation & Safety -> Content Safety.
- Expand the category you want to tune (for example, Toxicity).
- For each action in the ladder, toggle it on and drag the slider to the confidence percentage that should trigger it. Toggle an action off to disable it for that category.
- Save your changes. The next scored message uses the new thresholds.
- Standard categories (toxicity, spam, harassment, hate speech, sexual content, violence): flag at 55%, delete at 70%, warn at 75%, timeout at 88%. Kick and ban are off.
- Self-harm: flag only at 30%. All destructive actions are off so the bot routes incidents to human review.
- Doxxing: flag at 35%, delete at 60%, warn at 70%, timeout at 82%, ban at 92%.
- Child endangerment: flag at 25%, delete at 50%, warn at 65%, timeout at 78%, ban at 85%.
moderation.aiAutoMod.actionThresholds
null (or omit the key) to disable that rung. Values are decimals between 0 and 1 representing model confidence.
Excluded channels
Excluded channels let you carve specific channels out of AI auto-moderation entirely. Messages posted in an excluded channel skip Content Safety scoring, thresholds, and every response action. Threads inherit their parent channel’s setting, so excluding#vent also excludes every thread under it without you adding each one by hand.
When to use it
- You run a venting, debate, or roleplay channel where stricter Content Safety responses cause more friction than they’re worth.
- You have a staff-only channel where moderators discuss flagged content and don’t want the bot scoring their own quotes.
- You’re rolling out Content Safety gradually and want to opt specific channels in (or out) before turning it on server-wide.
- Open Settings -> Moderation & Safety -> Content Safety.
- Under Excluded Channels, select one or more channels from the picker.
- Save your changes. Exclusions take effect on the next message.
moderation.aiAutoMod
Link filter
The Link filter blocks messages that link to domains you’ve added to your blocklist. Use it to stop ad spam, phishing links, competitor invites, or unwanted referral URLs without enabling the rest of classic auto-mod. Once a message matches, the bot deletes it and notifies your configured moderator channel. The Link filter has its own toggle nested under classic moderation, so it runs even when the parent Auto-mod enabled toggle is paused. You can keep aggressive spam rules off while still enforcing your domain blocklist. When to use it- You want to block specific Discord invite hosts (for example,
discord.gg) or shorteners (bit.ly). - You’re rolling out moderation gradually and only need URL enforcement right now.
- You need to add or remove a domain quickly without touching other auto-mod rules.
- Open Settings -> Moderation & Safety -> Moderation -> Link filter.
- Turn on the Link filter toggle.
- In the domain input, paste one or more domains. You can separate entries with spaces, commas, semicolons, or new lines. URLs and Markdown links are normalized to their hostname automatically —
https://www.example.com/pathbecomesexample.com. - Select Add to save the entries. Invalid entries are rejected with an inline error and won’t be added.
- To remove a domain, select the × next to it in the blocked list.
Administrator permission alone does not grant a bypass for the Link filter — this lets server owners block links posted by other administrators when needed. Manual moderation actions and other auto-mod rules still treat the Administrator permission as an exemption.
Example blocklist
Example blocked domains
moderation.linkFilter
Protected roles
Use Settings -> Moderation & Safety -> Moderation -> Protect Roles to mark roles whose members should never be moderated. Members with any selected role are exempt from:- Manual moderation commands and dashboard actions (warn, timeout, kick, ban, tempban).
- AI auto-moderation responses from Content Safety.
- Spam rate limiting and link/invite filters.
- AI triage moderation nudges (the in-channel warning the bot posts when triage flags a message for moderation).
How the bot sends AI auto-mod DMs
When the bot queues multiple destructive actions for the same message (for example, warn -> kick -> ban), each destructive step sends its own DM right before that action runs:- Actions taken lists actions the bot has already applied in this incident.
- Actions planned lists the single next destructive action.
- Triggered categories and Reason show what the model flagged and why.
<server name>. Follow-up DMs that include actions already taken use a title like Moderation action update in <server name>. The updated title helps members recognize this is the same ongoing incident. The bot tracks which actions a DM has already covered and never sends a duplicate for the same action. The bot only sends DMs for actions you have enabled under User DM Notifications.
The bot records every triggered AI auto-moderation response in the audit log, with one entry per action that actually runs. Classic spam detections are logged as spam.detect. Legacy configs that still cross a category threshold without a selected action record a “no action” entry for auditability. New action-ladder configs only trigger when at least one enabled action threshold is met.
AI auto-mod case reason format
Cases opened by Content Safety use a compact, structured reason so case tables stay scannable. Each reason follows this shape:- Categories and scores list every category that crossed its threshold, with the model’s confidence as a percentage.
- Actions list every response action that ran for the incident (for example,
delete,warn,timeout,kick,ban), separated by commas.
AI triage settings
Triage model settings are in Settings -> AI & Automation -> Triage. The classifier and response engines use the same supported model dropdown as Content Safety. The dashboard validatesprovider:model strings when you save, so it rejects unsupported or malformed selections up front. If a stored config still contains an invalid value at runtime, the bot logs a warning. It falls back through your configured models in order. When none are supported, it uses the default AI model.
Triage latency tuning
These are advanced tuning knobs; defaults work for most servers. The dashboard exposes these fields under Settings -> AI & Automation -> Triage -> Performance. Config loading normalizes each value and clamps it to the listed range. The runtime currently enforcesresponseCooldownMs and memoryTimeoutMs.
| Setting | Description | Default | Range |
|---|---|---|---|
responseCooldownMs | Minimum gap between bot replies in the same channel. Set to 0 to allow replies as fast as the model and other gates permit. Direct mentions bypass this cooldown when fast direct replies are enabled. | 0 | 0-60000 |
triageDebounceMs | Reserved for a future debounce feature; has no effect today. The current channel evaluation timer still uses triage.defaultInterval. | 500 | 0-2000 |
memoryTimeoutMs | Maximum time triage waits for user memory context before continuing without memory. This keeps mem0 latency from blocking classification or response generation. | 2000 | 500-30000 |
Triage classification tuning
Each classifier call returns two independent scores. Confidence measures how certain the model is about the label it picked (for example,chime-in vs. skip). Relevance measures how much value a reply would add to the conversation. Both scores must clear their threshold before triage proceeds to the response stage. Direct @mentions bypass these gates when fast direct replies are enabled, and safety-critical classifications always bypass them.
Set these knobs in the triage config block. Defaults work for most servers; raise the thresholds to make the bot pickier, lower them to make it chime in more often.
| Setting | Description | Default | Range |
|---|---|---|---|
confidenceThreshold | Minimum label certainty required before the bot acts on a classification. Classifications below this score are dropped. | 0.6 | 0-1 |
relevanceThreshold | Minimum response-value score required before the bot generates a reply. A high-confidence label with low relevance is treated as a skip. | 0.4 | 0-1 |
soloUserBoost | Lowers both thresholds when the recent channel context contains exactly one non-bot participant, since a solo speaker is implicitly addressing the bot. Only applies to chime-in classifications. | true | boolean |
soloUserBoostAmount | How much to subtract from each threshold while the solo-user boost is active. Effective thresholds are clamped at 0, so large values reduce — but never invert — the gate. | 0.15 | 0-1 |
maxMessageChars | Hard cap on the number of characters from each newly buffered Discord message passed into the classifier. Recent channel history is truncated separately before the prompt is built. | 1000 | 100-10000 |
maxReplyChars | Hard cap on the number of characters copied from a referenced Discord reply into the classifier context. Longer referenced messages are truncated before the prompt is built. | 500 | 100-5000 |
config.json
AI triage classified: ... and AI triage skipped: ... entries with both confidence and relevance alongside the picked label, so you can spot-check threshold behavior before adjusting either knob.
Provider capability gating
Triage checks provider capabilities before building the responder prompt. When the selected triage model can’t search the web, triage appends aSEARCH_UNAVAILABLE directive to the responder prompt. The bot then hedges claims about current events, prices, or other time-sensitive facts instead of inventing answers. System prompt information — the bot’s identity, team, and invite links — stays authoritative regardless. To restore live search, select a model whose provider supports web search for triage responses. MiniMax M2.7 now also advertises thinking support, so reasoning traces appear when the underlying model emits them.
New server config uses contextMessages: 5 by default. Runtime triage only falls back to 10 recent messages when you leave contextMessages unset.
Fast direct replies are enabled by default. With that toggle on, direct @mentions and replies to the bot force immediate triage evaluation and bypass the response cooldown for mentioned messages. Classification still runs before the responder, so safety and relevance checks remain in place.
Settings changes apply on reload or the next config refresh. You don’t need to restart the bot process.