Skip to content

Content Moderation Best Practices (2025)

A comprehensive guide to implementing content moderation in modern applications.

The Modern Moderation Stack

Content moderation in 2025 requires:

  1. Multi-modal coverage - Text, images, and video
  2. Real-time decisions - Sub-second response times
  3. Configurable policies - Different rules for different contexts
  4. Audit trails - Every decision logged and reviewable
  5. Human escalation - AI + human review for edge cases

Core Principles

1. Defense in Depth

Never rely on a single layer of protection:

User Input → Client Validation → API Moderation → Database Rules → Human Review

Each layer catches what the previous missed.

2. Fail Gracefully

Decide your failure mode upfront:

ts
try {
  const result = await vettly.check({ content, contentType: 'text' })
  if (result.action === 'block') {
    return reject(content)
  }
} catch (error) {
  // Choose one:
  // Option A: Fail closed (safer, may block legitimate content)
  return reject(content)

  // Option B: Fail open (riskier, but better UX)
  return allow(content)
}

For high-risk contexts (payments, legal), fail closed. For social features, consider failing open with async review.

3. Context Matters

A message saying "kill it!" means different things in:

  • A gaming chat (probably fine)
  • A support ticket (needs review)
  • A threat report (escalate immediately)

Configure different policies for different contexts:

yaml
# gaming-chat.yaml
categories:
  violence:
    threshold: 0.9  # Very permissive
    action: flag

# support-tickets.yaml
categories:
  violence:
    threshold: 0.5  # More sensitive
    action: block

Text Moderation

Best Practices

  1. Moderate before storage - Check content before saving to database
  2. Include context - Send surrounding messages for better accuracy
  3. Handle edits - Re-moderate when users edit content
  4. Rate limit - Prevent abuse with request limits

Example Flow

ts
// 1. Check content
const result = await vettly.check({
  content: userMessage,
  contentType: 'text',
  policyId: 'community-chat'
})

// 2. Handle decision
switch (result.action) {
  case 'block':
    // Don't save, notify user
    return { error: 'Message not allowed' }

  case 'flag':
    // Save but queue for review
    await db.messages.create({
      content: userMessage,
      flagged: true,
      moderationId: result.decisionId
    })
    break

  case 'allow':
    // Save normally
    await db.messages.create({ content: userMessage })
    break
}

Image Moderation

Key Considerations

  1. Check before upload - Moderate images before saving to storage
  2. Handle all formats - JPEG, PNG, GIF, WebP
  3. Size limits - Set reasonable file size limits
  4. Async for large files - Use webhooks for videos and large images

Example Flow

ts
// 1. Receive upload
const file = await request.formData()
const image = file.get('image')

// 2. Convert to base64
const buffer = await image.arrayBuffer()
const base64 = Buffer.from(buffer).toString('base64')

// 3. Moderate
const result = await vettly.check({
  content: base64,
  contentType: 'image'
})

// 4. Handle decision
if (result.action === 'block') {
  return { error: 'Image not allowed' }
}

// 5. Upload to storage only if allowed
await uploadToS3(image)

Video Moderation

Challenges

  • Larger files = longer processing
  • Frame-by-frame analysis needed
  • Audio track may contain violations
  1. Use webhooks - Don't wait synchronously
  2. Hold uploads - Store in temp location until moderated
  3. Show status - Let users know content is being reviewed
ts
// 1. Upload to temp storage
const tempUrl = await uploadToTemp(video)

// 2. Start async moderation
const result = await vettly.check({
  content: tempUrl,
  contentType: 'video',
  webhook: 'https://yourapp.com/webhooks/moderation'
})

// 3. Return pending status
return {
  status: 'processing',
  jobId: result.decisionId
}

// 4. Webhook handler moves to permanent storage if allowed

Policy Design

Start Strict, Then Loosen

It's easier to relax rules than tighten them:

yaml
# Start with strict defaults
default_action: flag

categories:
  hate:
    threshold: 0.5
    action: block
  harassment:
    threshold: 0.6
    action: block
  violence:
    threshold: 0.7
    action: flag

Monitor false positives, then adjust thresholds upward.

Separate Policies by Context

Don't use one policy for everything:

ContextPolicyApproach
User profilesstrictBlock most violations
Private messagesbalancedFlag for review
Public postsstrictProtect community
Gaming chatpermissiveAllow banter

Version Control Policies

Store policies in git:

policies/
  production/
    default.yaml
    strict.yaml
    permissive.yaml
  staging/
    experimental.yaml

Changes are reviewable, reversible, and auditable.

Human Review

When to Escalate

AI isn't perfect. Escalate when:

  • Confidence scores are borderline (0.4-0.6)
  • Content is flagged but not blocked
  • User appeals a decision
  • New violation patterns emerge

Review Queue Design

ts
// Queue flagged content for review
if (result.action === 'flag') {
  await reviewQueue.add({
    contentId: content.id,
    decisionId: result.decisionId,
    categories: result.categories,
    priority: calculatePriority(result)
  })
}

Reviewer Guidelines

  • Provide clear accept/reject criteria
  • Show context (previous messages, user history)
  • Track reviewer accuracy
  • Rotate difficult categories

Logging and Auditing

What to Log

Every moderation decision should include:

json
{
  "decisionId": "dec_abc123",
  "timestamp": "2025-01-15T10:30:00Z",
  "contentType": "text",
  "action": "block",
  "categories": [
    { "category": "harassment", "score": 0.87 }
  ],
  "policyId": "strict",
  "userId": "user_123",
  "contentHash": "sha256:..."
}

Retention

  • Keep decision logs for compliance (typically 1-7 years)
  • Store content hashes, not raw content when possible
  • Enable audit exports for legal requests

Performance

Optimize for Latency

  1. Cache policies - Don't fetch on every request
  2. Batch when possible - Multiple items in one call
  3. Use regional endpoints - Reduce network latency
  4. Async for non-blocking - Use webhooks for background checks

Target Metrics

MetricTarget
P50 latency (text)< 100ms
P99 latency (text)< 500ms
P50 latency (image)< 1s
Availability99.9%

Handling Edge Cases

User Appeals

Always provide an appeal path:

ts
// Store decision ID with content
await db.posts.create({
  content,
  moderationDecisionId: result.decisionId
})

// Appeal endpoint
app.post('/appeal', async (req, res) => {
  const { postId, reason } = req.body
  const post = await db.posts.findById(postId)

  await reviewQueue.add({
    type: 'appeal',
    decisionId: post.moderationDecisionId,
    reason,
    priority: 'high'
  })
})

False Positives

Track and learn from mistakes:

  1. Log all overturned decisions
  2. Identify patterns in false positives
  3. Adjust thresholds based on data
  4. Consider category-specific tuning

Adversarial Content

Users will try to bypass moderation:

  • Unicode tricks (homoglyphs)
  • Leetspeak (h4t3)
  • Spacing tricks (h a t e)
  • Image text (text embedded in images)

Modern APIs handle most of these, but stay vigilant with new patterns.

Compliance

GDPR / Privacy

  • Log decisions, not content when possible
  • Enable data deletion requests
  • Provide decision explanations to users

Platform Requirements

If you're building on platforms:

  • App Store - Requires user-generated content moderation
  • Play Store - Similar requirements
  • Discord - Must moderate for ToS compliance

Enable freezing of moderation data for legal requests:

ts
// Mark content for legal hold
await db.moderationLogs.update({
  where: { decisionId },
  data: { legalHold: true, holdExpiry: null }
})

Getting Started with Vettly

Implement these best practices with Vettly:

bash
npm install @nextauralabs/vettly-sdk
ts
import { ModerationClient } from '@nextauralabs/vettly-sdk'

const vettly = new ModerationClient({
  apiKey: process.env.VETTLY_API_KEY
})

// That's it. Start moderating.
const result = await vettly.check({
  content: userInput,
  contentType: 'text'
})

Summary

  1. Layer your defenses - Client, API, database, human
  2. Configure per context - Different rules for different features
  3. Log everything - Decisions, appeals, overrides
  4. Plan for humans - AI + human review for edge cases
  5. Start strict - Easier to loosen than tighten

Content moderation is not a one-time setup. It's an ongoing practice that evolves with your community and the threat landscape.