Content Moderation Best Practices (2025)
A comprehensive guide to implementing content moderation in modern applications.
The Modern Moderation Stack
Content moderation in 2025 requires:
- Multi-modal coverage - Text, images, and video
- Real-time decisions - Sub-second response times
- Configurable policies - Different rules for different contexts
- Audit trails - Every decision logged and reviewable
- Human escalation - AI + human review for edge cases
Core Principles
1. Defense in Depth
Never rely on a single layer of protection:
User Input → Client Validation → API Moderation → Database Rules → Human ReviewEach layer catches what the previous missed.
2. Fail Gracefully
Decide your failure mode upfront:
try {
const result = await vettly.check({ content, contentType: 'text' })
if (result.action === 'block') {
return reject(content)
}
} catch (error) {
// Choose one:
// Option A: Fail closed (safer, may block legitimate content)
return reject(content)
// Option B: Fail open (riskier, but better UX)
return allow(content)
}For high-risk contexts (payments, legal), fail closed. For social features, consider failing open with async review.
3. Context Matters
A message saying "kill it!" means different things in:
- A gaming chat (probably fine)
- A support ticket (needs review)
- A threat report (escalate immediately)
Configure different policies for different contexts:
# gaming-chat.yaml
categories:
violence:
threshold: 0.9 # Very permissive
action: flag
# support-tickets.yaml
categories:
violence:
threshold: 0.5 # More sensitive
action: blockText Moderation
Best Practices
- Moderate before storage - Check content before saving to database
- Include context - Send surrounding messages for better accuracy
- Handle edits - Re-moderate when users edit content
- Rate limit - Prevent abuse with request limits
Example Flow
// 1. Check content
const result = await vettly.check({
content: userMessage,
contentType: 'text',
policyId: 'community-chat'
})
// 2. Handle decision
switch (result.action) {
case 'block':
// Don't save, notify user
return { error: 'Message not allowed' }
case 'flag':
// Save but queue for review
await db.messages.create({
content: userMessage,
flagged: true,
moderationId: result.decisionId
})
break
case 'allow':
// Save normally
await db.messages.create({ content: userMessage })
break
}Image Moderation
Key Considerations
- Check before upload - Moderate images before saving to storage
- Handle all formats - JPEG, PNG, GIF, WebP
- Size limits - Set reasonable file size limits
- Async for large files - Use webhooks for videos and large images
Example Flow
// 1. Receive upload
const file = await request.formData()
const image = file.get('image')
// 2. Convert to base64
const buffer = await image.arrayBuffer()
const base64 = Buffer.from(buffer).toString('base64')
// 3. Moderate
const result = await vettly.check({
content: base64,
contentType: 'image'
})
// 4. Handle decision
if (result.action === 'block') {
return { error: 'Image not allowed' }
}
// 5. Upload to storage only if allowed
await uploadToS3(image)Video Moderation
Challenges
- Larger files = longer processing
- Frame-by-frame analysis needed
- Audio track may contain violations
Recommended Approach
- Use webhooks - Don't wait synchronously
- Hold uploads - Store in temp location until moderated
- Show status - Let users know content is being reviewed
// 1. Upload to temp storage
const tempUrl = await uploadToTemp(video)
// 2. Start async moderation
const result = await vettly.check({
content: tempUrl,
contentType: 'video',
webhook: 'https://yourapp.com/webhooks/moderation'
})
// 3. Return pending status
return {
status: 'processing',
jobId: result.decisionId
}
// 4. Webhook handler moves to permanent storage if allowedPolicy Design
Start Strict, Then Loosen
It's easier to relax rules than tighten them:
# Start with strict defaults
default_action: flag
categories:
hate:
threshold: 0.5
action: block
harassment:
threshold: 0.6
action: block
violence:
threshold: 0.7
action: flagMonitor false positives, then adjust thresholds upward.
Separate Policies by Context
Don't use one policy for everything:
| Context | Policy | Approach |
|---|---|---|
| User profiles | strict | Block most violations |
| Private messages | balanced | Flag for review |
| Public posts | strict | Protect community |
| Gaming chat | permissive | Allow banter |
Version Control Policies
Store policies in git:
policies/
production/
default.yaml
strict.yaml
permissive.yaml
staging/
experimental.yamlChanges are reviewable, reversible, and auditable.
Human Review
When to Escalate
AI isn't perfect. Escalate when:
- Confidence scores are borderline (0.4-0.6)
- Content is flagged but not blocked
- User appeals a decision
- New violation patterns emerge
Review Queue Design
// Queue flagged content for review
if (result.action === 'flag') {
await reviewQueue.add({
contentId: content.id,
decisionId: result.decisionId,
categories: result.categories,
priority: calculatePriority(result)
})
}Reviewer Guidelines
- Provide clear accept/reject criteria
- Show context (previous messages, user history)
- Track reviewer accuracy
- Rotate difficult categories
Logging and Auditing
What to Log
Every moderation decision should include:
{
"decisionId": "dec_abc123",
"timestamp": "2025-01-15T10:30:00Z",
"contentType": "text",
"action": "block",
"categories": [
{ "category": "harassment", "score": 0.87 }
],
"policyId": "strict",
"userId": "user_123",
"contentHash": "sha256:..."
}Retention
- Keep decision logs for compliance (typically 1-7 years)
- Store content hashes, not raw content when possible
- Enable audit exports for legal requests
Performance
Optimize for Latency
- Cache policies - Don't fetch on every request
- Batch when possible - Multiple items in one call
- Use regional endpoints - Reduce network latency
- Async for non-blocking - Use webhooks for background checks
Target Metrics
| Metric | Target |
|---|---|
| P50 latency (text) | < 100ms |
| P99 latency (text) | < 500ms |
| P50 latency (image) | < 1s |
| Availability | 99.9% |
Handling Edge Cases
User Appeals
Always provide an appeal path:
// Store decision ID with content
await db.posts.create({
content,
moderationDecisionId: result.decisionId
})
// Appeal endpoint
app.post('/appeal', async (req, res) => {
const { postId, reason } = req.body
const post = await db.posts.findById(postId)
await reviewQueue.add({
type: 'appeal',
decisionId: post.moderationDecisionId,
reason,
priority: 'high'
})
})False Positives
Track and learn from mistakes:
- Log all overturned decisions
- Identify patterns in false positives
- Adjust thresholds based on data
- Consider category-specific tuning
Adversarial Content
Users will try to bypass moderation:
- Unicode tricks (homoglyphs)
- Leetspeak (h4t3)
- Spacing tricks (h a t e)
- Image text (text embedded in images)
Modern APIs handle most of these, but stay vigilant with new patterns.
Compliance
GDPR / Privacy
- Log decisions, not content when possible
- Enable data deletion requests
- Provide decision explanations to users
Platform Requirements
If you're building on platforms:
- App Store - Requires user-generated content moderation
- Play Store - Similar requirements
- Discord - Must moderate for ToS compliance
Legal Holds
Enable freezing of moderation data for legal requests:
// Mark content for legal hold
await db.moderationLogs.update({
where: { decisionId },
data: { legalHold: true, holdExpiry: null }
})Getting Started with Vettly
Implement these best practices with Vettly:
npm install @nextauralabs/vettly-sdkimport { ModerationClient } from '@nextauralabs/vettly-sdk'
const vettly = new ModerationClient({
apiKey: process.env.VETTLY_API_KEY
})
// That's it. Start moderating.
const result = await vettly.check({
content: userInput,
contentType: 'text'
})Summary
- Layer your defenses - Client, API, database, human
- Configure per context - Different rules for different features
- Log everything - Decisions, appeals, overrides
- Plan for humans - AI + human review for edge cases
- Start strict - Easier to loosen than tighten
Content moderation is not a one-time setup. It's an ongoing practice that evolves with your community and the threat landscape.