Skip to main content
Guardrails

Overview

The Guardrails node validates content using AI-powered checks to ensure safety, accuracy, and compliance. Each guardrail uses an LLM as a judge to evaluate your input against specific criteria, failing the workflow if confidence thresholds are exceeded.
Best for: Content moderation, PII detection, hallucination checks, jailbreak prevention, and custom validation rules.

How It Works

  1. Provide input content to validate (from previous nodes)
  2. Enable specific guardrail checks
  3. Set confidence threshold for each check (0-1)
  4. Choose AI model for evaluation
  5. If any check exceeds threshold → Node fails and flags the issue

Configuration

Input

The content you want to validate. Supports Manual, Auto, and Prompt AI modes. Example:
{{agent.output.response}}
{{trigger.output.user_message}}
{{http_request.output.content}}

Model Selection

Choose the AI model used to evaluate all enabled guardrails. More capable models provide more accurate detection but cost more.

Available Guardrails

Personally Identifiable Information (PII)

Detects personal information like names, emails, phone numbers, addresses, SSNs, credit cards, etc. When to use:
  • Before storing user-generated content
  • When sharing data externally
  • Compliance requirements (GDPR, HIPAA)
  • Customer service workflows
Configuration:
  • Confidence Threshold: 0.7 (recommended)
  • Higher threshold = stricter detection
Example:
Input: {{agent.output.customer_response}}
Threshold: 0.8
Result: Fails if PII detected with >80% confidence

Moderation

Checks for inappropriate, harmful, or offensive content including hate speech, violence, adult content, harassment, etc. When to use:
  • User-generated content platforms
  • Public-facing communications
  • Community moderation
  • Customer-facing outputs
Configuration:
  • Confidence Threshold: 0.6 (recommended)
  • Adjust based on your content policies

Jailbreak Detection

Identifies attempts to bypass AI safety controls or manipulate the AI into unintended behaviors. When to use:
  • Processing user prompts before sending to AI
  • Public AI interfaces
  • Workflows with user-provided instructions
  • Security-sensitive applications
Configuration:
  • Confidence Threshold: 0.7 (recommended)
  • Higher threshold for fewer false positives
Example:
Input: {{trigger.user_prompt}}
Threshold: 0.75
Flags: Attempts to "ignore previous instructions" or similar

Hallucination Detection

Detects when AI-generated content contains false or unverifiable information. When to use:
  • Fact-based content generation
  • Customer support responses
  • Financial or medical information
  • Any workflow where accuracy is critical
Configuration:
  • Confidence Threshold: 0.6 (recommended)
  • Requires reference data for comparison
Example:
Input: {{agent.generated_summary}}
Reference: {{http_request.original_data}}
Threshold: 0.7
Checks: Does summary accurately reflect source data?

Custom Evaluation

Define your own validation criteria using natural language instructions. When to use:
  • Domain-specific validation
  • Brand voice compliance
  • Custom business rules
  • Specialized content requirements
Configuration:
  • Evaluation Criteria: Describe what to check for
  • Confidence Threshold: Set based on strictness needed
Example:
Criteria: "Check if this response maintains our brand voice:
- Professional but friendly tone
- No jargon or technical terms
- Addresses customer by name
- Offers clear next steps"

Input: {{agent.email_response}}
Threshold: 0.8

Setting Confidence Thresholds

The confidence threshold determines how strict each check is:
ThresholdBehaviorUse When
0.3-0.5LenientAvoid false positives, informational only
0.6-0.7BalancedMost use cases, good accuracy
0.8-0.9StrictHigh-risk scenarios, critical validation
0.9-1.0Very StrictOnly flag very obvious violations
Start with 0.7 as a balanced default, then adjust based on false positives or missed detections.

Example Workflows

Content Moderation Pipeline

Trigger: Form submission (user comment)
→ Guardrails:
  ✅ PII Detection (threshold: 0.8)
  ✅ Moderation (threshold: 0.6)
  Input: {{trigger.comment}}
→ [On Success] → Post comment publicly
→ [On Failure] → Send to manual review queue

AI Response Validation

Agent: Generate customer response
→ Guardrails:
  ✅ Hallucination (threshold: 0.7)
  ✅ Custom: "Professional and helpful tone"
  Input: {{agent.response}}
→ [On Success] → Send email to customer
→ [On Failure] → Regenerate with different prompt

Multi-Check Validation

Agent: Generate article summary
→ Guardrails:
  ✅ PII Detection (threshold: 0.8)
  ✅ Hallucination (threshold: 0.7)
  ✅ Custom: "No promotional language" (threshold: 0.75)
  Input: {{agent.summary}}
→ [On Success] → Publish to website
→ [On Failure] → Return to editor for revision

Handling Failures

When a guardrail check fails, the workflow stops at the Guardrails node. You can configure error handling to route to alternative paths, send notifications, or trigger fallback actions.

When to Use Each Guardrail

Use PII detection for:
  • Public content that shouldn’t contain personal information
  • Data being sent to third parties or external systems
  • Compliance-sensitive workflows (GDPR, HIPAA, etc.)
  • Preventing accidental exposure of sensitive user data
Use moderation for:
  • User-generated content that needs review
  • Public-facing outputs and communications
  • Community platforms and forums
  • Filtering inappropriate or harmful content
Use jailbreak detection for:
  • User-provided prompts or instructions to AI
  • Public AI interfaces accessible to external users
  • Security-critical applications where prompt manipulation is a risk
  • Protecting against attempts to bypass system constraints
Use hallucination detection for:
  • Fact-based content generation requiring accuracy
  • Customer support responses with specific information
  • Financial or medical information where accuracy is critical
  • Any content where false information could cause harm
Use custom evaluation for:
  • Brand compliance and tone of voice guidelines
  • Domain-specific rules and industry standards
  • Quality standards unique to your organization
  • Business-specific requirements not covered by other guardrails

Best Practices

Use multiple guardrails together for comprehensive validation. PII + Moderation is a common combination.
Begin with 0.7 and adjust based on results. Too low = false positives, too high = missed issues.
Don’t just fail the workflow—add error paths to notify teams, log violations, or trigger alternative actions.
Test guardrails with borderline content to calibrate thresholds correctly.
More capable models (GPT-4) provide better detection but cost more. Balance accuracy needs with budget.
Write clear, specific criteria for custom evaluations so the AI understands exactly what to check.

Next Steps