Guardrails

Overview

The Guardrails node validates content using AI-powered checks to ensure safety, accuracy, and compliance. Each guardrail uses an LLM as a judge to evaluate your input against specific criteria, failing the workflow if confidence thresholds are exceeded.

Best for: Content moderation, PII detection, hallucination checks, jailbreak prevention, and custom validation rules.

How It Works

Provide input content to validate (from previous nodes)
Enable specific guardrail checks
Set confidence threshold for each check (0-1)
Choose AI model for evaluation
If any check exceeds threshold → Node fails and flags the issue

Configuration

Input

The content you want to validate. Supports Manual, Auto, and Prompt AI modes. Example:

{{agent.output.response}}
{{trigger.output.user_message}}
{{http_request.output.content}}

Model Selection

Choose the AI model used to evaluate all enabled guardrails. More capable models provide more accurate detection but cost more.

Available Guardrails

Personally Identifiable Information (PII)

Detects personal information like names, emails, phone numbers, addresses, SSNs, credit cards, etc. When to use:

Before storing user-generated content
When sharing data externally
Compliance requirements (GDPR, HIPAA)
Customer service workflows

Configuration:

Confidence Threshold: 0.7 (recommended)
Higher threshold = stricter detection

Example:

Input: {{agent.output.customer_response}}
Threshold: 0.8
Result: Fails if PII detected with >80% confidence

Moderation

Checks for inappropriate, harmful, or offensive content including hate speech, violence, adult content, harassment, etc. When to use:

User-generated content platforms
Public-facing communications
Community moderation
Customer-facing outputs

Configuration:

Confidence Threshold: 0.6 (recommended)
Adjust based on your content policies

Jailbreak Detection

Identifies attempts to bypass AI safety controls or manipulate the AI into unintended behaviors. When to use:

Processing user prompts before sending to AI
Public AI interfaces
Workflows with user-provided instructions
Security-sensitive applications

Configuration:

Confidence Threshold: 0.7 (recommended)
Higher threshold for fewer false positives

Example:

Input: {{trigger.user_prompt}}
Threshold: 0.75
Flags: Attempts to "ignore previous instructions" or similar

Hallucination Detection

Detects when AI-generated content contains false or unverifiable information. When to use:

Fact-based content generation
Customer support responses
Financial or medical information
Any workflow where accuracy is critical

Configuration:

Confidence Threshold: 0.6 (recommended)
Requires reference data for comparison

Example:

Input: {{agent.generated_summary}}
Reference: {{http_request.original_data}}
Threshold: 0.7
Checks: Does summary accurately reflect source data?

Custom Evaluation

Define your own validation criteria using natural language instructions. When to use:

Domain-specific validation
Brand voice compliance
Custom business rules
Specialized content requirements

Configuration:

Evaluation Criteria: Describe what to check for
Confidence Threshold: Set based on strictness needed

Example:

Criteria: "Check if this response maintains our brand voice:
- Professional but friendly tone
- No jargon or technical terms
- Addresses customer by name
- Offers clear next steps"

Input: {{agent.email_response}}
Threshold: 0.8

Setting Confidence Thresholds

The confidence threshold determines how strict each check is:

Threshold	Behavior	Use When
0.3-0.5	Lenient	Avoid false positives, informational only
0.6-0.7	Balanced	Most use cases, good accuracy
0.8-0.9	Strict	High-risk scenarios, critical validation
0.9-1.0	Very Strict	Only flag very obvious violations

Start with 0.7 as a balanced default, then adjust based on false positives or missed detections.

Example Workflows

Content Moderation Pipeline

Trigger: Form submission (user comment)
→ Guardrails:
  ✅ PII Detection (threshold: 0.8)
  ✅ Moderation (threshold: 0.6)
  Input: {{trigger.comment}}
→ [On Success] → Post comment publicly
→ [On Failure] → Send to manual review queue

AI Response Validation

Agent: Generate customer response
→ Guardrails:
  ✅ Hallucination (threshold: 0.7)
  ✅ Custom: "Professional and helpful tone"
  Input: {{agent.response}}
→ [On Success] → Send email to customer
→ [On Failure] → Regenerate with different prompt

Multi-Check Validation

Agent: Generate article summary
→ Guardrails:
  ✅ PII Detection (threshold: 0.8)
  ✅ Hallucination (threshold: 0.7)
  ✅ Custom: "No promotional language" (threshold: 0.75)
  Input: {{agent.summary}}
→ [On Success] → Publish to website
→ [On Failure] → Return to editor for revision

Handling Failures

When a guardrail check fails, the workflow stops at the Guardrails node. You can configure error handling to route to alternative paths, send notifications, or trigger fallback actions.

When to Use Each Guardrail

PII Detection

Use PII detection for:

Public content that shouldn’t contain personal information
Data being sent to third parties or external systems
Compliance-sensitive workflows (GDPR, HIPAA, etc.)
Preventing accidental exposure of sensitive user data

Moderation

Use moderation for:

User-generated content that needs review
Public-facing outputs and communications
Community platforms and forums
Filtering inappropriate or harmful content

Jailbreak Detection

Use jailbreak detection for:

User-provided prompts or instructions to AI
Public AI interfaces accessible to external users
Security-critical applications where prompt manipulation is a risk
Protecting against attempts to bypass system constraints

Hallucination Detection

Use hallucination detection for:

Fact-based content generation requiring accuracy
Customer support responses with specific information
Financial or medical information where accuracy is critical
Any content where false information could cause harm

Custom Evaluation

Use custom evaluation for:

Brand compliance and tone of voice guidelines
Domain-specific rules and industry standards
Quality standards unique to your organization
Business-specific requirements not covered by other guardrails

Best Practices

Enable Multiple Checks

Use multiple guardrails together for comprehensive validation. PII + Moderation is a common combination.

Start with Balanced Thresholds

Begin with 0.7 and adjust based on results. Too low = false positives, too high = missed issues.

Always Handle Failures

Don’t just fail the workflow—add error paths to notify teams, log violations, or trigger alternative actions.

Test with Edge Cases

Test guardrails with borderline content to calibrate thresholds correctly.

Use Appropriate Models

More capable models (GPT-4) provide better detection but cost more. Balance accuracy needs with budget.

Document Custom Evaluations

Write clear, specific criteria for custom evaluations so the AI understands exactly what to check.

Next Steps

Agent Node

Validate AI-generated content

Condition Node

Route based on validation results

Human in the Loop

Add manual review for sensitive content

Getting Started

Build your first workflow with validation

Get Started

Chat

Assistants

Workflows

Integrations

Chatbots

Models

Settings & Configuration

Security

Admin

Troubleshooting

Overview

How It Works

Configuration

Input

Model Selection

Available Guardrails

Personally Identifiable Information (PII)

Moderation

Jailbreak Detection

Hallucination Detection

Custom Evaluation

Setting Confidence Thresholds

Example Workflows

Content Moderation Pipeline

AI Response Validation

Multi-Check Validation

Handling Failures

When to Use Each Guardrail

Best Practices

Next Steps

Agent Node

Condition Node

Human in the Loop

Getting Started

Get Started

Chat

Assistants

Workflows

Integrations

Chatbots

Models

Settings & Configuration

Security

Admin

Troubleshooting

​Overview

​How It Works

​Configuration

​Input

​Model Selection

​Available Guardrails

​Personally Identifiable Information (PII)

​Moderation

​Jailbreak Detection

​Hallucination Detection

​Custom Evaluation

​Setting Confidence Thresholds

​Example Workflows

​Content Moderation Pipeline

​AI Response Validation

​Multi-Check Validation

​Handling Failures

​When to Use Each Guardrail

​Best Practices

​Next Steps

Agent Node

Condition Node

Human in the Loop

Getting Started

Overview

How It Works

Configuration

Input

Model Selection

Available Guardrails

Personally Identifiable Information (PII)

Moderation

Jailbreak Detection

Hallucination Detection

Custom Evaluation

Setting Confidence Thresholds

Example Workflows

Content Moderation Pipeline

AI Response Validation

Multi-Check Validation

Handling Failures

When to Use Each Guardrail

Best Practices

Next Steps