
Overview
The Guardrails node validates content using AI-powered checks to ensure safety, accuracy, and compliance. Each guardrail uses an LLM as a judge to evaluate your input against specific criteria, failing the workflow if confidence thresholds are exceeded.Best for: Content moderation, PII detection, hallucination checks, jailbreak prevention, and custom validation rules.
How It Works
- Provide input content to validate (from previous nodes)
- Enable specific guardrail checks
- Set confidence threshold for each check (0-1)
- Choose AI model for evaluation
- If any check exceeds threshold → Node fails and flags the issue
Configuration
Input
The content you want to validate. Supports Manual, Auto, and Prompt AI modes. Example:Model Selection
Choose the AI model used to evaluate all enabled guardrails. More capable models provide more accurate detection but cost more.Available Guardrails
Personally Identifiable Information (PII)
Detects personal information like names, emails, phone numbers, addresses, SSNs, credit cards, etc. When to use:- Before storing user-generated content
- When sharing data externally
- Compliance requirements (GDPR, HIPAA)
- Customer service workflows
- Confidence Threshold: 0.7 (recommended)
- Higher threshold = stricter detection
Moderation
Checks for inappropriate, harmful, or offensive content including hate speech, violence, adult content, harassment, etc. When to use:- User-generated content platforms
- Public-facing communications
- Community moderation
- Customer-facing outputs
- Confidence Threshold: 0.6 (recommended)
- Adjust based on your content policies
Jailbreak Detection
Identifies attempts to bypass AI safety controls or manipulate the AI into unintended behaviors. When to use:- Processing user prompts before sending to AI
- Public AI interfaces
- Workflows with user-provided instructions
- Security-sensitive applications
- Confidence Threshold: 0.7 (recommended)
- Higher threshold for fewer false positives
Hallucination Detection
Detects when AI-generated content contains false or unverifiable information. When to use:- Fact-based content generation
- Customer support responses
- Financial or medical information
- Any workflow where accuracy is critical
- Confidence Threshold: 0.6 (recommended)
- Requires reference data for comparison
Custom Evaluation
Define your own validation criteria using natural language instructions. When to use:- Domain-specific validation
- Brand voice compliance
- Custom business rules
- Specialized content requirements
- Evaluation Criteria: Describe what to check for
- Confidence Threshold: Set based on strictness needed
Setting Confidence Thresholds
The confidence threshold determines how strict each check is:| Threshold | Behavior | Use When |
|---|---|---|
| 0.3-0.5 | Lenient | Avoid false positives, informational only |
| 0.6-0.7 | Balanced | Most use cases, good accuracy |
| 0.8-0.9 | Strict | High-risk scenarios, critical validation |
| 0.9-1.0 | Very Strict | Only flag very obvious violations |
Start with 0.7 as a balanced default, then adjust based on false positives or missed detections.
Example Workflows
Content Moderation Pipeline
AI Response Validation
Multi-Check Validation
Handling Failures
When a guardrail check fails, the workflow stops at the Guardrails node. You can configure error handling to route to alternative paths, send notifications, or trigger fallback actions.When to Use Each Guardrail
PII Detection
PII Detection
Use PII detection for:
- Public content that shouldn’t contain personal information
- Data being sent to third parties or external systems
- Compliance-sensitive workflows (GDPR, HIPAA, etc.)
- Preventing accidental exposure of sensitive user data
Moderation
Moderation
Use moderation for:
- User-generated content that needs review
- Public-facing outputs and communications
- Community platforms and forums
- Filtering inappropriate or harmful content
Jailbreak Detection
Jailbreak Detection
Use jailbreak detection for:
- User-provided prompts or instructions to AI
- Public AI interfaces accessible to external users
- Security-critical applications where prompt manipulation is a risk
- Protecting against attempts to bypass system constraints
Hallucination Detection
Hallucination Detection
Use hallucination detection for:
- Fact-based content generation requiring accuracy
- Customer support responses with specific information
- Financial or medical information where accuracy is critical
- Any content where false information could cause harm
Custom Evaluation
Custom Evaluation
Use custom evaluation for:
- Brand compliance and tone of voice guidelines
- Domain-specific rules and industry standards
- Quality standards unique to your organization
- Business-specific requirements not covered by other guardrails
Best Practices
Enable Multiple Checks
Enable Multiple Checks
Use multiple guardrails together for comprehensive validation. PII + Moderation is a common combination.
Start with Balanced Thresholds
Start with Balanced Thresholds
Begin with 0.7 and adjust based on results. Too low = false positives, too high = missed issues.
Always Handle Failures
Always Handle Failures
Don’t just fail the workflow—add error paths to notify teams, log violations, or trigger alternative actions.
Test with Edge Cases
Test with Edge Cases
Test guardrails with borderline content to calibrate thresholds correctly.
Use Appropriate Models
Use Appropriate Models
More capable models (GPT-4) provide better detection but cost more. Balance accuracy needs with budget.
Document Custom Evaluations
Document Custom Evaluations
Write clear, specific criteria for custom evaluations so the AI understands exactly what to check.