Documentation

Developer-grade guidance for deploying AI Safety Gate into production workflows.

PASS

WARN

BLOCK

Prefer Postman? Download the collection: ai-safety-gate.postman_collection.json

AI Safety Gate documentation

This documentation explains how to integrate AI Safety Gate into production systems that execute AI-driven actions. You will learn where to place validate(), how PASS/WARN/BLOCK enforcement works, and how to integrate the enforcement API into any system.

How it works (end-to-end)

Exact request/response flow: validate → enforce → (WARN) approve.

Step 1 — Generate AI output

Your system produces the final AI output you intend to use to trigger an action.

Step 2 — Call the Safety Gate before executing

Your app or service calls POST https://aisafegate.com/api/validate with ai_output (string) and context (object).

Step 3 — Enforce PASS / WARN / BLOCK

PASS: execute the action.

BLOCK: do not execute the action.

WARN: do not execute yet; persist decision_id and approval_token and begin polling.

Step 4 — Human approval (WARN only)

Poll GET https://aisafegate.com/api/decisions/:id/approval with the approval token in X-Approval-Token (or Authorization: Bearer ...) until you receive {"approved": true}.

Step 5 — Fail closed

If validation fails, approval polling never returns {"approved": true}, or your system times out, you must stop and not execute.

n8n is optional: the n8n drop-in workflow follows the same contract, but you can integrate directly from any app or service using the HTTP API.

What AI Safety Gate does

One validation call that returns an enforceable decision.

Rule of thumb: validate after AI output and before any real-world action (money, customer contact, production writes).

PASS

Proceed automatically.

WARN

Review required — route to review or retry.

BLOCK

Stop execution.

LLM output

Message / tool args

GATE

validate()

Policy + risk checks

WORKFLOW

PASS

WARN

BLOCK

Branch by status

Where it fits in systems

Think of it as an enforcement layer, not a content filter.

Place AI Safety Gate on the boundary between model output and irreversible actions.

It is most effective when workflows treat BLOCK as terminal and WARN as an interlock requiring review.

n8n Cloud quick start

Deploy in minutes and branch safely by decision.

STEP 1

Import workflow JSON

Workflows → Import from File → select the template.

STEP 2

Replace the trigger

Swap in your Webhook/Cron/app event trigger.

STEP 3

Connect AI output → Gate

Wire the LLM output into the validate request.

STEP 4

Branch by decision

PASS → action, WARN → review, BLOCK → safe fallback.

Do not place the gate after the action. If the refund/email/write already happened, validation is too late.

Placement guide (DO / DO NOT)

Keep enforcement close to execution.

Place the gate inline: immediately after model output is produced and immediately before the irreversible action node.

DO:

AI → Gate → Action

Route WARN to approvals

Log every decision

DO NOT:

Validate only prompts (validate outputs too)

Treat WARN as PASS

Allow BLOCK to reach action nodes

PASS / WARN / BLOCK behavior

Deterministic workflow branching.

PASS means low risk. Continue to the real action node.

WARN means elevated risk (review required). Add human review, retry with stricter prompts, or degrade to safe templates.

BLOCK means stop. Treat it as terminal for the action path.

Recommended Safety Defaults (Production)

Customer-safe defaults for reliable enforcement.

Fail closed: if validation fails (timeouts, parsing, missing fields), do not execute.
Treat WARN as an interlock: pause execution until a human explicitly approves.
Treat BLOCK as terminal for the action path.
Log the decision status and explanation for audit and incident response.
Use environment separation (dev/stage/prod) and verify policy changes before rollout.

Known limitations (v1)

Practical boundaries to plan around.

Decisions are only as accurate as the context you provide (action type, channel, environment, and payload).
AI Safety Gate cannot prevent actions your system executes without enforcing the returned outcome.
WARN and BLOCK handling must be implemented deterministically in your workflow (no “best effort” execution).
Misuse-resistant operations require customer-side controls (RBAC, rate limits, and review processes) in addition to gating.

Data evaluated

Pass enough context to make decisions accurate.

Recommended inputs

AI output text (message/tool args)

Action type (refund/email/write)

Workflow name / environment

Optional: actor, channel, customer segment

Returned outputs

status: PASS | WARN | BLOCK

explanation: human-readable reason

timestamps + identifiers for audit

Policy management

How policy changes relate to integration behavior.

Single integration contract

The integration contract is stable: you call validate(), enforce PASS/WARN/BLOCK, and (for WARN) poll approval.

Managed policies

Managed policies may be updated over time to improve safety. Your integration must not assume outcomes remain constant.

Customer-defined policies (if enabled)

If your account allows customer-defined policies, changes can affect future decisions. Treat the Safety Gate response as the source of truth and enforce it.

Nodes do not define policy

Workflow nodes never define policy. They call validate() and route outcomes (PASS/WARN/BLOCK) deterministically.

Common mistakes

Issues that reduce enforcement reliability.

Gating only prompts. Validate outputs too.

Ignoring WARN. Review required — treat it as an interlock.

Non-terminal BLOCK. BLOCK must stop the action path.

No audit trail. Store status + explanation.

FAQ

Operational questions

Do I need n8n?

No. Any system that can call an HTTP API and branch by status works.

What do I do on WARN?

Route to human approval, retry, or fallback to safe templates.

What do I do on BLOCK?

Stop execution, notify your team, and log the decision.

How do I use this as AI workflow enforcement?

Place validate() after the model response and before the irreversible step. Treat BLOCK as terminal and WARN as an interlock.