Policies
Policy evolution
Managed policies evolve
Safety policy may evolve over time while preserving enforcement integrity.
“Managed policies are designed to improve safety over time.”
AI Safety Gate may update managed policies over time. Your integration should treat outcomes as authoritative and enforce PASS/WARN/BLOCK consistently.
What AI Safety Gate Evaluates
High-level, customer-safe coverage areas.
AI Safety Gate evaluates content for a range of safety risks. Coverage may include (illustrative, non-exhaustive):
- Violence & physical harm
- Self-harm & suicide
- Sexual content & exploitation
- Illegal activities & facilitation
- Hate, harassment & extremism
- Fraud, scams & impersonation
- Privacy & personal data exposure
- Malware & cyber misuse
- Weapons & dangerous instructions
- Regulated advice (medical, legal, financial)
This list is illustrative only and does not represent complete coverage.
How Policies Evolve
Non-technical guidance for what “evolution” means.
Safety risks change over time.
Policies are updated to reflect new abuse patterns, new kinds of unsafe behavior, and evolving regulations.
Updates may refine how decisions are classified or when a review checkpoint is used.
Customers do not need to change their integration when policies evolve.
What Does Not Change
Stability and backward compatibility for integrations.
PASS, WARN, and BLOCK semantics remain stable.
Public API contracts do not change.
Approval workflows remain deterministic.
Existing integrations continue to work without modification.
Illustrative examples (non-binding)
Simple examples to communicate intent, not a specification.
Category
Example (Illustrative)
Outcome
Fraud
Impersonation guidance
BLOCK
Self-harm
Ambiguous phrasing
WARN
Violence
Graphic instructions
BLOCK
Regulated advice
Actionable medical steps
WARN
Examples are illustrative only and do not represent exhaustive policy behavior.
Transparency & Trust
How AI Safety Gate aims to behave over time.
The goal is consistent, explainable enforcement that customers can build around.
The system errs on the side of safety when outcomes are uncertain.
Human approval is used where context matters.
Policy updates aim to reduce false positives over time.
Legal & Policy Disclaimer
Policy Transparency Notice
The policy categories, examples, and descriptions provided in this documentation are for informational purposes only and are not exhaustive.
AI Safety Gate’s enforcement behavior may vary based on context, evolving risk patterns, regulatory requirements, and system improvements. The presence or absence of any example does not guarantee a specific enforcement outcome.
Policy coverage, classifications, and review flows may be updated over time without prior notice. Such updates are designed to improve safety, accuracy, and reliability and do not constitute breaking changes to the public API or integration contracts.
This documentation does not constitute legal advice, compliance certification, or a guarantee of regulatory conformity. Customers remain responsible for ensuring their own use cases, content, and deployments comply with applicable laws, regulations, and platform policies.
AI Safety Gate is provided “as-is,” and enforcement decisions are applied as part of a risk-mitigation system that may include automated analysis and human review.