Content policy enforcement

Foldspace can automatically analyze and flag user-generated content across conversations, inputs, and output streams.

When content violates or risks violating a policy, the system attaches a policy flag and masks the relevant content in the conversation to prevent unsafe or sensitive material from being displayed.

Supported Policy Categories

Policy Name	Description
Dangerous Content	Content that facilitates, promotes, or enables access to harmful goods, services, or activities.
Harassment	Content that is malicious, intimidating, bullying, or abusive towards others.
Sexually Explicit	Content that is sexually explicit in nature.
Hate Speech	Content that is generally accepted as being hate speech.
Medical Information	Content that promotes, facilitates, or enables access to harmful medical advice or guidance.
Violence & Gore	Content that includes gratuitous or realistic descriptions of violence and/or gore.
Obscenity & Profanity	Content that contains vulgar, profane, or offensive language.

How It Works

Foldspace scans conversational and generated content in real time. If a message matches one or more of the policies above:

The system flags the violation for moderation and observability.
The affected text is masked in the conversation to maintain a safe, compliant user experience.

This ensures AI-driven interactions remain aligned with safety standards and organizational policies.