Core Guardian is available on V2 model projects only.
Default vs. Custom Settings
Every project ships with Default Settings — Core Guardian’s essential protections are active from day one, no setup required.

Guardrails
Underage Detection
Checks whether the User appears to be under 18 based on their messages. If confirmed, the conversation is permanently stopped and the Persona tells the User to leave. Enabled by default.AI Suspicious
Detects when a User is probing whether the Persona is an AI through direct or indirect questions that don’t fit the flow of a normal conversation. Enabled by default.Unknown Language
Fires when the User writes in a language the Persona isn’t set up to speak. The Persona signals it doesn’t understand, and if the User keeps going, the conversation is stopped. Enabled by default.Message Repetition
Spots Users who keep sending the exact same short message over and over. A common pattern for testing whether the Persona gives scripted responses. The Persona calls it out, and repeat offenders receive a temporary ban. Enabled by default.Malicious Content
Detects messages involving severe illegal content such as pedophilia, zoophilia/bestiality, or incest. Disabled by default.Jailbreak Attempt
Catches attempts to override the Persona’s instructions or break its character. This guardrail is always on and cannot be disabled. It’s what keeps all your other Persona settings intact.Configuring Guardrails
With Custom Settings, you’re in full control. Each guardrail can be switched on or off independently with its toggle. We added even more granularity so you can fine-tune exactly what happens when a detection fires. For instance, you may decide whether the Persona should send a final reply before the ban kicks in (Answer) and how long each ban should last (Ban Durations — you can set up to five escalating durations in hours, e.g. 1, 3, 24).Bans are applied at the conversation level only. The User’s account is never blocked, and they can start a new conversation normally.
- Grey — deactivated
- Pink — fully activated
- Yellow — active with custom configuration (sub-options have been adjusted)
