5 min read

Beyond Buzzwords: What 'AI Safety Testing' Actually Means

What does 'AI safety testing' really involve? In this clear and accessible guest article, Sahil Agarwal explains some technical safeguards that are essential to help protect children from AI-related harms.
Beyond Buzzwords: What 'AI Safety Testing' Actually Means

Note from the editor:

Many AI applications present significant risks to children, and we advocate for thorough safety testing before any product is deployed - but what does that actually mean?

At SAIFCA, we strongly advocate for higher standards of AI safety testing - especially where children are concerned. Testing must go far beyond surface-level filters, and should include rigorous checks for harm, ethical oversight, and a clear understanding of how AI products interact with young users. While technical safeguards alone are not enough, they are a vital part of the wider protections we urgently need.

While there are certain products that should not be made available to children in their current state (most predominantly, AI companions and griefbots), it is also accurate to say that many children will use AI applications as part of their daily lives - and there are ways to help make such systems safer for them to use.

In this guest article, Sahil Agarwal, CEO of Enkrypt AI, outlines some of the key steps companies can take to identify and address technical AI safety issues. His explanation is clear, accessible, and grounded in real-world experience - offering insight into what responsible technical AI safety can and should look like.

Sahil’s article focuses specifically on testing and technical safeguards - one very important part of a much wider safety and governance effort.


What It Really Means to Test AI for Safety

By Sahil Agarwal, CEO, Enkrypt AI

Why Safety Testing Matters

Many AI systems today are astonishingly lifelike. They can hold conversations, interpret images, understand emotions, and even simulate relationships. But as these systems become more humanlike, they also become more capable of causing harm, especially when used by children.

We’ve already seen real-world examples of AI-powered chatbots encouraging self-harm, undermining parental guidance, or crossing emotional boundaries. These aren’t harmless glitches. They are serious safety failures, and they’re often preventable.

So why do they keep happening? It’s often difficult to understand how products that pose such clear dangers still make it into the hands of children. Were the risks missed entirely, or simply deprioritised?

Too often, it’s because safety testing is rushed, oversimplified, or treated as an afterthought. But meaningful safety isn’t just about filtering keywords or issuing disclaimers. It’s about stress-testing systems from the inside out, long before they ever reach a child’s device.

Here’s what responsible AI safety testing looks like...

A Simple Framework for Safer AI

1. Red Teaming: Breaking the System Before It Breaks Someone

Automated AI red teaming is about pushing AI to its limits - on purpose. It involves simulating high-risk situations and probing the system with questions a child might realistically ask. These can range from innocent-sounding queries to emotionally charged conversations.

With the right tools, companies can generate thousands or even millions of these prompts automatically, uncovering hidden vulnerabilities and harmful outputs before real users ever encounter them.

If a company is not actively testing their AI for harm, they’re likely missing it altogether. In our latest research, we evaluated cutting-edge AI models from Mistral AI capable of processing both text and images - and found alarming risks, especially for children. These risks aren’t just technical - they’re also societal. We can’t afford to ignore them or keep them hidden. The only responsible path forward is transparency and collaboration - to ensure these risks are addressed before they scale into real-world harm. (Enkrypt AI research report)

2. Guardrails: Setting Boundaries That Stick

Once risks are identified, the next step is building guardrails from these risks. Guardrails are automated safeguards that prevent the AI from crossing dangerous lines.

Think of guardrails like bumpers in a bowling alley. They don’t stop the game; they simply keep the ball from landing in the gutter.

Similarly, AI guardrails keep conversations from veering into unsafe territory - like offering disturbing mental health advice, simulating romantic relationships, or discussing adult topics with children.

Guardrails should be used whenever an AI product is live and being used by the public.

3. Ongoing Monitoring: Because AI Doesn’t Stand Still

AI systems continue to evolve after deployment. They learn, adapt, and change - especially when models are updated or fine-tuned. That’s why ongoing monitoring is critical.

Companies need tools that track how AI behaviour shifts over time. Is it still responding safely a week after launch? A month later? Continuous oversight ensures that safety doesn’t erode silently in the background.

Why This Matters for Children

Children are uniquely vulnerable to misinformation, manipulation, and emotional influence, especially from systems that sound friendly, empathetic, and human. As AI becomes more immersive through voice, avatars, and image generation, it becomes even harder for children to distinguish what’s real from what’s artificial.

That’s why safety testing isn’t optional - it’s critical.

We believe AI can be a force for good in children’s lives, supporting learning, creativity, and emotional development. But only if it’s built with strong boundaries and ethical intent.

As parents, educators, and advocates, you have every right to ask: “How was this AI tested for safety?” The companies that deserve your trust in their systems will have thoughtful answers.

Looking Ahead

No AI system is perfectly safe. But that doesn’t excuse inaction. With the right testing, tooling, and commitment, we can help prevent products that are unsafe for children from reaching them, and we can build AI that’s not only powerful - but also responsible.

If we want AI to truly support and empower the next generation, it must be built with their safety at its core.


Editor’s Postscript

As Sahil’s article highlights, technical safety testing is an essential part of protecting children from AI-related risks. It’s also important to note that responsible testing includes broader elements - such as evaluating age appropriateness, understanding psychological impacts, addressing bias, and ensuring human oversight. Robust testing across all these dimensions is critical. We encourage parents, educators, and policymakers to continue asking the right questions - and calling for high standards - as AI systems become part of children’s lives. Thank you to Sahil for this clear and accessible explanation of what the buzzwords in this context actually mean!

About Enkrypt AI

Enkrypt AI is an AI safety and compliance platform. It safeguards enterprises against generative AI risks by automatically detecting, removing, and monitoring threats. The unique approach ensures AI applications, systems, and agents are safe, secure, and trustworthy. Enkrypt AI is driven by a commitment to make the world a safer place by ensuring the responsible and secure use of AI technology, empowering everyone to harness its potential for good.

As with all guest contributions, this article reflects the perspective of the author, and inclusion does not imply endorsement of specific products or services by SAIFCA.