Hackers bypass safeguards on LLMs

Of interest. Potentially unlocking LLMs can be bypassed by hackers to remove safeguards. I think this is worth keeping on the radar as potentially a problem like this could prevent or limit specific downloaded packages from LLM providers. If in fact, this could be done with LLM’s locally hosted. Curious, what you’ll think?

Short intro from article with full article link below:

The vulnerabilities, called #noRAGrets, consist of two specific vulnerabilities that can entirely bypass model guardrails through a “race condition-like” attack, affecting artificial intelligence chatbots such as ChatGPT and Microsoft Copilot for Microsoft 365. A race condition attack in AI exploits the timing of operations within a system to manipulate or bypass safeguards, causing unintended or unauthorized behaviors.

Here is the link: Knostic research unveils timing-based vulnerabilities in AI large language models - SiliconANGLE

1 Like

Agree on concern here.

I’m new here, so sorry if it’s been addressed.

Earlier today posted on validation/trust and the like - curious if anyone knew of frameworks, methods etc. I haven’t found anything. As independent, devs, startup companies (including many engineers here, i’m sure) begin to build the vertical agents of the future - the resulting value will be huge; my concern is that gaining traction in public companies will be limited. Will they take on the risk w/o a standardized framework/standard validation process? Especially considering internal sec/engineers may not want to validate; considering the implications.

If it doesn’t exist - is anyone here interested in chatting about next steps in creating such a standard/approach/trust-org?

1 Like

@gamepawn I replied to your post, but Ragas is a framework I would look into! I think there is definitely a huge opportunity to build on top of something like Ragas still though, generative AI is certainly still the wild west!