Google creates a Red Team to test attacks against artificial intelligence systems

Google creates a Red Team to test attacks against artificial intelligence systems

Google has created a red team that focuses on artificial intelligence (AI) systems and released a report providing an overview of common types of attacks and lessons learned.

The company announced its AI Red Team just weeks after it introduced the Secure AI Framework (SAIF), designed to provide a security framework for the development, use, and protection of AI systems.

Google’s new report highlights the importance of red teaming for AI systems, the types of AI attacks that can be simulated by red teams, and the lessons for other organizations that may be considering launching their own team.

The AI ​​Red Team is closely aligned with traditional Red Teams, but also has the necessary AI expertise to perform complex technical attacks on AI systems, Google said.

The company’s AI Red Team takes on the role of adversary in testing the impact of potential attacks against real-world products and features that use AI.

Take prompt engineering for example, a widely used AI attack method where prompts are manipulated to force the system to respond in a specific way desired by the attacker.

In one example shared by Google, a webmail application uses artificial intelligence to automatically detect phishing emails and warn users. The security feature uses a generic large language model (LLM) ChatGPT is the most well-known LLM for analyzing an email and classifying it as legitimate or malicious.

Announcement. Scroll to continue reading.

An attacker who knows that the phishing detection feature uses artificial intelligence can add an invisible paragraph (by setting the font to white) containing instructions for the LLM to their malicious email, telling them to classify the email as legitimate.

If the Webmail Phishing Filter is vulnerable to immediate attacks, the LLM could interpret parts of the email content as instructions and classify the email as legitimate, as desired by the attacker. The phisher need not worry about the negative consequences of including this, as the text is well hidden from the victim and will lose nothing even if the attack fails, Google explained.

Another example involves the data used to train the LLM. While this training data was largely stripped of personal information and other sensitive information, researchers have shown that they are still able to extract personal information from an LLM.

Training data can also be abused in the case of email autocomplete features. An attacker could trick the AI ​​into providing information about an individual using specially crafted sentences that the auto-complete feature completes with stored training data that could include private information.

For example, an attacker enters the text: John Doe has lost a lot of work lately. He hasn’t been able to come to the office since. The autocomplete feature, based on training data, could complete the sentence with the interview for a new job.

Blocking access to an LLM is also important. In one example provided by Google, a student gains access to an LLM specifically designed for grading essays. The model can prevent fast injection, but access was not blocked, allowing the student to train the model to always give the best grade to documents that contain a specific word.

Google’s report contains many more examples of the types of attacks that a red AI team can test.

As for the lessons learned, Google advises traditional red teams to join forces with AI experts to create realistic opponent simulations. He also points out that dealing with the results of the red teams can be challenging and some problems may not be easy to solve.

Traditional security controls can be effective at mitigating many risks. For example, ensuring that systems and models are properly locked down helps protect the integrity of AI models, preventing backdoors and data poisoning.

On the other hand, while some attacks on AI systems can be detected using traditional methods, others, such as content issues and instant attacks, may require layering of multiple security models.

Related: Now is the time for a pragmatic approach to adopting new technologies

Related: ChatGPT hallucinations can be exploited to distribute packages of malicious code

Related: AntChain and Intel Create New Privacy Computing Platform for AI Training

#Google #creates #Red #Team #test #attacks #artificial #intelligence #systems
Image Source : www.securityweek.com

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *