Building Trustworthy AI: How to Secure Your Models from Cyberattacks

Răzvan Axinia

Software Development Engineer I at ASSIST

Andrei Cuzenco

Software Development Engineer I at ASSIST

Read time: 13 minutes

Contents

1. The Growing Threat Landscape for AI-Powered Systems and the Need for Robust AI Security
2. Real-World Examples: AI Security in Cloud Platforms
- Understanding the Attack Vectors: Targeting AI Systems and Implementing AI Security Measures
3. Cybersecurity and AI's history together
4. Protecting Large Language Models (LLMs)
- 4.1 Prompt Injection
- 4.2 Jailbreak Attacks
5. Securing Photo/Video Classification Models
6. Safeguarding Photo/Video Generative Models
7. Conclusions
Read more about this topic and be inspired.

The year 2024 has witnessed a troubling rise in cyber-attacks, with critical infrastructure facing the consequences of the assault. These attacks, ranging from ransomware disruptions to sophisticated data breaches, have caused significant damage, jeopardizing not only the ability to deliver critical services (like patient care or financial transactions) but also the privacy of sensitive information such as medical records, financial data, and even personal identification details.

1. The Growing Threat Landscape for AI-Powered Systems and the Need for Robust AI Security

While the specific details of the Change Healthcare cyberattack are still emerging, it raises concerns about the potential impact of cyberattacks on AI-powered systems in healthcare. Imagine attackers manipulating AI models used for medical diagnosis or patient data analysis. This highlights the critical need for robust AI security measures as AI becomes more integrated into critical healthcare workflows.

While WannaCry primarily impacted traditional IT infrastructure, it serves as a cautionary tale for the future of AI security. Ransomware attacks like WannaCry often target unpatched vulnerabilities – vulnerabilities that could potentially exist in the systems that manage and train AI models. Imagine a scenario where attackers gain access to the systems responsible for training an AI model used for fraud detection in the financial sector. By exploiting vulnerabilities like those targeted by WannaCry, hackers could manipulate the training data used by the model. This manipulated data could lead the AI model to misclassify legitimate transactions as fraudulent or vice versa, potentially causing significant financial losses.

This example highlights the critical need for robust cybersecurity measures throughout the entire AI lifecycle, from development and training to deployment and maintenance. By proactively addressing traditional cybersecurity vulnerabilities, we can help mitigate the risks associated with AI model manipulation and ensure a more secure future for AI across various industries.

2. Real-World Examples: AI Security in Cloud Platforms

The recent MOAB (Mother of All Breaches) exposed vulnerabilities in cloud platforms used by various industries, including those heavily reliant on AI. This incident underscores the importance of AI security within cloud environments. For example, self-driving cars rely on secure cloud platforms for AI model training and updates. A breach could expose this data, potentially leading to manipulated AI behavior and safety risks. AI security best practices for cloud platforms can help mitigate these risks.

Understanding the Attack Vectors: Targeting AI Systems and Implementing AI Security Measures

Here are some specific ways cyberattacks can exploit AI vulnerabilities:

Data Poisoning: Malicious actors can inject manipulated data into the training datasets used for AI models. This can lead to biased or inaccurate outputs, compromising the model's effectiveness. AI security measures like data validation techniques can help address this threat.
Model Hijacking: Hackers could gain access to AI models and manipulate their code or inputs to generate desired outputs. Imagine an AI model used for stock price prediction being hijacked to influence market manipulation. Robust AI security protocols can help prevent unauthorized access to AI models.
Social Engineering Attacks: While not exclusive to AI, social engineering tactics can be used to trick employees into granting unauthorized access to AI systems or manipulating the data they use. Employee awareness training is a crucial element of a comprehensive AI security strategy.
⁠Deepfakes: Malicious actors can use AI-powered deepfakes to impersonate people in videos or create fake news, potentially impacting public trust. AI-generated voices further complicate matters, allowing hackers to mimic trusted individuals.

By acknowledging the evolving cyber threat landscape and implementing proactive security measures, we can ensure that AI continues to drive progress across various industries while protecting sensitive data and critical infrastructure.

3. Cybersecurity and AI's history together

In the late 1990s and early 2000s, Intrusion Detection Systems began to play a significant role in a company's defense. To achieve better results, the industry started looking at machine learning to improve its analyzing capabilities by using various techniques to detect anomalies and network traffic patterns. The 2000s also saw increased use of machine learning techniques in cybersecurity, applying algorithms to analyze data patterns and identify potential threats. Behavioral analysis, a form of AI, gained prominence in detecting malware and other cyber threats by understanding normal behavior and identifying deviations from the norm.

4. Protecting Large Language Models (LLMs)

Large Language Models (LLMs) are at the forefront of AI advancements, powering applications like chatbots, translators, and more. However, they are susceptible to several types of attacks. One significant risk is API abuse. Attackers can exploit the APIs to generate excessive costs by sending numerous requests. To prevent this, developers should implement middleware that limits the number of requests per second, counts requests, and filters out potentially malicious payloads. Tools like Natural Language Toolkit (NLTK) can remove connecting words, leaving only meaningful words for filtering.

Payload exploits pose another threat in addition to API abuse. Middleware should include a list of the latest payload exploits and match user inputs with these payloads to preserve credits and avoid unnecessary costs. Moreover, website vulnerabilities such as SQL injection (SQLi), Cross-Site Scripting (XSS), and other injections can compromise the LLM's website hosting. Input sanitization and proper validation for file types and input fields can mitigate these risks effectively.

Implementing an abuse list is another measure to enhance security. An abuse list can prevent the same IP addresses from spamming the model with malicious requests, helping maintain the system's integrity by preventing repeated attacks from the same sources. Additionally, LLM training datasets can be intentionally poisoned to degrade model performance. Using high-quality data and filtering datasets before training can combat this issue, ensuring that the model performs optimally.

Another critical aspect of LLM security is restricting access to external resources. Ensuring the model cannot access external resources is crucial, especially for support chatbots, as crawling unknown pages can lead to malicious behavior. By implementing these measures, developers can significantly enhance the security of LLMs and ensure their reliable operation.

4.1 Prompt Injection

Prompt injection involves manipulating the input prompts given to an AI model to induce unintended behavior. This can lead to unauthorized actions or leakage of sensitive information.

Example of Vulnerable Code:

 1 | # Vulnerable code example
 2 | user_input = input("Please enter your query: ")
 3 | response = model.generate(user_input)
 4 | print(response)

In this example, the user input is directly fed into the model without any validation or sanitization, making it vulnerable to prompt injection.

Mitigation Strategies:

1 | # Secure code example 
2 | import re 
3 |   
4 | def sanitize_input(user_input): 
5 |     # Simple sanitization example 
6 |     user_input = re.sub(r'[^\w\s]', '', user_input) 
7 |     return user_input 
8 |   
9 | user_input = input("Please enter your query: ") 
10| sanitized_input = sanitize_input(user_input) 
11| response = model.generate(sanitized_input) 
12| print(response)

The sanitize_input function removes any potentially harmful characters from the user input, reducing the risk of prompt injection.

4.2 Jailbreak Attacks

Explanation:
Jailbreak attacks involve bypassing an AI model's intended restrictions or limitations, allowing the attacker to exploit the model beyond its intended use.

1 | # Vulnerable code example
2 | def respond_to_query(query): 
3 |     if "shutdown" in query: 
4 |         return "Sorry, I can't help with that." 
5 |     return model.generate(query) 
6 |  
7 | user_input = input("Please enter your query: ") 
8 | response = respond_to_query(user_input) 
9 | print(response)

Explanation:
In this example, the model checks for a specific keyword but doesn't account for variations or more sophisticated attempts to bypass the check.

Mitigation Strategies:

1 | # Secure code example 
2 | def is_safe_query(query): 
3 |     # Comprehensive list of disallowed actions and keywords 
4 |     disallowed_patterns = ["shutdown", "delete", "format", "destroy"] 
5 |     for pattern in disallowed_patterns: 
6 |         if re.search(pattern, query, re.IGNORECASE): 
7 |             return False 
8 |     return True 
9 |   
10| user_input = input("Please enter your query: ") 
11| if is_safe_query(user_input): 
12|     response = model.generate(user_input) 
13| else: 
14|     response = "Sorry, your query cannot be processed." 
15| print(response)

Explanation:
The is_safe_query function checks the input against a list of disallowed patterns, ensuring that potentially harmful commands are not processed by the model.

5. Securing Photo/Video Classification Models

Photo and video classification models face unique challenges, including the notorious one-pixel attack. This attack involves altering a single pixel to mislead the classifier, causing it to misidentify the object. Defenses against such attacks are complex but necessary. In the following research, Evaluation of Defense Methods Against the One-Pixel Attack on DeepNeural Networks, the authors follow some strategies to combat one-pixel attacks. The most effective model-agnostic method seems to be spatial smoothing, while Gaussian augmentation seems to be the best defense for NiN and VGG16 models.

Another significant concern for photo/video classification models is the potential upload of copyrighted content or Personally Identifiable Information (PII) without permission. Ensuring that such content is not uploaded is challenging but essential. Implementing automatic reverse image searches and utilizing models trained to detect PII can help enforce policies and prevent unauthorized use of sensitive data. This not only protects the integrity of the model but also ensures compliance with legal and ethical standards.

By addressing these unique challenges, developers can enhance the security of photo and video classification models. Implementing robust defenses against one-pixel attacks and properly handling copyrighted content and PII are critical to securing these models.

One-Pixel Attack

Example of Vulnerable Code:

1 | import tensorflow as tf 
2 | from tensorflow.keras.applications.vgg16 import VGG16, preprocess_input, decode_predictions 
3 | from tensorflow.keras.preprocessing import image 
4 | import numpy as np 
5 |   
6 | # Load pre-trained model 
7 | model = VGG16(weights='imagenet') 
8 |   
9 | # Load and preprocess image 
10| img_path = 'elephant.jpg' 
11| img = image.load_img(img_path, target_size=(224, 224)) 
12| x = image.img_to_array(img) 
13| x = np.expand_dims(x, axis=0) 
14| x = preprocess_input(x) 
15|   
16| # Predict class probabilities 
17| preds = model.predict(x) 
18| print('Predicted:', decode_predictions(preds, top=3)[0])

Explanation:
In this example, an image is loaded and classified without any checks for potential pixel-level manipulations.

Mitigation Strategies:

1 | import tensorflow as tf 
2 | from tensorflow.keras.applications.vgg16 import VGG16, preprocess_input, decode_predictions 
3 | from tensorflow.keras.preprocessing import image 
4 | import numpy as np 
5 |   
6 | def spatial_smoothing(img_array): 
7 |     # Apply spatial smoothing 
8 |     kernel = np.ones((3, 3), np.float32) / 9 
9 |     smoothed_img = cv2.filter2D(img_array, -1, kernel) 
10|     return smoothed_img 
11|   
12| # Load pre-trained model
13| model = VGG16(weights='imagenet')
14|   
15| # Load and preprocess image 
16| img_path = 'elephant.jpg' 
17| img = image.load_img(img_path, target_size=(224, 224)) 
18| x = image.img_to_array(img) 
19| x = np.expand_dims(x, axis=0) 
20| x = preprocess_input(x) 
21|   
22| # Apply spatial smoothing 
23| x = spatial_smoothing(x) 
24|   
25| # Predict class probabilities 
26| preds = model.predict(x) 
27| print('Predicted:', decode_predictions(preds, top=3)[0])8|

Explanation:
The spatial_smoothing function is applied to the input image to reduce the impact of one-pixel attacks by smoothing out anomalies before classification.

Handling Copyrighted Content and PII

Example of Vulnerable Code:

1 | import os 
2 | from flask import Flask, request 
3 | from werkzeug.utils import secure_filename 
4 |   
5 | app = Flask(__name__) 
6 | UPLOAD_FOLDER = 'uploads/' 
7 | app.config['UPLOAD_FOLDER'] = UPLOAD_FOLDER 
8 |   
9 | @app.route('/upload', methods=['POST']) 
10| def upload_file(): 
11|     if 'file' not in request.files: 
12|         return 'No file part' 
13|     file = request.files['file'] 
14|     if file.filename == '': 
15|         return 'No selected file' 
16|     filename = secure_filename(file.filename) 
17|     file.save(os.path.join(app.config['UPLOAD_FOLDER'], filename)) 
18|     return 'File successfully uploaded'

Explanation:
This code allows for file uploads without checking for copyrighted content or PII.

Mitigation Strategies:

1 | import os 
2 | from flask import Flask, request 
3 | from werkzeug.utils import secure_filename 
4 | from PIL import Image 
5 | import pytesseract 
6 | import requests 
7 |   
8 | app = Flask(__name__) 
9 | UPLOAD_FOLDER = 'uploads/' 
10| app.config['UPLOAD_FOLDER'] = UPLOAD_FOLDER 
11|   
12| def is_copyrighted_image(file_path): 
13|     # Implement reverse image search 
14|     response = requests.post('https://api.example.com/reverse_image_search', files={'file': open(file_path, 'rb')}) 
15|     return response.json().get('is_copyrighted', False) 
16|   
17| def contains_pii(file_path): 
18|     # Use OCR to detect text in the image 
19|     text = pytesseract.image_to_string(Image.open(file_path)) 
20|     pii_keywords = ['name', 'address', 'phone', 'email'] 
21|     for keyword in pii_keywords: 
22|         if keyword in text: 
23|             return True 
24|     return False 
25|   
26| @app.route('/upload', methods=['POST']) 
27| def upload_file(): 
28|     if 'file' not in request.files: 
29|         return 'No file part' 
30|     file = request.files['file'] 
31|     if file.filename == '': 
32|         return 'No selected file' 
33|     filename = secure_filename(file.filename) 
34|     file_path = os.path.join(app.config['UPLOAD_FOLDER'], filename) 
35|     file.save(file_path) 
36|   
37|     if is_copyrighted_image(file_path): 
38|         os.remove(file_path) 
39|         return 'File contains copyrighted content' 
40|      
41|     if contains_pii(file_path): 
42|         os.remove(file_path) 
43|         return 'File contains PII' 
44|   
45|     return 'File successfully uploaded'

Explanation:
This code adds checks for copyrighted content using a reverse image search API and for PII using OCR. Files containing such content are not saved.

6. Safeguarding Photo/Video Generative Models

Generative models, which create unique images or videos based on user prompts, require robust security measures to prevent misuse. One crucial aspect is content filtering. Prompts should be comprehensively filtered to avoid generating inappropriate content, such as racist, NSFW, or copyrighted material. This ensures that the outputs of the generative models are suitable for use and do not violate ethical or legal standards.

In addition to content filtering, generative models, such as Stable Diffusion, are vulnerable to website exploits. SQLi, XSS, and other types of injections can be used to compromise the website hosting the generative model. Proper input sanitization and validation are critical to mitigating these risks. By ensuring that all inputs are thoroughly checked and sanitized, developers can prevent attackers from exploiting vulnerabilities in the system.

By implementing these measures, developers can safeguard photo and video generative models from the mentioned threats. Ensuring robust content filtering and protecting against website exploits are essential steps in maintaining the security and integrity of these powerful AI tools.

Content Filtering

Example of Vulnerable Code:

1 | from textgenrnn import textgenrnn 
2 |   
3 | textgen = textgenrnn.TextgenRnn() 
4 |   
5 | prompt = input("Enter your prompt: ") 
6 | generated_text = textgen.generate(return_as_list=True, prefix=prompt)[0] 
7 | print(generated_text)

Explanation:
This code generates text based on user input without filtering, allowing for inappropriate content generation.

Mitigation Strategies:

1 | from textgenrnn import textgenrnn 
2 | import re 
3 |   
4 | def filter_prompt(prompt): 
5 |     # Define inappropriate content patterns 
6 |     inappropriate_patterns = ['racist', 'NSFW', 'copyrighted'] 
7 |     for pattern in inappropriate_patterns: 
8 |         if re.search(pattern, prompt, re.IGNORECASE): 
9 |             return False 
10|     return True 
11|   
12| textgen = textgenrnn.TextgenRnn() 
13|   
14| prompt = input("Enter your prompt: ") 
15| if filter_prompt(prompt): 
16|     generated_text = textgen.generate(return_as_list=True, prefix=prompt)[0] 
17|     print(generated_text) 
18| else: 
19|     print("Inappropriate prompt content detected.")

Explanation:
The filter_prompt function checks the user input for inappropriate content patterns before allowing text generation.

Website Exploits

Example of Vulnerable Code:

1 | from flask import Flask, request 
2 |   
3 | app = Flask(__name__) 
4 |   
5 | @app.route('/generate', methods=['POST']) 
6 | def generate(): 
7 |     prompt = request.form['prompt'] 
8 |     generated_content = model.generate(prompt) 
9 |     return generated_content

Explanation:
This code does not validate or sanitize user input, making it vulnerable to SQLi, XSS, and other injections.

Mitigation Strategies:

1 | from flask import Flask, request 
2 | import re 
3 |   
4 | app = Flask(__name__) 
5 |   
6 | def sanitize_input(user_input): 
7 |     # Remove harmful characters 
8 |     sanitized_input = re.sub(r'[^\w\s]', '', user_input) 
9 |     return sanitized_input 
10|   
11| @app.route('/generate', methods=['POST']) 
12| def generate(): 
13|     prompt = request.form['prompt'] 
14|     sanitized_prompt = sanitize_input(prompt) 
15|     generated_content = model.generate(sanitized_prompt) 
16|     return generated_content

Explanation:
The sanitize_input function removes potentially harmful characters from user input, reducing the risk of injections.

7. Conclusions

Securing AI models is an ever-evolving arms race against emerging threats. Developers must remain vigilant and adapt their strategies to these changing landscapes.

By addressing these crucial pillars of AI security, developers can build trust and ensure responsible deployment of these powerful tools. As AI technology continues to evolve, so must our security strategies. Continuous research and development in AI security are paramount to staying ahead of potential threats and ensuring AI remains a force for positive change across various industries.

Search form

Building Trustworthy AI: How to Secure Your Models from Cyberattacks

1. The Growing Threat Landscape for AI-Powered Systems and the Need for Robust AI Security

2. Real-World Examples: AI Security in Cloud Platforms

Understanding the Attack Vectors: Targeting AI Systems and Implementing AI Security Measures

3. Cybersecurity and AI's history together

4. Protecting Large Language Models (LLMs)

4.1 Prompt Injection

4.2 Jailbreak Attacks

5. Securing Photo/Video Classification Models

One-Pixel Attack

6. Safeguarding Photo/Video Generative Models

7. Conclusions

Read more about this topic and be inspired.

Möchten Sie mit uns in Kontakt treten?

1. The Growing Threat Landscape for AI-Powered Systems and the Need for Robust AI Security

2. Real-World Examples: AI Security in Cloud Platforms

Understanding the Attack Vectors: Targeting AI Systems and Implementing AI Security Measures

3. Cybersecurity and AI's history together

4. Protecting Large Language Models (LLMs)

4.1 Prompt Injection

4.2 Jailbreak Attacks

5. Securing Photo/Video Classification Models

One-Pixel Attack

6. Safeguarding Photo/Video Generative Models

7. Conclusions

Read more about this topic and be inspired.

Keep Reading

If you liked this article, check out our other blog articles.

Möchten Sie mit uns in Kontakt treten?