Machine learning models could become a data security disaster
Date:
Tue, 12 Apr 2022 16:20:17 +0000
Description:
By poisoning a dataset, researchers managed to extract sensitive information relatively quickly
FULL STORY ======================================================================
Malicious actors can force machine learning models into sharing sensitive information, by poisoning the datasets used to train the models, researchers have found.
A team of experts from Google, the National University of Singapore, Yale-NUS College, and Oregon State University published a paper, called Truth serum: Poisoning machine learning models to reveal their secrets , which details how the attack works.
Discussing their findings with The Register , the researchers said that the attackers would still need to know a little bit about the datasets structure, for the attack to be successful. Shadow models
"For example, for language models, the attacker might guess that a user contributed a text message to the dataset of the form 'John Smith's social security number is ???-????-???.' The attacker would then poison the known part of the message 'John Smith's social security number is', to make it easier to recover the unknown secret number, co-author Florian Tramer explained.
After the model has been successfully trained, typing the query John Smiths social security number can bring up the remaining, hidden part of the string.
Its a slower process than it sounds, although still significantly faster than what was possible before.
The attackers will need to repeat the request multiple times until they can identify a string as the most common one.
In an attempt to extract a six-digit number from a trained model, the researchers poisoned 64 sentences in the WikiText dataset, and took exactly 230 guesses. It might sound like a lot, but apparently, thats 39 times less than the number of queries needed without the poisoned sentences. Read more
New AI security system cleverly combines machine learning and human
intuition
Google had to block hundreds of thousands of malicious Android apps last
year
What is machine learning?
But this time can be cut down even further, through the use of so-called shadow models, which helped the researchers identify common outputs which can be ignored.
"Coming back to the above example with John's social security number, it
turns out that John's true secret number is actually often not the second
most likely output of the model," Tramer told the publication.
"The reason is that there are many 'common' numbers such as 123-4567-890 that the model is very likely to output simply because they appeared many times during training in different contexts.
"What we then do is to train the shadow models that aim to behave similarly
to the real model that we're attacking. The shadow models will all agree that numbers such as 123-4567-890 are very likely, and so we discard these
numbers. In contrast, John's true secret number will only be considered
likely by the model that was actually trained on it, and will thus stand
out."
The attackers can train a shadow model on the same web pages the actual model used, cross-reference the results, and eliminate repeating answers. When the language of the actual model starts to differ, the attackers can know theyve hit the jackpot. Here's our roundup of the best identity theft protection services and ID protection providers around
Via: The Register
======================================================================
Link to news story:
https://www.techradar.com/news/machine-learning-models-could-become-a-data-sec urity-disaster/
--- Mystic BBS v1.12 A47 (Linux/64)
* Origin: tqwNet Technology News (1337:1/100)