by Leonardo Carvalho

A team of researchers from the University of Chicago demonstrated the possibility of using Artificial Neural Networks (ANNs) and machine learning in generating comments and false reviews of products and services in websites with malicious purposes.

The technique, described in the document “Automated Crowdfunding Attacks and Defenses in Online Review Systems”, is presented as the next evolution of a practice known as “crowdturfing” (a combination of the words “crowd” and “astroturfing” — the practice of masking the sponsors of a message or organization to make it appear as though it originates from grassroots participant(s) and is supported by them). In the technique, groups of people are hired by malicious agents with the purpose of detracting or promoting a person, institution, idea or product on the web through disinformation campaigns.

The article gives an example: “An attacker pays a group a certain amount to write negative reviews of a competitor through fake accounts (…). Because these accounts are created by real people, they go unnoticed by tools that look for anomalous activities. “

The practice is profitable: in 2011, another study from the University of California at Santa Barbara found that nine out of the top 10 service sellers on Fiverr — a freelance services website — offered crowdturfing services to artificially increase Twitter followers, boost website traffic and increase the number of Facebook likes; the largest of these profiles earned at least $ 3 million in two years of service.

It is also widely used for political purposes: recently it was reported that tobacco giant Philip Morris had used the practice to convince politicians to legalize nicotine-containing electronic cigarettes in Australia.

The authors of the new study suggest that the evolution in the field of artificial intelligence can make this type of campaign even more effective, and much more difficult to detect. In an interview with an Australian news website, Professor Ben Zhao (one of the authors of the recent study and participant of the 2011 survey), while commenting on the use of Artificial Intelligence in crowdturfing, stated that “one of the interesting aspects of this type of attack is that it is different from other automated attacks … there are few possible defenses against it today. “

Artificial Neural Networks are “computing systems inspired by the biological neural networks that constitute animal brains”. Such systems “learn” — or more precisely progressively improve performance — “to do tasks by considering examples, generally without task-specific programming.”.

Although these networks are not yet capable of generating texts that mimick human language with a high degree of verisimilitude, the study considers that, at the present stage of development, “the quality of ANN-generated text is likely more than sufficient for applications relying on domain-specific, short length user-generated content, e.g., online reviews”.

In the proposed model, an attacker would need “access to a corpus of real reviews to train the generative language model” (sites like Yelp provide this openly); secondly, it is necessary to have “knowledge of the domain of a product (e.g., cameras) or business (e.g., restaurants, clothing stores),which allows for training on a review corpus that matches the domain”, and, finally, “access to sufficient computational resources for training neural networks”- which can also be achieved at a relatively low cost.

The following table shows examples of a series of comments generated by a neural network trained by researchers targeting a New York restaurant; the “Temperature” field in the example is an indicator of the “originality” of the generated content, lower temperatures indicate more repetitive content.

Researchers fear that the low cost of the attack — when compared to its current version, which relies on large groups of people working in coordination — makes its use attractive to criminals and other malicious agents. “There used to be a bottleneck: getting someone to sit down and write serious, meaningful content that makes sense in the context in which they were posted … it costs a lot of money. Now that bottleneck can be totally ignored, “Zhao said.