Creating artificial data circumvents privacy issues in social networking
Recommender systems — which make suggestions for you on Google or generate ads based on your preferences — are all around us now, thanks to social media. Yet testing the effectiveness of these systems can be a challenge, as they require users to input large amounts of data.
With a background in computer science dating back to before the invention of the world wide web, professor Abdolreza Abhari is certain of one thing: “If you want to understand a system, you have to simulate it. It’s what I tell all my students.”
Professor Abhari has simulated many types of networks and their content delivery systems. But when social networks came onto the scene, professor Abhari was presented with the puzzle of simulating a network that required large amounts of data to be inputted by a variety of individual users. Without permission from content creators, or real inputted data, social networks and their major applications (such as recommender systems) cannot be replicated.
Professor Abhari has found a way around the problem through the creation of artificial or stochastically modeled data. The promising performance of recommender systems is simulated using algorithms that are, in turn, tested for the volume of data that goes through social networks. The novelty of this work is that the stochastic or artificial modeling of data by professor Abhari and his graduate student, Jason Li, does not interfere with the privacy of any social media user.
His current work is to test recommender systems that will be used in analyzing the sentiments of users’ tweets and other social media posts. “You need to be able to simulate a massive data-mining algorithm to test it when you are dealing with the issue of recommender system performance,” he said. “Thus, using stochastic modeling to present data instead of using real data will work.”
Professor Abhari says his work addresses one of the primary concerns in using social media data for research: privacy. “Even if we get permission to use social network data, people have concerns about how the information is being used,” said professor Abhari. By creating artificial data instead of using real users’ inputted data, his team can test such human-centric systems without any privacy concerns.