Computer Science Ph.D. student Dimitrios Kotzias wins Yelp Dataset Challenge with novel machine learning algorithm.
Computer Science Ph.D. student Dimitrios Kotzias has won in round five of Yelp’s recent Dataset Challenge, where the crowdsourced consumer review business provides teams of researchers with data sets comprised of real reviews to use as a test bed. Yelp rewards the unique and innovative research spawned from each round of their data set challenges, to date awarding more than $40,000 in cash prizes and inspiring hundreds of academic papers.
Kotzias, along with his adviser Padhraic Smyth, director of UCI’s Data Science Initiative, and researchers from the University of Oxford and Google DeepMind, developed a novel approach to using group-level labels, such as the category of an entire review, to learn instance-level classification, such as the category of specific sentences inside a review. The machine learning algorithm can accurately detect whether individual sentences within a review are expressing positive or negative sentiment. Applications span far beyond Yelp, enabling companies like Netflix, Amazon and eBay extract useful information for consumers from the text in online reviews. Kotzias’ team details this new algorithm in their paper, “From Group to Individual Labels using Deep Features.”
Kotzias says his involvement with the Yelp Dataset Challenge was accidental: “My co-authors and I were creating a model and sculpting our idea, and we needed data to test if our idea worked our not. Yelp offered a very nice data set because of the reviews we were able to utilize for free. There was also the added benefit of entering the challenge if we used Yelp’s data.”
A Yelp data scientist noted in a blog post that the team’s “innovative research has broad implications for a variety of fields, and not just text classification.” He writes, “This entry was selected from many submissions for its technical and academic merit.”
In addition to receiving a $5,000 cash prize for winning the competition, Kotzias presented on the research at the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. The paper was also published in the conference’s proceedings.
For Kotzias, he greatly appreciated the opportunity to work with real-world data. “I think one of the most attractive features of competing with real data is to see research created by academics applied to real-world challenges. You certainly gain a broader mindset when attempting to reach a twofold goal: research and real use and application. It is also fascinating how open the field is and how much progress has been achieved by ideas that have emerged from events such as this,” he says.