
In a recent tweet, Computer Science Professor Sameer Singh asked, “Are Natural Language Processing models as good as they seem on leaderboards?” He then provided the answer. “We know they’re not, but there isn’t a structured way to test them.” He went on to introduce CheckList, a task-agnostic methodology for testing NLP models that he developed in collaboration with Marco Tulio Ribeiro of Microsoft Research and Tongshuang Wu and Carlos Guestrin at the University of Washington. The team presented their paper, “Beyond Accuracy: Behavioral Testing of NLP Models with CheckList,” at the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020). Their work was not only well-received; it was the recipient of the Best Paper Award.
[Read more…]