Document classification has widespread applications, such as with web pages for advertising, emails for legal discovery, blog entries for sentiment analysis, and many more. Unfortunately, due to the high dimensionality, understanding the decisions made by the document classifiers is very difficult. We define a new sort of explanation, tailored to the business needs of document classification and able to cope with the associated technical constraints. Specifically, an explanation is defined as a set of words (terms, more generally) such that removing all words within this set from the document changes the predicted class from the class of interest.

Some recent publications:

Some explanations why a webpage is classified as having adult content.