Classification of highly-skewed data [patent]

Publication date

September 15, 2020

Inventors

Vipin Kumar (professor), Varun Mithal (Ph.D. 2016), Guruprasad Nayak (Ph.D. 2019), Ankush Khandelwal (Ph.D. 2019)

Abstract

A method for identifying highly-skewed classes using an imperfect annotation of every instance together with a set of features for all instances. The imperfect annotations designate a plurality of instances as belonging to the target rare class and others to the majority class. First, a classifier is trained on the set of features using the imperfect annotation as supervision, to designate each instance to either the rare class or majority class. A combination of the predictions from the trained classifier and the imperfect annotations is then used to classify each instance to either the rare class or majority class. In particular, an instance is classified to the rare class only when both the trained classifier and the imperfect annotation classify the instance to the rare class. Finally, for each instance assigned as a rare class instance by the combination stage, all instances in its neighborhood are re-classified as either rare class or majority class.

Link to patent application

Classification of highly-skewed data

Share