16th AIAI 2020, 5 -7 June 2020, Greece

An innovative graph-based approach to advance feature selection from multiple textual documents

Nikos Giarelis, Nikos Kanakaris, Nikos Karacapilidis


  This paper introduces a novel graph-based approach to select features from multiple textual documents. The proposed solution enables the investigation of the importance of a term into a whole corpus of documents by utilizing contemporary graph theory methods, such as community detection algo-rithms and node centrality measures. Compared to well-tried existing solu-tions, evaluation results show that the proposed approach increases the accu-racy of most text classifiers employed and decreases the number of features required to achieve ‘state-of-the-art’ accuracy. Well-known datasets used for the experimentations reported in this paper include 20Newsgroups, Ling-Spam, Amazon Reviews and Reuters.  

