An Automatic Labeling Method for Subword-Phrase Recognition in Effective Text Classification

Yusuke Kimura Takahiro Komamizu Kenji Hatano
雑誌・プロシーディングス名: Informatica
言語: English
Vol.: 47
No.: 3
ページ: 315-326
出版年: 2023
出版月: 9
DOI: 10.31449/inf.v47i3.4742
📄 PDFを開く
       

概要

The deep learning-based text classification methods perform better than traditional ones. In addition to the success of the deep learning technique, multi-task learning (MTL) has come to become a promising approach for text classification; for instance, an MTL approach in text classification employs named entity recognition as an auxiliary task and has showcased that the task helps to improve the text classification performance. Existing MTL-based text classification methods depend on the auxiliary tasks using supervised labels. Obtaining such supervision labels requires additional human and financial costs in addition to those for the main text classification task. To reduce these additional costs, we propose an MTL-based text classification framework on supervised label creation by automatically labeling phrases in texts for the auxiliary recognition task. A basic idea to realize the proposed framework is to utilize phrasal expressions consisting of subwords (called subword-phrases). To the best of our knowledge, no text classification approach has been designed on top of subword-phrases because subwords only sometimes express a coherent set of meanings. The novelty of the proposed framework is in adding subword-phrase recognition as an auxiliary task and utilizing subword-phrases for text classification. It extracts subword-phrases in an unsupervised manner using the statistics approach. To construct labels for effective subword-phrase recognition tasks, extracted subword-phrases are classified based on document classes to ensure that subword-phrases dedicated to some classes can be distinguishable. Experimental evaluation for text classification using five popular datasets showcased the effectiveness of the subword-phrase recognition as an auxiliary task. It also showed that comparing various labeling schemes in recent studies indicated insights for labeling common subword-phrases among several document classes.

引用情報

Yusuke Kimura, Takahiro Komamizu, , Kenji Hatano, An Automatic Labeling Method for Subword-Phrase Recognition in Effective Text Classification, Informatica, Vol.47, No.3, pp.315-326, 2023-09, DOI: 10.31449/inf.v47i3.4742.

Iconic One Theme | Powered by Wordpress