Text Normalization for Japanese Sentiment Analysis

5月 1, 2025·
Risa Kondo
,
Ayu Teramen
,
Reon Kajikawa
,
Koki Horiguchi
,
Tomoyuki Kajiwara
,
Takashi Ninomiya
,
Hideaki Hayashi
,
Yuta Nakashima
,
Hajime Nagahara
· 0 分で読める
概要
We manually normalize noisy Japanese expressions on social networking services (SNS) to improve the performance of sentiment polarity classification. Despite advances in pre-trained language models, informal expressions found in social media still plague natural language processing. In this study, we analyzed 6,000 posts from a sentiment analysis corpus for Japanese SNS text, and constructed a text normalization taxonomy consisting of 33 types of editing operations. Text normalization according to our taxonomy significantly improved the performance of BERT-based sentiment analysis in Japanese. Detailed analysis reveals that most types of editing operations each contribute to improve the performance of sentiment analysis.
タイプ
収録
Proc. the Tenth Workshop on Noisy and User-generated Text (W-NUT 2025)