News Categorization based on Titles with SVM, Naïve Bayesian, Random Forest, and RNN algorithms

Yongwei Li, Kejun Liu, Ziyu Liu, Zhen Tao, Meng Yuan

Research output: Chapter or section in a book/report/conference proceedingChapter in a published conference proceeding

Abstract

News categorization, a text classification task, is now commonly used in many news websites. However, many of these news classifiers require full content of the news, which would cost great amounts of time for computation. In this paper, we focus on the possibility of categorizing news by its title with Support Vector Machines, Random Forest Classifiers, Naive Bayes, and Recurrent Neural Network. First, we explore some widely used pre-processing methods, including Bag of Words and Word2Vec. Then we combine these different pre-processing methods with the machine learning algorithms mentioned above to create different models. We measure their performances on the News Aggregator Data Set from UCI Machine Learning Repository, which contains over 400,000 pieces of news over 4 main categories. To evaluate the related performances, we use 85% data as a training set and 5% data as a validation set, and finally, use 10% data as a testing set. Comprehensive experimental results demonstrate that even with only the news titles, some models can still perform well in this challenging task. Therefore, it is possible to categorize news through its title in high accuracy yet with a much lower computing cost compared to full-text classification.

Original languageEnglish
Title of host publication2nd International Conference on Artificial Intelligence, Automation, and High-Performance Computing, AIAHPC 2022
EditorsLigu Zhu
Place of PublicationU. S. A.
PublisherSPIE
ISBN (Electronic)9781510657717
DOIs
Publication statusPublished - 28 Feb 2022
Externally publishedYes
Event2nd International Conference on Artificial Intelligence, Automation, and High-Performance Computing, AIAHPC 2022 - Zhuhai, China
Duration: 25 Feb 202227 Feb 2022

Publication series

NameProceedings of SPIE - The International Society for Optical Engineering
Volume12348
ISSN (Print)0277-786X
ISSN (Electronic)1996-756X

Conference

Conference2nd International Conference on Artificial Intelligence, Automation, and High-Performance Computing, AIAHPC 2022
Country/TerritoryChina
CityZhuhai
Period25/02/2227/02/22

Keywords

  • natural language processing
  • news aggregator
  • text classification
  • titles

ASJC Scopus subject areas

  • Electronic, Optical and Magnetic Materials
  • Condensed Matter Physics
  • Computer Science Applications
  • Applied Mathematics
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'News Categorization based on Titles with SVM, Naïve Bayesian, Random Forest, and RNN algorithms'. Together they form a unique fingerprint.

Cite this