WikAnalytics: A Web-based Application for Identifying Linguistic Features of a Text Group Supporting Filipino, English, and Taglish Languages

Jomari Valmadrid Ramos, John Michael Ballesta, Andrew Kobe Lee Yam, Moises Kairon Mogol, Ramon Rodriguez, Joseph Marvin Imperial

Research output: Chapter or section in a book/report/conference proceedingChapter in a published conference proceeding

Abstract

Reading is one of the first things humans learn to improve their comprehension, vocabulary, and imagination. Determining what to read for your level will be difficult, but forcing you to understand a high text level is more complex. It can result in not learning at all. The Philippines has two official languages, which are English and Filipino. Filipinos tend to engage with text written in both languages. Although the Philippines has two official languages, not all individuals are bilingual and only focus on one language, especially those who do not have the privilege to study in school. Thus, having bilingual text to read might be challenging to understand its context. A system that can analyze and recognize either English, Filipino, or Taglish languages while giving the readability index of the given text could help readers determine the readability level they want to read. Wikanalytics, a web application text analysis tool, was developed in this paper. The application aims to provide analysis of text files while calculating its readability index. The Agile Software Development Method was utilized for developing the web application. The system handles text documents written in English, Filipino, or Taglish Languages. The extracted features of an analyzed text are Readability Index, Traditional Features, Lexical Features, Syllable Patterns, Sentences and Paragraphs, and Token Ratios. The accuracy of the web application tool in identifying the language is 97.5% in English, 96% for Filipino, and 96.89% for Taglish. The runtime is tested for English, Filipino, and Taglish files, and the longest runtime out of the three languages mentioned is the Taglish file. Having two different languages in the file affected the runtime of the application's Lexical and Token ratio feature.

Original languageEnglish
Title of host publicationProceedings of MLMI 2022 - 2022 5th International Conference on Machine Learning and Machine Intelligence
Place of PublicationU. S. A.
PublisherAssociation for Computing Machinery
Pages190-198
Number of pages9
ISBN (Electronic)9781450397551
DOIs
Publication statusPublished - 23 Sept 2022
Event5th International Conference on Machine Learning and Machine Intelligence, MLMI 2022 - Virtual, Online, China
Duration: 23 Sept 202225 Sept 2022

Publication series

NameACM International Conference Proceeding Series

Conference

Conference5th International Conference on Machine Learning and Machine Intelligence, MLMI 2022
Country/TerritoryChina
CityVirtual, Online
Period23/09/2225/09/22

Keywords

  • Agile Software Development Cycle
  • English Language
  • Filipino Language
  • Linguistic Properties
  • Taglish Language
  • Text Analysis Software

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'WikAnalytics: A Web-based Application for Identifying Linguistic Features of a Text Group Supporting Filipino, English, and Taglish Languages'. Together they form a unique fingerprint.

Cite this