Normalized effect size (NES): a novel feature selection model for Urdu fake news classification.

Muhammad Wasim, Sehrish Munawar Cheema, Ivan Miguel Pires
Author Information
  1. Muhammad Wasim: Department of Computer Science, University of Management & Technology, Sialkot Campus, Sialkot, Pakistan.
  2. Sehrish Munawar Cheema: Department of Computer Science, University of Management and Technology, Lahore, Pakistan.
  3. Ivan Miguel Pires: Instituto de Telecomunicações, Covilhã, Portugal.

Abstract

Social media has become an essential source of news for everyday users. However, the rise of fake news on social media has made it more difficult for users to trust the information on these platforms. Most research studies focus on fake news detection in the English language, and only a limited number of studies deal with fake news in resource-poor languages such as Urdu. This article proposes a globally weighted term selection approach named normalized effect size (NES) to select highly discriminative features for Urdu fake news classification. The proposed model is based on the traditional inverse document frequency (TF-IDF) weighting measure. TF-IDF transforms the textual data into a weighted term-document matrix and is usually prone to the curse of dimensionality. Our novel statistical model filters the most discriminative terms to reduce the data's dimensionality and improve classification accuracy. We compare the proposed approach with the seven well-known feature selection and ranking techniques, namely normalized difference measure (NDM), bi-normal separation (BNS), odds ratio (OR), GINI, distinguished feature selector (DFS), information gain (IG), and Chi square (Chi). Our ensemble-based approach achieves high performance on two benchmark datasets, BET and UFN, achieving an accuracy of 88% and 90%, respectively.

Keywords

References

  1. Nat Hum Behav. 2023 May;7(5):812-822 [PMID: 36928780]
  2. Curr Psychol. 2023 Feb 11;:1-8 [PMID: 36819750]
  3. PeerJ Comput Sci. 2021 Mar 9;7:e425 [PMID: 33817059]
  4. Soc Netw Anal Min. 2022;12(1):168 [PMID: 36407554]
  5. Sci Rep. 2023 Jan 5;13(1):178 [PMID: 36604448]
  6. Int J Data Sci Anal. 2022;13(4):335-362 [PMID: 35128038]
  7. Inf Syst Front. 2022 Jan 19;:1-16 [PMID: 35068999]
  8. Soc Netw Anal Min. 2023;13(1):30 [PMID: 36789378]
  9. J Grad Med Educ. 2012 Sep;4(3):279-82 [PMID: 23997866]
  10. PeerJ Comput Sci. 2021 Jun 18;7:e518 [PMID: 34239967]
  11. PeerJ Comput Sci. 2022 Jun 28;8:e1004 [PMID: 35875651]

Word Cloud

Created with Highcharts 10.0.0newsfakeUrduclassificationselectionmediaapproachmodelfeatureSocialusersinformationstudieslanguageweightednormalizedeffectsizeNESdiscriminativeproposedTF-IDFmeasuredatadimensionalitynovelaccuracyChiFeaturebecomeessentialsourceeverydayHoweverrisesocialmadedifficulttrustplatformsresearchfocusdetectionEnglishlimitednumberdealresource-poorlanguagesarticleproposesgloballytermnamedselecthighlyfeaturesbasedtraditionalinversedocumentfrequencyweightingtransformstextualterm-documentmatrixusuallypronecursestatisticalfilterstermsreducedata'simprovecomparesevenwell-knownrankingtechniquesnamelydifferenceNDMbi-normalseparationBNSoddsratioORGINIdistinguishedselectorDFSgainIGsquareensemble-basedachieveshighperformancetwobenchmarkdatasetsBETUFNachieving88%90%respectivelyNormalized:engineeringMachinelearningNaturalprocessingNLPcontentStyle-basedTextualtext

Similar Articles

Cited By