Arabic fake news detection based on deep contextualized embedding models.

Ali Bou Nassif, Ashraf Elnagar, Omar Elgendy, Yaman Afadar
Author Information
  1. Ali Bou Nassif: Department of Computer Engineering, University of Sharjah, P.O. Box: 27272, Sharjah, UAE. ORCID
  2. Ashraf Elnagar: Department of Computer Science, University of Sharjah, P.O. Box: 27272, Sharjah, UAE.
  3. Omar Elgendy: Department of Computer Engineering, University of Sharjah, P.O. Box: 27272, Sharjah, UAE.
  4. Yaman Afadar: Department of Computer Engineering, University of Sharjah, P.O. Box: 27272, Sharjah, UAE.

Abstract

Social media is becoming a source of news for many people due to its ease and freedom of use. As a result, fake news has been spreading quickly and easily regardless of its credibility, especially in the last decade. Fake news publishers take advantage of critical situations such as the Covid-19 pandemic and the American presidential elections to affect societies negatively. Fake news can seriously impact society in many fields including politics, finance, sports, etc. Many studies have been conducted to help detect fake news in English, but research conducted on fake news detection in the Arabic language is scarce. Our contribution is twofold: first, we have constructed a large and diverse Arabic fake news dataset. Second, we have developed and evaluated transformer-based classifiers to identify fake news while utilizing eight state-of-the-art Arabic contextualized embedding models. The majority of these models had not been previously used for Arabic fake news detection. We conduct a thorough analysis of the state-of-the-art Arabic contextualized embedding models as well as comparison with similar fake news detection systems. Experimental results confirm that these state-of-the-art models are robust, with accuracy exceeding 98%.

Keywords

References

  1. Sensors (Basel). 2021 Dec 17;21(24): [PMID: 34960517]

Word Cloud

Created with Highcharts 10.0.0newsfakeArabicmodelsdetectionstate-of-the-artcontextualizedembeddingmanyFakeconductedlanguageSocialmediabecomingsourcepeopledueeasefreedomuseresultspreadingquicklyeasilyregardlesscredibilityespeciallylastdecadepublisherstakeadvantagecriticalsituationsCovid-19pandemicAmericanpresidentialelectionsaffectsocietiesnegativelycanseriouslyimpactsocietyfieldsincludingpoliticsfinancesportsetcManystudieshelpdetectEnglishresearchscarcecontributiontwofold:firstconstructedlargediversedatasetSeconddevelopedevaluatedtransformer-basedclassifiersidentifyutilizingeightmajoritypreviouslyusedconductthoroughanalysiswellcomparisonsimilarsystemsExperimentalresultsconfirmrobustaccuracyexceeding98%baseddeepContextualizedDeeplearningNaturalprocessing

Similar Articles

Cited By