VERA-ARAB: unveiling the Arabic tweets credibility by constructing balanced news dataset for veracity analysis.

Mohamed A Mostafa, Ahmad Almogren
Author Information
  1. Mohamed A Mostafa: Department of Computer Science, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia. ORCID
  2. Ahmad Almogren: Chair of Cyber Security, Department of Computer Science, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia. ORCID

Abstract

The proliferation of fake news on social media platforms necessitates the development of reliable datasets for effective fake news detection and veracity analysis. In this article, we introduce a veracity dataset of Arabic tweets called "VERA-ARAB", a pioneering large-scale dataset designed to enhance fake news detection in Arabic tweets. VERA-ARAB is a balanced, multi-domain, and multi-dialectal dataset, containing both fake and true news, meticulously verified by fact-checking experts from Misbar. Comprising approximately 20,000 tweets from 13,000 distinct users and covering 884 different claims, the dataset includes detailed information such as news text, user details, and spatiotemporal data, spanning diverse domains like sports and politics. We leveraged the X API to retrieve and structure the dataset, providing a comprehensive data dictionary to describe the raw data and conducting a thorough statistical descriptive analysis. This analysis reveals insightful patterns and distributions, visualized according to data type and nature. We also evaluated the dataset using multiple machine learning classification models, exploring various social and textual features. Our findings indicate promising results, particularly with textual features, underscoring the dataset's potential for enhancing fake news detection. Furthermore, we outline future work aimed at expanding VERA-ARAB to establish it as a benchmark for Arabic tweets in fake news detection. We also discuss other potential applications that could leverage the VERA-ARAB dataset, emphasizing its value and versatility for advancing the field of fake news detection in Arabic social media. Potential applications include user veracity assessment, topic modeling, and named entity recognition, demonstrating the dataset's wide-ranging utility for broader research in information quality management on social media.

Keywords

References

  1. Sci Rep. 2024 Jan 27;14(1):2265 [PMID: 38280911]
  2. J Biomed Inform. 2002 Oct-Dec;35(5-6):352-9 [PMID: 12968784]
  3. Big Data. 2020 Jun;8(3):171-188 [PMID: 32491943]
  4. Data Brief. 2022 Apr 08;42:108141 [PMID: 35496492]
  5. Front Neurorobot. 2013 Dec 04;7:21 [PMID: 24409142]
  6. Arab J Sci Eng. 2022;47(8):10453-10469 [PMID: 35194540]
  7. Neural Comput Appl. 2022;34(18):16019-16032 [PMID: 35529091]

Word Cloud

Created with Highcharts 10.0.0newsdatasetfakeArabicdetectiontweetssocialmediaveracityanalysisdataVERA-ARABbalanced000informationuseralsoclassificationtextualfeaturesdataset'spotentialapplicationsentityrecognitionSocialproliferationplatformsnecessitatesdevelopmentreliabledatasetseffectivearticleintroducecalled"VERA-ARAB"pioneeringlarge-scaledesignedenhancemulti-domainmulti-dialectalcontainingtruemeticulouslyverifiedfact-checkingexpertsMisbarComprisingapproximately2013distinctuserscovering884differentclaimsincludesdetailedtextdetailsspatiotemporalspanningdiversedomainslikesportspoliticsleveragedXAPIretrievestructureprovidingcomprehensivedictionarydescriberawconductingthoroughstatisticaldescriptiverevealsinsightfulpatternsdistributionsvisualizedaccordingtypenatureevaluatedusingmultiplemachinelearningmodelsexploringvariousfindingsindicatepromisingresultsparticularlyunderscoringenhancingFurthermoreoutlinefutureworkaimedexpandingestablishbenchmarkdiscussleverageemphasizingvalueversatilityadvancingfieldPotentialincludeassessmenttopicmodelingnameddemonstratingwide-rangingutilitybroaderresearchqualitymanagementVERA-ARAB:unveilingcredibilityconstructingFakeNamedcomputingTopic

Similar Articles

Cited By