AFND: Arabic fake news dataset for the detection and classification of articles credibility.

Ashwaq Khalil, Moath Jarrah, Monther Aldwairi, Manar Jaradat
Author Information
  1. Ashwaq Khalil: Department of Computer Engineering, Jordan University of Science and Technology, PO Box 3030, Irbid 22110, Jordan.
  2. Moath Jarrah: Department of Computer Engineering, Jordan University of Science and Technology, PO Box 3030, Irbid 22110, Jordan.
  3. Monther Aldwairi: Department of Computer Engineering, Jordan University of Science and Technology, PO Box 3030, Irbid 22110, Jordan.
  4. Manar Jaradat: Department of Computer Engineering, The Hashemite University, PO Box 330127, Zarqa 13133, Jordan.

Abstract

The news credibility detection task has started to gain more attention recently due to the rapid increase of news on different social media platforms. This article provides a large, labeled, and diverse Arabic Fake News Dataset (AFND) that is collected from public Arabic news websites. This dataset enables the research community to use supervised and unsupervised machine learning algorithms to classify the credibility of Arabic news articles. AFND consists of 606912 public news articles that were scraped from 134 public news websites of 19 different Arab countries over a 6-month period using Python scripts. The Arabic fact-check platform, Misbar, is used manually to classify each public news source into credible, not credible, or undecided. Weak supervision is applied to label news articles with the same label as the public source. AFND is imbalanced in the number of articles in each class. Hence, it is useful for researchers who focus on finding solutions for imbalanced datasets. The dataset is available in JSON format and can be accessed from Mendeley Data repository.

Keywords

Word Cloud

Created with Highcharts 10.0.0newsArabicpublicarticlescredibilitydatasetAFNDdetectiondifferentwebsitesclassifysourcecredibleWeaklabelimbalancedfaketaskstartedgainattentionrecentlyduerapidincreasesocialmediaplatformsarticleprovideslargelabeleddiverseFakeNewsDatasetcollectedenablesresearchcommunityusesupervisedunsupervisedmachinelearningalgorithmsconsists606912scraped13419Arabcountries6-monthperiodusingPythonscriptsfact-checkplatformMisbarusedmanuallyundecidedsupervisionappliednumberclassHenceusefulresearchersfocusfindingsolutionsdatasetsavailableJSONformatcanaccessedMendeleyDatarepositoryAFND:classificationArticleClassificationDetectionlabeling

Similar Articles

Cited By (1)