Analysis of child development facts and myths using text mining techniques and classification models.

Mehedi Tajrian, Azizur Rahman, Muhammad Ashad Kabir, Md Rafiqul Islam
Author Information
  1. Mehedi Tajrian: School of Computing, Mathematics and Engineering, Charles Sturt University, NSW, Australia.
  2. Azizur Rahman: School of Computing, Mathematics and Engineering, Charles Sturt University, NSW, Australia.
  3. Muhammad Ashad Kabir: School of Computing, Mathematics and Engineering, Charles Sturt University, NSW, Australia.
  4. Md Rafiqul Islam: School of Computing, Mathematics and Engineering, Charles Sturt University, NSW, Australia.

Abstract

The rapid dissemination of misinformation on the internet complicates the decision-making process for individuals seeking reliable information, particularly parents researching child development topics. This misinformation can lead to adverse consequences, such as inappropriate treatment of children based on myths. While previous research has utilized text-mining techniques to predict child abuse cases, there has been a gap in the analysis of child development myths and facts. This study addresses this gap by applying text mining techniques and classification models to distinguish between myths and facts about child development, leveraging newly gathered data from publicly available websites. The research methodology involved several stages. First, text mining techniques were employed to pre-process the data, ensuring enhanced accuracy. Subsequently, the structured data was analysed using six robust Machine Learning (ML) classifiers and one Deep Learning (DL) model, with two feature extraction techniques applied to assess their performance across three different training-testing splits. To ensure the reliability of the results, cross-validation was performed using both k-fold and leave-one-out methods. Among the classification models tested, Logistic Regression (LR) demonstrated the highest accuracy, achieving a 90 % accuracy with the Bag-of-Words (BoW) feature extraction technique. LR stands out for its exceptional speed and efficiency, maintaining low testing time per statement (0.97 μs). These findings suggest that LR, when combined with BoW, is effective in accurately classifying child development information, thus providing a valuable tool for combating misinformation and assisting parents in making informed decisions.

Keywords

References

  1. PLoS One. 2019 Jan 30;14(1):e0210746 [PMID: 30699155]
  2. Child Abuse Negl. 2016 Sep;59:55-65 [PMID: 27517122]
  3. PeerJ Comput Sci. 2022 Jan 20;8:e830 [PMID: 35174265]
  4. Br J Soc Work. 2016 Jun;46(4):1044-1058 [PMID: 27559213]
  5. J Comput Soc Sci. 2020;3(2):279-317 [PMID: 33134595]
  6. JAMA. 2018 Dec 18;320(23):2417-2418 [PMID: 30428002]
  7. J Ambient Intell Humaniz Comput. 2023 May 27;:1-13 [PMID: 37360776]
  8. Br J Psychol. 2011 Aug;102(3):443-63 [PMID: 21751999]
  9. Front Psychol. 2023 Mar 02;14:1109126 [PMID: 36935982]
  10. Lancet. 2000 Oct 28;356(9240):1517-9 [PMID: 11081552]
  11. Science. 2009 Jul 17;325(5938):284-8 [PMID: 19608908]
  12. Eur J Soc Psychol. 2018 Dec;48(7):897-908 [PMID: 30555188]
  13. PLoS One. 2022 Feb 9;17(2):e0263381 [PMID: 35139117]
  14. Adv Child Dev Behav. 2011;41:xiii-xvii [PMID: 23259185]
  15. Science. 2018 Mar 9;359(6380):1094-1096 [PMID: 29590025]
  16. Mem Stud. 2017 Jul;10(3):323-333 [PMID: 29081831]
  17. IEEE Access. 2020 Sep 09;8:165201-165215 [PMID: 34786288]
  18. Soc Netw Anal Min. 2022;12(1):94 [PMID: 35919516]

Word Cloud

Created with Highcharts 10.0.0childdevelopmenttechniquesmythsminingmisinformationfactstextclassificationmodelsdataaccuracyusingLRinformationparentsresearchgapMachineLearningDeepfeatureextractionBoWlearningrapiddisseminationinternetcomplicatesdecision-makingprocessindividualsseekingreliableparticularlyresearchingtopicscanleadadverseconsequencesinappropriatetreatmentchildrenbasedpreviousutilizedtext-miningpredictabusecasesanalysisstudyaddressesapplyingdistinguishleveragingnewlygatheredpubliclyavailablewebsitesmethodologyinvolvedseveralstagesFirstemployedpre-processensuringenhancedSubsequentlystructuredanalysedsixrobustMLclassifiersoneDLmodeltwoappliedassessperformanceacrossthreedifferenttraining-testingsplitsensurereliabilityresultscross-validationperformedk-foldleave-one-outmethodsAmongtestedLogisticRegressiondemonstratedhighestachieving90 %Bag-of-Wordstechniquestandsexceptionalspeedefficiencymaintaininglowtestingtimeperstatement097 μsfindingssuggestcombinedeffectiveaccuratelyclassifyingthusprovidingvaluabletoolcombatingassistingmakinginformeddecisionsAnalysisMisinformationMythText

Similar Articles

Cited By