A structured sentiment analysis dataset based on public comments from various domains.

Zhongliang Wei, Shunxiang Zhang
Author Information
  1. Zhongliang Wei: School of Computer Science and Engineering, Anhui University of Science & Technology, Huainan, China.
  2. Shunxiang Zhang: School of Computer Science and Engineering, Anhui University of Science & Technology, Huainan, China.

Abstract

A structured sentiment analysis dataset, derived from social media comments, is introduced in this paper. The dataset spans 22 diverse domains and comprises over 200,000 reviews, providing a rich resource for sentiment analysis tasks in the Chinese language context. Each comment within the dataset has been manually annotated with a sentiment label, either positive, negative, or neutral, and grouped by topic. This meticulous annotation process ensures the dataset's reliability for training, validating, and testing sentiment analysis models. The construction of the dataset involved a three-step process. Initially, data was collected from the topics that garnered high attention and discussion rates, thereby reflecting the authentic opinions of users. Following data collection, preprocessing was undertaken to remove extraneous elements, while preserving emoticons that are crucial for sentiment analysis. The final step involved manual annotation by researchers, who assigned sentiment labels to each comment based on various factors. The dataset stands as a valuable contribution to the field of natural language processing, particularly for sentiment analysis tasks in the Chinese language context.

Keywords

Word Cloud

Created with Highcharts 10.0.0sentimentanalysisdatasetlanguagestructuredcommentsdomainstasksChinesecontextcommentannotationprocessinvolveddatabasedvariousprocessingclassificationderivedsocialmediaintroducedpaperspans22diversecomprises200000reviewsprovidingrichresourcewithinmanuallyannotatedlabeleitherpositivenegativeneutralgroupedtopicmeticulousensuresdataset'sreliabilitytrainingvalidatingtestingmodelsconstructionthree-stepInitiallycollectedtopicsgarneredhighattentiondiscussionratestherebyreflectingauthenticopinionsusersFollowingcollectionpreprocessingundertakenremoveextraneouselementspreservingemoticonscrucialfinalstepmanualresearchersassignedlabelsfactorsstandsvaluablecontributionfieldnaturalparticularlypublicNaturalSentimentTextminingTriple

Similar Articles

Cited By