Determination of the Geographical Origin of Coffee Beans Using Terahertz Spectroscopy Combined With Machine Learning Methods.

Si Yang, Chenxi Li, Yang Mei, Wen Liu, Rong Liu, Wenliang Chen, Donghai Han, Kexin Xu
Author Information
  1. Si Yang: State Key Laboratory of Precision Measuring Technology and Instruments, Tianjin University, Tianjin, China.
  2. Chenxi Li: State Key Laboratory of Precision Measuring Technology and Instruments, Tianjin University, Tianjin, China.
  3. Yang Mei: State Key Laboratory of Precision Measuring Technology and Instruments, Tianjin University, Tianjin, China.
  4. Wen Liu: School of Chemical Engineering, Xiangtan University, Xiangtan, China.
  5. Rong Liu: State Key Laboratory of Precision Measuring Technology and Instruments, Tianjin University, Tianjin, China.
  6. Wenliang Chen: State Key Laboratory of Precision Measuring Technology and Instruments, Tianjin University, Tianjin, China.
  7. Donghai Han: College of Food Science and Nutritional Engineering, China Agricultural University, Beijing, China.
  8. Kexin Xu: State Key Laboratory of Precision Measuring Technology and Instruments, Tianjin University, Tianjin, China.

Abstract

Different geographical origins can lead to great variance in coffee quality, taste, and commercial value. Hence, controlling the authenticity of the origin of coffee beans is of great importance for producers and consumers worldwide. In this study, terahertz (THz) spectroscopy, combined with machine learning methods, was investigated as a fast and non-destructive method to classify the geographic origin of coffee beans, comparing it with the popular machine learning methods, including convolutional neural network (CNN), linear discriminant analysis (LDA), and support vector machine (SVM) to obtain the best model. The curse of dimensionality will cause some classification methods which are struggling to train effective models. Thus, principal component analysis (PCA) and genetic algorithm (GA) were applied for LDA and SVM to create a smaller set of features. The first nine principal components (PCs) with an accumulative contribution rate of 99.9% extracted by PCA and 21 variables selected by GA were the inputs of LDA and SVM models. The results demonstrate that the excellent classification (accuracy was 90% in a prediction set) could be achieved using a CNN method. The results also indicate variable selecting as an important step to create an accurate and robust discrimination model. The performances of LDA and SVM algorithms could be improved with spectral features extracted by PCA and GA. The GA-SVM has achieved 75% accuracy in a prediction set, while the SVM and PCA-SVM have achieved 50 and 65% accuracy, respectively. These results demonstrate that THz spectroscopy, together with machine learning methods, is an effective and satisfactory approach for classifying geographical origins of coffee beans, suggesting the techniques to tap the potential application of deep learning in the authenticity of agricultural products while expanding the application of THz spectroscopy.

Keywords

References

  1. Food Chem. 2016 Nov 15;211:494-501 [PMID: 27283659]
  2. J Physiol. 1968 Mar;195(1):215-43 [PMID: 4966457]
  3. J Agric Food Chem. 2005 Jun 15;53(12):4654-9 [PMID: 15941296]
  4. Food Sci Biotechnol. 2017 Sep 8;26(5):1245-1254 [PMID: 30263658]
  5. Materials (Basel). 2019 Sep 06;12(18): [PMID: 31489927]
  6. J Agric Food Chem. 2014 Jun 18;62(24):5403-7 [PMID: 24884284]
  7. Talanta. 2016 Apr 1;150:367-74 [PMID: 26838420]
  8. J Anal Methods Chem. 2020 Feb 12;2020:9652470 [PMID: 32104610]
  9. Food Chem. 2018 Jun 15;251:86-92 [PMID: 29426428]
  10. Food Chem. 2016 Nov 1;210:415-21 [PMID: 27211665]
  11. Anal Chim Acta. 2008 Sep 5;625(1):95-102 [PMID: 18721545]
  12. Chem Senses. 2001 Jun;26(5):533-45 [PMID: 11418500]
  13. Food Chem. 2019 Aug 30;290:295-307 [PMID: 31000050]
  14. J Agric Food Chem. 2009 May 27;57(10):4224-35 [PMID: 19298065]
  15. J Agric Food Chem. 2001 Nov;49(11):5437-44 [PMID: 11714340]
  16. J Sci Food Agric. 2019 Jun;99(8):3811-3823 [PMID: 30671959]
  17. Magn Reson Med. 2018 Sep;80(3):851-863 [PMID: 29388313]
  18. Spectrochim Acta A Mol Biomol Spectrosc. 2018 May 5;196:123-130 [PMID: 29444494]
  19. Nature. 2015 May 28;521(7553):436-44 [PMID: 26017442]
  20. J Agric Food Chem. 2002 Mar 27;50(7):2068-75 [PMID: 11902958]
  21. Sci Rep. 2015 Jul 08;5:11115 [PMID: 26154950]
  22. Anal Chim Acta. 2010 May 14;667(1-2):14-32 [PMID: 20441862]
  23. Food Chem. 2020 Aug 1;320:126602 [PMID: 32222657]
  24. Spectrochim Acta A Mol Biomol Spectrosc. 2021 Feb 5;246:118973 [PMID: 33017793]
  25. Anal Chem. 2013 Feb 19;85(4):1980-4 [PMID: 23351123]
  26. Food Chem. 2012 Dec 1;135(3):1828-35 [PMID: 22953929]
  27. J Opt Soc Am A Opt Image Sci Vis. 2001 Jul;18(7):1562-71 [PMID: 11444549]
  28. J Food Drug Anal. 2014 Mar;22(1):29-48 [PMID: 24673902]
  29. J Agric Food Chem. 2008 Mar 26;56(6):2273-80 [PMID: 18303823]
  30. Nat Mater. 2002 Sep;1(1):26-33 [PMID: 12618844]
  31. Analyst. 2019 Feb 25;144(5):1789-1798 [PMID: 30672931]
  32. Talanta. 2014 Oct;128:393-400 [PMID: 25059177]
  33. Foods. 2019 Feb 22;8(2): [PMID: 30813296]
  34. Food Chem. 2015 Aug 15;181:152-9 [PMID: 25794734]
  35. Talanta. 2007 Jan 15;71(1):221-9 [PMID: 19071292]

Word Cloud

Created with Highcharts 10.0.0coffeemachinelearningSVMbeansTHzspectroscopymethodsLDAgeographicaloriginclassificationPCAGAsetresultsaccuracyachievedoriginsgreatauthenticitymethodCNNanalysismodeleffectivemodelsprincipalcreatefeaturesextracteddemonstratepredictionapplicationDifferentcanleadvariancequalitytastecommercialvalueHencecontrollingimportanceproducersconsumersworldwidestudyterahertzcombinedinvestigatedfastnon-destructiveclassifygeographiccomparingpopularincludingconvolutionalneuralnetworklineardiscriminantsupportvectorobtainbestcursedimensionalitywillcausestrugglingtrainThuscomponentgeneticalgorithmappliedsmallerfirstninecomponentsPCsaccumulativecontributionrate999%21variablesselectedinputsexcellent90%usingalsoindicatevariableselectingimportantstepaccuraterobustdiscriminationperformancesalgorithmsimprovedspectralGA-SVM75%PCA-SVM5065%respectivelytogethersatisfactoryapproachclassifyingsuggestingtechniquestappotentialdeepagriculturalproductsexpandingDeterminationGeographicalOriginCoffeeBeansUsingTerahertzSpectroscopyCombinedMachineLearningMethods

Similar Articles

Cited By