A document-level information extraction pipeline for layered cathode materials for sodium-ion batteries.

Yuxiao Gou, Yiping Zhang, Jian Zhu, Yidan Shu
Author Information
  1. Yuxiao Gou: School of Materials Science and Engineering, Sun Yat-sen University, Guangdong, China.
  2. Yiping Zhang: School of Materials Science and Engineering, Sun Yat-sen University, Guangdong, China.
  3. Jian Zhu: School of Materials Science and Engineering, Sun Yat-sen University, Guangdong, China.
  4. Yidan Shu: School of Materials Science and Engineering, Sun Yat-sen University, Guangdong, China. shuyd@mail.sysu.edu.cn.

Abstract

Natural language processing techniques enable extraction of valuable information from large amounts of published literature for the application of data science and technology, i.e. machine learning in the field of materials science. Nevertheless, the automated extraction of data from full-text documents remains a complex task. We propose a document-level natural language processing pipeline for literature extraction of comprehensive information on layered cathode materials for sodium-ion batteries. The pipeline enhances entity recognition with contextual supplementary information while capturing the article structure. Finally, a heuristic multi-level relationship extraction algorithm is employed in relation extraction to extract experimental parameters and complex performance relationships respectively. We successfully extracted a comprehensive dataset containing 5265 records from 1747 documents, encompassing essential information such as chemical composition, synthesis parameters, and electrochemical properties. By implementing our pipeline, we have made significant progress in overcoming the challenges associated with data scarcity in battery informatics. The extracted datasets provide a valuable resource for further research and development in the field of layered cathode materials.

References

  1. Small. 2019 Dec;15(52):e1905311 [PMID: 31663266]
  2. Small. 2023 Oct;19(43):e2302687 [PMID: 37376874]
  3. Nat Commun. 2015 Apr 17;6:6865 [PMID: 25882619]
  4. Nanoscale Horiz. 2022 Mar 28;7(4):338-351 [PMID: 35060586]
  5. J Am Chem Soc. 2017 Apr 5;139(13):4835-4845 [PMID: 28271898]
  6. Small. 2018 May;14(21):e1704523 [PMID: 29667305]
  7. Pac Symp Biocomput. 2003;:451-62 [PMID: 12603049]
  8. Sci Data. 2019 Oct 15;6(1):203 [PMID: 31615989]
  9. Small. 2023 Sep;19(37):e2302332 [PMID: 37140106]
  10. Sci Bull (Beijing). 2018 Mar 30;63(6):376-384 [PMID: 36658875]
  11. Sci Data. 2020 Aug 6;7(1):260 [PMID: 32764659]
  12. NPJ Comput Mater. 2023;9(1):52 [PMID: 37033291]
  13. J Chem Inf Model. 2022 Dec 26;62(24):6365-6377 [PMID: 35533012]
  14. Chem Commun (Camb). 2015 May 18;51(40):8480-3 [PMID: 25892570]
  15. Small. 2023 May;19(20):e2208005 [PMID: 36807840]
  16. J Am Chem Soc. 2019 Jan 16;141(2):840-848 [PMID: 30562030]
  17. iScience. 2021 Feb 06;24(3):102155 [PMID: 33665573]
  18. J Am Chem Soc. 2023 Jan 11;145(1):224-233 [PMID: 36562472]
  19. Adv Mater. 2015 Nov 18;27(43):6928-33 [PMID: 26436288]
  20. Sci Data. 2022 Jul 13;9(1):401 [PMID: 35831367]
  21. Nat Commun. 2022 Jun 9;13(1):3205 [PMID: 35680909]
  22. Adv Mater. 2022 Aug;34(33):e2202695 [PMID: 35747910]
  23. ACS Omega. 2023 Jul 22;8(30):27170-27178 [PMID: 37546682]
  24. J Am Chem Soc. 2019 Apr 24;141(16):6680-6689 [PMID: 30932488]
  25. J Chem Inf Model. 2021 Sep 27;61(9):4280-4289 [PMID: 34529432]
  26. Sci Data. 2024 Apr 11;11(1):372 [PMID: 38605057]
  27. J Chem Inf Model. 2022 Mar 14;62(5):1207-1213 [PMID: 35199519]
  28. Nanoscale. 2023 Feb 16;15(7):3345-3350 [PMID: 36722741]

Grants

  1. 202201011269/Guangzhou Municipal Science and Technology Project

Word Cloud

Created with Highcharts 10.0.0extractioninformationmaterialspipelinedatalayeredcathodelanguageprocessingvaluableliteraturesciencefielddocumentscomplexdocument-levelcomprehensivesodium-ionbatteriesparametersextractedNaturaltechniquesenablelargeamountspublishedapplicationtechnologyiemachinelearningNeverthelessautomatedfull-textremainstaskproposenaturalenhancesentityrecognitioncontextualsupplementarycapturingarticlestructureFinallyheuristicmulti-levelrelationshipalgorithmemployedrelationextractexperimentalperformancerelationshipsrespectivelysuccessfullydatasetcontaining5265records1747encompassingessentialchemicalcompositionsynthesiselectrochemicalpropertiesimplementingmadesignificantprogressovercomingchallengesassociatedscarcitybatteryinformaticsdatasetsprovideresourceresearchdevelopment

Similar Articles

Cited By