Meshed Context-Aware Beam Search for Image Captioning.

Fengzhi Zhao, Zhezhou Yu, Tao Wang, He Zhao
Author Information
  1. Fengzhi Zhao: College of Computer Science and Technology, Jilin University, Changchun 130012, China. ORCID
  2. Zhezhou Yu: College of Computer Science and Technology, Jilin University, Changchun 130012, China.
  3. Tao Wang: College of Computer Science and Technology, Jilin University, Changchun 130012, China.
  4. He Zhao: College of Computer Science and Technology, Jilin University, Changchun 130012, China.

Abstract

Beam search is a commonly used algorithm in image captioning to improve the accuracy and robustness of generated captions by finding the optimal word sequence. However, it mainly focuses on the highest-scoring sequence at each step, often overlooking the broader image context, which can lead to suboptimal results. Additionally, beam search tends to select similar words across sequences, causing repetitive and less diverse output. These limitations suggest that, while effective, beam search can be further improved to better capture the richness and variety needed for high-quality captions. To address these issues, this paper presents meshed context-aware beam search (MCBS). In MCBS for image captioning, the generated caption context is dynamically used to influence the image attention mechanism at each decoding step, ensuring that the model focuses on different regions of the image to produce more coherent and contextually appropriate captions. Furthermore, a penalty coefficient is introduced to discourage the generation of repeated words. Through extensive testing and ablation studies across various models, our results show that MCBS significantly enhances overall model performance.

Keywords

References

  1. Nat Hum Behav. 2024 Mar;8(3):544-561 [PMID: 38172630]

Grants

  1. 20240601039RC/Development Project of Jilin Province of China
  2. U21A20390/National Natural Science Foundation of China
  3. 2023KTSCX186/Guangdong Provincial Department of Education

Word Cloud

Created with Highcharts 10.0.0searchimagebeamcaptionsMCBSBeamusedcaptioninggeneratedsequencefocusesstepcontextcanresultswordsacrossmeshedcontext-awaredecodingmodelcommonlyalgorithmimproveaccuracyrobustnessfindingoptimalwordHowevermainlyhighest-scoringoftenoverlookingbroaderleadsuboptimalAdditionallytendsselectsimilarsequencescausingrepetitivelessdiverseoutputlimitationssuggesteffectiveimprovedbettercapturerichnessvarietyneededhigh-qualityaddressissuespaperpresentscaptiondynamicallyinfluenceattentionmechanismensuringdifferentregionsproducecoherentcontextuallyappropriateFurthermorepenaltycoefficientintroduceddiscouragegenerationrepeatedextensivetestingablationstudiesvariousmodelsshowsignificantlyenhancesoverallperformanceMeshedContext-AwareSearchImageCaptioningstrategy

Similar Articles

Cited By