CIC: A Framework for Culturally-Aware Image Captioning

Youngsik Yun, Jihie Kim

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Image Captioning generates descriptive sentences from images using Vision-Language Pre-trained models (VLPs) such as BLIP, which has improved greatly.However, current methods lack the generation of detailed descriptive captions for the cultural elements depicted in the images, such as the traditional clothing worn by people from Asian cultural groups.In this paper, we propose a new framework, Culturally-aware Image Captioning (CIC), that generates captions and describes cultural elements extracted from cultural visual elements in images representing cultures.Inspired by methods combining visual modality and Large Language Models (LLMs) through appropriate prompts, our framework (1) generates questions based on cultural categories from images, (2) extracts cultural visual elements from Visual Question Answering (VQA) using generated questions, and (3) generates culturally-aware captions using LLMs with the prompts.Our human evaluation conducted on 45 participants from 4 different cultural groups with a high understanding of the corresponding culture shows that our proposed framework generates more culturally descriptive captions when compared to the image captioning baseline based on VLPs.Resources can be found at https://shane3606.github.io/cic.

Original languageEnglish
Title of host publicationProceedings of the 33rd International Joint Conference on Artificial Intelligence, IJCAI 2024
EditorsKate Larson
PublisherInternational Joint Conferences on Artificial Intelligence
Pages1625-1633
Number of pages9
ISBN (Electronic)9781956792041
StatePublished - 2024
Event33rd International Joint Conference on Artificial Intelligence, IJCAI 2024 - Jeju, Korea, Republic of
Duration: 3 Aug 20249 Aug 2024

Publication series

NameIJCAI International Joint Conference on Artificial Intelligence
ISSN (Print)1045-0823

Conference

Conference33rd International Joint Conference on Artificial Intelligence, IJCAI 2024
Country/TerritoryKorea, Republic of
CityJeju
Period3/08/249/08/24

Fingerprint

Dive into the research topics of 'CIC: A Framework for Culturally-Aware Image Captioning'. Together they form a unique fingerprint.

Cite this