TY - JOUR
T1 - TopicFM+
T2 - Boosting Accuracy and Efficiency of Topic-Assisted Feature Matching
AU - Giang, Khang Truong
AU - Song, Soohwan
AU - Jo, Sungho
N1 - Publisher Copyright:
© 1992-2012 IEEE.
PY - 2024
Y1 - 2024
N2 - This study tackles image matching in difficult scenarios, such as scenes with significant variations or limited texture, with a strong emphasis on computational efficiency. Previous studies have attempted to address this challenge by encoding global scene contexts using Transformers. However, these approaches have high computational costs and may not capture sufficient high-level contextual information, such as spatial structures or semantic shapes. To overcome these limitations, we propose a novel image-matching method that leverages a topic-modeling strategy to capture high-level contexts in images. Our method represents each image as a multinomial distribution over topics, where each topic represents semantic structures. By incorporating these topics, we can effectively capture comprehensive context information and obtain discriminative and high-quality features. Notably, our coarse-level matching network enhances efficiency by employing attention layers only to fixed-sized topics and small-sized features. Finally, we design a dynamic feature refinement network for precise results at a finer matching stage. Through extensive experiments, we have demonstrated the superiority of our method in challenging scenarios. Specifically, our method ranks in the top 9% in the Image Matching Challenge 2023 without using ensemble techniques. Additionally, we achieve an approximately 50% reduction in computational costs compared to other Transformer-based methods. Code is available at https://github.com/TruongKhang/TopicFM.
AB - This study tackles image matching in difficult scenarios, such as scenes with significant variations or limited texture, with a strong emphasis on computational efficiency. Previous studies have attempted to address this challenge by encoding global scene contexts using Transformers. However, these approaches have high computational costs and may not capture sufficient high-level contextual information, such as spatial structures or semantic shapes. To overcome these limitations, we propose a novel image-matching method that leverages a topic-modeling strategy to capture high-level contexts in images. Our method represents each image as a multinomial distribution over topics, where each topic represents semantic structures. By incorporating these topics, we can effectively capture comprehensive context information and obtain discriminative and high-quality features. Notably, our coarse-level matching network enhances efficiency by employing attention layers only to fixed-sized topics and small-sized features. Finally, we design a dynamic feature refinement network for precise results at a finer matching stage. Through extensive experiments, we have demonstrated the superiority of our method in challenging scenarios. Specifically, our method ranks in the top 9% in the Image Matching Challenge 2023 without using ensemble techniques. Additionally, we achieve an approximately 50% reduction in computational costs compared to other Transformer-based methods. Code is available at https://github.com/TruongKhang/TopicFM.
KW - Image-matching
KW - topic modeling
KW - transformers
UR - http://www.scopus.com/inward/record.url?scp=85207329607&partnerID=8YFLogxK
U2 - 10.1109/TIP.2024.3473301
DO - 10.1109/TIP.2024.3473301
M3 - Article
C2 - 39418144
AN - SCOPUS:85207329607
SN - 1057-7149
VL - 33
SP - 6016
EP - 6028
JO - IEEE Transactions on Image Processing
JF - IEEE Transactions on Image Processing
ER -