首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Attention meets involution in visual tracking
Abstract:In visual tracking, both convolution and attention are widely employed for feature enhancement and fusion. However, convolution does not adequately model global dependencies of samples due to its operation on local neighbors, while attention gives too much attention to global dependencies and too little to local dependencies. It is intrinsically infeasible to combine both methods to integrate global and local information. However, a recently-proposed model called involution uses kernels differing in spatial extent but sharing across channels, making it possible to take advantage of both convolution and attention. We propose an attention-involution (Att-Inv) model that uses an attention mechanism to generate involution kernels to take both global and local dependencies of samples into account. To improve the performance of our tracker, we develop and implement strategies of backbone network modification, template updates, and regression of bounding box distributions. We evaluate our tracker using benchmarks such as GOT10k, LaSOT, TrackingNet and OxUvA. Experimental results show that it is competitive with state-of-the-art trackers.
Keywords:Visual tracking  Involution  Attention
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号