Learning Support Correlation Filters for Visual Tracking

Wangmeng Zuo      Xiaohe Wu      Liang Lin      Lei Zhang      Ming-Hsuan Yang

Abstract


Sampling and budgeting are two essential factors in tracking algorithms based on support vector machines (SVMs) for tradeoff between accuracy and efficiency, while the circulant matrix formed by dense sampling of translated images can be utilized in correlation filters for fast tracking. Although dense sampling can also be adopted in SVM learning, how to exploit this circulant property to accelerate SVM-based tracking remains unsolved. In this paper, we derive an equivalent formulation of a SVM model with circulant matrix expression and present an efficient alternating optimization method. We incorporate discrete Fourier transform into the alternating optimization process, converting the SVM classifier learning into an iterative learning of support correlation filters (SCFs) which finds the global optimal solution with real-time performance. For a given circulant data matrix with n^2 samples of size n × n, the computational complexity of the proposed algorithm is O(n^2*logn), while that of the standard SVM solver is at least O(n^4). In addition, we further develop the multi-channel SCF (MSCF), kernelized SCF (KSCF) and multi-scale KSCF (Scale-KSCF) to improve the performance of SCF for visual tracking. Experimental results on a large benchmark dataset show that our KSCF and Scale-KSCF perform favorably against the state-of-the-art tracking algorithms.

Overview of our approach


Figure 1 - Illustration of the proposed SCF learning algorithm at the t-th frame. The proposed algorithm iterates between updating e and updating SVM classifier {w,b} until convergence. In each iteration, only one DFT and one IDFT is required, which makes the proposed algorithm efficient. The black blocks in e are those values with zeros which denote support vectors, and thus our algorithm can adaptively find and exploit difficult samples (i.e., support vectors) to learn support correlation filters.

Figure 2 - Differences between the proposed SCF model and existing CF approaches. (a) Existing CF-based models are designed to learn correlation filters that make the actual output being close to the predefined confidence maps. (b) The SCF model aims to learn a support correlation filter together with the bias b for distinguishing a target object from the background based on the max margin principle. The peak value in the right response map of (b) locates the target object well.

 

Experimental Results



Datasets

To assess the quality of the proposed methods, experiments are performed on the benchmark dataset, which contains 50 challenging sequences annotated with 11 attributes that often affect tracking performance. For the first frame of each sequence, the bounding box of the target object is provided for fair comparisons.

Evaluated Tracking Methods

For comprehensive comparisons, we evaluate the baseline SCF, multi-channel SCF, kernelized SCF and Scale-KSCF methods. The SCF and MSCF methods are designed in the linear space with raw pixels and multi-channel features based on HOG [12] and color names (CN) [14], respectively. The KSCF and Scale-KSCF algorithms are evaluated by using the Gaussian kernel on multi-channel feature representation. Furthermore, we compare the proposed trackers with the othertrackers based on correlation features (e.g., MOSSE [9], CSK [26], KCF [27], DCF [27], STC [56] and CN [14]), existing SVM based trackers (e.g., Struck [24] and MEEM [55]), and the other state-of-the-art methods (e.g., TGPR [17], SCM [58], TLD [30], L1APG [6], MIL [3], ASLA [29] and CT [57]).

OPE Results on 11 Attributes

Figure 3 - Precision plots and success plots of videos with different attributes.

Video Tracking Results


We show tracking results of 50 challenging videos.