Wangmeng Zuo Xiaohe Wu Liang Lin Lei Zhang Ming-Hsuan Yang
Sampling and budgeting are two essential factors in tracking algorithms based on support vector machines (SVMs) for tradeoff between accuracy and efficiency, while the circulant matrix formed by dense sampling of translated images can be utilized in correlation filters for fast tracking. Although dense sampling can also be adopted in SVM learning, how to exploit this circulant property to accelerate SVM-based tracking remains unsolved. In this paper, we derive an equivalent formulation of a SVM model with circulant matrix expression and present an efficient alternating optimization method. We incorporate discrete Fourier transform into the alternating optimization process, converting the SVM classifier learning into an iterative learning of support correlation filters (SCFs) which finds the global optimal solution with real-time performance. For a given circulant data matrix with n^2 samples of size n × n, the computational complexity of the proposed algorithm is O(n^2*logn), while that of the standard SVM solver is at least O(n^4). In addition, we further develop the multi-channel SCF (MSCF), kernelized SCF (KSCF) and multi-scale KSCF (Scale-KSCF) to improve the performance of SCF for visual tracking. Experimental results on a large benchmark dataset show that our KSCF and Scale-KSCF perform favorably against the state-of-the-art tracking algorithms.
|
Figure 1 - Illustration of the proposed SCF learning algorithm at the t-th frame. The proposed algorithm iterates between updating e and updating SVM classifier {w,b} until convergence. In each iteration, only one DFT and one IDFT is required, which makes the proposed algorithm efficient. The black blocks in e are those values with zeros which denote support vectors, and thus our algorithm can adaptively find and exploit difficult samples (i.e., support vectors) to learn support correlation filters. |
|
Figure 2 - Differences between the proposed SCF model and existing CF approaches. (a) Existing CF-based models are designed to learn correlation filters that make the actual output being close to the predefined confidence maps. (b) The SCF model aims to learn a support correlation filter together with the bias b for distinguishing a target object from the background based on the max margin principle. The peak value in the right response map of (b) locates the target object well. |
Datasets
To assess the quality of the proposed methods, experiments are performed on the benchmark dataset, which contains 50 challenging sequences annotated with 11 attributes that often affect tracking performance. For the first frame of each sequence, the bounding box of the target object is provided for fair comparisons.
Figure 3 - Precision plots and success plots of videos with different attributes.
We show tracking results of 50 challenging videos.