Title
Cold-Start Active Sampling Via <italic>γ</italic>-Tube
Abstract
Active learning (AL) improves the generalization performance for the current classification hypothesis by querying labels from a pool of unlabeled data. The sampling process is typically assessed by an informative, representative, or diverse evaluation policy. However, the policy, which needs an initial labeled set to start, may degenerate its performance in a <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">cold-start</i> hypothesis. In this article, we first show that typical AL sampling can be equivalently formulated as geometric sampling over minimum enclosing ballsMEB of this article denotes a conceptual geometry over the cluster in generalization analysis. In the SVM community, it is related to hard-margin support vector data description.(MEBs) of clusters. Following the <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\gamma $ </tex-math></inline-formula> -tube structure in geometric clustering, we then divide one MEB covering a cluster into two parts: 1) a <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\gamma $ </tex-math></inline-formula> -tube and 2) a <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\gamma $ </tex-math></inline-formula> -ball. By estimating the error disagreement between sampling in MEB and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\gamma $ </tex-math></inline-formula> -ball, our theoretical insight reveals that <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\gamma $ </tex-math></inline-formula> -tube can effectively measure the disagreement of hypotheses in original space over MEB and sampling space over <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\gamma $ </tex-math></inline-formula> -ball. To tighten our insight, we present generalization analysis, and the results show that sampling in <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\gamma $ </tex-math></inline-formula> -tube can derive higher probability bound to achieve a nearly zero generalization error. With these analyses, we finally apply the informative sampling policy of AL over <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\gamma $ </tex-math></inline-formula> -tube to present a tube AL (TAL) algorithm against the cold-start sampling issue. As a result, the dependency between the querying process and the evaluation policy of active sampling can be alleviated. Experimental results show that by using the <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\gamma $ </tex-math></inline-formula> -tube structure to deal with cold-start sampling, TAL achieves the superior performance than standard AL evaluation baselines by presenting substantial accuracy improvements. Image edge recognition extends our theoretical results.
Year
DOI
Venue
2022
10.1109/TCYB.2021.3069956
IEEE Transactions on Cybernetics
Keywords
DocType
Volume
Algorithms,Cluster Analysis
Journal
52
Issue
ISSN
Citations 
7
2168-2267
0
PageRank 
References 
Authors
0.34
36
3
Name
Order
Citations
PageRank
Xiaofeng Cao1185.68
Ivor W. Tsang25396248.44
Jianliang Xu32743168.17