Embedding Adaptation is Still Needed for Few-Shot Learning - Sébastien M. R. Arnold
\( % Universal Mathematics \newcommand{\paren}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\braces}[1]{\left\{ #1 \right\}} \newcommand{\norm}[1]{\left\lVert#1\right\rVert} \newcommand{\case}[1]{\begin{cases} #1 \end{cases}} \newcommand{\bigO}[1]{\mathcal{O}\left(#1\right)} % Analysis % Linear Algebra \newcommand{\mat}[1]{\begin{pmatrix}#1\end{pmatrix}} \newcommand{\bmat}[1]{\begin{bmatrix}#1\end{bmatrix}} % Probability Theory \DeclareMathOperator*{\V}{\mathop{\mathrm{Var}}} \DeclareMathOperator*{\E}{\mathop{\mathbb{E}}} \newcommand{\Exp}[2][]{\E_{#1}\brackets{#2}} \newcommand{\Var}[2][]{\V_{#1}\brackets{#2}} \newcommand{\Cov}[2][]{\mathop{\mathrm{Cov}}_{#1}\brackets{#2}} % Optimization \newcommand{\minimize}{\operatorname*{minimize}} \newcommand{\maximize}{\operatorname*{maximize}} \DeclareMathOperator*{\argmin}{arg\,min} \DeclareMathOperator*{\argmax}{arg\,max} % Set Theory \newcommand{\C}{\mathbb{C}} \newcommand{\N}{\mathbb{N}} \newcommand{\Q}{\mathbb{Q}} \newcommand{\R}{\mathbb{R}} \newcommand{\Z}{\mathbb{Z}} \)


Constructing new and more challenging tasksets is a fruitful methodology to analyse and understand few-shot classification methods. Unfortunately, existing approaches to building those tasksets are somewhat unsatisfactory: they either assume train and test task distributions to be identical — which leads to overly optimistic evaluations — or take a “worst-case” philosophy — which typically requires additional human labor such as obtaining semantic class relationships. We propose ATG, a principled clustering method to defining train and test tasksets without additional human knowledge. ATG models train and test task distributions while requiring them to share a predefined amount of information. We empirically demonstrate the effectiveness of ATG in generating tasksets that are easier, in-between, or harder than existing benchmarks, including those that rely on semantic information. Finally, we leverage our generated tasksets to shed a new light on few-shot classification: gradient-based methods — previously believed to underperform — can outperform metric-based ones when transfer is most challenging.

Séb Arnold - seb.arnold@usc.edu


Adapting your feature extractor is required when transfer from train to test tasks is most challenging.


ArXiv Preprints