Abstract | ||
---|---|---|
There are many realistic applications of activity recognition where the set of potential activity descriptions is combinatorially large. This makes end-to-end supervised training of a recognition system impractical as no training set is practically able to encompass the entire label set. In this paper, we present an approach to fine-grained recognition that models activities as compositions of dynamic action signatures. This compositional approach allows us to reframe fine-grained recognition as zero-shot activity recognition, where a detector is composed "on the fly" from simple first-principles state machines supported by deep-learned components. We evaluate our method on the Olympic Sports and UCF101 datasets, where our model establishes a new state of the art under multiple experimental paradigms. We also extend this method to form a unique framework for zero-shot joint segmentation and classification of activities in video and demonstrate the first results in zero-shot decoding of complex action sequences on a widely-used surgical dataset. Lastly, we show that we can use off-the-shelf object detectors to recognize activities in completely de-novo settings with no additional training. |
Year | Venue | DocType |
---|---|---|
2021 | THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | Conference |
Volume | ISSN | Citations |
35 | 2159-5399 | 0 |
PageRank | References | Authors |
0.34 | 17 | 9 |
Name | Order | Citations | PageRank |
---|---|---|---|
Tae Soo Kim | 1 | 66 | 3.68 |
Jonathan D Jones | 2 | 0 | 0.34 |
Michael Peven | 3 | 5 | 0.78 |
Zihao Xiao | 4 | 27 | 2.05 |
Jin Bai | 5 | 19 | 1.15 |
Yi Zhang | 6 | 47 | 5.12 |
Weichao Qiu | 7 | 54 | 9.02 |
Alan L. Yuille | 8 | 27 | 7.33 |
Gregory D. Hager | 9 | 0 | 0.34 |