Abstract | ||
---|---|---|
We attempt to train a classifier to identify food items in images captured during food preparation processes. Food changes appearance significantly during the cooking process, and different foods can be mixed together. Thus, manually annotating individual food items during the preparation process is difficult. To train a classifier without manual annotation, we used multimedia recipes with stepwise pairs of instructional text and an image. Such stepwise pairs can be informative for training; however, most images contain objects that are not referenced in text (i.e., missing labels) and vice versa (inaccurate labels). To reduce such mismatches, we propose a method that identifies missing labels by searching label candidates from upstream processes in the given recipe. Then, inconsistent word-appearance pairs are removed from the sample based on differences in the model fitting speed using a convolutional neural network. We conducted an experiment using carrot as the target food. The classifier trained using the proposed method outperformed the one that used the method with naive implementation. The proposed method achieved an average precision of 91% compared with 83.8% for the naive implementation. |
Year | DOI | Venue |
---|---|---|
2017 | 10.1145/3106668.3106675 | CEA@IJCAI |
Field | DocType | ISBN |
Convolutional neural network,Food recognition,Computer science,Manual annotation,Recipe,Artificial intelligence,Classifier (linguistics),Machine learning | Conference | 978-1-4503-5267-3 |
Citations | PageRank | References |
0 | 0.34 | 23 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Atsushi Hashimoto | 1 | 40 | 13.33 |
Takumi Fujino | 2 | 2 | 0.72 |
Jun Harashima | 3 | 20 | 5.74 |
Masaaki Iiyama | 4 | 17 | 14.23 |
Michihiko Minoh | 5 | 349 | 58.69 |