V-HICO Dataset: Videos of Humans Interacting with Common Objects


About
V-HICO is a dataset for human-object interaction in videos. There are 6,594 videos, including 5,297 training videos, 635 validation videos, 608 test videos, and 54 unseen test videos, of human-object interaction. To test the performance of models on common human-object interaction classes and generalization to new human-object interaction classes, we provide two test splits, the first one has the same human-object interaction classes in the training split while the second one consists of unseen novel classes.

V-HICO consists of 244 object classes and 99 action classes. There are 756 action-object pairwise classes in total. The unseen test dataset contains 51 object classes and 32 action classes with 52 action-object pairwise classes. All videos are labeled with text annotations of the human action and the associated object. The test and unseen dataset contain the annotations of both human and object bounding boxes.

Paper

Weakly Supervised Human and Object Detection via Spatiotemporal Interactions

Shuang Li, Yilun Du, Antonio Torralba, Josef Sivic, Bryan Russell
Paper    Project    Dataset    PyTorch Code
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021. (In submission)


Team

Shuang Li

MIT

Yilun Du

MIT

Antonio Torralba

MIT

Josef Sivic

INRIA

Bryan Russell

Adobe

MIT INRIA Adobe


Contact
Reach out to lishuang@mit.edu for questions, suggestions, and feedback.