V-HICO Dataset: Videos of Humans Interacting with Common Objects
About
V-HICO is a dataset for human-object interaction in videos. There are 6,594 videos, including 5,297 training videos, 635 validation videos, 608 test videos, and 54 unseen test videos, of human-object interaction. To test the performance of models on common human-object interaction classes and generalization to new human-object interaction classes, we provide two test splits, the first one has the same human-object interaction classes in the training split while the second one consists of unseen novel classes.
V-HICO consists of 244 object classes and 99 action classes. There are 756 action-object pairwise classes in total.
The unseen test dataset contains 51 object classes and 32 action classes with 52 action-object pairwise classes.
All videos are labeled with text annotations of the human action and the associated object.
The test and unseen dataset contain the annotations of both human and object bounding boxes.
Paper
Weakly Supervised Human-Object Interaction Detection in Video via Contrastive Spatiotemporal Regions
Shuang Li, Yilun Du, Antonio Torralba, Josef Sivic, Bryan Russell
Paper Project Dataset PyTorch Code International Conference on Computer Vision (ICCV), 2021.
Team
Shuang Li
MIT
Yilun Du
MIT
Antonio Torralba
MIT
Josef Sivic
CIIRC CTU
Bryan Russell
Adobe
Contact
Reach out to lishuang@mit.edu for questions, suggestions, and feedback.