3D Human-Object Interaction in Video

A New Approach to Object Tracking via Cross-Modal Attention

Qualitative results of H2O-CA on BEHAVE. Simplified GT mesh in green, simplified prediction mesh in red. GT human mesh in blue.

Date03_Sub03_monitor_move

Camera1

Date03_Sub04_boxtiny

Camera0

Date03_Sub04_tablesmall_lift

Camera3

Date03_Sub05_boxlarge

Camera1

Date03_Sub04_monitor_move

Camera3