I looked around, but didn't really got a clear view of what works best for object tracking in videos.
I know some methods with binary masks in deep learning for object detection in images. But videos have temporal information, so I think it would be really inefficient to use these methods frame by frame, especially because consecutive frames are almost the same.
EDIT: By detection, I mean bounding box determination.
[link][8 comments]