Abstract: Transferring visual language models (VLMs) from the image domain to the video domain has recently yielded great success on human action recognition tasks. However, standard recognition ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results