In a similar way to :
It would include automatic captions and user made captions.
It must be visual,
using the original Video in Commons as the base video to add the captions.
There are lots of videos needing captions:
https://commons.wikimedia.org/wiki/Category:Videos_needing_subtitles
And the process is nor easy:
https://commons.wikimedia.org/wiki/Commons:Timed_Text
'''Nor visual''', as https://subtitle-horse.com/ and others are (but they do not allow add captions to Commons videos, nor automatic transfer to Commons TimedText: space.