MediaViewer captions have no .mw-parser-output class, which means that they only conserve inline styles provided in HTML element and not the TemplateStyles that can be defined on the page.
In the case of some templates, like Template:Legend, that can lead to substandard rendering of the caption. If .mw-parser-output was added, the styles from TemplateStyles declaration would work.