Custom tag types are supported in HTML5, and they are parsed as inline content. Thus instead of replacing <figure> with <span> in the Parsoid DOM spec for inline figures, we could replace <figure> with <figure-inline> instead. That would allow better semantic matching than abusing <span>s.
We'd still have to protect block-level content in the figure caption, of course. I think we already move the caption into data-mw for inline media, but if we didn't we'd have to use <figcaption-inline> (or some such) since <figcaption> is also a block element.
Note that IE6-8 require a bit of thunk JavaScript to support custom elements (basically, document.createElement('figure-inline'), more details here). But we're deprecating IE8 in January 2016 anyway, and T118517 would add the necessary JS thunk for us where it's needed.