Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Early design review: Document Picture-in-Picture #798

Closed
1 task done
steimelchrome opened this issue Dec 9, 2022 · 31 comments
Closed
1 task done

Early design review: Document Picture-in-Picture #798

steimelchrome opened this issue Dec 9, 2022 · 31 comments
Assignees
Labels
Progress: review complete Resolution: unsatisfied The TAG does not feel the design meets required quality standards Review type: CG early review An early review of general direction from a Community Group Venue: Media WG Venue: WICG

Comments

@steimelchrome
Copy link

Wotcher TAG!

I'm requesting a TAG review of Document Picture-in-Picture.

There currently exists a Web API for putting an HTMLVideoElement into a Picture-in-Picture window (HTMLVideoElement.requestPictureInPicture()). This limits a website's ability to provide a custom picture-in-picture experience (PiP). We want to expand upon that functionality by giving websites the ability to open a picture-in-picture (i.e., always-on-top) window with a blank document that can be populated with arbitrary HTMLElements instead of only a single HTMLVideoElement.

Further details:

  • I have reviewed the TAG's Web Platform Design Principles
  • The group where the incubation/design work on this is being done (or is intended to be done in the future): WICG
  • The group where standardization of this work is intended to be done ("unknown" if not known): unknown
  • Major unresolved issues with or opposition to this design:
    • See github issues list for known problems. One notable issue is that we're still trying to figure out how to best design and specify how CSS copying works for this feature
  • This work is being funded by: Google

We'd prefer the TAG provide feedback as (please delete all but the desired option):

🐛 open issues in our GitHub repo for each point of feedback

@steimelchrome steimelchrome added Progress: untriaged Review type: CG early review An early review of general direction from a Community Group labels Dec 9, 2022
@torgo torgo assigned torgo and rhiaro Jan 4, 2023
@torgo torgo added this to the 2023-01-09-week milestone Jan 4, 2023
@slightlyoff
Copy link
Member

This is an exciting proposal!

Some questions:

  • the lifetime of the PIP window isn't exactly clear from the explainer. Consider, for instance, wanting to create an MPA-style media application that uses Viewport Transitions, but continually plays media in a PIP window. Will transitioning from the original host window to the next (same origin) page in the main frame kill PIP playback in this model?
  • if not, how does the destination window receive or re-create a handle to the PIP window? Is that what the global documentPictureInPicture is for?
  • How are multiple PIP windows handled in a desktop scenario where multiple windows request PIP?
  • is documentPictureInPicture the actual entrypoint? It seems a strange location for the API A more natural location might be on navigator or as an extension to clients.openWindow()

@steimelchrome
Copy link
Author

  • Yes, in this model if the main frame is navigated (even to the same origin) the PiP window is killed (similar to the existing requestPictureInPicture() API for HTMLVideoElement)
  • N/A
  • That is left up to the user agent (similiar to the existing requestPictureInPicture() API for HTMLVideoElement). In Chrome, we will only allow one PiP window to exist at a time. I know other browsers do something different for HTMLVideoElement and therefore may do something different for this as well.
  • Yes. I originally had it on navigator, but was told that it wasn't a good place for the API. I don't have strong feelings about API name or placement personally

@arnaudbud
Copy link

Dialpad would benefit if this feature would be available in the browser. Is there a demo available?
This is our use case, Dialpad Anywhere:
https://help.dialpad.com/hc/en-us/articles/360000407666-Dialpad-Everywhere#access-call-controls
I support this proposal.

@tomayac
Copy link

tomayac commented Feb 2, 2023

(I have PR'ed the Document Picture in Picture API into this Pomodoro Timer app.)

@torgo
Copy link
Member

torgo commented Mar 7, 2023

@torgo
Copy link
Member

torgo commented Mar 7, 2023

  • Yes, in this model if the main frame is navigated (even to the same origin) the PiP window is killed (similar to the existing requestPictureInPicture() API for HTMLVideoElement)
  • N/A
  • That is left up to the user agent (similiar to the existing requestPictureInPicture() API for HTMLVideoElement). In Chrome, we will only allow one PiP window to exist at a time. I know other browsers do something different for HTMLVideoElement and therefore may do something different for this as well.
  • Yes. I originally had it on navigator, but was told that it wasn't a good place for the API. I don't have strong feelings about API name or placement personally

@steimelchrome have you updated the explainer to clarify these issues? (And by the way, thanks @slightlyoff!)

From our review in today's TAG breakout, this looks like a generally useful feature.

A few other questions:

  • what is the planned route for standardizaion for this? Right now it just lists WICG.
  • we noted that while the explainer is well written, it doesn't start with user needs as we've been encouraging. Can you add some material documenting the use cases from a user's perspective?
  • is there any relationship with Popover The Popover API (previously Popup) #743 - considering these are both to do with layering of content
  • we discussed a potential issue around accessibility... for example if there are subtitles in the PiP "window" ensuring those can be picked up by assistive technology appropriately. What other accessibility considerations have you discussed?
  • we're slightly concerned with the proposed mitigation to the spoofing issue - although it's good that this consideration is called out. Can you strengthen this wording maybe with an example?
  • we'd like to encourage you to use normative language in the security and privacy considerations sections, as you develop those further
  • has there been any feedback from other browsers? Have you opened up issues in Mozilla or Webkit standards positions?
  • how would this feature work with multiple screens? Would it be up to implementations to decide which screen the PIP window shows up in? It seems like it would be useful to factor in multiple screens, given that proposals like this have come forward.
  • Is the aspect ratio (width/height vs height/width) following a common pattern? If so, we might want to document this as a design principle.

Also just noting: we're going to bring more CSS expertise to bear on this review so expect some further questions.

Thanks!

@LeaVerou LeaVerou added the Progress: pending external feedback The TAG is waiting on response to comments/questions asked by the TAG during the review label Mar 29, 2023
@steimelchrome
Copy link
Author

@torgo Yep, I've just updated the explainer to clarify those (and the spec should also be clear on them).

  • For standardization, I'm going to bring it to the media working group. I have presented it there before, but haven't discussed bringing it to the wg with Chris yet.
  • Sure I can add user needs. I assume that's different from the "Use Cases" section in that it's from the user's perspective and not the web developers? Do you have a link to an example user needs section?
  • I don't think this proposal really has any overlap with the Popover API
  • The PiP window is a full HTML document that can be focused and picked up by assitive technology like any other browser window, so that shouldn't be an issue. We also ensure that the PiP window is in the tab order (so it can be focused via keyboard) and that the toolbar buttons are keyboard focusable as well. However, these things (especially the toolbar buttons) are Chrome-specific UI, so I'm not sure it makes sense to call them out in the spec/explainer for the Web API, but I don't know what's normally done for that.
  • Added an example to the spoofing section
  • Okay I'll keep that in mind
  • Yes, we have opened issues for standards positions:
  • Right now, we leave placement entirely up to the user agent (screen and location). On Chrome, we just use the screen that the opener window is on
  • It might accidentally be following a common pattern, but I think I just made it that way because I had to pick one or the other.

CSS expertise sounds good. I believe @liberato-at-chromium was talking with some CSS people a while back about it

Thanks!

@torgo
Copy link
Member

torgo commented Apr 20, 2023

I don't think this proposal really has any overlap with the Popover API

I think we meant: In the popover api, you have a window that sits on top that can have arbitrary content.. However I think one difference is that the Popover is only visible in the current browsing context whereas the PiP floating window is visible in other apps, etc... is that correct?

@torgo
Copy link
Member

torgo commented Apr 20, 2023

@steimelchrome there was also a concern raised in the Mozilla Standards Position thread about this being misused by advertisers or other actors that want to interrupt the user experience - that this could become another popup. Can you elaborate on how this concern has been addressed? In the response to this question I'm reading from @liberato-at-chromium "However, I don't believe that Document PiP makes the situation any worse." We're trying to push spec developers to "leave the web better than you found it." See our design principle on this topic. So I think we'd like to understand how Document picture-in-picture makes things better for end users on this front.

On a related note, is a permission request to the user currently necessary in order to invoke document picture-in-picture?

@ylafon
Copy link
Member

ylafon commented Apr 20, 2023

The spoofing section is giving hints and should use stronger wording to avoid, for example, payment website spoofing, or as stated in the document System UI used to gather user passwords.
Having PiP restricted to video was enough to avoid this issue, but opening it up to be any document leads to need to care about security/spoofingin a normative way.

@tomayac
Copy link

tomayac commented Apr 20, 2023

Since you can render HTML content to a video that you can then PiP with the traditional API, I don't think a proper API as proposed here causes new spoofing surface—arguably even less, since the UA per encouragement in the spoofing section renders UI such as a title bar, at least as implemented in Chrome. As an example, here's my custom-built solar system PiP dashboard that I like to keep an eye on:

Screenshot 2023-04-20 at 09 25 20

Compared to a traditional PiP window with no UI:

Screenshot 2023-04-20 at 09 28 22

Update: to be fair, you can't interact with a video PiP window much and with a document PiP window you can, but there's more to come for video.

@liberato-at-chromium
Copy link

Can you elaborate on how this concern has been addressed? In the response to this question I'm reading from @liberato-at-chromium "However, I don't believe that Document PiP makes the situation any worse." We're trying to push spec developers to "leave the web better than you found it." [...] So I think we'd like to understand how Document picture-in-picture makes things better for end users on this front.

Document PiP makes the web better because we've seen a lot of (legitimate, not abusive) demand for always-on-top arbitrary content.

My comments earlier were just trying to say that we aren't introducing a new vector for abuse in the process of providing those improvements, because it's not more abusable that what's already there. I might be able to make the case that it's actually less so. For example, the site can't move or resize a document pip window via scripting. However, I don't think those differences are a reason we'd do any of this.

@matatk matatk added the Progress: pending external feedback The TAG is waiting on response to comments/questions asked by the TAG during the review label Mar 18, 2024
@hober
Copy link
Contributor

hober commented Mar 18, 2024

I'm still concerned that this feature doesn't appear to be widely implementable across platforms, as discussed in the WebKit standards-positions issue on this.

@plinss plinss removed this from the 2024-03-18-week milestone Mar 25, 2024
@torgo torgo added this to the 2024-04-22-week milestone Apr 21, 2024
@torgo
Copy link
Member

torgo commented Apr 22, 2024

Hi @steimelchrome can you feed back on any updates to this proposal? Matthew asked a question above regarding Lea's feedback that looks like it's still pending. Thanks!

@steimelchrome
Copy link
Author

Sorry for the delay.

Hi @steimelchrome, thank you for your recent updates. We are still unclear as to whether options 2 and 3 from Lea's comment have been considered - could you point us to the outcome of any discussions on those?

I don't think we have any written-down outcome/resolution I can point to. I just sent an email to Domenic to discuss further and I'll post a resolution here.

I'm still concerned that this feature doesn't appear to be widely implementable across platforms, as discussed in the WebKit standards-positions issue on this.

For Android (and possibly iOS, but I'm less familiar with iOS), with current system APIs we could implement a non-interactive version of document picture-in-picture (allowing the website to populate it with arbitrary HTML elements, but not actually allow input). There are some potential issues (e.g. a pip window that has an active media session would show media controls, which may or may not be appropriate depending on the use case), but we haven't seen any demand for this so we haven't pursued it. But you're right that arbitrary interactive HTML would not be implementable without new Android/iOS APIs to support it.

Otherwise, I'm not sure what changes we could make to the API to support these use cases on desktop while remaining 100% implementable on mobile. Do you have any ideas?

@plinss plinss removed this from the 2024-04-22-week milestone Apr 29, 2024
@plinss plinss removed the Progress: pending external feedback The TAG is waiting on response to comments/questions asked by the TAG during the review label Apr 29, 2024
@plinss plinss added this to the 2024-04-29-week:e milestone Apr 29, 2024
@steimelchrome
Copy link
Author

We're also proposing allowing user gestures in the document picture-in-picture window to be usable in the opener window and vice versa. This makes it more ergonomic to use user-activation-gated APIs, since often event handlers in the document picture-in-picture window are actually run in the opener's context, so the opener's context needs access to the user gesture. This essentially makes the document picture-in-picture window act the same as a same-origin iframe inside the opener as far as user gesture propagation is concerned.

PR: WICG/document-picture-in-picture#117
Chromestatus: https://chromestatus.com/feature/5185710702460928

@beaufortfrancois
Copy link

For info, Spotify folks are using the Document Picture-in-Picture API for their Miniplayer.
You can learn more about their journey and use cases at https://developer.chrome.com/blog/spotify-picture-in-picture

@LeaVerou
Copy link
Member

Hi folks,

We (@plinss @matatk and I) discussed this again during a breakout today.

Overall, we see why the current window.open() doesn’t work for what this API is trying to do, however it appears that all of these differences are things that would be useful for window.open() as well:

  • An async API to allow gating behind a permissions prompt
  • Feature detection for individual parameters
  • Allowing up to one window per top-level traversible
  • Ability to create "always on top" windows
  • ...

We understand that improving window.open() is a substantial undertaking, however from an architectural point of view, we cannot justify creating a parallel, more narrowly scoped API for the sole reason of avoiding that work. Instead, we encourage people to work on the existing effort to modernize window.open() and ensure it covers these use cases as well.

The video-specific use cases appear to be covered already by video.pictureInPicture() so designing this as a more general API seems appropriate. It is unfortunate that not every existing platform can implement this API, but it is clear that there are use cases that go beyond video, so we think that as long as feature detection is possible and has good ergonomics, this may be worth doing.

@LeaVerou LeaVerou added Progress: propose closing we think it should be closed but are waiting on some feedback or consensus and removed Progress: in progress labels May 13, 2024
@plinss plinss removed this from the 2024-05-27-week milestone Jun 10, 2024
@torgo torgo added this to the 2024-06-17-week:b milestone Jun 16, 2024
@hober
Copy link
Contributor

hober commented Jun 17, 2024

The TAG revisited this issue today, and have decided to close this review as unsatisfied. We would prefer enhancing window.open(), as described in Lea's comment above, as a way to address your use cases more in line with Web platform architecture.

(Personally, I also remain concerned with adding a feature like this to the Web platform without a clear strategy for making it available on entire classes of very popular, Web-capable devices.)

@hober hober added Progress: review complete Resolution: unsatisfied The TAG does not feel the design meets required quality standards and removed Progress: propose closing we think it should be closed but are waiting on some feedback or consensus labels Jun 17, 2024
@hober hober closed this as completed Jun 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Progress: review complete Resolution: unsatisfied The TAG does not feel the design meets required quality standards Review type: CG early review An early review of general direction from a Community Group Venue: Media WG Venue: WICG
Projects
None yet
Development

No branches or pull requests

15 participants