Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebRTC integration #60

Open
sergiomeneses opened this issue Jun 18, 2022 · 12 comments
Open

WebRTC integration #60

sergiomeneses opened this issue Jun 18, 2022 · 12 comments

Comments

@sergiomeneses
Copy link

sergiomeneses commented Jun 18, 2022

Hey @scottlamb , im here again after some time, do you remember me? (from discussions) ;).

I have some time now to start with my PoC (e.g: rtp to webrtc)

Im using retina to get the rtsp stream (without demuxed) and pasing it through a UDP socket to my webrtc server (example), this partially works (once start the stream suddenly stops without a clear reason, apparently retina stops sending packets).

I am using as a RTSP source stream the rtsp.stream service this.

the retina code is something like this:

#[tokio::main]
async fn main() -> Result<()> {
    let stop = signal::ctrl_c();
    let settings = Settings::new()?;

    let mut session = Session::describe(
        settings.get_src_url(),
        SessionOptions::default().creds(settings.get_credrentials()),
    )
    .await?;

    let onvif_stream_i = session.streams().iter().position(|stream| {
        matches!(
            stream.parameters(),
            Some(retina::codec::ParametersRef::Video(..))
        )
    });

    if let Some(stream_i) = onvif_stream_i {
        session.setup(stream_i, SetupOptions::default()).await?;

        let session = session.play(PlayOptions::default()).await?;

        tokio::pin!(session);
        tokio::pin!(stop);

        let listener = UdpSocket::bind("127.0.0.1:5005").await?;
        listener.connect("127.0.0.1:5004").await?;

        loop {
            tokio::select! {
                item = session.next() => {
                    if let Some(Ok(packet)) = item {
                        match packet {
                            PacketItem::Rtp(recived_packet) => {
                                info!("sending stream_id: {}, to: {}", recived_packet.stream_id(), "127.0.0.1:5004");
                                listener.send(recived_packet.raw()).await?;
                            },
                            _ => continue,
                        }

                    }
                },
                _ = &mut stop => {
                    break;
                },
            }
        }
    }

    Ok(())
}

I want to get some guideness about if is this the correct path to go, i was reading #58 and he wants to pass the frames using gstreamer rtph264depay, you answers him about using the demuxed() of retina to do that.

In my case using demuxed() on session doesn't works, i keep improving this PoC to maybe one day add it to the examples if you think is valuable.

Thanks for you work on this awesome crate.

@scottlamb
Copy link
Owner

scottlamb commented Jun 19, 2022

Hey @scottlamb , im here again after some time, do you remember me? (from discussions) ;).

Welcome back!

Im using retina to get the rtsp stream (without demuxed) and pasing it through a UDP socket to my webrtc server (example), this partially works (once start the stream suddenly stops without a clear reason, apparently retina stops sending packets).

Best guess: item is Err. Your code ignores such an item (not taking the if let Some(Ok(packet)) = item {) and Retina won't produce any more items, so it behaves as you describe. I bet if you print the error, you'll learn more. It might also help to enable logging via e.g. env_logger::init() near the beginning of main() having RUST_LOG=debug in the environment.

I want to get some guideness about if is this the correct path to go, i was reading #58 and he wants to pass the frames using gstreamer rtph264depay, you answers him about using the demuxed() of retina to do that.

Conceptually we have RTP packets coming in (from the RTSP server) and want RTP packets going out (to the WebRTC client). I think there are a few valid approaches:

  1. pass them unmodified. This is the simplest thing. The caveat is that the two may not agree on an ideal (or even acceptable) RTP packet size. E.g. the RTSP server may have giant packets (using TCP or jumbo/fragmented UDP frames) and the WebRTC client may not accept them (not sure if WebRTC allows fragmentation, and jumbo frames are generally not possible across the Internet).
  2. demux/depayload into whole frames, and remux/repayload those into packets. Retina does the former through demuxed. It has some internal test code for the latter which isn't exposed at the moment. The webrtc crate probably has something too. The downside here is a bit of latency: you receive a whole frame before you start sending any of it.
  3. a more tailored/sophisticated proxy re-packetizer that grabs bits of the frames as they come in, and flushes when it reaches the desired packet size or frame boundaries, so you can choose the output packet size independently of the input packet size without introducing extra latency.

You're effectively doing 1 now. As long as that's working, great. If you run into problems, we can move on to the other approaches. I'd like to eventually offer 3 as part of retina.

In my case using demuxed() on session doesn't works, i keep improving this PoC to maybe one day add it to the examples if you think is valuable.

Yeah, I think it'd be great to have examples of how to link Retina with other code: ffmpeg, gstreamer, webrtc, etc. I think ideally we'd skip the retransmit/receive over UDP stage and glue Retina and webrtc together in the same process.

@sergiomeneses
Copy link
Author

Hi

Best guess: item is Err. Your code ignores such an item (not taking the if let Some(Ok(packet)) = item {) and Retina won't produce any more items, so it behaves as you describe. I bet if you print the error, you'll learn more. It might also help to enable logging via e.g. env_logger::init() near the beginning of main() having RUST_LOG=debug in the environment.

You are right, the item is None, so i dont know why is that, im using tracing and color_eyre to get the logs, the last event before the None item is: TRACE tokio_util::codec::framed_impl: attempting to decode a frame, the whole event log:

2022-06-20T00:35:42.735792Z TRACE mio::poll: registering event source with poller: token=Token(1), interests=READABLE | WRITABLE    
2022-06-20T00:35:43.092624Z DEBUG retina::codec::h264: sps: SeqParameterSet {
    profile_idc: ProfileIdc(
        66,
    ),
    constraint_flags: ConstraintFlags {
        flag0: true,
        flag1: true,
        flag2: false,
        flag3: false,
        flag4: false,
        flag5: false,
        reserved_zero_two_bits: 0,
    },
    level_idc: 21,
    seq_parameter_set_id: ParamSetId(
        0,
    ),
    chroma_info: ChromaInfo {
        chroma_format: YUV420,
        separate_colour_plane_flag: false,
        bit_depth_luma_minus8: 0,
        bit_depth_chroma_minus8: 0,
        qpprime_y_zero_transform_bypass_flag: false,
        scaling_matrix: SeqScalingMatrix,
    },
    log2_max_frame_num_minus4: 0,
    pic_order_cnt: TypeTwo,
    max_num_ref_frames: 3,
    gaps_in_frame_num_value_allowed_flag: false,
    pic_width_in_mbs_minus1: 19,
    pic_height_in_map_units_minus1: 14,
    frame_mbs_flags: Frames,
    direct_8x8_inference_flag: true,
    frame_cropping: None,
    vui_parameters: Some(
        VuiParameters {
            aspect_ratio_info: Some(
                Ratio1_1,
            ),
            overscan_appropriate: Unspecified,
            video_signal_type: Some(
                VideoSignalType {
                    video_format: Unspecified,
                    video_full_range_flag: false,
                    colour_description: Some(
                        ColourDescription {
                            colour_primaries: 6,
                            transfer_characteristics: 1,
                            matrix_coefficients: 5,
                        },
                    ),
                },
            ),
            chroma_loc_info: Some(
                ChromaLocInfo {
                    chroma_sample_loc_type_top_field: 1,
                    chroma_sample_loc_type_bottom_field: 1,
                },
            ),
            timing_info: Some(
                TimingInfo {
                    num_units_in_tick: 1,
                    time_scale: 60,
                    fixed_frame_rate_flag: true,
                },
            ),
            nal_hrd_parameters: None,
            vcl_hrd_parameters: None,
            low_delay_hrd_flag: None,
            pic_struct_present_flag: false,
            bitstream_restrictions: Some(
                BitstreamRestrictions {
                    motion_vectors_over_pic_boundaries_flag: true,
                    max_bytes_per_pic_denom: 0,
                    max_bits_per_mb_denom: 0,
                    log2_max_mv_length_horizontal: 10,
                    log2_max_mv_length_vertical: 10,
                    max_num_reorder_frames: 0,
                    max_dec_frame_buffering: 3,
                },
            ),
        },
    ),
}    
2022-06-20T00:35:43.092924Z  INFO retina::codec: no depacketizer for media/encoding_name audio/mp4a-latm    
2022-06-20T00:35:43.272318Z DEBUG retina::client: SETUP response: Response {
    version: V1_0,
    status: Ok,
    reason_phrase: "OK",
    headers: Headers(
        {
            HeaderName(
                "CSeq",
            ): HeaderValue(
                "2",
            ),
            HeaderName(
                "Server",
            ): HeaderValue(
                "gortsplib",
            ),
            HeaderName(
                "Session",
            ): HeaderValue(
                "3155538373",
            ),
            HeaderName(
                "Transport",
            ): HeaderValue(
                "RTP/AVP/TCP;unicast;interleaved=0-1;ssrc=139F913D",
            ),
        },
    ),
    body: b"",
}    
2022-06-20T00:35:43.272513Z TRACE retina::client: PLAY with channel mappings: {
    "0-1": 0,
}    
2022-06-20T00:35:43.446389Z TRACE mio::poll: registering event source with poller: token=Token(2), interests=READABLE | WRITABLE    

Bunch of TRACE tokio_util::codec::framed_impl: attempting to decode a frame and TRACE tokio_util::codec::framed_impl: frame decoded from buffer with a DEBUG retina::client: time for a keepalive a couples events before the item None.

---item is None---

---CTRL-C signal---

2022-06-20T00:36:54.207713Z TRACE mio::poll: deregistering event source from poller    
2022-06-20T00:36:54.207871Z DEBUG retina::client::teardown: TEARDOWN 3155538373 starting for URL rtsp://rtsp.stream/pattern/    
2022-06-20T00:36:54.208360Z TRACE mio::poll: deregistering event source from poller    
2022-06-20T00:36:54.208522Z DEBUG retina::client::teardown: TEARDOWN 3155538373 on existing conn failed: Error reading from RTSP peer: EOF while expecting response to TEARDOWN CSeq 5

conn: 192.168.164.8:60880(me)->23.88.67.97:554@2022-06-19T19:35:42
msg: 6297109@2022-06-19T19:36:54    
2022-06-20T00:36:54.208781Z DEBUG retina::client::teardown: Giving up on TEARDOWN 3155538373; use TearDownPolicy::Always to try harder    
2022-06-20T00:36:54.209066Z TRACE mio::poll: deregistering event source from poller

Conceptually we have RTP packets coming in (from the RTSP server) and want RTP packets going out (to the WebRTC client). I think there are a few valid approaches:

I also think the thrid one is the best/valid solution to transmit the data, i will try to get some information about how to do that one, sounds like a good functionality to develop to get a more solid rust and video stream knowledge.

Yeah, I think it'd be great to have examples of how to link Retina with other code: ffmpeg, gstreamer, webrtc, etc. I think ideally we'd skip the retransmit/receive over UDP stage and glue Retina and webrtc together in the same process.

I was thinking on something like this, but in rust (with the retina getting the data), what you think about it?.

@scottlamb
Copy link
Owner

the item is None

Interesting. I was expecting a Some(Err(_)) first. But iirc Retina will currently go straight to None if the server drops the RTSP connection. (Which is arguably a reasonable thing to do in the case of the default TCP transport. When using UDP transport, this is probably a bug I should fix, as the RTSP connection and the session are supposed to be separate concepts.) Why the server dropped the connection, I don't know. Maybe it has its own debug logs that would explain this, and/or maybe using Wireshark to compare a Retina connection to an e.g. ffmpeg one would help.

I was thinking on something like [RTSPtoWeb], but in rust (with the retina getting the data), what you think about it?.

Yeah, that'd be very nice.

If it's simple, it could be just be a retina example; if it's a full-featured version, it should probably be a separate crate, whether in this repo or otherwise.

@sergiomeneses
Copy link
Author

sergiomeneses commented Jun 21, 2022

Hi.

When using UDP transport, this is probably a bug I should fix, as the RTSP connection and the session are supposed to be separate concepts

When i use UDP transport can't get a single item (e.g: session.setup(stream_i, SetupOptions::default().transport(Transport::Udp(UdpTransportOptions::default()))).await?;)

Why the server dropped the connection, I don't know. Maybe it has its own debug logs that would explain this

I already wrote to dave @rtspstream, i hope he can help us.

maybe using Wireshark to compare a Retina connection to an e.g. ffmpeg one would help.

I don't know how to do this, but i will find the way and posting back.

If it's simple, it could be just be a retina example; if it's a full-featured version, it should probably be a separate crate, whether in this repo or otherwise.

Nice, i will confirm you.

As a note, when i use VLC it works and even with the mp4 example it doesn't.

@sergiomeneses
Copy link
Author

Hey.

Here is the retina wireshark screenshot:

image

And the VLC screenshot:

image

Retina full capture

VLC full capture

I just assuming this but retina is indeed closing the connection just after a bad request (if so, why retina doesn't log this action)?

@nemosupremo
Copy link
Contributor

If you don't mind dropping the audio; I found it was easier to use retina to pop off the raw h264 frames:

let frame = match demuxed.next().await {
    None => break;
    Some(Ok(CodecItem::VideoFrame(frame))) => {
        frame
    }
    Some(Ok(_)) => continue,
    Some(Err(err)) => {
        break;
    }
};

This will give you frames in AVC format, but to use them in webrtc you will want to convert them to Annex B. While this involves a memory copy, you will want to inspect the NALs anyways to throwaway any NALUs that come in that can effect streaming. In my case, an amcrest camera I was using was sending SEI NALUs which caused Safari's WebRTC player to throwup. This is the same approach that RTSPtoWeb uses.

Using webrtc-rs and h264-reader

let mut frame = frame.data.clone();
let mut sps_nal = Bytes::new();
let mut pps_nal = Bytes::new();
while frame.len() > 0 {
  use bytes::Buf;
  use h264_reader::nal::UnitType;
  let sz = frame.get_u32() as usize;
  let nal = frame.split_to(sz);
  let nal_type = match nal.get(0).and_then(|n| UnitType::for_id(n & 0x1F).ok()) {
      Some(n) => n,
      None => continue,
  };
  let data = match nal_type {
      UnitType::SliceLayerWithoutPartitioningIdr => {
          if !idr {
              idr = true;
          }
          let mut v = Vec::with_capacity(
              4   sps_nal.len()   4   pps_nal.len()   4   nal.len(),
          );
          v.extend_from_slice(&[0, 0, 0, 1]);
          v.extend_from_slice(sps_nal.as_ref());
          v.extend_from_slice(&[0, 0, 0, 1]);
          v.extend_from_slice(pps_nal.as_ref());
          v.extend_from_slice(&[0, 0, 0, 1]);
          v.extend_from_slice(nal.as_ref());
          Bytes::from(v)
      }
      UnitType::SliceLayerWithoutPartitioningNonIdr => {
          if !idr {
              continue;
          }
          let mut v = Vec::with_capacity(4   nal.len());
          v.extend_from_slice(&[0, 0, 0, 1]);
          v.extend_from_slice(nal.as_ref());
          Bytes::from(v)
      }
      UnitType::SeqParameterSet => {
          sps_nal = nal.clone();
          continue;
      }
      UnitType::PicParameterSet => {
          pps_nal = nal.clone();
          continue;
      }
      _ => continue,
  };
  
  let data_sz = data.len();
  let err = video_track
      .write_sample(&Sample {
          data,
          duration: frame_duration,
          timestamp: frame_timestamp,
          packet_timestamp: packet_timestamp,
          ..Default::default()
      })
      .await;
}

@scottlamb
Copy link
Owner

@sergiomeneses: Sorry, I initially missed your comment with the Wireshark output, and the full dump links no longer work: "File has been removed."

I just assuming this but retina is indeed closing the connection just after a bad request (if so, why retina doesn't log this action)?

Retina should be following my habit of either returning errors or logging them, not both at once. If it returns, you can log if yourself with whatever extra context you have higher up the stack. Basically, I prefer fewer, richer log messages over more frequent ones that have to be pieced together to get the complete picture.

Anyway, I don't know why the server is returning status 400. If you can post those full captures again, I'd be interested in comparing Retina's failing request with the successful one from VLC to know what went wrong.

@nemosupremo: thanks for your working example! Interesting point about Safari not liking the SEI NALs. I imagine that might have been a pain to debug. fwiw, I'm open to adding to Retina a way to request data in say Annex B format with non-VCL NALs stripped out, so you don't have to do that conversion yourself. This is the kind of thing I was imagining using SetupOptions for.

A couple interesting cases you might encounter:

  • If one frame has multiple slices (I'm told some Axis cameras do this), your code will put the SPS/PPS before each of them. Probably makes more sense to just insert it once at the beginning of the frame if frame.is_random_access_point() is true rather than per-slice.
  • In theory, there can be multiple SPSs and PPS valid at once (they have ids), and looks like your code will break if so. I haven't actually encountered this yet, and Retina's code probably isn't handling it properly either.

@nemosupremo
Copy link
Contributor

nemosupremo commented Jul 12, 2022

I imagine that might have been a pain to debug.

You can probably imagine; as the debugging tools in Safari are delightfully unhelpful. The "fix" was just noticing what RTSPToWeb was doing.

If one frame has multiple slices (I'm told some Axis cameras do this), your code will put the SPS/PPS before each of them. Probably makes more sense to just insert it once at the beginning of the frame if frame.is_random_access_point() is true rather than per-slice.

Ah, I thought is_random_access_point simply meant that the "frame" contained a single Idr slice.

In theory, there can be multiple SPSs and PPS valid at once (they have ids), and looks like your code will break if so. I haven't actually encountered this yet, and Retina's code probably isn't handling it properly either.

I haven't seen it in the wild either; and most other code I see online don't handle them. I am currently working on replacing some parts of gstreamer with retina in our application and the cameras we have tested so far stream fine. I'll have to double check.

@scottlamb
Copy link
Owner

scottlamb commented Jul 12, 2022

the debugging tools in Safari are delightfully unhelpful

Yeah, I went through the same thing a couple times when Safari didn't like Moonfire's .mp4 files. I ended up making my .mp4 files more like MP4Box's almost a byte at a time until I finally found what it was unhappy about...

Ah, I thought is_random_access_point simply meant that the "frame" contained a single Idr slice.

Currently it means it has at least one IDR slice:

UnitType::SliceLayerWithoutPartitioningIdr => is_random_access_point = true,

I think a frame ("access unit" in H.264 terms) is supposed to have either all IDR or all non-IDR NALs (but I could be wrong). There's something called periodic infra refresh, but I think it uses non-IDR NALs with slice type P/B and some type-I macroblocks, then adds in "SEI recovery point" NALs periodically. Not quite sure how that should be handled either. Maybe the SPS/PPS should be (re)sent on the SEI recovery point NAL? Except if Safari needs SEI stripped out, it probably doesn't handle periodic infra refresh well regardless of what the sender does...

scottlamb added a commit that referenced this issue Jul 15, 2022
This is a very basic version, as a starting point.
@scottlamb
Copy link
Owner

I just pushed a commit with a very basic WebRTC proxy example. Improvements welcome!

@sergiomeneses
Copy link
Author

sergiomeneses commented Aug 19, 2022

hey @scottlamb i was a little busy.

Retina should be following my habit of either returning errors or logging them, not both at once. If it returns, you can log if yourself with whatever extra context you have higher up the stack. Basically, I prefer fewer, richer log messages over more frequent ones that have to be pieced together to get the complete picture.

Anyway, I don't know why the server is returning status 400. If you can post those full captures again, I'd be interested in comparing Retina's failing request with the successful one from VLC to know what went wrong.

I think this is fixed on the 0.4.1 release, it was related to the simpleserver implementation and the fixed timeout.

I just pushed a commit with a very basic WebRTC proxy example. Improvements welcome!

i will do it.

Don't know if we can close this as fixed.

@Alibirb
Copy link

Alibirb commented Aug 19, 2023

I just pushed a commit with a very basic WebRTC proxy example. Improvements welcome!

For what it's worth (which might not be much), another example of this would be my rtsp-to-webrtc. It's intended to be part of a larger system I'm creating, but also to be used stand-alone. It can handle multiple streams, and uses a variation on the WISH protocol for signalling between the server and the browser. There's a sample HTML page which can be opened in a browser, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants