Before we had these modern streaming services like Netflix or Youtube, video streaming heavily depended on protocols like RTP and RTSP. Though they were packetized1 the video was only available in one bitrate2

The above protocols had a downsides:

  1. Single bitrate means you are stuck with the same quality of content irrespective of your connection speed. If you had a slower speed your video could buffer often to download the content.
  2. The protocols were stateful, meaning the server had to maintain a state of where the video was at for each user that was streaming the content which could be resource intensive.

Introducing Adaptive bitrate streaming

Adapative bitrate streaming as the name suggests has the ability to adjust the bitrate of the video based on the network conditions. If your network was slow the player could request the server to switch to a lower bitrate or higher bitrate otherwise. The decision of which bitrate to choose happens on the client as it has the ability to understand the network conditions on the client side.

For this the server either needs to have videos pre-encoded in multiple bitrates and could switch and stream them on the fly or put an encoder in place and encode it in realtime (which is a waste of compute to do it every single time).

ARB algorithms are based on:

  1. Throughput based - Based on the network throughput
  2. Buffer based - based on the client buffer level
  3. Or a hybrid one combining the both

Implementations

Dynamic Adaptive Streaming over HTTP - DASH

  • Also known as MPEG-DASH
  • Only adaptive bit-rate HTTP-based streaming solution that is an international standard.
  • The content is broken down into small segments.
  • DASH is codec-agnostic. It can be used with content encoded with any coding format - H.265, H.264 etc.
  • Media Presentation Description (MPD) is an XML based manifest file used to describe how audio and video are structured and delivered.
  • The root contains one or more Periods
  • Periods: A period is a logical segmentation of media timeline. The usecases of periods include.
    • Ad Insertion
    • Content chaptering/logical units
    • Dynamic content update for live streaming
  • Adaptation Sets: Each period consists of adaptation sets. Adaptation sets group together multiple representations of the same content type:
    • one adaptation set for video
    • one for audio
    • one for subtitles
  • Representation: Representations are the different versions of the same content:
    • Video/audio:
      • Encoded at different bitrates, resolutions, codecs, framerates, languages(audio)
    • Subtitles:
      • Different languages
  • Segments: Segments represent the various segments of the content. This can be described using:
    • SegmentList: A list of all segment URLs
    • SegmentTemplate: URL pattern with variables for generating segment URLs
    • SegmentBase: single-segment representation
    • BaseURL: Base location for segments
MPD
└── Period(s)
    └── AdaptationSet(s)
        └── Representation(s)
            └── Segment Information
<?xml version="1.0" encoding="UTF-8"?>
<MPD xmlns="urn:mpeg:dash:schema:mpd:2011"
     type="static" <!--for livestream this is set to dynamic-->
     mediaPresentationDuration="PT634.566S"
     minBufferTime="PT2.0S"
     profiles="urn:mpeg:dash:profile:isoff-live:2011">
  
  <!-- Single Period for the entire video -->
  <Period id="0" duration="PT634.566S">
    
    <!-- Video Adaptation Set -->
    <AdaptationSet id="0" 
                   contentType="video" 
                   mimeType="video/mp4"
                   codecs="avc1.4d401f"
                   width="1920"
                   height="1080"
                   frameRate="24"
                   segmentAlignment="true"
                   startWithSAP="1">
      
      <!-- High Quality Video (1080p) -->
      <Representation id="video_1080p" 
                      bandwidth="5000000"
                      width="1920"
                      height="1080">
        <BaseURL>video_1080p/</BaseURL>
        <SegmentTemplate timescale="1000"
                        duration="2000"
                        initialization="init.mp4"
                        media="segment_$Number$.m4s"
                        startNumber="1"/>
      </Representation>
      
      <!-- Medium Quality Video (720p) -->
      <Representation id="video_720p" 
                      bandwidth="2500000"
                      width="1280"
                      height="720">
        <BaseURL>video_720p/</BaseURL>
        <SegmentTemplate timescale="1000"
                        duration="2000"
                        initialization="init.mp4"
                        media="segment_$Number$.m4s"
                        startNumber="1"/>
      </Representation>
      
      <!-- Low Quality Video (480p) -->
      <Representation id="video_480p" 
                      bandwidth="1000000"
                      width="854"
                      height="480">
        <BaseURL>video_480p/</BaseURL>
        <SegmentTemplate timescale="1000"
                        duration="2000"
                        initialization="init.mp4"
                        media="segment_$Number$.m4s"
                        startNumber="1"/>
      </Representation>
    </AdaptationSet>
    
    <!-- Audio Adaptation Set -->
    <AdaptationSet id="1" 
                   contentType="audio" 
                   mimeType="audio/mp4"
                   codecs="mp4a.40.2"
                   audioSamplingRate="48000"
                   segmentAlignment="true"
                   startWithSAP="1">
      
      <!-- English Audio -->
      <Representation id="audio_en" 
                      bandwidth="128000"
                      audioSamplingRate="48000">
        <AudioChannelConfiguration 
            schemeIdUri="urn:mpeg:dash:23003:3:audio_channel_configuration:2011"
            value="2"/>
        <BaseURL>audio_en/</BaseURL>
        <SegmentTemplate timescale="1000"
                        duration="2000"
                        initialization="init.mp4"
                        media="segment_$Number$.m4s"
                        startNumber="1"/>
      </Representation>
      
      <!-- Spanish Audio -->
      <Representation id="audio_es" 
                      bandwidth="128000"
                      audioSamplingRate="48000">
        <AudioChannelConfiguration 
            schemeIdUri="urn:mpeg:dash:23003:3:audio_channel_configuration:2011"
            value="2"/>
        <BaseURL>audio_es/</BaseURL>
        <SegmentTemplate timescale="1000"
                        duration="2000"
                        initialization="init.mp4"
                        media="segment_$Number$.m4s"
                        startNumber="1"/>
      </Representation>
    </AdaptationSet>
    
  </Period>
</MPD>

Apple HTTP Live Streaming

  • HLS streams can be identified by the playlist URL format extension of m3u8 or MIME type of application/vnd.apple.mpegurl
  • It uses m3u8 playlist files which act as manifests that tell players which segments to download and in what order.
  • Master Playlist: Contains list of all available qualities
#EXTM3U
#EXT-X-VERSION:3
#EXT-X-STREAM-INF:BANDWIDTH=800000,RESOLUTION=640x360
low/index.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1400000,RESOLUTION=842x480
mid/index.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=2800000,RESOLUTION=1280x720
high/index.m3u8
  • Media Playlist: Contains the actual media segments for a specific quality level.
#EXTM3U
#EXT-X-VERSION:3
#EXT-X-TARGETDURATION:10
#EXT-X-MEDIA-SEQUENCE:0
#EXTINF:9.9,
segment0.ts
#EXTINF:9.9,
segment1.ts
#EXTINF:9.9,
segment2.ts
#EXT-X-ENDLIST
  • For a live stream the file is updated dynamicall and doesn’t contain aEXT-X-ENDLIST

Footnotes

  1. video broken into chunks and delivered

  2. Bitrate