Low Latency UHD Adaptive Video Bitrate Streaming Based on HEVC Encoder Configurations and Http2 Protocol

Applying 4K, (Ultra HD) Real-time video streaming via the internet network, with low bitrate and low latency, is the challenge this paper addresses. Compression technology and transfer links are the important elements that influence video quality. So, to deliver video over the internet or another fixed capacity medium, it is essential to compress the video to more controllable bitrates (customarily in the 1-20 Mbps range). In this study, the video quality is examined using the H.265/HEVC compression standard, and the relationship between quality of video and bitrate flow is investigated using various constant rate factors, GOP patterns, quantization parameters, RC-lookahead, and other types of video motion sequences. The ultra-high-definition video source is used, down sampled and encoded at multiple resolutions of (3480x2160), (1920x1080), (1280x720), (704x576), (352x288), and (176x144). To determine the best H265 feature configuration for each resolution experiments were conducted that resulted in a PSNR of 36 dB at the specified bitrate. The resolution is selected by delivery (encoder resource) based on the end-user application. While video streaming adapted to the available bandwidth is achieved via embedding a controller with MPEG DASH protocol at the client-side. Video streaming Adaptation methods allow the delivery of content that is encoded at different representations of video quality and bitrate and then dividing each representation into chunks of time. Through this paper, we propose to utilize HTTP/2 as a protocol to achieve low latency video streaming focusing on live streaming video avoiding the problem of HTTP/1.


Introduction
In recent years, the production of digital video has advanced quickly. Video streaming, which transformed the Internet, would have been impossible without video compression [1]. By converting a raw video sequence to a coded video stream, video compression has advanced, allowing for the reduction of unnecessary digital information [2]. Video compression methods must include both an encoder and a decoder to compress the video and reconstruct the original [3]. A codec is made up of an encoder and a decoder working together. Video compression cuts down on memory usage and transmission costs [4,5]. Modern coding techniques such as MPEG-4 Part 10 Advanced Video Coding (H264)/AVC [6], H265/High-Efficiency Video Coding (HEVC) [7], and H266/VVC are used to compress video. HEVC, on the other hand, provides efficient video compression, reducing video file size by up to 50% when compared to H264 [8] with lower complexity. The goal of this paper is to investigate video streaming compression with H265 and video streaming adaptation with low latency using MPEG DASH and HTTP/2 protocol over a channel with an unlimited number of users sharing limited bandwidth. To identify the best quality and bitrate for each representation, the H265 parameters that directly affect the bitrate and quality, such as quantization parameter, constant rate factor, group of pictures, RC-lookahead, and others, are utilized. The client-side controller embedded in the MPEG DASH protocol selects the appropriate representation based on the channel situation. Most of the major online browsers now support HTTP/2, a new version of the Hypertext Transfer Protocol (HTTP). Standard. HTTP/2 was created to make the most of network resources, deliver and receive data as quickly as possible. Only the push function, which allows the server to push content to the client before the client requests it, has attracted the interest of academics in the multimedia field. HTTP/2, on the other hand, includes a novel mechanism for multiplexing structured data delivery known as HTTP/2 streams [9].
Over the top (OTT) platforms are increasingly using HTTP Adaptive Streaming (HAS). The video content is encoded at several quality levels, which are referred to as representations. The client has a rate adaptation algorithm that determines the optimum representation to request based on a bandwidth prediction in real-time. Inaccurate forecasts, on the other hand, can occur, resulting in a reduction in the Quality of Experience (QoE) [10].The first service is for live streaming of traditional/non-immersive videos in restricted networks when HAS throughput forecast can be inaccurate [11]. In this instance, rebuffering events and a rising delay between the original video flow and the displayed video flow may occur, lowering the QoE.

Application Layer Protocol
With a small number of available video qualities (one Standard Definition (SD) and one High Definition (HD)), and low latency, streaming on the Internet used multicast and Real-Time Streaming Protocol (RTSP). Over the last decade, HAS has served as the most important technology for streaming live and VOD contents over the Internet. The use of CDNs to optimize client-server communications is possible with HTTP-based streaming. Furthermore, as the video streaming technology is built on top of HTTP, the packets simply pass through possible barriers like firewalls and NAT [12]. Additionally, by selecting a suitable representation, the client can optimize the quality of the video by using the available bandwidth.

Network video streaming
This system will deal with bandwidth reservations of video streaming with high-quality video representation especially due to an unusually large number of users on the channel that causes a variety of bandwidth availability. Video adaptive streaming methods, especially DASH, offer dynamical video quality adaptation to the channel condition factors. In April 2012, the MPEG-DASH protocol was released as ISO/IEC 23 009-1 [13].The following are its main tenets: The audiovisual content is encoded into many formats, each with its own video quality level and resolution. After that, each representation is segmented to make the video sequence available as a series of web objects. In DASH-based content delivery, the server generates two types of files: the Media Presentation Description (MPD), which contains metadata information about the video content and, the video chunks, which include the media data that is received by viewers as web objects (with HTTP GET requests). At each new request, a DASH client uses a rate-adaptation mechanism to match the video representation bitrate to the network bandwidth. The rate adaption methods aren't part of the standard, thus they're up to the vendors to implement. Some input information, such as throughput prediction, buffer fullness state [14], and network parameters, can be taken into consideration by rate adaptation algorithms.

System Model Design
The H.265 encoder, together with its features and characteristics that impact Bitrate and PSNR, is utilized to determine the best value for these parameters for various representations. The raw video which is 3840x2160 is subsampled into (1920x1080), (1280x720), (704x576), (352x288) and (176x144) as seen in Figure 1. All representations are processed with H265 using its features mentioned in the previous sections. The MPEG DASH protocol with embedded control at client side achieves adaptive streaming. Each of these representations is optimized for Bitrate and PSNR as design steps, which can be re-constructed at the end-user by means of interpolation to the required resolution. Adapting the bitrate sent over the internet necessitates changing the network layer syntax to accommodate the sent format.

Implementation procedure
The implementation consists of two parts; the first dealing with encoder configuration while the second is the server configuration. The main job of the first part is finding the optimal operation of HEVC standard at each resolution that keeps proper streaming with good quality. The second part is to install the server to work based on the results in the first part. This makes MPEG DASH protocol use the different resolution that works probably with channel condition.

Encoder configuration
This work uses libx265 and libavcodec, which provides a large number of codecs, as well as FFmpeg software package to convert, handle and stream videos. Because of the wide range of devices used by users and the limited bandwidth available, video resolution and bitrate streaming must be adapted. FFmpeg program applies a layering of HEVC/H.265 compressed representations to the raw video by utilizing the system parameters CRF, GOP, RC-LOOKAHEAD, and QP to produce a higher compression ratio according to the video's details. The QP doing a key role in enhancing the HEVC encoder's performance using Constant Rate Factor (CRF) of values 0-51. GOP is also used for good quality and lower bitrate. There are two kinds of predictions in HEVC for reference pictures (Intra and inter). HEVC uses three types of slices (intra "I,"), (predictive "P,") and (bi-predictive "B"), with the decoder putting up lists of reference pictures for the slice to be encoded when decoding a P or B slice.

The experiments design of the encoder configuration
In the experiments that were implemented, three test video sequences were utilized, each with a different dynamic state. The Beauty sequence features a tiny movement of one subject with a stationary lens. The Bosphour sequence features a dynamic scene with the camera moving to the left and the ReadySetGo sequence features a transition from rest to dynamic movement with the lens following. The work proposed utilizing six different resolutions of these video sequences, each with 600 frames (120 frames encoded) and a frame rate of 120 frames per second. Characteristics the compression ratio for each video in Table 1  resolutions. The three test sequences utilized in these investigations are shown in Figure 2.

Constant rate factor
The performance of HEVC was verified using a variety of CRF numbers with a range values of six resolutions {4, 12, 20, 28, 36, 44 and 51}. In Tables 2-7, as can be seen the PSNR and Bitrate for different resolution are reduced when the CRF was growing.

Group of pictures (GOP)
The frame type is another important factor that affects video quality. Frames can be divided into three categories: I, P, and B. Because I (intra) frames are coded without reference to previous frames, a P frame is predicted through forwarding prediction, while B frames are inter-coded using motion-compensated prediction from two reference frames [16]. Encode and decode each test video under the HEVC compression technique with the FFMPEG software. The desired bitrate ranges from 2 Mbit/s to 10 Mbit/s, with 1 Mbit/s increment. The GOP pattern is determined by the code structures below, five group of pictures and 3 B-frame values were used shown in Table 8. The encoder was tested for the three video sequences at different objective metric utilized to evaluate PSNR and processing time, as shown in Figures 3-5.

Quantization parameters (QP)
The performance of HEVC was certified with specified numbers of QP in the ranges of 1, 5, 10, 15, 20, 25, 30, 35, 40, 45, and 51 to assess its suitability for their video sequences and information with six representations for the three test sequences, as shown in Figures 6-8.

RC lookahead
Is the amount of frames used for slice-type decision lookahead, which is a major determinant of encoder delay. The longer the lookahead buffer, the more accurate scene cut judgments will be made and the tree will be more successful in enhancing adaptive quantization. It is not advisable to have a lookahead that is longer than the maximum keyframe interval. The range used in this work was {5, 10,15,20,25,30,40,50 and 60}. Default was 20 values, which is between the greatest consecutive bframe count and 250, see   Figure 11-RC-lookahead influence on video's PSNR and Bitrate at Readysetgo video sequence with six representations 6. Server configuration As shown in Figure 1, when the HTTP server FFMPEG package is installed, this produces the coded video resolutions with multiple bitrates. At the same time, it applies MPEG DASH protocol in FFMPEG package in the server by splitting the video into segments and storing these segments on the server also create MPD file as XML file then transmit MPD file to the client for the function of dynamic adaptive video streaming over HTTP. When the client sends an HTTP request to the server where HTTP/2 protocol pushes the proper segment to the client as web objects based on MPEG dash control in the client after sensing the condition of the channel. As shown in Figure 1 the streamed data includes side information and encoded data, which are together managed by means of the protocol syntax. Based on the received syntax the decoder will produce the required resolution.

Discussion and results analysis
We must determine the ideal configuration of the source device's encoder settings for each resolution that may be applied to each video sequence delivered over the channel. Therefore, in experiments results from encoding three videos, when using the H265 encoder parameters, the video sequence with poor movement details has a larger compression ratio than the other two videos and vice versa. Where QP is the most important parameter in determining Bitrate. From changing the value of QP, by varying the value of QP, we discovered that the ideal range was 32-45, which kept the video quality satisfactory at (34-39) dB. The same QP configuration was implemented to three separate video test sequences with differing movement details and bitrate. For example, when encoding the ReadySetGo video sequence at 4k level with QP values of 30, 35, 40, and 45, the PSNR of encoded video to be sent across the channel was within an acceptable range, and at a compression ratio of 3473.73, the Bitrate with a greater reduction value of 45 has a lower acceptable quality of 34.628 dB. This situation was tested with three test video sequences to determine the optimal QP. However, instead of using QP, CRF may be used to save PSNR and reduce Bitrate. The bitrate was chosen based on the buffer state. Thus, when the buffer is congested, the selected bitrate should be low to adapt to the available bandwidth. The video quality changes in direct relation to the bitrate of the video; as the bitrate is reduced, the quality decreases. However, with the right GOP selection, the video quality may be successfully improved even with a low bitrate. With the right RC-lookahead setting, video quality may be significantly improved at a low bitrate. Table 9 shows how these tests may be utilized to determine the optimal parameter setup.

Conclusion and future work
The purpose of this paper was to provide an overview of the latest video coding standards through exploring their implications for multimedia communications. This was achieved by examining videos encoded with the new coding standard through studying the video quality under HEVC/H265 compression and the adaption of high-quality video transmission with low latency across the internet network while sending to end-users. Two steps solve the limitation bandwidth problem when the number of users on the network is growing: First, streaming an H265 for each representation with optimal H265 configuration. Second, using MPEG dash protocol with embed controller to choose the most suitable representation. The tests assist us in identifying the appropriate setting for each layer to obtain a PSNR of 36dB. When the system is in operation, the controller embedded with MPEG dash protocol is continuously sensing the situation of the channel, using the feedback acknowledgment from client, to choose the suitable video representation to send over the remaining channel bandwidth.
To transfer a video with a resolution smaller than 4K, customers choose a resolution that is compatible with the application on the end devices. When the controller detects the channel status, it sends an instruction to the HTTP server to apply the optimum configuration for preserving video quality with proper BR that is appropriate for available channel bandwidth. However, the downside of this approach is that it requires fast processing and high device requirements due to the rapid changing of the channel status over time. Meanwhile, this study offers valuable advice on video compression techniques. Hardware implementation of the suggested encoder increases the processing speed that supports the diversity of applications.