TY - UNPB

T1 - Improvements of Motion Estimation and Coding using Neural Networks

AU - Birman, Raz

AU - Segal, Yoram

AU - Hadar, Ofer

AU - Benois-Pineau, Jenny

PY - 2020/2/24

Y1 - 2020/2/24

N2 - Inter-Prediction is used effectively in multiple standards, including H.264 and HEVC (also known as H.265). It leverages correlation between blocks of consecutive video frames in order to perform motion compensation and thus predict block pixel values and reduce transmission bandwidth. In order to reduce the magnitude of the transmitted Motion Vector (MV) and thus reduce bandwidth, the encoder utilizes Predicted Motion Vector (PMV), which is derived by taking the median vector of the corresponding MVs of the neighboring blocks. In this research, we propose innovative methods, based on neural networks prediction, for improving the accuracy of the calculated PMV. We begin by showing a straightforward approach of calculating the best matching PMV and signaling its neighbor block index value to the decoder while reducing the number of bits required to represent the result without adding any computation complexity. Then we use a classification Fully Connected Neural Networks (FCNN) to estimate from neighbors the PMV without requiring signaling and show the advantage of the approach when employed for high motion movies. We demonstrate the advantages using fast forward movies. However, the same improvements apply to camera streams of autonomous vehicles, drone cameras, Pan-Tilt-Zoom (PTZ) cameras, and similar applications whereas the MVs magnitudes are expected to be large. We also introduce a regression FCNN to predict the PMV. We calculate Huffman coded streams and demonstrate an order of ~34% reduction in number of bits required to transmit the best matching calculated PMV without reducing the quality, for fast forward movies with high motion.

AB - Inter-Prediction is used effectively in multiple standards, including H.264 and HEVC (also known as H.265). It leverages correlation between blocks of consecutive video frames in order to perform motion compensation and thus predict block pixel values and reduce transmission bandwidth. In order to reduce the magnitude of the transmitted Motion Vector (MV) and thus reduce bandwidth, the encoder utilizes Predicted Motion Vector (PMV), which is derived by taking the median vector of the corresponding MVs of the neighboring blocks. In this research, we propose innovative methods, based on neural networks prediction, for improving the accuracy of the calculated PMV. We begin by showing a straightforward approach of calculating the best matching PMV and signaling its neighbor block index value to the decoder while reducing the number of bits required to represent the result without adding any computation complexity. Then we use a classification Fully Connected Neural Networks (FCNN) to estimate from neighbors the PMV without requiring signaling and show the advantage of the approach when employed for high motion movies. We demonstrate the advantages using fast forward movies. However, the same improvements apply to camera streams of autonomous vehicles, drone cameras, Pan-Tilt-Zoom (PTZ) cameras, and similar applications whereas the MVs magnitudes are expected to be large. We also introduce a regression FCNN to predict the PMV. We calculate Huffman coded streams and demonstrate an order of ~34% reduction in number of bits required to transmit the best matching calculated PMV without reducing the quality, for fast forward movies with high motion.

U2 - https://doi.org/10.48550/arXiv.2002.10439

DO - https://doi.org/10.48550/arXiv.2002.10439

M3 - Preprint

BT - Improvements of Motion Estimation and Coding using Neural Networks

ER -