|
Implementation of H.264 in FPGA faces multiple levels of complexity because real-time video compression requires careful design to achieve high data throughput and computational complexity. Generally the video encoders require a lot of resources and also need for faster processing. This will be a bottle neck with area & speed conscious design of FPGA. A high-profile implementation of H.264 inter-frame encoding still pose as an engineering challenge even on the latest FPGAs. Important modules and the complexity estimates The modules those can exploit the data parallelism are referred as hardware friendly modules. Module such as Intra Processing, SAD, SATD etc are example for this. Not all modules can exploit parallelism because the algorithms such as CAVLC, CAVLD or reference MB search are highly sequential.
The implementation complexity increases when the sequential nature of the algorithm or the number of processes inside the algorithm, increases. Motion estimation H.264/AVC encoding, motion estimation can take up to 70% of the computational burden of a complete video compression procedure, making it as a bottleneck in term of power-consumption, computing speed and hardware cost. Support for 'Variable block size' is generally discarded in hardware implementation. Sub-pixel Resolution The sub-pixel resolution increases the algorithmic and computational complexity significantly. The decoding portion, which requires performing subpixel motion compensation only once per block, takes about 10 to 20 percent of decoding pipeline. The bulk of this time is spent interpolating values between pixels to generate the sub-pixel-offset reference blocks. The cost of performing sub-pixel estimation varies with the encoding algorithm, but may require performing motion compensation more than once. Interpolation Algorithm The interpolation algorithm to
generate offset reference blocks is defined differently for luma
and chroma blocks. For luma, interpolation is performed in two
steps, half-pixel and then quarter-pixel interpolation. The
half-pixel values are created by filtering with this kernel
horizontally and vertically: Quarter-pixel interpolation is then performed by linearly averaging adjacent half-pixel values. Motion compensation for chroma blocks uses bilinear interpolation with quarter-pixel or eighth-pixel accuracy, depending on the chroma format. Each subpixel position is a linear combination of the neighboring pixels. After interpolating to generate the reference block, the algorithm adds that reference block to the decoded difference information to get the reconstructed block. The encoder executes this step to get reconstructed reference frames, and the decoder executes this step to get the output frames. DCT Instead of the DCT, the H.264 algorithm uses an integer transform as its primary transform to translate the difference data between the spatial and frequency domains. The transform is an approximation of the DCT that is both lossless and computationally simpler. The core transform, can be implemented using only shifting and adding. Quantization Quantization in H.264 is
arithmetically expressed as a two-stage operation. The first
stage is multiplying each coefficient in the 4x4 block by a
fixed coefficient-specific value. This stage allows the
coefficients to be scaled unequally according to importance or
information. The second stage is dividing by an adjustable
quantization parameter (QP) value. This stage provides a single
“knob” for ad-justing the quality and resultant bitrate of the
encoding. Entropy Coding The use of CABAC can improve the compression of around 5-7%, but requires a 30-40% of additional total processing power to be accomplished. Rate Distortion Optimization H.264 video coding is based on the concept of rate distortion optimization (RDO) which means that the encoder has to encode the blocks using all the mode combinations and choose the one that gives the best RDO performance. But personally I will not suggest this in FPGA due to the high memory requirements.
|
You can consult the author Tony Gladvin George for the implementation of latest Video CODECs on FPGA. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
H.264 Video Codec on FPGA Implementation of Video Codec
Features in H.264 and implementation
feasibilities Encoder architecture for H.264 Video encoder in FPGA Important modules and complexity estimates |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||