Identify the features to
be implemented
Before starting the implementation, we have
to decide the features of the CODEC. For example the
resolution, the frames-per-sec etc. This is important for a
RTL projects when compared to a C-code project because the
architecture will be highly depended on the finalized
profile. Any change in the requirement often directly
influences the overall architectural decision at the system
level.
The features can vary much. Below, some of
the most important features are identified for you. In the
following sections the implementation complexity is also
discussed.
* Resolution
1080 (HD)
VGA
CIF
QCIF
* Interlaced / Non-interlaced
* Frames Per Second (fps)
30 fps
15 fps
7 fps
* Frequency of Operation
90 MHz
100 MHz
160 MHz
220 MHz
* Variable Bit Rate (VBR) or Constant Bit Rate (CBR)
* Types of frame supported
I Key - frame
P Predicted - frame
B Bi-Predicted - frame
* Prediction Modes for intra16x16
Vertical
Horizontal
DC
diagonal down-left
diagonal down-right
vertical right
horizontal down
vertical left
horizontal up
* search window size for Motion estimation
* Macroblock sizes supported
16x16
8x8
4x4
* Modes supported for intra4x4
* Pixel size for Prediction
full pixel
half pixel
quarter pixel
* Deblocking filter
* Number of reference frames
* memory interface, external and internal
* memory, area and speed constrains, if
any
You can find the details of various profiles
here >> |
|
The modules in the architecture should be pipelined. For
implementing pipelining in modules like Inter-frame coding
we have to do a some customization in the order of fetching
the reference macro blocks
1080p is referred as high-definition (HD)
video, which can offer the highest level of resolution the
h.264 standard allows. 1080p is free of the complexity and
motion errors associated with interlace formats. The
suggested way of implementing these types of heavy codecs is
to use FPGA along with high-speed memory interfaces.
Generally AISC/FPGAs have extremely high
memory bandwidth. As a result, one can use custom memory
configurations and/or addressing techniques to exploit data
locality in high-dimensional data.
First define the entire processing chain.
For implementing the CODEC,
it is recommended having a reference C-code. In most cases
the reference code will be available for free from the
standardization team itself. The functions in the reference
codes developed in C will not be segregated in the fashion
how the hardware is needed.
For example, 'nested
functions' which is common in C-language, are not
recommended inside a hardware module. Hence we have to make
the C-code segregated in the pipeline fashion required by
the hardware architecture. This activity also helps in
testing the individual modules by defining test-data
extraction points in the C-code.
One of the important design phases is to
identify the pipeline stages and the memory management. A
clearly pipelined architecture will be able to represent as
a timing/block diagram. Once the pipeline stages are
identified, those changes have to be incorporated in the
C-model so that the test data can be generated at the
defined boundaries of the modules. The pipeline stages are
segregated based on the data flow & memory management rather
than the algorithmic tool’s boundary.
The H.264 standard is much
improved for its compression capacity, but along with that
the sequential nature of the algorithm has also increased.
This introduced more algorithmic discontinuity to the data
flow of standard. The modules like ‘entropy encoder’ will be
highly recursive algorithms and it demands more
computational power because of its serial nature. This makes
the design much complicated and requires more optimization
for real-time processing. Allocate more implementation time
for the same.
The video hardware have huge
combinational circuits in its RTL, which create bottle-neck
in frequency or area. So, one basic study has to be done to
analyze the behavior of algorithms, which is used in
synthesis-place-route tool. This study will be concentrated
in the hardware structure formation for combinational
circuits. It is important since the tool chain & hardware
differs in different project. This enable us to determine
the optimal usage trade off between ‘nested if conditions’
or ‘nested case conditions’ or ‘large combinational state
machines’ etc.
|