Title
Analysis, fast algorithm, and VLSI architecture design for H.264/AVC intra frame coder
Abstract
Intra prediction with rate-distortion constrained mode decision is the most important technology in H.264/AVC intra frame coder, which is competitive with the latest image coding standard JPEG2000, in terms of both coding performance and computational complexity. The predictor generation engine for intra prediction and the transform engine for mode decision are critical because the operations require a lot of memory access and occupy 80% of the computation time of the entire intra compression process. A low cost general purpose processor cannot process these operations in real time. In this paper, we proposed two solutions for platform-based design of H.264/AVC intra frame coder. One solution is a software implementation targeted at low-end applications. Context-based decimation of unlikely candidates, subsampling of matching operations, bit-width truncation to reduce the computations, and interleaved full-search/partial-search strategy to stop the error propagation and to maintain the image quality, are proposed and combined as our fast algorithm. Experimental results show that our method can reduce 60% of the computation used for intra prediction and mode decision while keeping the peak signal-to-noise ratio degradation less than 0.3 dB. The other solution is a hardware accelerator targeted at high-end applications. After comprehensive analysis of instructions and exploration of parallelism, we proposed our system architecture with four-parallel intra prediction and mode decision to enhance the processing capability. Hadamard-based mode decision is modified as discrete cosine transform-based version to reduce 40% of memory access. Two-stage macroblock pipelining is also proposed to double the processing speed and hardware utilization. The other features of our design are reconfigurable predictor generator supporting all of the 13 intra prediction modes, parallel multitransform and inverse transform engine, and CAVLC bitstream engine. A prototype chip is fabricated with TSMC 0.25-μm CMOS 1P5M technology. Simulation results show that our implementation can process 16 mega-pixels (4096×4096) within 1 s, or namely 720×480 4:2:0 30 Hz video in real time, at the operating frequency of 54 MHz. The transistor count is 429 K, and the core - size is only 1.855×1.885 mm2.
Year
DOI
Venue
2005
10.1109/TCSVT.2004.842620
IEEE Trans. Circuits Syst. Video Techn.
Keywords
Field
DocType
macroblock pipelining,cmos integrated circuits,vlsi architecture design,avc intra frame coder,predictor generation,image coding,image matching,mode decision,full-search strategy,0.25 micron,partial-search strategy,discrete cosine transform,image subsampling,vlsi architecture,memory access,intra frame coder,intra prediction mode,parallel multitransform,discrete cosine transforms,computational complexity,image sampling,54 mhz,four-parallel intra prediction,hadamard transforms,fast algorithm,vlsi,intra prediction,iso/iec 14496-10 avc,joint video team (jvt),prediction theory,cavlc bitstream engine,h.264-avc intra frame coder,rate-distortion,hadamard-based mode decision,entire intra compression process,real time,itu-t rec. h.264,inverse transform,query formulation,rate distortion theory,video codecs,cmos technology,system architecture,very large scale integration,hardware accelerator,chip,peak signal to noise ratio,process capability,computer architecture,engines,error propagation,algorithm design and analysis,hardware,image quality
Macroblock,Pipeline (computing),Decimation,Context-adaptive variable-length coding,Computer science,Discrete cosine transform,Algorithm,JPEG 2000,Bitstream,Hadamard transform
Journal
Volume
Issue
ISSN
15
3
1051-8215
Citations 
PageRank 
References 
147
17.31
8
Authors
4
Search Limit
100147
Name
Order
Citations
PageRank
Yu-Wen Huang11116114.02
Bing-Yu Hsieh248354.76
Tung-Chien Chen379178.22
Liang-Gee Chen43637383.22