PNG hardware decoding acceleration design

0 Preface
PNG (Portable Network Graphic Format) is an abbreviation of streaming network graphic format, which is a Bitmap File storage format. The PNG file adopts LZ77 and Huffman two lossless compression algorithms with high compression ratio, supports network color image transmission, supports Alpha channel, defines transparent area and multiple transparency, and gradually displays pictures in detail.
The core algorithm of PNG compression is to use Zip compression algorithm. The characteristic of this algorithm is to use LZ77 algorithm to perform phrase-like repeated compression to obtain the unmatched byte and the combination of matching length and distance, and then perform single-byte according to Huffman algorithm. Repeated compression ultimately results in a compressed code stream. The principle of PNG decoding is also the inverse process of compression, then the original image data can be restored according to the code table information and the compressed code stream during decoding.
The decoding of PNG files is usually done by software. The software decoding implementation is flexible, but compared with hardware decoding, the software decoding speed is slow and the energy consumption is large, which is not conducive to the low power design optimization of mobile devices. To this end, the hardware decoding implementation method of PNG image is discussed here. The application object is the mobile phone dedicated chip, which has high requirements for low power consumption and decoding speed, and solves the fast table lookup, software and hardware coordination of PNG decoding. Hardware acceleration and other implementation methods, and the main function of the hardware accelerated decoding function is to reduce the burden on the CPU, greatly speed up the display speed of the PNG image, and reduce the power consumption to a certain extent, prolonging the standby time of the mobile phone, and has great research and The actual value of development.

1 Introduction to PNG image decoding principle
1.1 LZ77 algorithm introduction
LZ77 algorithm can be called "sliding window compression", the algorithm will be a virtual window that can follow the sliding process sliding as the term dictionary; if the string to be compressed appears in the window, the output matching length The combination of the distance and the distance to replace the same string that appears before, and the minimum matching string is 3 bytes, which ensures that the compressed data is smaller than the original data.
For example, the size of the window is 15 characters, and the 15 characters just encoded by Gang 0 are: byhelloeveryone, and the characters to be encoded are: helloto-e, eryonehi. It can be found that some strings have appeared before, and the string with () indicates the matching string that has appeared in the sliding window: (hello) to (everyone) hi.
The above original information can be replaced by the LZ77 algorithm with the combination of matching length and distance. If the unmatched byte is encountered, the compressed content is: (5,13)to( 8,15)hi. When the LZ77 is decompressed, as long as the sliding window is maintained, as the compression information is continuously input, the corresponding matching string can be found as an output according to the matched combination information, and the original data can be restored.
1.2 Introduction to Huffman Algorithm
    Huffman algorithm is a kind of coding compression, which uses the characteristics of different frequencies used by each single byte to make the fixed length coding into variable length coding, shorter frequency coding for higher frequency bytes, and longer frequency of use of lower frequency bytes. The encoding acts as a lossless compression. In this way, the unmatched bytes and matched combination information after LZ77 compression can be further subjected to Huffman compression, thereby obtaining high compression efficiency.
For example, for a group of elements whose character values ​​are s={a, b, c, d, e, f}, their corresponding appearance frequencies are P={10, 2, 2, 2, 2, 9}. Figure 1 is a Huffman tree built from the above information. The frequency and element values ​​of each element are shown in Figure 1. The length of each element after encoding is L_{1,3,3,3,3,2}, and the space required to store these character values ​​after encoding is greatly visible. decreased.

This article refers to the address: http://

This Huffman tree is built according to the Dellate principle of the PNG specification and has the following characteristics:
(1) The left leaf code is 0, and the right side is 1;
(2) The encoding must satisfy the requirement of "prefix encoding", that is, the shorter encoding cannot be the prefix of the longer encoding, which guarantees the uniqueness of the code;
(3) The node frequencies of each layer of leaves are arranged from small to large, and the nodes of the same frequency are arranged from small to large according to the character value. This is also an improvement of the Huffman algorithm by the zip algorithm adopted by PNG. Therefore, when decoding, the code table information in the compressed stream is first extracted to establish a Huffman tree, where each leaf node should contain code length and character value information, and the finally generated code table is stored in RAM for Huff_man decoding. The module lookup table restores the original image data.

2 PNG decoding hardware and software coordination mechanism The entire PNG hardware decoding process is scheduled by software. In the hardware decoding, if the image data is verified to be wrong or the decoding is completed, the PNG hardware module configures the special register to the software check. Interrupt processing; when the software detects that this register signal is enabled, an interrupt is generated, and even if the PNG hardware decoding module is turned off, the power consumption of hardware decoding is saved in case of data error.
The data handling mechanism before and after decoding is to realize the handling of PNG data through the common AVI module (equivalent to the FIF0 buffer for input and output data): Before decoding, the software deploys the compressed data from the memory to the PNG hardware module by deploying the AVI module. Decoding; the decoded data can be displayed by VGA to the VGA after being scaled by the Resize module. This superior software provisioning mechanism solves the software and hardware coordination problem of the design and can achieve high power consumption. The decoding of efficiency, the specific software and hardware coordination principle is shown as in Fig. 2.

3 PNG decoding overall hardware structure The overall structure of PNG hardware decoding acceleration is mainly composed of Bytesshift character container, PNG header information processing module, Inflate table built Huffman table module, Inflate fast decoding module, Lz77 search matching string module, Filter inverse filtering de-interlacing Module and Resize enlargement and reduction module consists of 7 large modules. The hardware flow chart of specific PNG decoding is shown in Figure 3.

As can be seen from Figure 3, the basic process of PNG decoding is: taking the compressed data from the bus through the AVI module to the Bytesshift character container for caching, and converting into a compressed bit stream; the PNG header information processing module retains the header information of the file, and Control the Inflate table module to read the code length information to establish a Huffman table, and decode the compressed data; the decoded data is subjected to inverse filtering and deinterleaving by the Filter module, and then sent to the Resize module for zooming in and out, and then passed. The AVI module transmits the final decoded data. Among them, the decoding core module and the Filter module are greatly improved by adopting the pipeline processing method of data. PNG decoding efficiency.

4 The hardware structure of the PNG core decoding module is variable because the encoding length is variable and the encoding length is not uniform. It takes a lot of time to find the Huffman table by bit comparison during decoding, and the longest code length of the Huffman encoding in the PNG data stream is 9. Therefore, in order to achieve fast table lookup decoding, the leaf node of the Huffman tree with code length less than 9 is extended to the 9th layer as the parent node, that is, the extended leaf node information is the same as the parent node. The secondary 9-bit compressed data is used as an address to look up the table. This ensures that the corresponding character value can be found in each clock, which can greatly improve the efficiency of hardware decoding. Taking the previous Huffman tree as an example (as shown in Figure 4), simply add the leaf nodes within the 4th layer to the 4th layer, that is, fill the entire Huffman binary tree, then the 4th level of the cotyledon node The length and character information are the same as the parent node.

This method of extending the Huffman tree can quickly find the Huffman table, obtain the corresponding character value and the matching combined information value, and solve the combined combined information value, and then restore the decoded data as an output according to the LZ77 principle.
The hardware decoding core module in this design can refer to FIG. 5. The advantage of this hardware structure is that it uses a method of spreading the code table to achieve fast decoding. The basic process of core decoding is to look up the table with a fixed 9 b compressed data as an address, find out the leaf nodes containing the code length and character information, and remove the used compression from the character container module according to the code length information. The data, and waiting for the new compressed data and the remaining compressed data of the character container to form a new 9 b data as a table lookup address. Repeat the process of looking up the table on the next clock, and repeatedly check the table in this way until the end of Huffman decoding.

5 Simulation and synthesis results The decoded data is extracted by Modelsim 6.3 simulation. The Matlab tool compares the original image display with the image data extracted after the design decoding. The comparison results are completely consistent and compared on the verification platform. The corresponding original image data is also completely consistent, so the hardware design can recover PNG image data completely without distortion.
In the design, using the TSMC 90 nlTl process library, the PNG decoding core module is integrated with DC at a frequency of 100 MHz. The results are shown in Table 1. (Where the area size and power consumption do not include the area of ​​the RAM and the power consumption of the read and write RAM)

6 Conclusion <br> The hardware implementation of PNG decoding acceleration is discussed here. The hardware decoding principle of LZ77 and Huffman algorithms is analyzed, and the mechanism of complementing the Huffman tree is used to realize fast table lookup decoding, and the better software and hardware coordination mechanism is used to realize PNG hardware decoding under the premise of saving power. accelerate.

Power pulse Transformer

24V Frequency Transformer,110V 60Hz To 220V 50Hz Transformer,Oil-Filled Electric Transformer,High Voltage Pulse Ferrite Core Transformer

IHUA INDUSTRIES CO.,LTD. , https://www.ihuagroup.com