Recently, Vimicro applied the actual security project experience to the PASCAL VOC dataset and successfully achieved the first place in the one-stage algorithm. Zhang Yundong, chairman and general manager of Vimicro Micro-Intelligent Chip Technology Co., Ltd., said: “Vimicro has for the first time combined the experience of security monitoring applications with the data set of international algorithm competitions, enabling embedded front-end devices to achieve the same effect as cloud-based intelligence, and achieved encouraging results. Kega's results, but this is only the beginning of one, I believe that the future will bring more exciting content."
Target detection is one of the most important and earliest research areas in machine vision, and it is also the basis of all machine vision tasks. Therefore, in the aspect of target detection algorithms, international giant companies and scientific research institutions have entered this field one after another. It is a battleground for competition and competition is very high. Intense, and is currently the most successful application of the actual scene, including security monitoring, automatic driving and other fields.
Recently, Vimicro applied the actual security project experience to the PASCAL VOC data set, and successfully achieved the first place in the one-stage algorithm, achieving 87.2% mAP accuracy, surpassing most of the two-step method ( The two-stage algorithm also proved that one-stage can achieve both win accuracy and speed.
What is target detection?
Target detection is to determine whether there is an object of interest on an image. If it exists, it gives the category and location of all objects of interest (What and Where). PASCAL VOC is a visual recognition competition organized by the University of Leeds, the ETH Zurich, the University of Edinburgh, Microsoft, and Oxford University. It includes tasks such as object classification, target detection, and image segmentation. It has far-reaching and enormous impact on the development of computer vision. influences. Among them, the target detection task covers 20 common targets such as cars, people, cats, and dogs. There are few training samples and many scene changes, which are very challenging.
Figure II
Figure 3
Two-step assay and one-step assay
The current mainstream target detection algorithms are mainly based on deep learning models, which can be divided into two major categories:
1) Two-stage detection algorithm, which divides the detection problem into two phases, first generating region proposals, and then classifying candidate regions (generally also requiring position refinement). The typical representative is the R-CNN algorithm based on the region proposal, such as R-CNN, Fast R-CNN, Faster R-CNN, etc.
2) The one-stage detection algorithm, which does not require a region during the stage of proposal, directly generates the class probability and position coordinates of the object, and compares typical algorithms such as YOLO and SSD. The main performance index of the target detection model is the detection accuracy and speed. For the accuracy, the target detection must consider the accuracy of the positioning of the object, not just the classification accuracy.
In general, the two-stage algorithm has advantages in accuracy, and the one-stage algorithm has advantages in speed. However, in industrial applications, the balance between speed and precision must be considered. Especially in front-end intelligent applications, the one-stage algorithm is generally selected due to the limitations of computing resources. The basic network should be lightweight as much as possible. Network, such as MobileNet.
In June 2016, China Star Microelectronics successfully developed a Neural Network Processing Unit (NPU). This NPU is integrated into the Starlight Smart VC0758 chip and becomes China's first embedded neural network processor SOC chip for front-end intelligence. In the application, Vimicro has been working on developing a lightweight deep learning algorithm that can be deployed to the end.
After several years of R&D accumulation, the actual security project experience was applied to the PASCAL VOC data set, and successfully achieved the first place in the one-stage algorithm. The Vimicro _VIM_SSD network is based on the SSD deep learning target detection algorithm architecture and incorporates a variety of mechanisms and strategies. The network is also heavily optimized. The basic network adopts VGG16. It does not use more complex networks such as ResNet-101 and ResNet-152. It ensures faster accuracy. In addition, the entire network is a full convolutional network, and the resolution and usage scenarios of detection targets can be freely adjusted. , easier to deploy in the actual system.
The following table shows the top 10 PASCAL VOC contests. It can be seen that most of the following algorithms use the more complex basic networks ResNet-101 and ResNet-152, and use a two-step method.
Figure 4
Constraints on the development of front-end intelligent development
“In recent years, with the revival and rapid development of deep learning methods, the algorithm has made tremendous progress and breakthroughs, but the development of front-end intelligence is lagging behind,†said Ai Guo, vice president of R&D of Vimicro’s micro-intelligent chip technology company. Constrained by front-end computing resources, the deployment of lightweight, deep neural networks is a must-have for front-end smart development. ChinaStar Micro has been developing a lightweight deep neural network that can be tightly integrated with the NPU, and can be quickly compiled, ported, and deployed embedded. Smart applications."
Mechanisms and strategies
VIM_SSD uses the following major mechanisms and strategies: 1) Use FPN to fuse multiple layers of features. 2) Use inception structure to provide a variety of receptive fields. 3) Use the SE structure to enhance useful features and suppress unwanted features. 4) Added semantic monitoring based on boxes to enhance the semantic information extracted to features.
Figure five
FPN refers to the amplification of deep features and their incorporation into low-level features to enhance the semantic information of low-level features. Based on this, we did some processing of the merged features to make them adaptable to the inspection task. We observed that the anchor box of the network is not all 1:1, but also 1:2, 1:3, 2:1, and 3:1, but the convolution kernels of the existing networks are all MxM. We introduce the inception+SE structure. On the one hand, we add 1x3, 3x1 convolution kernels to make it better able to extract non-square features. On the other hand, we introduce the SE module to make the network automatic for each aspect ratio. Select the appropriate feature.
Semantic segmentation has been proven in many articles and it has a catalytic effect on target detection. However, due to the difficulty of data labeling and the difficulty in the actual project deployment process, we have adopted box-based semantic information to supervise the extraction of network semantic features.
Yang Min, one of the authors of VIM_SSD, said: “We have been researching light-weight deep neural networks since 2016 and have used and deployed them in various practical projects. This time we mainly want to put experience and methods of project accumulation on public data sets. Doing a trial is a one-time summary of the past work."
Zhang Yundong, chairman and general manager of Vimicro Micro-Intelligent Chip Technology Co., Ltd., said: “Vimicro has for the first time combined the experience of security monitoring applications with the data set of international algorithm competitions, enabling embedded front-end devices to achieve the same effect as cloud-based intelligence, and achieved encouraging results. Kega's results, but this is only the beginning of one, I believe that the future will bring more exciting content."
magic mirror,smart Touch Screen display,wifi module built-in,android system,easy to install and maintance;
smart mirror,magic mirror,digital mirror
Jumei Video(Shenzhen)Co.,Ltd , https://www.jmsxdisplay.com