Book chapter
A system for large-scale automatic traffic sign recognition and mapping
We present a system for the large-scale automatic traffic signs recognition and mapping and experimentally justify design choices made for different components of the system. Our system works with more than 140 different classes of traffic signs and does not require labor -intensivelabellingof a large amount of training data due to the training on synthetically generated images. We evaluated our system on the large dataset of Russian traffic signs and made this dataset publically available to encourage futurecomparison.
In book
We investigate the specific problem of machine vision, namely, video-based detection of the moving forklift truck. It is shown that the detection quality of the state-of-the-art local descriptors (SURF, SIFT, etc.) is not satisfactory if the resolution is low and the illumination is changed dramatically. In this paper, we propose to use a simple mathematical morphological algorithm to detect the presence of a cargo on the forklift truck. At first, the movement direction is estimated by the updating motion history image method and the front part of the moving object is obtained. Next, contours are detected and morphological operations in front of the moving object are used to estimate simple geometric features of empty forklift. In the experimental study it has been shown that the proposed method has 40% lower FAR and 27% lower FRR in comparison with conventional matching of local descriptors. Moreover, our algorithm is 7 times faster.
The sixteen-volume set comprising the LNCS volumes 11205-11220 constitutes the refereed proceedings of the 15th European Conference on Computer Vision, ECCV 2018, held in Munich, Germany, in September 2018.
The 776 revised papers presented were carefully reviewed and selected from 2439 submissions. The papers are organized in topical sections on learning for vision; computational photography; human analysis; human sensing; stereo and reconstruction; optimization; matching and recognition; video attention; and poster sessions.
Person detection is a key problem for many computer vision tasks. While face detection has reached maturity, detecting people under full variation of camera view-points, human poses, lighting conditions and occlusions is still a difficult challenge. In this work we focus on detecting human heads in natural scenes. Starting from the recent R-CNN object detector, we extend it in two ways. First, we leverage person-scene relations and propose a global CNN model trained to predict positions and scales of heads directly from the full image. Second, we explicitly model pairwise relations among the objects via energy-based model where the potentials are computed with a CNN framework. Our full combined model complements R-CNN with contextual cues derived from the scene. To train and test our model, we introduce a large dataset with 369,846 human heads annotated in 224,740 movie frames. We evaluate our method and demonstrate improvements of person head detection compared to several recent baselines on three datasets. We also show improvements of the detection speed provided by our model.
We present a novel technique for estimating disk parameters (the center and the radius) from its 2D image. It is based on the maximal likelihood approach utilizing both edge pixels coordinates and the image intensity gradients. We emphasize the following advantages of our likelihood model. It has closed-form formulae for estimating the parameters, therefore, requiring less computational resources than iterative algorithms. The likelihood model naturally distinguishes outer and inner annulus edges. The proposed technique was evaluated on both synthetic and real data.
The article describes an approach for extraction of user preferences based on the analysis of a gallery of photos and videos on mobile device. It is proposed to firstly use fast SSD-based methods in order to detect objects of interests in offline mode directly on mobile device. Next we perform facial analysis of all visual data: extract feature vectors from detected facial regions, cluster them and select public photos and videos which do not contain faces from the large clusters of an owner of mobile device and his or her friends and relatives. At the second stage, these public images are processed on the remote server using very accurate but rather slow object detectors. Experimental study of several contemporary detectors is presented with the specially designed subset of MS COCO, ImageNet and Open Images datasets.
The sixteen-volume set comprising the LNCS volumes 11205-11220 constitutes the refereed proceedings of the 15th European Conference on Computer Vision, ECCV 2018, held in Munich, Germany, in September 2018. The 776 revised papers presented were carefully reviewed and selected from 2439 submissions. The papers are organized in topical sections on learning for vision; computational photography; human analysis; human sensing; stereo and reconstruction; optimization; matching and recognition; video attention; and poster sessions.
The problem of automatic detection of the moving forklift truck in video data is explored. This task is formulated in terms of computer vision approach as a moving object detection in noisy environment. It is shown that the state-of-the-art local descriptors (SURF, SIFT, FAST, ORB) are not characterized with satisfactory detection quality if the camera resolution is low, the lighting is changed dramatically and shadows are observed. In this paper we propose to use a simple mathematical morphological algorithm to detect the presence of a cargo on the forklift truck. Its first step is the estimation of the movement direction and the front part of the truck by using the updating motion history image. The second step is the application of Canny contour detection and binary morphological operations in front of the moving object to estimate simple geometric features of empty forklift. The algorithm is implemented with the OpenCV library. Our experimental study shows that the best results are achieved if the difference of the width of bounding rectangles is used as a feature. Namely, the detection accuracy is 78.7% (compare with 40% achieved by the best local descriptor), while the average frame processing time is only 5 ms (compare with 35 ms for the fastest descriptor).