Deep Part-Based Generative Shape Model with Latent Variables
The Shape Boltzmann Machine (SBM) and its multilabel version MSBM have been recently introduced as deep generative models that capture the variations of an object shape. While being more flexible MSBM requires datasets with labeled parts of the objects for training. In the paper we present an algorithm for training MSBM using binary masks of objects and the seeds which approximately correspond to the locations of objects parts. The latter can be obtained from part-based detectors in an unsupervised manner. We derive a latent variable model and an EM-like training procedure for adjusting the weights of MSBM using a deep learning framework. We show that the model trained by our method outperforms SBM in the tasks related to binary shapes and is very close to the original MSBM in terms of quality of multilabel shapes.
The paper deals with a linear regression model. The EM algorithm is popular tool for maximum likelihood estimation of the parameters of regression model. It provides a method of robust regression under the assumption that the disturbances are independent and have identical multivariate t distribution. Previous work focused on the method of maximum likelihood estimation via the EM algorithm under the assumption that the degrees of freedom parameter of the t distribution is a scalar. In this paper, a broader assumption is employed, namely, that the disturbances have a multivariate t distribution with a vector of degrees of freedom. Missing values from the EM algorithm are random matrices. The theoretical results are illustrated in a simulation experiment using several distributions for the error process. Robust procedures are shown to be superior to the method of least squares.
The article is devoted to the history and problems of creating interfaces. Shows the complexity and importance of effective interfaces, noted that this problem is a system of multilevel interdisciplinary. The new systems should be given serious attention to issues of human efficiency level. Man is still the leading element in determining the efficiency of any ergatic system. The main means of control in ergatic systems including computers, is the graphic manipulator (GM), with which to control the on-screen controls. Are the main styles of user interface. The most popular are GUI-interface (GUI - GraphicalUserInterface) and based on them WUI-interface (WUI-WebUserInterface). The development of equipment and technology of computer modeling led to the active introduction of virtual reality technology to ensure the inclusion of people in artificial worlds. Their main feature - full control of all the parameters of the development and the emergence of a sense of presence in people who live in these environments, which are called immersive. Technology induced environments allow a number of new, not generally applicable to the present, of interfaces using specially engineered virtual environments. Much attention is paid to creating the most advanced systems - systems contact management, which are the camera and sophisticated software. The drawbacks of modern non-contact control. Is being developed to create a contactless intelligent interface, which will allow: to control with data from a video camera, which is installed on your computer have a high noise immunity, clearly identify the user to recognize the situational environment, have an acceptable cost.
Most of today’s machine learning techniques requires large manually labeled data. This problem can be solved by using synthetic images. Our main contribution is to evaluate methods of traffic sign recognition trained on synthetically generated data and show that results are comparable with results of classifiers trained on real dataset. To get a representative synthetic dataset we model different sign image variations such as intra-class variability, imprecise localization, blur, lighting, and viewpoint changes. We also present a new method for traffic sign segmentation, based on a nearest neighbor search in the large set of synthetically generated samples, which improves current traffic sign recognition algorithms.
We present a new click model for processing click logs and predicting relevance and appeal for query–document pairs in search results. Our model is a simplified version of the task-centric click model but outperforms it in an experimental comparison.
We consider the problem of estimating 3-d structure from a single still image of an outdoor urban scene. Our goal is to efficiently create 3-d models which are visually pleasant. We chose an appropriate 3-d model structure and formulate the task of 3-d reconstruction as model fitting problem. Our 3-d models are composed of a number of vertical walls and a ground plane, where ground-vertical boundary is a continuous polyline. We achieve computational efficiency by special preprocessing together with stepwise search of 3-d model parameters dividing the problem into two smaller sub-problems on chain graphs. The use of Conditional Random Field models for both problems allows to various cues. We infer orientation of vertical walls of 3-d model vanishing points.