Development of new HPC architectures proceeds faster than the corresponding adjustment of the algorithms for such fundamental mathematical models as quantum and classical molecular dynamics. There is the need for clear guiding criteria for the computational efficiency of a particular model on a particular hardware. LINPACK benchmark alone can no longer serve this role. In this work we consider a practical metric of the time-to-solution versus the computational peak performance of a given hardware system. In this metric we compare different hardware for the CP2K and LAMMPS software packages widely used for atomistic modeling. The metric considered can serve as a universal unambiguous scale that ranges different types of supercomputers.
The paper focuses on solving the problem of hair removal in dermatology applications. The proposed hair removal algorithm is based on Gabor filtering and PDE-based image reconstruction. It also includes the edge sharpening stage using a new warping algorithm. The idea of warping is to move pixels from the neighborhood of the blurred edge closer to the edge. The proposed technique preserves the overall luminosity and textures of the image, while making the edges sharper and less noisy.
Abstract. Research and development (R&D) involves not only researchers but also many other specialists from different areas. All of them solve a variety of tasks that require comprehensive information and analytical support. This chapter discusses the major tasks arising in R&D: study of the state of the art in a given research area, prospects assessment of research fields and forecasting their development, quality assessment of scientific publications including plagiarism detection, and automated examination of proposed R&D projects. A number of informational and analytical systems have been developed to address these tasks. The main goal of this chapter is to give a review of R&D support functions of well-known and widely-used search and analytical systems and discuss information retrieval methods behind these functions. Keywords: Full-text search, information retrieval, R&D support, scientific publication, citation databases, scientometrics, exploratory search.
In recent years there have been a number of important improvements in exact color-based maximum clique solvers, which have considerably enhanced their performance. Initial vertex ordering is one strategy known to have a significant impact on the size of the search tree. Typically, a degenerate sorting by minimum degree is used; literature also reports different tiebreaking strategies. A systematic study of the impact of initial sorting in the light of new cutting-edge ideas (e.g. recoloring , selective coloring , ILS initial lower bound computation [15, 16] or MaxSAT-based pruning ) is, however, lacking. This paper presents a new initial sorting procedure and relates performance to the new mentioned variants implemented in leading solver BBMC [9, 10].
In the paper we address a challenging problem of incorporating preferences on possible shapes of an object in a binary image segmentation framework. We extend the well-known conditional random fields model by adding new variables that are responsible for the shape of an object. We describe the shape via a flexible graph augmented with vertex positions and edge widths. We derive exact and approximate algorithms for MAP estimation of label and shape variables given an image. An original learning procedure for tuning parameters of our model based on unlabeled images with only shape descriptions given is also presented. Experiments confirm that our model improves the segmentation quality in hard-to-segment images by taking into account the knowledge about typical shapes of the object.
The ORD corpus is a representative resource of everyday spoken Russian that contains about 1000 h of long-term audio recordings of daily communication made in real settings by research volunteers. ORD macro episodes are the large communication episodes united by setting/scene of communication, social roles of participants and their general activity. The paper describes annotation principles used for tagging of macro episodes, provides current statistics on communication situations presented in the corpus and reveals their most common types. Annotation of communication situations allows using these codes as filters for selection of audio data, therefore making it possible to study Russian everyday speech in different communication situations, to determine and describe various registers of spoken Russian. As an example, several high frequency word lists referring to different communication situations are compared. Annotation of macro episodes that is made for the ORD corpus is a prerequisite for its further pragmatic annotation.
The open source C++ class library GridMD for distributed computing is reviewed including its architecture, functionality and use cases. The library is intended to facilitate development of distributed applications that can be run at contemporary supercomputing clusters and standalone servers managed by Grid or cluster task scheduling middleware. The GridMD library used to be targeted at molecular dynamics and Monte-Carlo simulations but at present it can serve as a universal tool for developing distributed computing applications as well as for creating task management codes. In both cases the distributed application is represented by a single client-side executable built from a compact C++ code. In the first place the library is targeted at developing complex applications that contain many computation stages with possible data dependencies between them which can be run efficiently in the distributed environment.
The paper presents a new geometrically motivated method for non-linear regression based on Manifold learning technique. The regression problem is to construct a predictive function which estimates an unknown smooth mapping f from q-dimensional inputs to m-dimensional outputs based on a training data set consisting of given ‘input-output’ pairs. The unknown mapping f determines q-dimensional manifold M(f) consisting of all the ‘input-output’ vectors which is embedded in (q+m)-dimensional space and covered by a single chart; the training data set determines a sample from this manifold. Modern Manifold Learning methods allow constructing the certain estimator M* from the manifold-valued sample which accurately approximates the manifold. The proposed method called Manifold Learning Regression (MLR) finds the predictive function fMLR to ensure an equality M(fMLR) = M*. The MLR simultaneously estimates the m×q Jacobian matrix of the mapping f.
In this paper, we analyze a new approach for demand prediction in retail. One of the signicant gaps in demand prediction by machine learning methods is the unaccounted sales data censorship. Econometric approaches to modeling censored demand are used to obtain consistent and unbiased estimates of parameters. These approaches can also be transferred to different classes of machine learning models to reduce the prediction error of sales volume. In this study we build two ensemble models to predict demand with and without demand censorship, aggregating predictions for machine learning methods such as Linear regression, Ridge regression, LASSO and Random forest. Having estimated the predictive properties of both models, we test the best predictive power of the models with accounting for the censored nature of demand.
Structured-output learning is a challenging problem; particularly so because of the difficulty in obtaining large datasets of fully labelled instances for training. In this paper we try to overcome this difficulty by presenting a multi-utility learning framework for structured prediction that can learn from training instances with different forms of supervision. We propose a unified technique for inferring the loss functions most suitable for quantifying the consistency of solutions with the given weak annotation. We demonstrate the effectiveness of our framework on the challenging semantic image segmentation problem for which a wide variety of annotations can be used. For instance, the popular training datasets for semantic segmentation are composed of images with hard-to-generate full pixel labellings, as well as images with easy-to-obtain weak annotations, such as bounding boxes around objects, or image-level labels that specify which object categories are present in an image. Experimental evaluation shows that the use of annotation-specific loss functions dramatically improves segmentation accuracy compared to the baseline system where only one type of weak annotation is used.
We proposed a prototype of near-duplicate detection system for web-shop owners. It’s a typical situation for this online businesses to buy description of their goods from so-called copyrighters. Copyrighter can cheat from time to time and provide the owner with some almost identical descriptions for different items. In this paper we demonstrated how we can use FCA for fast clustering and revealing such duplicates in real online perfume shop’s datasets.