This article is an overview of scalable infrastructure for storage and processing of genome data in genetics problems. The overview covers used technologies descriptions, the organization of unified access to genome processing API of different underlying services. The article also covers methods for scalable and cloud computing technologies support. The first service in virtual genome processing laboratory is provided and presented. The service solves transcription factors bindning sites prediction problem. The main principles of service construction are provided. Basic requirements for underlying comptutaion software in virtual laboratory environments are provided. Overview describes the implemented web-service (https://api.ispras.ru/demo/gen) for transcription factors binding site prediction. Provided solution is based on ISPRAS API project as an API gateway and load-balancer; the middle-ware task-manager software for pool of workers support and for communications with Openstack infrastructure; OpenZFS as an intermediate storage with transparent compression support. The described solution is easy to extend with new services fitting the basic requirements.
Many modern applications (such as large-scale Web-sites, social networks, research projects, business analytics, etc.) have to deal with very large data volumes (also referred to as “big data”) and high read/write loads. These applications require underlying data management systems to scale well in order to accommodate data growth and increasing workloads. High throughput, low latencies and data availability are also very important, as well as data consistency guarantees. Traditional SQL-oriented DBMSs, despite their popularity, ACID transactions and rich features, do not scale well and thus are not suitable in certain cases. A number of new data management systems and approaches have emerged over the last decade intended to resolve scalability issues. This paper reviews several classes of such systems and key problems they are able to solve. A large variety of systems and approaches due to the general trend toward specialization in the field of SMS: every data management system has been adapted to solve a certain class of problems. Thus, the selection of specific solutions due to the specific problem to be solved: the expected load, the intensity ratio of read and write, the form of data storage and query types, the desired level of consistency, reliability requirements, the availability of client libraries for the selected language, etc.
Actual task is protecting programs from reverse engineering. The best choice to implement a resistant obfuscation is to create obfuscating compiler based on one of the existing compiler infrastructures. On the one hand, it will produce obfuscated program, with full information about it at all stages of compilation, and the other allows you to focus on the development of protection, rather than on creating the infrastructure required. In addition, this approach provides support for multiple architectures, as well as introduces watermarks for binary images of the program for each user depending from a unique key. The paper describes the methods for obfuscating C/C++ programs to prevent applying static analyzers to them. Paper observes existing obfuscating compilers. The proposed transformations are based on well-known obfuscation algorithms (including constant string protection, fake cycle insertion, control flow graph flattening, functions merge, function call encapsulation, control flow graph structure obfuscation, opaque predicate insertion and other) and they are specifically improved to resist better to static analysis deobfuscation techniques. The methods are implemented within the LLVM (low level virtual machine) compiler infrastructure. Experimental results presenting resulting program slowdown and used memory growth are given.
In this paper the problem of creating virtual clusters in clouds for big data analysis with Apache Hadoop and Apache Spark is discussed. Existing methods for Apache Spark clusters creation are described in this work. Also the implemented solution for building Apache Spark clusters and Apache Spark jobs execution in Openstack environment is described. The implemented solution is a modification for OpenStack Sahara project and it was featured in Openstack Liberty release.
Apache Spark is a framework providing fast computations on Big Data using MapReduce model. With cloud environments Big Data processing becomes more flexible since they allow to create virtual clusters on-demand. One of the most powerful open-source cloud environments is Openstack. The main goal of this project is to provide an ability to create virtual clusters with Apache Spark and other Big Data tools in Openstack. There exist three approaches to do it. The first one is to use Openstack REST APIs to create instances and then deploy the environment. This approach is used by Apache Spark core team to create clusters in propriatary Amazon EC2 cloud. Almost the same method has been implemented for Openstack environments. Although since Openstack API changes frequently this solution is deprecated since Kilo release. The second approach is to integrate virtual clusters creation as a built-in service for Openstack. ISP RAS has provided several patches implementing universal Spark Job engine for Openstack Sahara and Openstack Swift integration with Apache Spark as a drop-in replacement for Apache Hadoop. This approach allows to use Spark clusters as a service in PaaS service model. Since Openstack releases are less frequent than Apache Spark this approach may be not convenient for developers using the latest releases. The third solution implemented uses Ansible for orchestration purposes. We implement the solution in loosely coupled way and provide an ability to add any auxiliary tool or even to use another cloud environment. Also, it provides an ability to choose any Apache Spark and Apache Hadoop versions to deploy in virtual clusters. All the listed approaches are available under Apache 2.0 license.
Ensuring the correctness of microprocessors and other microelectronic equipment is a fundamental problem. To deal with it, various tools for functional verification are used. Unlike bugs in software programs which are relatively easy to fix (it does not apply to their consequences), defects in integrated circuits (both design and manufacturing ones) cannot be removed. In spite of continuous development of computer-aided design (CAD) systems, test generation tools and approaches to analysis of circuits, verification remains the bottleneck of the microprocessor design cycle (it accounts for approximately 70 percent of total design resources). The article gives a brief overview of microprocessor verification tools, describes issues that commonly occur in industrial practice and analyzes possible ways to solve them. The main part of the article is dedicated to research in the field of unit- and system-level hardware verification conducted at ISPRAS. It describes such approaches as contract specification of pipeline, event-driven hardware specification, parallel/distributed testing, combinatorial test program generation and template-based test program generation. The article also summarizes the outcomes of accomplished projects, describes the present works and formulates the directions of further research.
High complexity of present-day programs makes it nigh impossible to write a program without a defect. Thus it is increasingly necessary to use tools for defects detection. This article presents Svace, a tool for static program analysis developed in ISP RAS. This instrument allows to automatically find defects and potential vulnerabilities in programs written in C and C++ languages. Main features of the tool are simplicity of usage, deep interprocedural analysis, wide variety of supported warning types, scalability up to programs of millions lines of code and acceptable quality of analysis (30-80% of true positive warnings). In the core of the Svace tool lies an engine for interprocedural data-flow analysis based on function annotations. Each function is analyzed once and independently of the other functions which allows to achieve almost linear scalability (Linux kernel can be analyzed within 10 minutes on a relatively powerful machine and analysis of the whole Android source code takes less than 3 hours). Intraprocedural analysis is performed on source code internal representation derived from LLVM bitcode. It operates with value identifiers that are shared between memory locations with same values (similarly to generations in SSA representation). Special attributes of these value identifiers are calculated over the control-flow graph of the function. When specific combination of attributes is observed a defect warning is issued. Svace analysis engine is accompanied by Clang compiler-based lightweight analysis tool for checking of language-dependent rules which allows to quickly check a number of syntactic, semantic and situational rules. Analysis results can be presented to the user with the help of Eclipse IDE plugin. They can also be imported into analysis results database to trace history of program defects over time.
Nested Petri nets (NP-nets) have proved to be one of the convenient formalisms for distributed multi-agent systems modeling and analysis. It allows representing multi-agent systems structure in a natural way, since tokens in the system net are Petri nets themselves, and have their own behavior. Multi-agent systems are highly concurrent. Verification of such systems with model checking method causes serious difficulties arising from the huge growth of the number of system intermediate states (state-space explosion problem). To solve this problem an approach based on unfolding system behavior was proposed in the literature. Earlier in  the applicability of unfolding for nested Petri nets verification was studied, and the method for constructing unfolding for safe conservative nested Petri nets was proposed. In this work we propose another method for constructing safe conservative nested Petri nets unfoldings, which is based on translation of such nets into classical Petri nets and applying standard method for unfolding construction to them. We discuss also the comparative merits of the two approaches.
Graph partitioning is required for solving tasks on graphs that need to be split across disks or computers. This problem is well studied, but most results are not suitable for processing graphs with billons of nodes on commodity clusters, since they require shared memory or low-latency messaging. One approach suitable for cluster computing is Balanced Label Propagation, based on distributed label propagation algorithm for community detection. In this work we show how multi-level optimization can be used to improve partitioning quality of Balanced Label Propagation. One of major difficulties with distributed multi-level optimization is finding a matching in the graph. The matching is needed to choose pairs of vertices for collapsing in order to produce a smaller graph. As this work shows, simply splitting graph into several parts and finding matching in these parts independently is enough to improve the quality of partitioning generated by Balanced Label Propagation. Proposed algorithm can be implemented within any framework that supports MapReduce. In our experiments, when graphs were partitioned into 32 parts, ratio of edges that don’t cross partitions increased from 54-60% to 66-70%. One of significant problems of our implementation is performance – work time of multi-level algorithm was approximately twice that of the original algorithm. It seems likely that implementation can be improved so that multi-level algorithm would achieve better computational performance as well as partitioning quality.
In October 2013, the eighth meeting of researchers in the field of databases was held. The first such meeting took place in February 1988, so that 25 years passed between them. After each meeting, a report was published containing an overview of the current state of the field and a research program for the nearest future, a kind of set of forecasts for the development of research activities. This paper looks at the most interesting forecasts from the reports of the research meetings, discusses how they proved to be valid, to what extent they were true or not. Among the various problems of database technology under consideration are the following: the role of specialized hardware in building effective DBMS; SQL and database applications; perspectives of object-relational extensions; distributed heterogeneous database systems; databases and Web; databases and data warehouses, OLAP and data mining; component organization of DBMS; query optimization criteria; self-tuning and self-management of DBMS; DBMS architecture and new hardware capabilities: SSD, non-volatile memory, massively multithreaded processors; specialized DBMS; data fusion and data spaces; the Big Data problem and the response to it in the database community; architectural shifts in computing.
Requirements are an integral part of any software and hardware development process. The area where requirements become significantly important is the development of safety-critical systems which usage may cause risks on human lives. So the process of their development is often maintained by certification centers that requires from developers to meet the best practices supporting the safety of end product. This article reveals one possible approach to requirements management that was based on experience of embedded hardware development for civil avionics. This approach is now spread over different areas. Authors list the set of common tasks related to given approach. They also define the set of software features used to reduce the complexity of development and to mitigate risks. Authors review set of existing solutions in requirements management area using the listed features. In this article it is also defined on how given features can be applied within the given approach.