Big data challenged traditional storage and analysis systems in several new ways. In this paper we try to figure out how to overcome this challenges, why it's not possible to make it efficiently and describe three modern approaches to big data handling: NoSQL, MapReduce and real-time stream processing. The first section of the paper is the introduction. The second section discuss main issues of Big Data: volume, diversity, velocity, and value. The third section describes different approaches to solving the problem of Big Data. Traditionally one might use a relational DBMS. The paper propose some steps that allow to continue RDBMS using when it’s capacity becomes not enough. Another way is to use a NoSQL approach. The basic ideas of the NoSQL approach are: simplification, high throughput, and unlimited scaling out. Different kinds of NoSQL stores allow to use such systems in different applications of Big Data. MapReduce and it’s free implementation Hadoop may be used to provide scaling out Big Data analytics. Finally, several data management products support real time stream processing under Big Data. The paper briefly overviews these products. The final section of the paper is the conclusion.
Many experts in the field of data management believe that the emergence of non-volatile byte-addressable main memory (NVM) available for practical use will lead to the development of a new type of ultra-high-speed database management systems (DBMS) with single-level data storage (native in-NVM DBMS). However, the number of researchers who are actively engaged in research of architectures of native in-NVM DBMS has not increased in recent years. The most active researchers are PhD students that are not afraid of the risks, which, of course, exist in this new area. The second section of the article discusses the state of the art in NVM hardware. The analysis shows that NVM in the DIMM form factor has already become a reality, and that in the near future we can expect the appearance on the market NVM-DIMMs with the speed of conventional DRAM and endurance close to that of hard drives. The third section is devoted to the review of related works, among which the works of young researchers are the most advanced. In the fourth section, we state and justify that the work performed so far in the field of in-NVM DBMS, did not lead to the emergence of a native architecture. This is hampered by the set of limiting factors analyzed by us. In this regard, in the fifth section, we present a sketch of the native architecture of an in-NVM DBMS, the choice of which is influenced only by the goals of simplicity and efficiency. In conclusion, we summarize the article and argues the need for additional research into many aspects of the native architecture of an in-NVM DBMS.
ARM is a family of microprocessor instruction set architectures developed in a company with the same name. The newest architecture of this family, ARMv8, contains a large number of instructions of various types and is notable for its complex organization of virtual memory, which includes hardware support for multilevel address translation and virtualization. All of this makes functional verification of microprocessors with this architecture an extremely difficult technical task. An integral part of microprocessor verification is generation of test programs, i.e. programs in the assembly language, which cause various situations (exceptions, pipeline stalls, branch mispredictions, data evictions in caches, etc.). The article describes the requirements for industrial test program generators and presents a generator for microprocessors with the ARMv8 architecture, which has been developed with the help of MicroTESK (Microprocessor TEsting and Specification Kit). The generator supports an instruction subset typical for mobile applications (about 400 instructions) and consists of two main parts: (1) an architecture-independent core and (2) formal specifications of ARMv8 or, more precisely, a model automatically constructed on the basis of the formal specifications. With such a structure, the process of test program generator development consists mainly in creation of formal specifications, which saves efforts by reusing architecture-independent components. An architecture is described using the nML and mmuSL languages. The first one allows describing the microprocessor registers and syntax and semantics of the instructions. The second one is used to specify the memory subsystem organization (address spaces, various buffers and tables, address translation algorithms, etc.) The article describes characteristics of the developed generator and gives a comparison with the existing analogs.
Hardware testing is a process aimed at detecting manufacturing faults in integrated circuits. To measure test quality, two main metrics are in use: fault detection abilities (fault coverage) and test application time (test length). Many algorithms have been suggested for test generation; however, no scalable solution exists. In this paper, we analyze applicability of functional tests generated from high-level models for low-level manufacturing testing. A particular test generation method is considered. The input information is an HDL description. The key steps of the method are system model construction and coverage model construction. Both models are automatically extracted from the given description. The system model is a representation of the design in the form of high-level decision diagrams. The coverage model is a set of LTL formulae defining reachability conditions for the transitions of the extended finite state machine. The models are translated into the input format of a model checker. For each coverage model formula the model checker generates a counterexample, i.e. an execution that violates the formula (makes the corresponding transition to fire). The approach is intended for covering of all possible execution paths of the input HDL description and detecting dead code. Experimental comparison with the existing analogues has shown that it produces shorter tests, but they achieve lower stuck-at fault coverage comparing with the dedicated approach. An improvement has been proposed to overcome the issue.
The article proposes methods for supporting development of efficient programs for modern parallel architectures, including hybrid systems. First, specialized profiling methods designed for programmers tasked with parallelizing existing code are proposed. The first method is loop-based profiling via source-level instrumentation done with Coccinelle tool. The second method is memory reuse distance estimation via virtual memory protection mechanism and manual instrumentation. The third method is cache miss and false sharing estimation by collecting a partial trace of memory accesses using compiler instrumentation and estimating cache behavior in postprocessing based on the trace and a cache model. Second, the problem of automatic parallel code generation for hybrid architectures is discussed. Our approach is to generate OpenCL code from parallel loop nests based on GRAPHITE infrastructure in the GCC compiler. Finally, in cases where achieving high efficiency on hybrid systems requires significant rework of data structures or algorithms, one can employ auto-tuning to specialize for specific input data and hardware at run time. This is demonstrated on the problem of optimizing sparse matrix-vector multiplication for GPUs and its use for accelerating linear system solving in OpenFOAM CFD package. We propose a variant of “sliced ELLPACK” sparse matrix storage format with special treatment for small horizontal or diagonal blocks, where the exact parameters of matrix structure and GPU kernel launch should be automatically tuned at runtime for the specific matrix and GPU hardware.
It is intuitively clear that the search for scientific publications often has many characteristics of a research search. The purpose of this paper is to formalize this intuitive understanding, explore which research tasks of scientists can be attributed to research search, what approaches exist to solve a research search problem in general, and how they are implemented in specialized search engines for scientists. We researched existing works regarding information seeking behavior of scientists and the special variant of a search called exploratory search. There are several types of search typical for scientists, and we showed that most of them are exploratory. Exploratory search is different to information retrieval and demands special support from search systems. We analyzed seventeen actual search systems for academicians (from Google Scholar, Scopus and Web of Science to ResearchGate) from the exploratory search support aspect. We found that most of them didn’t go far from simple information retrieval and there is a room for further improvements especially in the collaborative search support.