Scalable and accurate detection of code clones
A detailed description of a method for detection of code clones is described. This method is based on the semantic analysis of programs and on new algorithms that make it scalable without affecting its accuracy. The proposed method involves two phases. In the first phase, the program dependence graph (PDG) is constructed while the program is compiled. LLVM is used as the compilation infrastructure. In the second phase, similar subgraphs of maximum size that represent code clones are detected. Before starting the search for similar subgraphs, the PDG is divided into subgraphs that will be considered as potential clones of each other. To ensure scalability of the search for similar subgraphs, the composition of algorithms is used. The first algorithm checks that a pair of graphs cannot have similar subgraphs of the desired size; this is done in a linear amount of time. If this algorithm fails, another (approximate) algorithm is executed to find similar subgraphs of maximum size. After similar subgraphs have been found, the program code is additionally checked for the position of the code lines corresponding to the detected clone candidates. Tests showed that the developed tool is more accurate than similar tools, such as MOSS, CCFinder, and CloneDR. Results obtained for the projects Linux-2.6, Firefox Mozilla, LLVM/Clang, and OpenSSL are presented
This paper describes our approach to document search based on the ontological resources and graph models. The approach is applicable in local networks and local computers. It can be useful for ontology engineering specialists or search specialists.
The article describes the original software tools for an experimental estimation of computational complexity of software solutions for problems on graph models of systems. The classes of the solved problems and the tools for analysis of results are listed. The method based on selection of graph models by their structural complexity is introduced.
One of the key advances in genome assembly that has led to a significant improvement in contig lengths has been improved algorithms for utilization of paired reads (mate-pairs). While in most assemblers, mate-pair information is used in a post-processing step, the recently proposed Paired de Bruijn Graph (PDBG) approach incorporates the mate-pair information directly in the assembly graph structure. However, the PDBG approach faces difficulties when the variation in the insert sizes is high. To address this problem, we first transform mate-pairs into edge-pair histograms that allow one to better estimate the distance between edges in the assembly graph that represent regions linked by multiple mate-pairs. Further, we combine the ideas of mate-pair transformation and PDBGs to construct new data structures for genome assembly: pathsets and pathset graphs.
The article discusses the strategy of «mixing» methods, particularly prevalent in the Western research tradition. Covers the methods of text analysis, demonstrated the difference between formal or approach on the example of the study of the image of modern Russia in the texts of the American edition of «New York Times», where attention is paid to algorithms work with texts. It is shown that for the study of such phenomena as the image of the country, the combination of formal or approaches to the analysis of the text is a necessary and natural research phenomenon.
The book contains the necessary information from the algorithm theory, graph theory, combinatorics. It is considered partially recursive functions, Turing machines, some versions of the algorithms (associative calculus, the system of substitutions, grammars, Post's productions, Marcov's normal algorithms, operator algorithms). The main types of graphs are described (multigraphs, pseudographs, Eulerian graphs, Hamiltonian graphs, trees, bipartite graphs, matchings, Petri nets, planar graphs, transport nets). Some algorithms often used in practice on graphs are given. It is considered classical combinatorial configurations and their generating functions, recurrent sequences. It is put in a basis of the book long-term experience of teaching by authors the discipline «Discrete mathematics» at the business informatics faculty, at the computer science faculty of National Research University Higher School of Economics, and at the automatics and computer technique faculty of National research university Moscow power engineering institute. The book is intended for the students of a bachelor degree, trained at the computer science faculties in the directions 09.03.01 Informatics and computational technique, 09.03.02 Informational systems and technologies, 09.03.03 Applied informatics, 09.03.04 Software Engineering, and also for IT experts and developers of software products.
In this paper we present some preliminary results for text corpus visualization by means of so-called reference graphs. The nodes of this graph stand for key words or phrases extracted from the texts and the edges represent the reference relation. The node A refers to the node B if the corresponding key word / phrase B is more likely to co-occur with key word / phrase A than to occur on its own. Since reference graphs are directed graphs, we are able to use graphtheoretic algorithms for further analysis of the text corpus. The visualization technique is tested on our own Web-based corpus of Russian-language newspapers.
Graph Structures for Knowledge Representation and Reasoning 2014. Workshop on IJCAI-2014.
This volume presents new results in the study and optimization of information transmission models in telecommunication networks using different approaches, mainly based on theiries of queueing systems and queueing networks .
The paper provides a number of proposed draft operational guidelines for technology measurement and includes a number of tentative technology definitions to be used for statistical purposes, principles for identification and classification of potentially growing technology areas, suggestions on the survey strategies and indicators. These are the key components of an internationally harmonized framework for collecting and interpreting technology data that would need to be further developed through a broader consultation process. A summary of definitions of technology already available in OECD manuals and the stocktaking results are provided in the Annex section.