Distributed In Situ Processing of Big Raster Data in the Cloud
A raster is the primary data type in Earth science, geology, remote sensing and other fields with tremendous growth of data volumes. An array DBMS is an option to tackle big raster data processing. However, raster data are traditionally stored in files, not in databases. Command line tools have long being developed to process raster files. Most tools are feature-rich and free but optimized for a single machine. This paper proposes new techniques for distributed processing of raster data directly in diverse file formats by delegating considerable portions of work to such tools. An N-dimensional array data model is proposed to maintain independence from the files and the tools. Also, a new scheme named GROUP–APPLY–FINALLY is presented to universally express the majority of raster data processing operations and streamline their distributed execution. New approaches make it possible to provide a rich collection of raster operations at scale and outperform SciDB over
The 2018 Global Smart Industry Conference is organized in order to exchange experience, promote discussion and presentation of research papers, and summarize results in development of innovative models, methods and technologies for the digital industry in universities, scientific and industrial associations of the Russian Federation as well as in foreign companies, and the experience of their implementation in large transnational and domestic industrial companies.
It will be held in Chelyabinsk, Russian Federation, on November 13-15, 2018.
The aim of the conference is to determine the prospects for the development of Smart Industry technologies, integration of industrial companies, scientific organizations and authorities to create promising technologies for the digital transformation of the industry.
Conference topics:Condition monitoring and control for intelligent manufacturing Industrial robotics Components of and sensors Wireless sensor and actuator networks Digital Twins technologies Additive manufacturing technologies Big data, machine learning and artificial intelligence for Industry 4.0 management Human-machine interaction in industrial systems Security and privacy protection in industrial networks Virtual and augmented realities for Industry 4.0 Cloud and high-performance computing for smart factory Basic research for Industry 4.0 New educational technologies for Industry 4.0
We built and evaluated two types of models: sequence-based and structure-based for recognition of 3’-end stem- loops of human L1s and Alus and found most important parameters contributing to recognition: Shift, Tilt and Rise, and aslo hydrophilicity.
This article describes expediency of using a graphics processing unit (GPU) in big data processing in the context of digital images processing. It provides a short description of a parallel computing technology and its usage in different areas, definition of the image noise and a brief overview of some noise removal algorithms. It also describes some basic requirements that should be met by certain noise removal algorithm in the projection to computer tomography. It provides comparison of the performance with and without using GPU as well as with different percentage of using CPU and GPU.
The article outlines the definition of the concept of Big Data, presents its applicability for official statistics, and reviews problems and challenges associated with it. The paper introduces international experience in carrying out Big Data projects in statistics, as well as prospects of using this concept in the Russian statistics. The authors give consecutive account of interdependence between Big Data and official statistics, which perfectly coincides with fundamental principles of official statistics adopted at the 68th General Assembly of the United Nations on January 23, 2014. There is an analysis of monitoring results conducted by the Statistics Division and Economic Commission for Europe which resulted in gathered information on completed, on-going and potential Big Data projects (as well as organizational conditions for their execution) in selected countries. The authors comment on challenges and problems which have to be overcome in order to use Big Data in official statistics; they specify implementation directions for the concept of Big Data not only to substitute the existing statistical observation practice, but also to use it as an additional source of statistical information and a way to check validity of the obtained results
We trained Random Forest model to recognize patterns of nucleosome and non-B DNA structures, considered as potential nucleosome barriers in the mouse genome. We showed that among four types of structures – Z-DNA, H-DNA, G-Quadruplexes and SIDD regions – recognition of G-Quadruplexes and H-DNA showed the best performance.
This volume contains the refereed proceedings of the 6th International Conference on Analysis of Images, Social Networks, and Texts (AIST 2017)1. The previous conferences during 2012–2016 attracted a significant number of students, researchers, academics, and engineers working on interdisciplinary data analysis of images, texts, and social networks. The broad scope of AIST made it an event where researchers from different domains, such as image and text processing, exploiting various data analysis techniques, can meet and exchange ideas. We strongly believe that this may lead to cross fertilisation of ideas between researchers relying on modern data analysis machinery. Therefore, AIST brought together all kinds of applications of data mining and machine learning techniques. The conference allowed specialists from different fields to meet each other, present their work, and discuss both theoretical and practical aspects of their data analysis problems. Another important aim of the conference was to stimulate scientists and people from industry to benefit from the knowledge exchange and identify possible grounds for fruitful collaboration. The conference was held during July 27–29, 2017. The conference was organised in Moscow, the capital of Russia, on the campus of Moscow Polytechnic University. This year, the key topics of AIST were grouped into six tracks: 1. General topics of data analysis chaired by Sergei Kuznetsov (Higher School of Economics, Russia) and Amedeo Napoli (LORIA, France) 2. Natural language processing chaired by Natalia Loukachevitch (Lomonosov Moscow State University, Russia) and Alexander Panchenko (University of Hamburg, Germany) 3. Social network analysis chaired by Stanley Wasserman (Indiana University, USA) 4. Analysis of images and video chaired by Victor Lempitsky (Skolkovo Institute of Science and Technology, Russia) and Andrey Savchenko (Higher School of Economics, Russia) 5. Optimisation problems on graphs and network structures chaired by Panos Pardalos (University of Florida, USA) and Michael Khachay (IMM UB RAS and Ural Federal University, Russia) 6. Analysis of dynamic behaviour through event data chaired by Wil van der Aalst (Eindhoven University of Technology, The Netherlands) and Irina Lomazova (Higher School of Economics, Russia) One of the novelties this year was the introduction of a new specialised track on process mining (Track 6).