SciPy 1.0--Fundamental Algorithms for Scientific Computing in Python
SciPy is an open source scientific computing library for the Python programming language. SciPy 1.0 was released in late 2017, about 16 years after the original version 0.1 release. SciPy has become a de facto standard for leveraging scientific algorithms in the Python programming language, with more than 600 unique code contributors, thousands of dependent packages, over 100,000 dependent repositories, and millions of downloads per year. This includes usage of SciPy in almost half of all machine learning projects on GitHub, and usage by high profile projects including LIGO gravitational wave analysis and creation of the first-ever image of a black hole (M87). The library includes functionality spanning clustering, Fourier transforms, integration, interpolation, file I/O, linear algebra, image processing, orthogonal distance regression, minimization algorithms, signal processing, sparse matrix handling, computational geometry, and statistics. In this work, we provide an overview of the capabilities and development practices of the SciPy library and highlight some recent technical developments
© Published under licence by IOP Publishing Ltd. Modern science clearly demands for a higher level of reproducibility and collaboration. To make research fully reproducible one has to take care of several aspects: research protocol description, data access, environment preservation, workflow pipeline, and analysis script preservation. Version control systems like git help with the workflow and analysis scripts part. Virtualization techniques like Docker or Vagrant can help deal with environments. Jupyter notebooks are a powerful platform for conducting research in a collaborative manner. We present project Everware that seamlessly integrates git repository management systems such as Github or Gitlab, Docker and Jupyter helping with a) sharing results of real research and b) boosts education activities. With the help of Everware one can not only share the final artifacts of research but all the depth of the research process. This been shown to be extremely helpful during organization of several data analysis hackathons and machine learning schools. Using Everware participants could start from an existing solution instead of starting from scratch. They could start contributing immediately. Everware allows its users to make use of their own computational resources to run the workflows they are interested in, which leads to higher scalability of the toolkit.
Data analysis in fundamental sciences nowadays is an essential process that pushes frontiers of our knowledge and leads to new discoveries. At the same time we can see that complexity of those analyses increases fast due to a) enormous volumes of datasets being analyzed, b) variety of techniques and algorithms one have to check inside a single analysis, c) distributed nature of research teams that requires special communication media for knowledge and information exchange between individual researchers. There is a lot of resemblance between techniques and problems arising in the areas of industrial information retrieval and particle physics. To address those problems we propose Reproducible Experiment Platform (REP), a software infrastructure to support collaborative ecosystem for computational science. It is a Python based solution for research teams that allows running computational experiments on shared datasets, obtaining repeatable results, and consistent comparisons of the obtained results. We present some key features of REP based on case studies which include trigger optimization and physics analysis studies at the LHCb experiment.
In the contemporary economy work is increasingly becoming freelance based and is also moving online. Open source software communities (OSS) are rapidly becoming arenas in which individuals identify, co-create, and realize opportunities through shared resources and expertise. Operating in a communal setting, these individuals, who we label open entrepreneurs, work and collaborate with members of their own open source community. We aim to investigate how open entrepreneurs are connected to other members of the communities in which they are involved and how their networks affect entrepreneurial processes. We are particularly interested in understanding how networked work benefits open entrepreneurs and how they work and collaborate with other community members. Our results suggest that open entrepreneurs through different types of networked work, not only can fulfill their profit motive in the short term but also in the long term as these networking activities facilitate the overall functioning of the community.
A model for organizing cargo transportation between two node stations connected by a railway line which contains a certain number of intermediate stations is considered. The movement of cargo is in one direction. Such a situation may occur, for example, if one of the node stations is located in a region which produce raw material for manufacturing industry located in another region, and there is another node station. The organization of freight traﬃc is performed by means of a number of technologies. These technologies determine the rules for taking on cargo at the initial node station, the rules of interaction between neighboring stations, as well as the rule of distribution of cargo to the ﬁnal node stations. The process of cargo transportation is followed by the set rule of control. For such a model, one must determine possible modes of cargo transportation and describe their properties. This model is described by a ﬁnite-dimensional system of diﬀerential equations with nonlocal linear restrictions. The class of the solution satisfying nonlocal linear restrictions is extremely narrow. It results in the need for the “correct” extension of solutions of a system of diﬀerential equations to a class of quasi-solutions having the distinctive feature of gaps in a countable number of points. It was possible numerically using the Runge–Kutta method of the fourth order to build these quasi-solutions and determine their rate of growth. Let us note that in the technical plan the main complexity consisted in obtaining quasi-solutions satisfying the nonlocal linear restrictions. Furthermore, we investigated the dependence of quasi-solutions and, in particular, sizes of gaps (jumps) of solutions on a number of parameters of the model characterizing a rule of control, technologies for transportation of cargo and intensity of giving of cargo on a node station.
Event logs collected by modern information and technical systems usually contain enough data for automated process models discovery. A variety of algorithms was developed for process models discovery, conformance checking, log to model alignment, comparison of process models, etc., nevertheless a quick analysis of ad-hoc selected parts of a journal still have not get a full-fledged implementation. This paper describes an ROLAP-based method of multidimensional event logs storage for process mining. The result of the analysis of the journal is visualized as directed graph representing the union of all possible event sequences, ranked by their occurrence probability. Our implementation allows the analyst to discover process models for sublogs defined by ad-hoc selection of criteria and value of occurrence probability
The geographic information system (GIS) is based on the first and only Russian Imperial Census of 1897 and the First All-Union Census of the Soviet Union of 1926. The GIS features vector data (shapefiles) of allprovinces of the two states. For the 1897 census, there is information about linguistic, religious, and social estate groups. The part based on the 1926 census features nationality. Both shapefiles include information on gender, rural and urban population. The GIS allows for producing any necessary maps for individual studies of the period which require the administrative boundaries and demographic information.
Existing approaches suggest that IT strategy should be a reflection of business strategy. However, actually organisations do not often follow business strategy even if it is formally declared. In these conditions, IT strategy can be viewed not as a plan, but as an organisational shared view on the role of information systems. This approach generally reflects only a top-down perspective of IT strategy. So, it can be supplemented by a strategic behaviour pattern (i.e., more or less standard response to a changes that is formed as result of previous experience) to implement bottom-up approach. Two components that can help to establish effective reaction regarding new initiatives in IT are proposed here: model of IT-related decision making, and efficiency measurement metric to estimate maturity of business processes and appropriate IT. Usage of proposed tools is demonstrated in practical cases.