MLDev: Data Science Experiment Automation and Reproducibility Software
A phenomenon of citizen science, its features and prospects are the topic of high actuality nowadays. And it seems to be natural, that citizen science and crowdsourcing techniques penetrate to such popular area as data science. This paper considers the questions about teaching data science and the areas, which borrow the techniques from data science. The review of learning outcomes, which may be gained from projects of citizen science, allows to propose educational data expeditions to be adopted into educational courses. Moreover, the paper represents the principles of citizen science as a mean of making a fully open educational project and to validate it as a learning tool.
In this paper, we present our current research regarding information interaction strategies of students of minor specialization in Data Science. We employed an online platform, consisted of a third-party and our software, to provide students with means of learning and analyse their learning activity. We developed several indicators to estimate their activity: coding activity, friends network size, and Q&A activity. We show that high-achieving and low-achieving students use resources in different ways, with substantial inequality in resource access/use. Based on the research, we propose two features that supposedly would provoke students to participate in a Q&A activity decreasing inequality in the use of these resources.
Following the great success of DSAA’2014 in Shanghai, the 2015 IEEE International Conference on Data Science and Advanced Analytics (IEEE DSAA’2015), to be held on 19-21 October 2015 in Paris, has seen the significant growth of the number of submissions, participates, sponsors and key stakeholders. Without any doubt, DSAA has been recognized to be the first and most influential event in the data science and analytics focused community. Data driven scientific discovery and innovation and practical development, applications and economy have been increasingly recognized as the major trend of future IT and business. Data science, big data and advanced analytics play the most important role in driving data innovation and economy. DSAA thus carries a critical role in substantially promoting and strengthening the above trends and results.
A model for organizing cargo transportation between two node stations connected by a railway line which contains a certain number of intermediate stations is considered. The movement of cargo is in one direction. Such a situation may occur, for example, if one of the node stations is located in a region which produce raw material for manufacturing industry located in another region, and there is another node station. The organization of freight traﬃc is performed by means of a number of technologies. These technologies determine the rules for taking on cargo at the initial node station, the rules of interaction between neighboring stations, as well as the rule of distribution of cargo to the ﬁnal node stations. The process of cargo transportation is followed by the set rule of control. For such a model, one must determine possible modes of cargo transportation and describe their properties. This model is described by a ﬁnite-dimensional system of diﬀerential equations with nonlocal linear restrictions. The class of the solution satisfying nonlocal linear restrictions is extremely narrow. It results in the need for the “correct” extension of solutions of a system of diﬀerential equations to a class of quasi-solutions having the distinctive feature of gaps in a countable number of points. It was possible numerically using the Runge–Kutta method of the fourth order to build these quasi-solutions and determine their rate of growth. Let us note that in the technical plan the main complexity consisted in obtaining quasi-solutions satisfying the nonlocal linear restrictions. Furthermore, we investigated the dependence of quasi-solutions and, in particular, sizes of gaps (jumps) of solutions on a number of parameters of the model characterizing a rule of control, technologies for transportation of cargo and intensity of giving of cargo on a node station.
Event logs collected by modern information and technical systems usually contain enough data for automated process models discovery. A variety of algorithms was developed for process models discovery, conformance checking, log to model alignment, comparison of process models, etc., nevertheless a quick analysis of ad-hoc selected parts of a journal still have not get a full-fledged implementation. This paper describes an ROLAP-based method of multidimensional event logs storage for process mining. The result of the analysis of the journal is visualized as directed graph representing the union of all possible event sequences, ranked by their occurrence probability. Our implementation allows the analyst to discover process models for sublogs defined by ad-hoc selection of criteria and value of occurrence probability
Existing approaches suggest that IT strategy should be a reflection of business strategy. However, actually organisations do not often follow business strategy even if it is formally declared. In these conditions, IT strategy can be viewed not as a plan, but as an organisational shared view on the role of information systems. This approach generally reflects only a top-down perspective of IT strategy. So, it can be supplemented by a strategic behaviour pattern (i.e., more or less standard response to a changes that is formed as result of previous experience) to implement bottom-up approach. Two components that can help to establish effective reaction regarding new initiatives in IT are proposed here: model of IT-related decision making, and efficiency measurement metric to estimate maturity of business processes and appropriate IT. Usage of proposed tools is demonstrated in practical cases.