Application of Rényi and Tsallis entropies to topic modeling optimization
This study proposes to minimize Rényi and Tsallis entropies for finding the optimal number of topics T in topic modeling (TM). A promising tool to obtain knowledge about large text collections, TM is a method whose properties are underresearched; in particular, parameter optimization in such models has been hindered by the use of monotonous quality functions with no clear thresholds. In this research, topic models obtained from large text collections are viewed as nonequilibrium complex systems where the number of topics is regarded as an equivalent of temperature. This allows calculating free energy of such systems—a value through which both Rényi and Tsallis entropies are easily expressed. Numerical experiments with four TM algorithms and two text collections show that both entropies as functions of the number of topics yield clear minima in the middle area of the range of T. On the marked-up dataset the minima of three algorithms correspond to the value of T detected by humans. It is concluded that Tsallis and especially Rényi entropy can be used for T optimization instead of Shannon entropy that decreases even when T becomes obviously excessive. Additionally, some algorithms are found to be better suited for revealing local entropy minima. Finally, we test whether the overall content of all topics taken together is resistant to the change of T and find out that this dependence has a quasi-periodic structure which demands further research.
Main concepts and models of the modern theory of self-organization of complex systems, called also synergetics, are generalized and formulated in the book as principles of a synergetic world view. They are under discussion in the context of philosophical studies of holism, teleology, evolutionism as well as of gestalt-psychology; they are compared with some images from the history of human culture. The original and unfamiliar (to the Western readers) research results of the Moscow synergetic school which has its center at the Keldysh Institute of Applied Mathematics of the Russian Academy of Sciences are expounded in the book. Complicated and paradoxical concepts of synergetics (structure-attractors, bifurcations, blow-up regimes, non-stationary dissipative structures of self-organization, fractals, non-linearity) are translated into an intelligible language and vividly illustrated by materials and examples from various fields of knowledge, starting with the laser thermonuclear fusion and concluding with mysterious phenomena of human psychology and creativity. The style of writing is close to that of popular-science literature. That's why the book might be of interest and is quite comprehensible for students and specialists in the humanities. It is shown that the development of synergetics entails deep changes in the conceptual net through which we comprehend the world. It means a radical shift of paradigm, a conceptual transition from being to becoming, from stability to sustainability, from images of order to chaos generating new ordered evolving structures, from self-maintaining systems to fast evolution through a nonlinear positive feedback, from evolution to co-evolution, reciprocal evolution of different complex systems. The new synergetic way of thinking is evolutionary, nonlinear and holistic. This is a modern stage of development within the traditions of cybernetics and system-structural analysis. However, many elements of the latter have undergone important changes since their appearance.
Some methodological foundations for elaboration of the modern strategies of ecological thinking based on the theoretical biology and on the theory of complex adaptive systems статье are under review in the article. Ecology, being a science of interaction of living organisms and their communities with environment, goes far beyond its primary frames of the biological knowledge and becomes a nodal discipline from which vectors of perspective interdisciplinary synthesis of knowledge diverge. The ecological approach turns to be fruitful in social and humanitarian researches. Ecology of action, ecology of mind, ecology of life, of cognition and of creativity, ecology of thoughts and words, ecology of ideas, ecology of communication and ecology of management – all these conceptual attitudes give evidence of audacious integration of the ecological thinking in wide spheres of the humanitarian and social knowledge, where it gives opportunities for some fresh approaches. The concept of Umwelt coined by Jakob von Uexküll in 1909 and his study of Umwelt (Umweltslehre) are of great significance for the development of the modern ecological universalism and for elaboration of strategic imperatives of the ecological thinking. The concept of Umwelt as a specific environment to which any biological species or its separate individual is adapted and which is constructed by it allows us to elaborate a real interdisciplinary platform for development of the theory of ecology, for holding a reasonable position in discussions about sustainable development and sustainable futures as well as about the role of education for sustainable development of the world.
Some texts written by me together with corresponding member of the Russian Academy of science Sergei P. Kurdyumov (1928-2004) and under his direct ideological influence are collected in the book. These texts are elaborated, systematized, brought together in the book and supplemented with new materials. Sergei P. Kurdyumov were possessed of a deep metaphysical flair and put forward ideas, the matter of which are not fully clear up to now. These are, first of all, the idea of co-evolution and the notion of complex structures developing at different tempos as co-existing tempo-worlds. Owing to developments in the field of nonlinear dynamics and of synergetics, the classical problem of time and the problems of evolutionary holism disclose some new and non-traditional aspects. The matter of new notions of nonlinearity of the course of time in the processes of evolution and coevolution and of nonlinear links between different modi of time – between the past, the present and the future - come to the light in the book. Analyses of four interconnected aspects of the course of processes in open and nonlinear dissipative systems – of evolutionarity, temporality, emergent nature and holism – are carried out. A whole series of paradoxical notions, such as “the influence of the future upon the present”, “the possibility of touch of a remote future in praesenti”, irreversibility and elements of reversibility of the course of time appear in synergetics, non-traditional and nonlinear notion of time being at the heart of all of them. It is shown that the best pictorial view of the nonlinear time is apparently the tree of evolution or the tree of time that represent one of archetypes in the human psyche. This image is widely used in myths and religious doctrines of the world nations (the tree of evolution of languages from some united parent language or the tree of evolution of biological species), the image is often drawn by children, appears in consciousness of a man in his sleep, etc. The synergetics methodology under development is applied to study of cognitive systems. The emergent structures of evolution and of self-organization of the individual consciousness, their spatiotemporal peculiarities, and the complexity of the human Self are considered in detail. The radical changes in the understanding of the problems of time that occur due to synergetics are compared with images of time and with the notions of connection between the past, the present and the future in the history of philosophy and of culture. The obtained methodological inferences are of great importance for a reform of systems of education, for forecasting (for building of scenarios of future development), for effective management activity in the modern globalizing world, for elaboration of methods of stimulation of the creative thinking, for the growth of personality and its adequate building into the social media.
The global evolution of the world community as an integrated self-organizing and self-developing system is studied in the article; and some main features and laws of its evolution are exposed. The attention is focused on the consideration of cyclic character of evolution, of periods of the global history of mankind, of the growth of complexity and the birth of technological, social and cultural innovation as a result of passing through crises. This is an interdisciplinary research which relies upon the results of mathematical modeling. The models elaborated by Ivan M. Diakonov, Sergey P. Kapitsa, M. Kremer, Sergey P. Kurdyumov as well as the model of macroevolution of World-System developed by L.E. Grinin, A.V. Korotaev and S.Yu. Malkov are analyzed. The evolution of the world commonwealth is considered as a stage of the universal (global) evolutionary process starting from Big Bang and up to emergence of life, appearance of a human being. The development of World-System in a blow-up regime leads to the strongest stratification of society, to the strengthening of instability and unevenness of development of countries and territories, to disintegration of complex geopolitical structures, and to the threat of collapse of civilization. As a positive alternative, the authors discuss opportunities of management of the future and of the choice of a favorable path of development of mankind. These opportunities are based on the understanding of the principles of co-evolution of nature and mankind as well as of complex social and geopolitical structures.
An important text mining problem is to find, in a large collection of texts, documents related to specific topics and then discern further structure among the found texts. This problem is especially important for social sciences, where the purpose is to find the most representative documents for subsequent qualitative interpretation. To solve this problem, we propose an interval semi-supervised LDA approach, in which certain predefined sets of keywords (that define the topics researchers are interested in) are restricted to specific intervals of topic assignments.
This proceedings publication is a compilation of selected contributions from the “Third International Conference on the Dynamics of Information Systems” which took place at the University of Florida, Gainesville, February 16–18, 2011. The purpose of this conference was to bring together scientists and engineers from industry, government, and academia in order to exchange new discoveries and results in a broad range of topics relevant to the theory and practice of dynamics of information systems. Dynamics of Information Systems: Mathematical Foundation presents state-of-the art research and is intended for graduate students and researchers interested in some of the most recent discoveries in information theory and dynamical systems. Scientists in other disciplines may also benefit from the applications of new developments to their own area of study.
Atomism and holism are considered as two opposite by implication but complementary approaches in the modern theory of complex self-organizing systems (theory of complexity). Atomism is connected with the consideration of nesting of complex structures in the world, their fractal organization, where we can reach elementary, further indivisible structural fragments on those basis complex scale invariant, self-similar spatio-temporal structures grow up. Besides, atomism is connected with the study of hierarchical organization of being and of elements, parts, subsystems out of which a whole structure is built. At the same time, it is shown in the article that the whole theory of complexity is penetrated by holism, and its holism is evolutionary by its character. Holism in evolution of complex self-organizing systems is coupled with the appearance of emergent properties of integral structural forms as well as with the discreteness, certain set of structures-attractors of evolution. The modern atomism can be brought into correlation with the notion of frames of perception in cognitive science. Proceeding from the system and evolutionary worldview, some arguments are put forward in favor of a hypothesis of the origins of the alphabetical writing in close connection with the studies of atoms in physical nature (Nidem, A.I. Kosyrev, V.G. Lysenko).
An important text mining problem is to find, in a large collection of texts, documents related to specic topics and then discern further structure among the found texts. This problem is especially important for social sciences, where the purpose is to nd the most representative documents for subsequent qualitative interpretation. To solve this problem, we propose an interval semi-supervised LDA approach, in which certain predened sets of keywords (that dene the topics researchers are interested in) are restricted to specic intervals of topic assignments. We present a case study on a Russian LiveJournal dataset aimed at ethnicity discourse analysis.
A form for an unbiased estimate of the coefficient of determination of a linear regression model is obtained. It is calculated by using a sample from a multivariate normal distribution. This estimate is proposed as an alternative criterion for a choice of regression factors.