This study proposes to minimize Rényi and Tsallis entropies for finding the optimal number of topics T in topic modeling (TM). A promising tool to obtain knowledge about large text collections, TM is a method whose properties are underresearched; in particular, parameter optimization in such models has been hindered by the use of monotonous quality functions with no clear thresholds. In this research, topic models obtained from large text collections are viewed as nonequilibrium complex systems where the number of topics is regarded as an equivalent of temperature. This allows calculating free energy of such systems—a value through which both Rényi and Tsallis entropies are easily expressed. Numerical experiments with four TM algorithms and two text collections show that both entropies as functions of the number of topics yield clear minima in the middle area of the range of T. On the marked-up dataset the minima of three algorithms correspond to the value of T detected by humans. It is concluded that Tsallis and especially Rényi entropy can be used for T optimization instead of Shannon entropy that decreases even when T becomes obviously excessive. Additionally, some algorithms are found to be better suited for revealing local entropy minima. Finally, we test whether the overall content of all topics taken together is resistant to the change of T and find out that this dependence has a quasi-periodic structure which demands further research.
There is an increasing number of studies showing that financial market crashes can be detected and predicted. The main aim of the research was to develop a technique for crashes prediction based on analysis of durations between sequent crashes of a certain magnitude of Dow Jones Industrial Average. We have found significant autocorrelation in the series of durations between sequent crashes and suggest autoregressive conditional duration models (ACD) to forecast the crashes. We apply rolling intervals technique in the sample of more than 400 DJIA crashes in 1896–2011 and repeatedly use the data on 100 sequent crashes to estimate a family of ACD models and calculate forecasts of the one following crash. It appears that the ACD models provide significant predictive power when combined with inter-event waiting time technique. This suggests that despite the high quality of retrospective predictions, using the technique for real-time forecasting seems rather ineffective, as in the case of every particular crash the specification of the ACD model, which would provide the best quality prediction, is rather hard to identify.
We discuss a possibility of deriving an H-theorem for nonlinear discrete time evolution equation that describes random wealth exchanges. In such kinetic models economical agents exchange wealth in pairwise collisions just as particles in a gas exchange their energy. It appears useful to reformulate the problem and represent the dynamics as a combination of two processes. The first is a linear transformation of a two-particle distribution function during the act of exchange while the second one corresponds to new random pairing of agents and plays a role of some kind of feedback control. This representation leads to a Clausius-type inequality which suggests a new interpretation of the exchange process as an irreversible relaxation due to a contact with a reservoir of a special type. Only in some special cases when equilibrium distribution is exactly a gamma distribution, this inequality results in the H-theorem with monotonically growing ‘entropy’ functional which differs from the Boltzmann entropy by an additional term. But for arbitrary exchange rule the evolution has some features of relaxation to a non-equilibrium steady state and it is still unclear if any general H-theorem could exist.
We investigate critical properties of a spatial evolutionary game based on the Prisoner’s Dilemma. Simulations demonstrate a jump in the component densities accompanied by drastic changes in average sizes of the component clusters. We argue that the cluster boundary is a random fractal. Our simulations are consistent with the fractal dimension of the boundary being equal to 2, and the cluster boundaries are hence asymptotically space filling as the system size increases
In the course of recent fifteen years the network analysis has become a powerful tool for studying financial markets. In this work we analyze stock markets of the USA and Sweden. We study cluster structures of a market network constructed from a correlation matrix of returns of the stocks traded in each of these markets. Such cluster structures are obtained by means of the P-Median Problem (PMP) whose objective is to maximize the total correlation between a set of stocks called medians of size p and other stocks. Every cluster structure is an undirected disconnected weighted graph in which every connected component (cluster) is a star, or a tree with one central node (called a median) and several leaf nodes connected with the median by weighted edges. Our main observation is that in non-crisis periods of time cluster structures change more chaotically, while during crises they show more stable behavior and fewer changes. Thus an increasing stability of a market graph cluster structure obtained via the PMP could be used as an indicator of a coming crisis.
We discuss the efficiency of the quadratic bridge volatility estimator in comparison with Parkinson, Garman-Klass and Roger-Satchell estimators. It is shown in particular that point and interval estimations of volatility, resting on the bridge estimator, are considerably more efficient than analogous estimations, resting on the Parkinson, Garman-Klass and Roger-Satchell ones. © 2012 Elsevier B.V. All rights reserved.
We present a possible approach to the study of the renormalization group (RG) flow based entirely on the information theory. The average information loss under a single step of Wilsonian RG transformation is evaluated as a conditional entropy of the fast variables, which are integrated out, when the slow ones are held fixed. Its positivity results in the monotonic decrease of the informational entropy under renormalization. This, however, does not necessarily imply the irreversibility of the RG flow, because entropy is an extensive quantity and explicitly depends on the total number of degrees of freedom, which is reduced. Only some size-independent additive part of the entropy could possibly provide the required Lyapunov function. We also introduce a mutual information of fast and slow variables as probably a more adequate quantity to represent the changes in the system under renormalization and evaluate it for some simple systems. It is shown that for certain real space decimation transformations the positivity of the mutual information directly leads to the monotonic growth of the entropy per lattice site along the RG flow and hence to its irreversibility.
A general approach to measure statistical uncertainty of different filtration techniques for market network analysis is proposed. Two measures of statistical uncertainty are introduced and discussed. One is based on conditional risk for multiple decision statistical procedures and another one is based on average fraction of errors. It is shown that for some important cases the second measure is a particular case of the first one. The proposed approach is illustrated by numerical evaluation of statistical uncertainty for popular network structures (minimum spanning tree, planar maximally filtered graph, market graph, maximum cliques and maximum independent sets) in the framework of Gaussian network model of stock market.
In this paper, the impact of lethal mutations on evolutionary dynamics of asexual populations is analyzed. We suggest distinguishing different definitions of lethality, which lead to different mathematical formalizations of the microscopic model. Most of the studies focus on polyphasic lethality, meaning that individuals carrying lethal mutations have no offspring but consume common resources. In an alternative problem setting, monophasic lethal mutants die without giving offspring on the first stage of development. In the third case, semi-lethal mutations are considered when the lethal mutants survive with some probability. We suggest and investigate mathematical models for these cases, deriving the evolutionary characteristics of the steady state. We found that the peak sequence probability drastically depends on the version of lethality. The results obtained here can be used to solve the error threshold paradox at the origin of life.
In this paper we address the problem of forecasting the target events of a time series given the distribution ξξ of time gaps between target events. Strong earthquakes and stock market crashes are the two types of such events that we are focusing on. In the series of earthquakes, as McCann et al. show [W.R. Mc Cann, S.P. Nishenko, L.R. Sykes, J. Krause, Seismic gaps and plate tectonics: seismic potential for major boundaries, Pure and Applied Geophysics 117 (1979) 1082–1147], there are well-defined gaps (called seismic gaps) between strong earthquakes. On the other hand, usually there are no regular gaps in the series of stock market crashes [M. Raberto, E. Scalas, F. Mainardi, Waiting-times and returns in high-frequency financial data: an empirical study, Physica A 314 (2002) 749–755]. For the case of seismic gaps, we analytically derive an upper bound of prediction efficiency given the coefficient of variation of the distribution ξξ. For the case of stock market crashes, we develop an algorithm that predicts the next crash within a certain time interval after the previous one. We show that this algorithm outperforms random prediction. The efficiency of our algorithm sets up a lower bound of efficiency for effective prediction of stock market crashes.
We define the BTW mechanism on a two-dimensional heterogeneous self-similar lattice. Our model exhibits the power distribution of avalanches with the exponent τ=2−2/ν, where ν is the similarity exponent of the lattice. The inequality τ<1, for the first time detected in this paper inside a broad class of sand-piles, is ensured by random loading uniformly distributed over the lattice.
A society is a medium with a complex structure of one-to-one relations between people. Those could be relations between friends, wife-husband relationships, relations between business partners, and so on. At a certain level of analysis, a society can be regarded as a gigantic maze constituted of one-to-one relationships between people. From a physical standpoint it can be considered as a highly porous medium. Such media are widely known for their outstanding properties and effects like self-organized criticality, percolation, power-law distribution of network cluster sizes, etc. In these media supercritical events, referred to as dragon-kings, may occur in two cases: when increasing stress is applied to a system (self-organized criticality scenario) or when increasing conductivity of a system is observed (percolation scenario). In social applications the first scenario is typical for negative effects: crises, wars, revolutions, financial breakdowns, state collapses, etc. The second scenario is more typical for positive effects like emergence of cities, growth of firms, population blow-ups, economic miracles, technology diffusion, social network formation, etc. If both conditions (increasing stress and increasing conductivity) are observed together, then absolutely miraculous dragon-king effects can occur that involve most human society. Historical examples of this effect are the emergence of the Mongol Empire, world religions, World War II, and the explosive proliferation of global internet services. This article describes these two scenarios in detail beginning with an overview of historical dragon-king events and phenomena starting from the early human history till the last decades and concluding with an analysis of their possible near future consequences on our global society. Thus we demonstrate that in social systems dragon-king is not a random outlier unexplainable by power-law statistics, but a natural effect. It is a very large cluster in a porous percolation medium. It occurs as a result of changes in external conditions, such as supercritical load, increase in system elements' sensitivity, or system connectivity growth.