Universal Algorithm for Trading in Stock Market Based on the Method of Calibration
We present a universal method for algorithmic trading in Stock Market which performs asymptotically at least as well as any stationary trading strategy that computes the investment at each step using continuous function of the side information. In the process of the game, a trader makes decisions using predictions computed by a randomized well-calibrated algorithm. We use Dawid's notion of calibration with more general checking rules and some modication of Kakade and Foster's randomized rounding algorithm for computing the well-calibrated forecasts. The method of randomized calibration is combined with Vovk's
method of defensive forecasting in RKHS. Unlike in statistical theory, no stochastic assumptions are made about the stock prices.
Despite the fact that user-generated data are widely used in medical informatics in general and for revealing side-effects of various pharmaceuticals in particular, recent studies have focused merely on methods of extracting information on side effects from unstructured or semi-structured reviews of specific medications without linking side effects to any outcomes.
In this study we demonstrate how user-generated online content on side effects experienced by patients while taking a pharmaceutical product can be used to do research after the drug has been introduced to the market, thus allowing to complement the results of official clinical studies and market research. In particular, we concentrate on revealing the contribution of various side effects to reported customer satisfaction with Tamiflu, a popular antiviral drug.
Publicly available data from an online platform with reviews from patients are used as an input to the analysis that applies statistical and machine learning methods (multivariate logit models and classification trees) to investigate the relationships of side effects to demographic characteristics and to the overall satisfaction with the medication.
We prioritized side effects of Tamiflu based on the significance of their association with patient’s ratings published at one of the well-known drug discussion forums. Among all types of side effects used in our study, the neuropsychiatric symptoms and body pain are the most influential, followed by skin problems. Specific combinations of side-effects that are associated with low satisfaction were detected.
The proposed analytical approach can help pharmaceutical companies to improve their products and/or medical guidelines associated with their products and figure out fighting which adverse effects should be given a priority from the customer satisfaction perspective.
Non-B DNA structures have a great potential to form and influence various genomic processes including transcription. One of the mechanisms of transcription regulation is nucleo- some positioning. Even though only B-DNA can be wrapped around a nucleosome, non-B DNA structures can compete with a nucleosome for a genomic location. Here we used perman- ganate/S1 nuclease footprinting data on non-B DNA structures, such as Z-DNA, H-DNA, G- quadruplexes and stress-induced duplex destabilization (SIDD) sites, together with MNase-seq data on nucleosome positioning in the mouse genome. We found three types of patterns of nucleosome positioning around non-B DNA structures: a structure is surrounded by nucleo- somes from both sides, from one side, or nucleosome free region. Machine learning models based on random forest and XGBoost algorithms were constructed to recognize DNA regions of 1kB length containing a particular pattern of nucleosome positioning for four types of DNA structures (Z-DNA, H-DNA, G-quadruplexes and SIDD sites) based on statistics of di- and tri- nucleotides. The best performance (94% of accuracy) was reached for G-quadruplexes while for other types of structures the accuracy was under 70%. We conclude that 1kB regions con- taining G-quadruplexes have distinct compositional properties, and this fact points to preferen- tial locations of such pattern in the genome and requires further investigation. For other DNA structures a region composition is not a sufficient predictive factor and one should take into account other physical and structural DNA properties to improve nucleosome-DNA-structure pattern recognition.
With the advances in the sequencing technology the International Cancer Genome Consortium (ICGC)  and The Cancer Genome Atlas (TCGA)  collected data on more than 16 000 genome-wide pairs tumor-normal tissue providing a valuable resource to study cancer mutations. In this research we focus on pre- evaluation of the relationship between cancer breakpoint hotspots and DNA regions potentially forming secondary structures such as stem-loops (cruciforms) and quadru- plexes. We performed analysis of 2 234 samples covering 10 cancer types and built machine-learning models predicting cancer breakpoint distribution over chromosome based on the density distribution of stem-loops and quadruplexes. We developed pro- cedure for machine learning models building and evaluation as the considered data are extremely imbalanced and it is needed to get reliable estimate of prediction power. We conducted a set of experiments to select the best appropriate resampling scheme, class balancing technique and parameters of machine learning algorithms. The best final models were applied to cancer breakpoints data. From the performed analysis it could be concluded that the relationship between cancer breakpoints hotspots and studied DNA secondary structures exists, however, generally, this relationship is weak for stem-loops, but higher for quadruplexes. We also found differences in model predictive power depending on cancer types. Thus, stem-loop-based model performs better for pancreatic, prostate, ovary, uterus, brain and liver cancer, and quadruplex- based model works better for blood, bone, skin and breast cancer.
There are many different methods for computing relevant patterns in sequential data and interpreting the results. In this paper, we compute emerging patterns (EP) in demographic sequences using sequence-based pattern structures, along with different algorithmic so- lutions. The purpose of this method is to meet the following domain requirement: the obtained patterns must be (closed) frequent contiguous prefixes of the input sequences. This is required in order for demogra- phers to fully understand and interpret the results.
This book constitutes the proceedings of the 23rd International Symposium on Foundations of Intelligent Systems, ISMIS 2017, held in Warsaw, Poland, in June 2017. The 56 regular and 15 short papers presented in this volume were carefully reviewed and selected from 118 submissions. The papers include both theoretical and practical aspects of machine learning, data mining methods, deep learning, bioinformatics and health informatics, intelligent information systems, knowledge-based systems, mining temporal, spatial and spatio-temporal data, text and Web mining. In addition, four special sessions were organized; namely, Special Session on Big Data Analytics and Stream Data Mining, Special Session on Granular and Soft Clustering for Data Science, Special Session on Knowledge Discovery with Formal Concept Analysis and Related Formalisms, and Special Session devoted to ISMIS 2017 Data Mining Competition on Trading Based on Recommendations, which was launched as a part of the conference.
Non-B DNA structures have a great potential to form and influence various genomic processes including transcription. One of the mechanisms of transcription regulation is nucleosome positioning. Even though only B-DNA can be wrapped around a nucleosome, non-B DNA structures can compete with a nucleosome for a genomic location. Here we used permanganate/S1 nuclease footprinting data on non-B DNA structures, such as Z-DNA, H-DNA, G-quadruplexes and stress-induced duplex destabilization (SIDD) sites, together with MNase-seq data on nucleosome positioning in the mouse genome. We found three types of patterns of nucleosome positioning around non-B DNA structures: a structure is surrounded by nucleosomes from both sides, from one side, or nucleosome free region. Machine learning models based on random forest and XGBoost algorithms were constructed to recognize DNA regions of 1kB length containing a particular pattern of nucleosome positioning for four types of DNA structures (Z-DNA, H-DNA, G-quadruplexes and SIDD sites) based on statistics of di- and tri-nucleotides. The best performance (94% of accuracy) was reached for Gquadruplexes while for other types of structures the accuracy was under 70%. We conclude that 1kB regions containing Gquadruplexes have distinct compositional properties, and this fact points to preferential locations of such pattern in the genome and requires further investigation. Gene ontology analysis revealed that the genes intersecting with the discovered patterns are enriched in channel and transmembrane activity, transcription factor and receptor binding. The direction for further research is to study the distribution of the discovered patterns in different tissues to identify well-positioned and dynamic nucleosomes and reveal genes, regulated via DNA structures and nucleosome positioning.
Proceedings of the international conference "Neural Information Processing Systems 2018." (NIPS 2018)
This volume is the supplementary volume of the 14th International Conference on Formal Concept Analysis (ICFCA 2017), held from June 13th to 16th 2017, at IRISA, Rennes. The ICFCA conference series is one of the major venues for researches from the field of Formal Concept Analysis and related areas to present and discuss their recent work with colleagues from all over the world. Since it has been started in 2003 in Darmstadt, the ICFCA conference series had been held in Europe, Australia, America, and Africa.
The field of Formal Concept Analysis (FCA) originated in the 1980s in Darmstadt as a subfield of mathematical order theory, with prior developments in other research groups. Its original motivation was to consider complete lattices as lattices of concepts, drawing motivation from philosophy and mathematics alike. FCA has since then devel- oped into a wide research area with applications much beyond its original motivation, for example in logic, data mining, learning, and psychology.
The FCA community is mourning the passing of Rudolf Wille on January 22nd 2017 in Bickenbach, Germany. As one of the leading researchers throughout the history of FCA, he was responsible for inventing and shaping many of the fundamental notions of this area. Indeed, the publication of his article ”Restructuring Lattice Theory: An Approach Based on Hierarchies of Concepts” is seen by many as the starting point of Formal Concept Analysis as an independent direction of research. He was head of the FCA research group in Darmstadt from 1983 until his retirement in 2003, and remained an active researcher and contributor thereafter. In 2003, he was among the founding members of the ICFCA conference series.
For this supplementary volume, 13 papers were chosen to be published: four papers judged mature enough to be discussed at the conference and nine papers presented in the demonstration and poster session.
One of the most challenging data analysis tasks of modern High Energy Physics experiments is the identification of particles. In this proceedings we review the new approaches used for particle identification at the LHCb experiment. Machine-Learning based techniques are used to identify the species of charged and neutral particles using several observables obtained by the LHCb sub-detectors. We show the performances of various solutions based on Neural Network and Boosted Decision Tree models.