• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site
Of all publications in the section: 9
Sort:
by name
by year
Working paper
Sergey S. Ketkov, Oleg A. Prokopyev, Evgenii P. Burashnikov. arxiv.org. Computer Science. Cornell University, 2019. No. 1910.08744.
In this study we consider the shortest path problem, where the arc costs are subject to distributional uncertainty. Basically, the decision-maker attempts to minimize her worst-case expected regret over an ambiguity set (or a family) of candidate distributions that are consistent with the decision-maker’s initial information. The ambiguity set is formed by all distributions that satisfy prescribed linear first-order moment constraints with respect to subsets of arcs and individual probability constraints with respect to particular arcs. Our distributional constraints can be constructed in a unified manner from real-life data observations. In particular, the decision-maker may collect some new distributional information and thereby improve her solutions in the subsequent decision epochs. Under some additional assumptions the resulting distributionally robust shortest path problem (DRSPP) admits equivalent robust and mixedinteger programming (MIP) reformulations. The robust reformulation is shown to be strongly NP-hard, whereas the problem without the first-order moment constraints is proved to be polynomially solvable. We perform numerical experiments to illustrate the advantages of the proposed approach; we also demonstrate that the MIP reformulation of DRSPP can be solved reasonably fast using off-the-shelf solvers.
Added: Oct 23, 2019
Working paper
Ivanov D. arxiv.org. Computer Science. Cornell University, 2019
Positive-Unlabeled Learning is an analog of supervised binary classification for the case when the Negative (N) sample in the training set is contaminated with latent instances of the Positive (P) class and hence is Unlabeled (U). We develop DEDPUL1 , a method able to solve two problems: first, to estimate the proportions of the mixing components (P and N) in U, and then, to classify U. The main steps are to reduce dimensionality of the data using Non-Traditional Classifier [11], to estimate densities of the reduced data in both P and U samples, and to apply the Bayes rule to density ratio. We experimentally show that DEDPUL outperforms current state-of-the-art methods for both problems. Additionally, we improve current state-of-the-art method for PU Classification [16].
Added: May 31, 2019
Working paper
Ozhegov E. M., Teterina D. arxiv.org. Computer Science. Cornell University, 2018. No. arXiv:1810.09166.
Added: Oct 24, 2018
Working paper
Kondratev A., Ianovski E., Nesterov A. S. arxiv.org. Computer Science. Cornell University, 2019
Individual rankings are often aggregated using scoring rules: each position in each ranking brings a certain score; the total sum of scores determines the aggregate ranking. We study whether scoring rules can be robust to adding or deleting particular candidates, as occurs with spoilers in political elections and with athletes in sports due to doping allegations. In general the result is negative, but weaker robustness criteria pin down a one-parameter family of geometric scoring rules with the scores 0, 1, 1 + p, 1 + p + p^2, . . .. These weaker criteria are independence from deleting unanimous winner (e.g., doping allegations) and independence from deleting unanimous loser (e.g., spoiler candidates). This family generalises three central rules: the Borda rule, the plurality rule and the antiplurality rule. For illustration we use recent events in biathlon; our results give simple instruments to design scoring rules for a wide range of applications.
Added: Oct 18, 2019
Working paper
Kondratev A., Nesterov A. S. arxiv.org. Computer Science. Cornell University, 2019
We study ex-post fairness in the object allocation problem where objects are valuable and commonly owned. A matching is fair from individual perspective if it has only inevitable envy towards agents who received most preferred objects -- minimal envy matching. A matching is fair from social perspective if it is supported by majority against any other matching -- popular matching. Surprisingly, the two perspectives give the same outcome: when a popular matching exists it is equivalent to a minimal envy matching. We show the equivalence between global and local popularity: a matching is popular if and only if there does not exist a group of size up to 3 agents that decides to exchange their objects by majority, keeping the remaining matching fixed. We algorithmically show that an arbitrary matching is path-connected to a popular matching where along the path groups of up to 3 agents exchange their objects by majority. A market where random groups exchange objects by majority converges to a popular matching given such matching exists. When popular matching might not exist we define most popular matching as a matching that is popular among the largest subset of agents. We show that each minimal envy matching is a most popular matching and propose a polynomial-time algorithm to find them.
Added: Oct 18, 2019
Working paper
Podkopaev A., Sergey I., Nanevski A. arxiv.org. Computer Science. Cornell University, 2016
In this work, we present a family of operational semantics that gradually approximates the realistic program behaviors in the C/C++11 memory model. Each semantics in our framework is built by elaborating and combining two simple ingredients: viewfronts and operation buffers. Viewfronts allow us to express the spatial aspect of thread interaction, i.e., which values a thread can read, while operation buffers enable manipulation with the temporal execution aspect, i.e., determining the order in which the results of certain operations can be observed by concurrently running threads.  Starting from a simple abstract state machine, through a series of gradual refinements of the abstract state, we capture such language aspects and synchronization primitives as release/acquire atomics, sequentially-consistent and non-atomic memory accesses, also providing a semantics for relaxed atomics, while avoiding the Out-of-Thin-Air problem. To the best of our knowledge, this is the first formal and executable operational semantics of C11 capable of expressing all essential concurrent aspects of the standard.  We illustrate our approach via a number of characteristic examples, relating the observed behaviors to those of standard litmus test programs from the literature. We provide an executable implementation of the semantics in PLT Redex, along with a number of implemented litmus tests and examples, and showcase our prototype on a large case study: randomized testing and debugging of a realistic Read-Copy-Update data structure.
Added: Dec 24, 2018
Working paper
Ben-Porat O., Sandomirskiy F., Tennenholtz M. arxiv.org. Computer Science. Cornell University, 2019
Machine Learning (ML) algorithms shape our lives. Banks use them to determine if we are good borrowers; IT companies delegate them recruitment decisions; police apply ML for crime-prediction, and judges base their verdicts on ML. However, real-world examples show that such automated decisions tend to discriminate against protected groups. This potential discrimination generated a huge hype both in media and in the research community. Quite a few formal notions of fairness were proposed, which take a form of constraints a "fair" algorithm must satisfy. We focus on scenarios where fairness is imposed on a self-interested party (e.g., a bank that maximizes its revenue). We find that the disadvantaged protected group can be worse off after imposing a fairness constraint. We introduce a family of \textit{Welfare-Equalizing} fairness constraints that equalize per-capita welfare of protected groups, and include \textit{Demographic Parity} and \textit{Equal Opportunity} as particular cases. In this family, we characterize conditions under which the fairness constraint helps the disadvantaged group. We also characterize the structure of the optimal \textit{Welfare-Equalizing} classifier for the self-interested party, and provide an algorithm to compute it. Overall, our \textit{Welfare-Equalizing} fairness approach provides a unified framework for discussing fairness in classification in the presence of a self-interested party.
Added: Jun 10, 2019
Working paper
Smirnov I. arxiv.org. Computer Science. Cornell University, 2018
The Internet provides students with a unique opportunity to connect and maintain social ties with peers from other schools, irrespective of how far they are from each other. However, little is known about the real structure of such online relationships. In this paper, we investigate the structure of interschool friendship on a popular social networking site. We use data from 36,951 students from 590 schools of a large European city. We find that the probability of a friendship tie between students from neighboring schools is high and that it decreases with the distance between schools following the power law. We also find that students are more likely to be connected if the educational outcomes of their schools are similar. We show that this fact is not a consequence of residential segregation. While high- and low-performing schools are evenly distributed across the city, this is not the case for the digital space, where schools turn out to be segregated by educational outcomes. There is no significant correlation between the educational outcomes of a school and its geographical neighbors; however, there is a strong correlation between the educational outcomes of a school and its digital neighbors. These results challenge the common assumption that the Internet is a borderless space, and may have important implications for the understanding of educational inequality in the digital age.
Added: Nov 6, 2018
Working paper
Alexey Buzmakov, Daria Semenova, Maria Temirkaeva. arxiv.org. Computer Science. Cornell University, 2019. No. arXiv:1912.01443.
Today, treatment effect estimation at the individual level is a vital problem in many areas of science and business. For example, in marketing, estimates of the treatment effect are used to select the most efficient promo-mechanics; in medicine, individual treatment effects are used to determine the optimal dose of medication for each patient and so on. At the same time, the question on choosing the best method, i.e., the method that ensures the smallest predictive error (for instance, RMSE) or the highest total (average) value of the effect, remains open. Accordingly, in this paper we compare the effectiveness of machine learning methods for estimation of individual treatment effects. The comparison is performed on the Criteo Uplift Modeling Dataset. In this paper we show that the combination of the Logistic Regression method and the Difference Score method as well as Uplift Random Forest method provide the best correctness of Individual Treatment Effect prediction on the top 30% observations of the test dataset.
Added: Dec 10, 2019