Gray-box Inference for Structured Gaussian Process Models

Galliani P.; Dezfouli A.; Bonilla E.; N. Quadrianto

?

Gray-box Inference for Structured Gaussian Process Models

P. 353–361.

Galliani P., Dezfouli A., Bonilla E., Quadrianto N.

We develop an automated variational inference method for Bayesian structured prediction problems with Gaussian process (GP) priors and linear-chain likelihoods. Our approach does not need to know the details of the structured likelihood model and can scale up to a large number of observations. Furthermore, we show that the required expected likelihood term and its gradients in the variational objective (ELBO) can be estimated efficiently by using expectations over very low-dimensional Gaussian distributions. Optimization of the ELBO is fully parallelizable over sequences and amenable to stochastic optimization, which we use along with control variate techniques to make our framework useful in practice. Results on a set of natural language processing tasks show that our method can be as good as (and sometimes better than, in particular with respect to expected log-likelihood) hard-coded approaches including svm-struct and crfs, and overcomes the scalability limitations of previous inference algorithms based on sampling. Overall, this is a fundamental step to developing automated inference methods for Bayesian structured prediction.

Language: English

Full text

Text on another site

Keywords: Gaussian processes structured prediction

In book

Proceedings of Machine Learning Research. 2017. Volume 54: Artificial Intelligence and Statistics

Vol. 54: Artificial Intelligence and Statistics. , [б.и.], 2017.

Surrogate uncertainty estimation for your time series forecasting black-box: learn when to trust

Erlygin L., Zholobov V., Baklanova V. et al., , in: 2023 IEEE International Conference on Data Mining Workshops (ICDMW) 1–4 December 2023, Shanghai, China.: Shanghai: IEEE Computer Society, 2023. P. 1247–1258.

Machine learning models play a vital role in time series forecasting. These models, however, often overlook an important element: point uncertainty estimates. Incorporating these estimates is crucial for effective risk management, informed model selection, and decision-making.To address this issue, our research introduces a method for uncertainty estimation. We employ a surrogate Gaussian process regression model. ...

Added: March 20, 2024

Uncertainty Estimation in Autoregressive Structured Prediction

Andrey Malinin, Gales M., , in: Proceedings of the 9th International Conference on Learning Representations (ICLR 2021). ICLR, 2021.: ICLR, 2021. P. 1–31.

Added: November 1, 2021

Gaussian processes with multidimensional distribution inputs via optimal transport and Hilbertian embedding

Bachoc F., Suvorikova A., Ginsbourger D. et al., Electronic journal of statistics 2020 Vol. 14 No. 2 P. 2742–2772

In this work, we propose a way to construct Gaussian processes indexed by multidimensional distributions. More precisely, we tackle the problem of defining positive definite kernels between multivariate distributions via notions of optimal transport and appealing to Hilbert space embeddings. Besides presenting a characterization of radial positive definite and strictly positive definite kernels on general ...

Added: October 30, 2020

High extremes of Gaussian chaos processes: a discrete time approximation approach

A. I. Zhdanov, V. I. Piterbarg, Theory Probability and its Applications 2018 Vol. 63 No. 1 P. 1–21

Let $\mathbf{\boldsymbol{\xi}}(t)=(\xi_{1}(t),\ldots,\xi_{d}(t))$ be a Gaussian zero mean stationary a.s. continuous vector process. Let $g\colon{\mathbb{R}}^{d}\to {\mathbb{R}}$ be a homogeneous function of positive degree. We study probabilities of high extrema of the Gaussian chaos process $g(\mathbf{\boldsymbol{\xi}}(t))$. Important examples are products of Gaussian processes, $\prod_{i=1}^{d}\xi_{i}(t)$, and quadratic forms $\sum_{i,j=1}^{d}a_{ij}\xi_{i}(t)\xi_{j}(t)$. Methods of our studies include the Laplace saddle point ...

Added: November 14, 2019

On probability of high extremes for product of two Gaussian stationary processes

A. I. Zhdanov., Theory Probability and its Applications 2015 Vol. 60 No. 3 P. 520–527

Let $(X(t),Y(t))$, $t\ge0$, be a zero-mean stationary Gaussian vector process with a covariance functions for components $r_i(t)$ satisfying Pickand's condition $r_i(t)=1-c_i|t|^{\alpha_i}(1+o(1))$, $t\to 0$, $c_i>0$, $0<\alpha_i\le2$, $i=1,2.$ Let $r_i(t)<1$, $i=1,2$, $t>0.$ Assuming that $r\equiv {\bf E}\,X(t)Y(t)\in(-1,1)$ and $\lim_{t,s\rightarrow0}({\bf E}\,X(t)Y(s)-r)/|t-s|^{\min(\alpha_1,\alpha_2)}$ exists, we study the behavior of probability ${\bf P}(\max_{t\in\lbrack0,p]}X(t)Y(t)>u)$ as $u\rightarrow\infty$ for any $p$. In particular, we ...

Added: November 14, 2019

On probability of high extremes for product of two independent Gaussian stationary processes

Zhdanov A., Piterbarg V.I., Extremes 2015 Vol. 18 No. 1 P. 99–108

Let X(t), Y(t), t ≥ 0, be two independent zero-mean stationary Gaussian processes, whose covariance functions are such that ri (t) = 1 − |t|^{a_{i}} + o(|t|^{a_{i}}) as t → 0, with 0 < a_{i} ≤ 2, i = 1, 2 and both of the functions are less than one for non-zero t . We derive for any p ...

Added: November 14, 2019

Точная асимптотика малых уклонений в L_2-норме с весом для некоторых гауссовских процессов

Pusev R., Назаров А. И., Записки научных семинаров ПОМИ РАН 2009 Т. 364 С. 166–199

We find the exact small ball asymptotics under weighted L_2-norm for a wide class of Gaussian processes which generate boundary-value problems for ordinary differential equations. Sharp constants in the asymptotics are derived for a number of processes connected with special functions. ...

Added: January 28, 2019

Scalable Gaussian Processes with Billions of Inducing Inputs via Tensor Train Decomposition

Izmailov P., Novikov A., Kropotov D., , in: Proceedings of Machine Learning Research. Proceedings of The International Conference on Artificial Intelligence and Statistics (AISTATS 2018).: [б.и.], 2018. P. 726–735.

We propose a method (TT-GP) for approximate inference in Gaussian Process (GP) models. We build on previous scalable GP research including stochastic variational inference based on inducing inputs, kernel interpolation, and structure exploiting algebra. The key idea of our method is to use Tensor Train decomposition for variational parameters, which allows us to train GPs ...

Added: December 10, 2018

Faster variational inducing input Gaussian process classification

Izmailov P., Kropotov D., Journal of machine learning and data analysis 2017 Vol. 3 No. 1 P. 20–35

Background: Gaussian processes (GP) provide an elegant and effective approach to learning in kernel machines. This approach leads to a highly interpretable model and allows using the Bayesian framework for model adaptation and incorporating the prior knowledge about the problem. The GP framework is successfully applied to regression, classification, and dimensionality reduction problems. Unfortunately, the ...

Added: December 6, 2018

Quantifying Learning Guarantees for Convex but Inconsistent Surrogates

Struminsky K., Lacoste-Julien S., Osokin A., , in: Advances in Neural Information Processing Systems 31 (NIPS 2018).: [б.и.], 2018. P. 1–9.

We study consistency properties of machine learning methods based on minimizing convex surrogates. We extend the recent framework of Osokin et al. (2017) for the quantitative analysis of consistency properties to the case of inconsistent surrogates. Our key technical contribution consists in a new lower bound on the calibration function for the quadratic surrogate, which ...

Added: October 29, 2018

Marginal Weighted Maximum Log-likelihood for Efficient Learning of Perturb-and-Map models

Shpakova T., Bach F., Osokin A., , in: Proceedings of the international conference on Uncertainty in Artificial Intelligence (UAI 2018).: [б.и.], 2018. P. 1–11.

We consider the structured-output prediction problem through probabilistic approaches and generalize the ``''perturb-and-MAP'' framework to more challenging weighted Hamming losses, which are crucial in applications. While in principle our approach is a straightforward marginalization, it requires solving many related MAP inference problems. We show that for log-supermodular pairwise models these operations can be performed efficiently ...

Added: October 29, 2018

SEARNN: Training RNNs with global-local losses

Leblond R., Alayrac J., Osokin A. et al., , in: Proceedings of the 6th International Conference on Learning Representations (ICLR 2018).: [б.и.], 2018. P. 1–16.

We propose SEARNN, a novel training algorithm for recurrent neural networks (RNNs) inspired by the "learning to search" (L2S) approach to structured prediction. RNNs have been widely successful in structured prediction applications such as machine translation or parsing, and are commonly trained using maximum likelihood estimation (MLE). Unfortunately, this training loss is not always an ...

Added: October 29, 2018

Scalable Gaussian Processes with Billions of Inducing Inputs via Tensor Train Decomposition

Izmailov P., Novikov A., Kroptov D., / Series arXiv "math". 2017.

Added: October 20, 2017

On Structured Prediction Theory with Calibrated Convex Surrogate Losses

Osokin A., Bach F., Lacoste-Julien S., , in: Advances in Neural Information Processing Systems 30 (NIPS 2017).: Montreal: Curran Associates, 2017. P. 302–313.

We provide novel theoretical insights on structured prediction in the context of efficient convex surrogate loss minimization with consistency guarantees. For any task loss, we construct a convex surrogate that can be optimized via stochastic gradient descent and we prove tight bounds on the so-called "calibration function" relating the excess surrogate risk to the actual ...

Added: October 19, 2017

Minding the Gaps for Block Frank-Wolfe Optimization of Structured SVMs

Osokin A., Alayrac J., Lukasewitz I. et al., , in: Proceedings of Machine Learning Research. Proceedings of the International Conference on Machine Learning (ICML 2016)Vol. 48.: NY: [б.и.], 2016. P. 885–925.

In this paper, we propose several improvements on the block-coordinate Frank-Wolfe (BCFW) algorithm from Lacoste-Julien et al. (2013) recently used to optimize the structured support vector machine (SSVM) objective in the context of structured prediction, though it has wider applications. The key intuition behind our improvements is that the estimates of block gaps maintained by ...

Added: October 19, 2017

Context-Aware CNNs for Person Head Detection

Vu T., Osokin A., Laptev I., , in: Proceedings of the IEEE International Conference on Computer Vision (ICCV 2015).: Santiago de Chile: IEEE, 2015. P. 2893–2901.

Person detection is a key problem for many computer vision tasks. While face detection has reached maturity, detecting people under full variation of camera view-points, human poses, lighting conditions and occlusions is still a difficult challenge. In this work we focus on detecting human heads in natural scenes. Starting from the recent R-CNN object detector, ...

Added: October 19, 2017

Perceptually Inspired Layout-Aware Losses for Image Segmentation

Osokin A., Kohli P., , in: Lecture Notes in Computer Science. Proceedings of the 13th European Conference on Computer Vision (ECCV 2014)* 2. Vol. 8690.: Zürich: Springer, 2014. P. 663–678.

Interactive image segmentation is an important computer vision problem that has numerous real world applications. Models for image segmentation are generally trained to minimize the Hamming error in pixel labeling. The Hamming loss does not ensure that the topology/structure of the object being segmented is preserved and therefore is not a strong indicator of the ...

Added: October 19, 2017