?
Marginal Weighted Maximum Log-likelihood for Efficient Learning of Perturb-and-Map models
We consider the structured-output prediction problem through probabilistic approaches and generalize the ``''perturb-and-MAP'' framework to more challenging weighted Hamming losses, which are crucial in applications. While in principle our approach is a straightforward marginalization, it requires solving many related MAP inference problems. We show that for log-supermodular pairwise models these operations can be performed efficiently using the machinery of dynamic graph cuts. We also propose to use \emph{double} stochastic gradient descent, both on the data and on the perturbations, for efficient learning. Our framework can naturally take weak supervision (e.g., partial labels) into account. We conduct a set of experiments on large-scale character recognition and image segmentation, showing the benefits of our algorithms.