The 2019 Conference on Neural Information Processing Systems (NeurIPS) kicked off today in Vancouver. In a blog post, the NeurIPS organizing committee announced the conference’s outstanding paper awards and other awards. NeurIPS also introduced an additional **Outstanding New Directions Paper Award** to “highlight work that distinguished itself in setting a novel avenue for future research.”

## Outstanding Paper Award

**Distribution-Independent PAC Learning of Halfspaces with Massart Noise (****arXiv****)**

**Authors:** Ilias Diakonikolas, Themis Gouleakis, Christos Tzamos

**Institutions: **University of Southern California, Max Planck Institute for Informatics, University of Wisconsin-Madison

**Abstract: **We study the problem of {\em distribution-independent} PAC learning of halfspaces in the presence of Massart noise. Specifically, we are given a set of labeled examples (x,y) drawn from a distribution D on Rd+1 such that the marginal distribution on the unlabeled points x is arbitrary and the labels y are generated by an unknown halfspace corrupted with Massart noise at noise rate η<1/2. The goal is to find a hypothesis h that minimizes the misclassification error Pr(x,y)∼D[h(x)≠y].

We give a poly(d,1/ϵ) time algorithm for this problem with misclassification error η+ϵ. We also provide evidence that improving on the error guarantee of our algorithm might be computationally hard. Prior to our work, no efficient weak (distribution-independent) learner was known in this model, even for the class of disjunctions. The existence of such an algorithm for halfspaces (or even disjunctions) has been posed as an open question in various works, starting with Sloan (1988), Cohen (1997), and was most recently highlighted in Avrim Blum’s FOCS 2003 tutorial.

Highlight comments from NeurIPS:The paper studies the learning of linear threshold functions for binary classification in the presence of unknown, bounded label noise in the training data. It solves a fundamental, and long-standing open problem by deriving an efficient algorithm for learning in this case. This paper makes tremendous progress on a long-standing open problem at the heart of machine learning: efficiently learning half-spaces under Massart noise.

## Outstanding New Directions Paper Award

**Uniform convergence may be unable to explain generalization in deep learning (****arXiv****)**

**Authors:** Vaishnavh Nagarajan, J. Zico Kolter

**Institutions: **Carnegie Mellon University, Bosch Center for Artificial Intelligence

**Abstract: **We cast doubt on the power of uniform convergence-based generalization bounds to provide a complete picture of why overparameterized deep networks generalize well. While it is well-known that many existing bounds are numerically large, through a variety of experiments, we first bring to light another crucial and more concerning aspect of these bounds: in practice, these bounds can {\em increase} with the dataset size. Guided by our observations, we then present examples of overparameterized linear classifiers and neural networks trained by stochastic gradient descent (SGD) where uniform convergence provably cannot `explain generalization,’ even if we take into account implicit regularization {\em to the fullest extent possible}. More precisely, even if we consider only the set of classifiers output by SGD that have test errors less than some small ϵ, applying (two-sided) uniform convergence on this set of classifiers yields a generalization guarantee that is larger than 1−ϵ and is therefore nearly vacuous.

Highlight comments from NeurIPS:The paper presents what are essentially negative results showing that many existing (norm based) bounds on the performance of deep learning algorithms don’t do what they claim. They go on to argue that theycan’tdo what they claim when they continue to lean on the machinery of two-sided uniform convergence. While the paper does not solve (nor pretend to solve) the question of generalisation in deep neural nets, it is an ``instance of the fingerpost’’ (to use Francis Bacon’s phrase) pointing the community to look in a different place.

## Honorable Mention Outstanding Paper Award

**Nonparametric Density Estimation & Convergence Rates for GANs under Besov IPM Losses (****arXiv****)**

**Authors:** Ananya Uppal, Shashank Singh, Barnabás Póczos

**Institutions: **Carnegie Mellon University

**Abstract: **We study the problem of estimating a nonparametric probability density under a large family of losses called Besov IPMs, which include, for example, Lp distances, total variation distance, and generalizations of both Wasserstein and Kolmogorov-Smirnov distances. For a wide variety of settings, we provide both lower and upper bounds, identifying precisely how the choice of loss function and assumptions on the data interact to determine the minimax optimal convergence rate. We also show that linear distribution estimates, such as the empirical distribution or kernel density estimator, often fail to converge at the optimal rate. Our bounds generalize, unify, or improve several recent and classical results. Moreover, IPMs can be used to formalize a statistical model of generative adversarial networks (GANs). Thus, we show how our results imply bounds on the statistical error of a GAN, showing, for example, that GANs can strictly outperform the best linear estimator.

Highlight comments from NeurIPS:Reviewers felt this paper would have significant impact for researchers working on non-parametric estimation and GANs.

**Fast and Accurate Least-Mean-Squares Solvers (****arXiv****)**

**Authors:** Alaa Maalouf, Ibrahim Jubran, Dan Feldman

**Institutions: **University of Haifa

**Abstract:** Least-mean squares (LMS) solvers such as Linear / Ridge / Lasso-Regression, SVD and Elastic-Net not only solve fundamental machine learning problems, but are also the building blocks in a variety of other methods, such as decision trees and matrix factorizations.

As an example application, we show how it can be used to boost the performance of existing LMS solvers, such as those in scikit-learn library, up to x100. Generalization for streaming and distributed (big) data is trivial. Extensive experimental results and complete open source code are also provided.

Highlight comments from NeurIPS:Reviewers emphasize the importance of the approach, for practitioners as the method can be easily implemented to improve existing algorithms, and for extension to other algorithms as the recursive partitioning principle of the approach lends itself to generalization.

## Honorable Mention Outstanding New Directions Paper Award

**Putting An End to End-to-End: Gradient-Isolated Learning of Representations (****arXiv****)**

**Authors: **Sindy Löwe, Peter O’Connor, Bastiaan S. Veeling

**Institutions: **University of Amsterdam

**Abstract:** We propose a novel deep learning method for local self-supervised representation learning that does not require labels nor end-to-end backpropagation but exploits the natural order in data instead. Inspired by the observation that biological neural networks appear to learn without backpropagating a global error signal, we split a deep neural network into a stack of gradient-isolated modules. Each module is trained to maximally preserve the information of its inputs using the InfoNCE bound from Oord et al [2018]. Despite this greedy training, we demonstrate that each module improves upon the output of its predecessor, and that the representations created by the top module yield highly competitive results on downstream classification tasks in the audio and visual domain. The proposal enables optimizing modules asynchronously, allowing large-scale distributed training of very deep neural networks on unlabelled datasets.

Highlight comments from NeurIPS:As noted by reviewers, such self-organization in perceptual networks might give food for thought at the cross-road of algorithmic perspectives (sidestepping end-to-end optimization, its huge memory footprint and computational issues), and cognitive perspectives (exploiting the notion of so-called slow features and going toward more “biologically plausible” learning processes).

**Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations (****arXiv****)**

**Authors: **Vincent Sitzmann, Michael Zollhöfer, Gordon Wetzstein

**Institutions: **Stanford University

**Abstract:** Unsupervised learning with generative models has the potential of discovering rich representations of 3D scenes. While geometric deep learning has explored 3D-structure-aware representations of scene geometry, these models typically require explicit 3D supervision. Emerging neural scene representations can be trained only with posed 2D images, but existing methods ignore the three-dimensional structure of scenes. We propose Scene Representation Networks (SRNs), a continuous, 3D-structure-aware scene representation that encodes both geometry and appearance. SRNs represent scenes as continuous functions that map world coordinates to a feature representation of local scene properties. By formulating the image formation as a differentiable ray-marching algorithm, SRNs can be trained end-to-end from only 2D images and their camera poses, without access to depth or shape. This formulation naturally generalizes across scenes, learning powerful geometry and appearance priors in the process. We demonstrate the potential of SRNs by evaluating them for novel view synthesis, few-shot reconstruction, joint shape and appearance interpolation, and unsupervised discovery of a non-rigid face model.

Highlight comments from NeurIPS:The paper presents an elegant synthesis of two broad approaches in CV: the multiple view geometric, and the deep representations.

## Test of Time Award

**Dual Averaging Method for Regularized Stochastic Learning and Online Optimization (****nips****)**

**Author: **Lin Xiao

**Institutions:** Microsoft Research

**Abstract:** We consider regularized stochastic learning and online optimization problems, where the objective function is the sum of two convex terms: one is the loss function of the learning task, and the other is a simple regularization term such as L1-norm for sparsity. We develop a new online algorithm, the regularized dual averaging method, that can explicitly exploit the regularization structure in an online setting. In particular, at each iteration, the learning variables are adjusted by solving a simple optimization problem that involves the running average of all past subgradients of the loss functions and the whole regularization term, not just its subgradient. This method achieves the optimal convergence rate and often enjoys a low complexity per iteration similar as the standard stochastic gradient method. Computational experiments are presented for the special case of sparse online learning using L1-regularization.

Highlight comments from NeurIPS:Congratulations to Lin Xiao for single-handedly having had such an enduring impact on our community!

This year, there were a record-breaking 6,743 submissions, of which 1,428 were accepted (including 36 orals and 164 spotlights). The 21 percent acceptance rate is the same as last year. On May 23, the Microsoft Conference Management Toolkit (CMT) that NeurIPS 2019 uses to process academic paper submissions failed in the countdown to the submission deadline. Conference organizers blamed an overwhelming volume of last-minute submissions and responded by extending the deadline by two hours. NIPS 2017 and NeuIPS 2018 received 3,240 and 4,854 paper submissions respectively.

NeurIPS 2019 runs December 8–14 at Vancouver Convention Center in Vancouver, Canada. According to a booklet distributed to attendees, NeurIPS 2021 will take place in **Sydney, Australia**. Synced will be reporting from the conference throughout the week.