We have just presented the following two papers at this year NeurIPS conference
F. Hellström and G. Durisi, “A new family of generalization bounds using sample-wise evaluated CMI,” in Conf. Neural Information Processing Systems (NeurIPS), New Orleans, LA, U.S.A., Nov. 2022. arXiv
F. Hellström and G. Durisi, “Evaluated CMI bounds for meta learning: Tightness and expressiveness,” in Conf. Neural Information Processing Systems (NeurIPS), New Orleans, LA, U.S.A., Nov. 2022. arXiv
Here is a short summary of the two papers:
The first paper deals with generalization bounds, i.e., theoretical guarantees on the test-time performance of machine learning algorithms, which, hopefully, provide insights on algorithm design. We focus in the paper on information-theoretic bounds, which—differently from classic minimax bounds in statistical learning theory—have recently shown to be nonvacuous when applied to modern machine-learning algorithms such as deep neural networks. The reason is that these bounds are typically both data and algorithm dependent. In the paper, we present novel information-theoretic bounds in terms of the so called samplewise evaluated conditional mutual information (eCMI). Through both theoretical investigations and numerical evaluations, we show that the proposed bounds are tighter than previously proposed bounds for the case of low training loss, and accurately approximate the empirically evaluated test error for a variety of settings, including the case of randomized labels.
In the second paper, we extend the bounds proposed in the first paper to the metalearning setup, where the objective is to improve the learning performance on a new machine-learning task by using the knowledge acquired from separate but related tasks. For the case of standard learning, it has been shown that information-theoretic bounds based on the eCMI are expressive enough to allow one to recover from them classic minimax bounds in statistical learning theory. In this paper, we obtain novel information theoretic generalization bounds based on the eCMI that are tighter than previously proposed bounds, and exhibit the convergence rate of the available minimax generalization bounds for metalearning. This allows us to establish a connection between information-theoretic results and results from classical learning theory for metalearning—two areas that have evolved separately thus far. We also apply the proposed bounds to a representation learning setting, and—also for this case—derive information-theoretic bounds on the excess risk that recover the convergence rate of recently proposed minimax bounds.