Source author record

Nikolay Malkin

Nikolay Malkin appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computer Vision Artificial Intelligence Computation and Language math.AG

Catalog footprint

What is connected

6works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Diffusion models as plug-and-play priors

We consider the problem of inferring high-dimensional data $\mathbf{x}$ in a model that consists of a prior $p(\mathbf{x})$ and an auxiliary differentiable constraint $c(\mathbf{x},\mathbf{y})$ on $x$ given some additional information $\mathbf{y}$. In this paper, the prior is an independently trained denoising diffusion generative model. The auxiliary constraint is expected to have a differentiable form, but can come from diverse sources. The possibility of such inference turns diffusion models into plug-and-play modules, thereby allowing a range of potential applications in adapting models to new domains and tasks, such as conditional generation or image segmentation. The structure of diffusion models allows us to perform approximate inference by iterating differentiation through the fixed denoising network enriched with different amounts of noise at each step. Considering many noised versions of $\mathbf{x}$ in evaluation of its fitness is a novel search mechanism that may lead to new algorithms for solving combinatorial optimization problems.

preprint2022arXiv

Coherence boosting: When your pretrained language model is not paying enough attention

Long-range semantic coherence remains a challenge in automatic language generation and understanding. We demonstrate that large language models have insufficiently learned the effect of distant words on next-token prediction. We present coherence boosting, an inference procedure that increases a LM's focus on a long context. We show the benefits of coherence boosting with pretrained models by distributional analyses of generated ordinary text and dialog responses. It is also found that coherence boosting with state-of-the-art models for various zero-shot NLP tasks yields performance gains with no additional training.

preprint2022arXiv

Generative Flow Networks for Discrete Probabilistic Modeling

We present energy-based generative flow networks (EB-GFN), a novel probabilistic modeling algorithm for high-dimensional discrete data. Building upon the theory of generative flow networks (GFlowNets), we model the generation process by a stochastic data construction policy and thus amortize expensive MCMC exploration into a fixed number of actions sampled from a GFlowNet. We show how GFlowNets can approximately perform large-block Gibbs sampling to mix between modes. We propose a framework to jointly train a GFlowNet with an energy function, so that the GFlowNet learns to sample from the energy distribution, while the energy learns with an approximate MLE objective with negative samples from the GFlowNet. We demonstrate EB-GFN's effectiveness on various probabilistic modeling tasks. Code is publicly available at https://github.com/zdhNarsil/EB_GFN.

preprint2022arXiv

Resolving label uncertainty with implicit posterior models

We propose a method for jointly inferring labels across a collection of data samples, where each sample consists of an observation and a prior belief about the label. By implicitly assuming the existence of a generative model for which a differentiable predictor is the posterior, we derive a training objective that allows learning under weak beliefs. This formulation unifies various machine learning settings; the weak beliefs can come in the form of noisy or incomplete labels, likelihoods given by a different prediction mechanism on auxiliary input, or common-sense priors reflecting knowledge about the structure of the problem at hand. We demonstrate the proposed algorithms on diverse problems: classification with negative training examples, learning from rankings, weakly and self-supervised aerial imagery segmentation, co-segmentation of video frames, and coarsely supervised text classification.

preprint2021arXiv

High-resolution land cover change from low-resolution labels: Simple baselines for the 2021 IEEE GRSS Data Fusion Contest

We present simple algorithms for land cover change detection in the 2021 IEEE GRSS Data Fusion Contest. The task of the contest is to create high-resolution (1m / pixel) land cover change maps of a study area in Maryland, USA, given multi-resolution imagery and label data. We study several baseline models for this task and discuss directions for further research. See https://dfc2021.blob.core.windows.net/competition-data/dfc2021_index.txt for the data and https://github.com/calebrob6/dfc2021-msd-baseline for an implementation of these baselines.

preprint2020arXiv

Shuffle relations for Hodge and motivic correlators

The Hodge correlators ${\rm Cor}_{\mathcal H}(z_0,z_1,\dots,z_n)$ are functions of several complex variables, defined by Goncharov (arXiv:0803.0297) by an explicit integral formula. They satisfy some linear relations: dihedral symmetry relations, distribution relations, and shuffle relations. We found new second shuffle relations. When $z_i\in0\cupμ_N$, where $μ_N$ are the $N$-th roots of unity, they are expected to give almost all relations. When $z_i$ run through a finite subset $S$ of $\mathbb C$, the Hodge correlators describe the real mixed Hodge-Tate structure on the pronilpotent completion of the fundamental group $π_1^{\rm nil}(\mathbb{CP}^1-(S\cup\infty),v_\infty)$, a Lie algebra in the category of mixed $\mathbb Q$-Hodge-Tate structures. The Hodge correlators are lifted to canonical elements ${\rm Cor_{Hod}}(z_0,\dots,z_n)$ in the Tannakian Lie coalgebra of this category. We prove that these elements satisfy the second shuffle relations. Let $S\subset\overline{\mathbb Q}$. The pronilpotent fundamental group is the Betti realization of the motivic fundamental group, a Lie algebra in the category of mixed Tate motives over $\overline{\mathbb Q}$. The Hodge correlators are lifted to elements ${\rm Cor_{Mot}}(z_0,\dots,z_n)$ in its Tannakian Lie coalgebra $\rm Lie_{MT}^\vee$. We prove the second shuffle relations for these motivic elements. The universal enveloping algebra of $\rm Lie_{MT}^\vee$ was described by Goncharov via motivic multiple polylogarithms, which obey a similar yet different set of double shuffle relations. Motivic correlators have several advantages: they obey dihedral symmetry relations at all points, not only at roots of unity; they are defined for any curve, and the double shuffle relations admit a generalization to elliptic curves; and they describe elements of the motivic Lie coalgebra rather than its universal enveloping algebra.

Nikolay Malkin

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

Diffusion models as plug-and-play priors

Coherence boosting: When your pretrained language model is not paying enough attention

Generative Flow Networks for Discrete Probabilistic Modeling

Resolving label uncertainty with implicit posterior models

High-resolution land cover change from low-resolution labels: Simple baselines for the 2021 IEEE GRSS Data Fusion Contest

Shuffle relations for Hodge and motivic correlators