Source author record

Nikita Doikov

Nikita Doikov appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Machine Learning Artificial Intelligence q-bio.MN Quantitative Methods

Catalog footprint

What is connected

5works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

PACER: Acyclic Causal Discovery from Large-Scale Interventional Data

Inferring the structure of directed acyclic graphs (DAGs) from data is a central challenge in causal discovery, particularly in modern high-dimensional settings where large-scale interventional data are increasingly available. While interventional data can improve identifiability, existing methods remain limited by soft acyclicity constraints, leading to optimization over invalid cyclic graphs, numerical instability, and reduced scalability. We introduce PACER (Perturbation-driven Acyclic Causal Edge Recovery), a scalable framework for causal discovery that guarantees acyclicity by construction. PACER parameterizes a distribution over DAGs through a joint model of variable permutations and edge probabilities, enabling direct optimization over valid causal structures without surrogate penalties. The framework supports a unified likelihood-based treatment of observational and interventional data, flexible conditional density models, and the incorporation of structural prior knowledge. For linear-Gaussian mechanisms, we derive closed-form expressions for the expected interventional log-likelihood and its gradients, yielding substantial computational gains. Empirically, PACER matches or exceeds state-of-the-art methods on protein signaling and large-scale genetic perturbation benchmarks, while scaling efficiently to networks with thousands of variables and achieving up to two orders of magnitude speedups over penalty-based differentiable approaches. These results demonstrate that exact and scalable causal discovery from high-dimensional perturbation data is achievable through principled search space design.

preprint2022arXiv

Lower Complexity Bounds for Minimizing Regularized Functions

In this paper, we establish lower bounds for the oracle complexity of the first-order methods minimizing regularized convex functions. We consider the composite representation of the objective. The smooth part has Hölder continuous gradient of degree $ν\in [0, 1]$ and is accessible by a black-box local oracle. The composite part is a power of a norm. We prove that the best possible rate for the first-order methods in the large-scale setting for Euclidean norms is of the order $O(k^{- p(1 + 3ν) / (2(p - 1 - ν))})$ for the functional residual, where $k$ is the iteration counter and $p$ is the power of regularization. Our formulation covers several cases, including computation of the Cubically regularized Newton step by the first-order gradient methods, in which case the rate becomes $O(k^{-6})$. It can be achieved by the Fast Gradient Method. Thus, our result proves the latter rate to be optimal. We also discover lower complexity bounds for non-Euclidean norms.

preprint2022arXiv

Super-Universal Regularized Newton Method

We analyze the performance of a variant of Newton method with quadratic regularization for solving composite convex minimization problems. At each step of our method, we choose regularization parameter proportional to a certain power of the gradient norm at the current point. We introduce a family of problem classes characterized by Hölder continuity of either the second or third derivative. Then we present the method with a simple adaptive search procedure allowing an automatic adjustment to the problem class with the best global complexity bounds, without knowing specific parameters of the problem. In particular, for the class of functions with Lipschitz continuous third derivative, we get the global $O(1/k^3)$ rate, which was previously attributed to third-order tensor methods. When the objective function is uniformly convex, we justify an automatic acceleration of our scheme, resulting in a faster global rate and local superlinear convergence. The switching between the different rates (sublinear, linear, and superlinear) is automatic. Again, for that, no a priori knowledge of parameters is needed.

preprint2020arXiv

Affine-invariant contracting-point methods for Convex Optimization

In this paper, we develop new affine-invariant algorithms for solving composite convex minimization problems with bounded domain. We present a general framework of Contracting-Point methods, which solve at each iteration an auxiliary subproblem restricting the smooth part of the objective function onto contraction of the initial domain. This framework provides us with a systematic way for developing optimization methods of different order, endowed with the global complexity bounds. We show that using an appropriate affine-invariant smoothness condition, it is possible to implement one iteration of the Contracting-Point method by one step of the pure tensor method of degree $p \geq 1$. The resulting global rate of convergence in functional residual is then ${\cal O}(1 / k^p)$, where $k$ is the iteration counter. It is important that all constants in our bounds are affine-invariant. For $p = 1$, our scheme recovers well-known Frank-Wolfe algorithm, providing it with a new interpretation by a general perspective of tensor methods. Finally, within our framework, we present efficient implementation and total complexity analysis of the inexact second-order scheme $(p = 2)$, called Contracting Newton method. It can be seen as a proper implementation of the trust-region idea. Preliminary numerical results confirm its good practical performance both in the number of iterations, and in computational time.

preprint2020arXiv

Stochastic Subspace Cubic Newton Method

In this paper, we propose a new randomized second-order optimization algorithm---Stochastic Subspace Cubic Newton (SSCN)---for minimizing a high dimensional convex function $f$. Our method can be seen both as a {\em stochastic} extension of the cubically-regularized Newton method of Nesterov and Polyak (2006), and a {\em second-order} enhancement of stochastic subspace descent of Kozak et al. (2019). We prove that as we vary the minibatch size, the global convergence rate of SSCN interpolates between the rate of stochastic coordinate descent (CD) and the rate of cubic regularized Newton, thus giving new insights into the connection between first and second-order methods. Remarkably, the local convergence rate of SSCN matches the rate of stochastic subspace descent applied to the problem of minimizing the quadratic function $\frac12 (x-x^*)^\top \nabla^2f(x^*)(x-x^*)$, where $x^*$ is the minimizer of $f$, and hence depends on the properties of $f$ at the optimum only. Our numerical experiments show that SSCN outperforms non-accelerated first-order CD algorithms while being competitive to their accelerated variants.

Nikita Doikov

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

PACER: Acyclic Causal Discovery from Large-Scale Interventional Data

Lower Complexity Bounds for Minimizing Regularized Functions

Super-Universal Regularized Newton Method

Affine-invariant contracting-point methods for Convex Optimization

Stochastic Subspace Cubic Newton Method