Source author record

Sejun Park

Sejun Park appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence cond-mat.mtrl-sci Data Structures and Algorithms physics.app-ph Systems and Control

Catalog footprint

What is connected

7works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Floating-Point Networks with Automatic Differentiation Can Represent Almost All Floating-Point Functions and Their Gradients

Theoretical studies show that for any differentiable function on a compact domain, there exists a neural network that approximates both the function values and gradients. However, such a result cannot be used in practice since it assumes real parameters and exact internal operations. In contrast, real implementations only use a finite subset of reals and machine operations with round-off errors. In this work, we investigate whether a similar result holds for neural networks under floating-point arithmetic, when the gradient with respect to the input is computed by the automatic differentiation algorithm $D^\mathtt{AD}$. We first show that given a floating-point function $φ$ (e.g., a loss function), arbitrary function values and gradients can be represented by a floating-point network $f$ and $D^\mathtt{AD}(φ\circ f)$, respectively. We further extend this result: given $φ_1,\dots,φ_n$, $D^\mathtt{AD}(φ_i\circ f)$ can simultaneously represent arbitrary gradients while $f$ represents the target values, under mild conditions. Our results hold for practical activation functions, e.g., $\mathrm{ReLU}$, $\mathrm{ELU}$, $\mathrm{GeLU}$, $\mathrm{Swish}$, $\mathrm{Sigmoid}$, and $\mathrm{tanh}$.

preprint2021arXiv

Learning Bounds for Risk-sensitive Learning

In risk-sensitive learning, one aims to find a hypothesis that minimizes a risk-averse (or risk-seeking) measure of loss, instead of the standard expected loss. In this paper, we propose to study the generalization properties of risk-sensitive learning schemes whose optimand is described via optimized certainty equivalents (OCE): our general scheme can handle various known risks, e.g., the entropic risk, mean-variance, and conditional value-at-risk, as special cases. We provide two learning bounds on the performance of empirical OCE minimizer. The first result gives an OCE guarantee based on the Rademacher average of the hypothesis space, which generalizes and improves existing results on the expected loss and the conditional value-at-risk. The second result, based on a novel variance-based characterization of OCE, gives an expected loss guarantee with a suppressed dependence on the smoothness of the selected OCE. Finally, we demonstrate the practical implications of the proposed bounds via exploratory experiments on neural networks.

preprint2020arXiv

Electron spin relaxations of phosphorus donors in bulk silicon under large electric field

Modulation of donor electron wavefunction via electric fields is vital to quantum computing architectures based on donor spins in silicon. For practical and scalable applications, the donor-based qubits must retain sufficiently long coherence times in any realistic experimental conditions. Here, we present pulsed electron spin resonance studies on the longitudinal $(T_1)$ and transverse $(T_2)$ relaxation times of phosphorus donors in bulk silicon with various electric field strengths up to near avalanche breakdown in high magnetic fields of about 1.2 T and low temperatures of about 8 K. We find that the $T_1$ relaxation time is significantly reduced under large electric fields due to electric current, and $T_2$ is affected as the $T_1$ process can dominate decoherence. Furthermore, we show that the magnetoresistance effect in silicon can be exploited as a means to combat the reduction in the coherence times. While qubit coherence times must be much longer than quantum gate times, electrically accelerated $T_1$ can be found useful when qubit state initialization relies on thermal equilibration.

preprint2020arXiv

Learning with End-Users in Distribution Grids: Topology and Parameter Estimation

Efficient operation of distribution grids in the smart-grid era is hindered by the limited presence of real-time nodal and line meters. In particular, this prevents the easy estimation of grid topology and associated line parameters that are necessary for control and optimization efforts in the grid. This paper studies the problems of topology and parameter estimation in radial balanced distribution grids where measurements are restricted to only the leaf nodes and all intermediate nodes are unobserved/hidden. To this end, we propose two exact learning algorithms that use balanced voltage and injection measured only at the end-users. The first algorithm requires time-stamped voltage samples, statistics of nodal power injections and permissible line impedances to recover the true topology. The second and improved algorithm requires only time-stamped voltage and complex power samples to recover both the true topology and impedances without any additional input (e.g., number of grid nodes, statistics of injections at hidden nodes, permissible line impedances). We prove the correctness of both learning algorithms for grids where unobserved buses/nodes have a degree greater than three and discuss extensions to regimes where that assumption doesn't hold. Further, we present computational and, more importantly, the sample complexity of our proposed algorithm for joint topology and impedance estimation. We illustrate the performance of the designed algorithms through numerical experiments on the IEEE and custom power distribution models.

preprint2020arXiv

Lookahead: A Far-Sighted Alternative of Magnitude-based Pruning

Magnitude-based pruning is one of the simplest methods for pruning neural networks. Despite its simplicity, magnitude-based pruning and its variants demonstrated remarkable performances for pruning modern architectures. Based on the observation that magnitude-based pruning indeed minimizes the Frobenius distortion of a linear operator corresponding to a single layer, we develop a simple pruning method, coined lookahead pruning, by extending the single layer optimization to a multi-layer optimization. Our experimental results demonstrate that the proposed method consistently outperforms magnitude-based pruning on various networks, including VGG and ResNet, particularly in the high-sparsity regime. See https://github.com/alinlab/lookahead_pruning for codes.

preprint2020arXiv

Minimum Width for Universal Approximation

The universal approximation property of width-bounded networks has been studied as a dual of classical universal approximation results on depth-bounded networks. However, the critical width enabling the universal approximation has not been exactly characterized in terms of the input dimension $d_x$ and the output dimension $d_y$. In this work, we provide the first definitive result in this direction for networks using the ReLU activation functions: The minimum width required for the universal approximation of the $L^p$ functions is exactly $\max\{d_x+1,d_y\}$. We also prove that the same conclusion does not hold for the uniform approximation with ReLU, but does hold with an additional threshold activation function. Our proof technique can be also used to derive a tighter upper bound on the minimum width required for the universal approximation using networks with general activation functions.

preprint2015arXiv

Minimum Weight Perfect Matching via Blossom Belief Propagation

Max-product Belief Propagation (BP) is a popular message-passing algorithm for computing a Maximum-A-Posteriori (MAP) assignment over a distribution represented by a Graphical Model (GM). It has been shown that BP can solve a number of combinatorial optimization problems including minimum weight matching, shortest path, network flow and vertex cover under the following common assumption: the respective Linear Programming (LP) relaxation is tight, i.e., no integrality gap is present. However, when LP shows an integrality gap, no model has been known which can be solved systematically via sequential applications of BP. In this paper, we develop the first such algorithm, coined Blossom-BP, for solving the minimum weight matching problem over arbitrary graphs. Each step of the sequential algorithm requires applying BP over a modified graph constructed by contractions and expansions of blossoms, i.e., odd sets of vertices. Our scheme guarantees termination in O(n^2) of BP runs, where n is the number of vertices in the original graph. In essence, the Blossom-BP offers a distributed version of the celebrated Edmonds' Blossom algorithm by jumping at once over many sub-steps with a single BP. Moreover, our result provides an interpretation of the Edmonds' algorithm as a sequence of LPs.

Sejun Park

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

Floating-Point Networks with Automatic Differentiation Can Represent Almost All Floating-Point Functions and Their Gradients

Learning Bounds for Risk-sensitive Learning

Electron spin relaxations of phosphorus donors in bulk silicon under large electric field

Learning with End-Users in Distribution Grids: Topology and Parameter Estimation

Lookahead: A Far-Sighted Alternative of Magnitude-based Pruning

Minimum Width for Universal Approximation

Minimum Weight Perfect Matching via Blossom Belief Propagation