Researcher profile

Sejun Park

Sejun Park contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2026arXiv

Floating-Point Networks with Automatic Differentiation Can Represent Almost All Floating-Point Functions and Their Gradients

Theoretical studies show that for any differentiable function on a compact domain, there exists a neural network that approximates both the function values and gradients. However, such a result cannot be used in practice since it assumes real parameters and exact internal operations. In contrast, real implementations only use a finite subset of reals and machine operations with round-off errors. In this work, we investigate whether a similar result holds for neural networks under floating-point arithmetic, when the gradient with respect to the input is computed by the automatic differentiation algorithm $D^\mathtt{AD}$. We first show that given a floating-point function $φ$ (e.g., a loss function), arbitrary function values and gradients can be represented by a floating-point network $f$ and $D^\mathtt{AD}(φ\circ f)$, respectively. We further extend this result: given $φ_1,\dots,φ_n$, $D^\mathtt{AD}(φ_i\circ f)$ can simultaneously represent arbitrary gradients while $f$ represents the target values, under mild conditions. Our results hold for practical activation functions, e.g., $\mathrm{ReLU}$, $\mathrm{ELU}$, $\mathrm{GeLU}$, $\mathrm{Swish}$, $\mathrm{Sigmoid}$, and $\mathrm{tanh}$.

preprint2021arXiv

Learning Bounds for Risk-sensitive Learning

In risk-sensitive learning, one aims to find a hypothesis that minimizes a risk-averse (or risk-seeking) measure of loss, instead of the standard expected loss. In this paper, we propose to study the generalization properties of risk-sensitive learning schemes whose optimand is described via optimized certainty equivalents (OCE): our general scheme can handle various known risks, e.g., the entropic risk, mean-variance, and conditional value-at-risk, as special cases. We provide two learning bounds on the performance of empirical OCE minimizer. The first result gives an OCE guarantee based on the Rademacher average of the hypothesis space, which generalizes and improves existing results on the expected loss and the conditional value-at-risk. The second result, based on a novel variance-based characterization of OCE, gives an expected loss guarantee with a suppressed dependence on the smoothness of the selected OCE. Finally, we demonstrate the practical implications of the proposed bounds via exploratory experiments on neural networks.

preprint2020arXiv

Electron spin relaxations of phosphorus donors in bulk silicon under large electric field

Modulation of donor electron wavefunction via electric fields is vital to quantum computing architectures based on donor spins in silicon. For practical and scalable applications, the donor-based qubits must retain sufficiently long coherence times in any realistic experimental conditions. Here, we present pulsed electron spin resonance studies on the longitudinal $(T_1)$ and transverse $(T_2)$ relaxation times of phosphorus donors in bulk silicon with various electric field strengths up to near avalanche breakdown in high magnetic fields of about 1.2 T and low temperatures of about 8 K. We find that the $T_1$ relaxation time is significantly reduced under large electric fields due to electric current, and $T_2$ is affected as the $T_1$ process can dominate decoherence. Furthermore, we show that the magnetoresistance effect in silicon can be exploited as a means to combat the reduction in the coherence times. While qubit coherence times must be much longer than quantum gate times, electrically accelerated $T_1$ can be found useful when qubit state initialization relies on thermal equilibration.

preprint2020arXiv

Learning with End-Users in Distribution Grids: Topology and Parameter Estimation

Efficient operation of distribution grids in the smart-grid era is hindered by the limited presence of real-time nodal and line meters. In particular, this prevents the easy estimation of grid topology and associated line parameters that are necessary for control and optimization efforts in the grid. This paper studies the problems of topology and parameter estimation in radial balanced distribution grids where measurements are restricted to only the leaf nodes and all intermediate nodes are unobserved/hidden. To this end, we propose two exact learning algorithms that use balanced voltage and injection measured only at the end-users. The first algorithm requires time-stamped voltage samples, statistics of nodal power injections and permissible line impedances to recover the true topology. The second and improved algorithm requires only time-stamped voltage and complex power samples to recover both the true topology and impedances without any additional input (e.g., number of grid nodes, statistics of injections at hidden nodes, permissible line impedances). We prove the correctness of both learning algorithms for grids where unobserved buses/nodes have a degree greater than three and discuss extensions to regimes where that assumption doesn't hold. Further, we present computational and, more importantly, the sample complexity of our proposed algorithm for joint topology and impedance estimation. We illustrate the performance of the designed algorithms through numerical experiments on the IEEE and custom power distribution models.

preprint2020arXiv

Lookahead: A Far-Sighted Alternative of Magnitude-based Pruning

Magnitude-based pruning is one of the simplest methods for pruning neural networks. Despite its simplicity, magnitude-based pruning and its variants demonstrated remarkable performances for pruning modern architectures. Based on the observation that magnitude-based pruning indeed minimizes the Frobenius distortion of a linear operator corresponding to a single layer, we develop a simple pruning method, coined lookahead pruning, by extending the single layer optimization to a multi-layer optimization. Our experimental results demonstrate that the proposed method consistently outperforms magnitude-based pruning on various networks, including VGG and ResNet, particularly in the high-sparsity regime. See https://github.com/alinlab/lookahead_pruning for codes.

preprint2020arXiv

Minimum Width for Universal Approximation

The universal approximation property of width-bounded networks has been studied as a dual of classical universal approximation results on depth-bounded networks. However, the critical width enabling the universal approximation has not been exactly characterized in terms of the input dimension $d_x$ and the output dimension $d_y$. In this work, we provide the first definitive result in this direction for networks using the ReLU activation functions: The minimum width required for the universal approximation of the $L^p$ functions is exactly $\max\{d_x+1,d_y\}$. We also prove that the same conclusion does not hold for the uniform approximation with ReLU, but does hold with an additional threshold activation function. Our proof technique can be also used to derive a tighter upper bound on the minimum width required for the universal approximation using networks with general activation functions.