Source author record

Sungho Shin

Sungho Shin appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.OC eess.SY Neural and Evolutionary Computing Systems and Control Computer Vision Computation and Language Robotics Sound

Catalog footprint

What is connected

21works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Improved Approximation Bounds for Moore-Penrose Inverses of Banded Matrices with Applications to Continuous-Time Linear Quadratic Control

We present improved approximation bounds for the Moore-Penrose inverses of banded matrices, where the bandedness is induced by a metric on the index set. We show that the pseudoinverse of a banded matrix can be approximated by another banded matrix, and the error of approximation is exponentially small in the ratio of the bandwidth of the approximation to that of the original matrix. An intuitive corollary can be obtained: the off-diagonal blocks of the pseudoinverse decay exponentially with the distance between the node sets associated with row and column indices, on the given metric space. Our bounds are expressed in terms of the bound of singular values of the system. For saddle point systems, commonly encountered in optimization, we provide the bounds of singular values associated under standard regularity conditions. Remarkably, our bounds improve previously reported ones and allow us to establish a perturbation bound for continuous-domain optimal control problems by analyzing the asymptotic limit of their finite difference discretization, which has been challenging with previously reported bounds.

preprint2023arXiv

Parallel Interior-Point Solver for Block-Structured Nonlinear Programs on SIMD/GPU Architectures

We investigate how to port the standard interior-point method to new exascale architectures for block-structured nonlinear programs with state equations. Computationally, we decompose the interior-point algorithm into two successive operations: the evaluation of the derivatives and the solution of the associated Karush-Kuhn-Tucker (KKT) linear system. Our method accelerates both operations using two levels of parallelism. First, we distribute the computations on multiple processes using coarse parallelism. Second, each process uses a SIMD/GPU accelerator locally to accelerate the operations using fine-grained parallelism. The KKT system is reduced by eliminating the inequalities and the state variables from the corresponding equations, to a dense matrix encoding the sensitivities of the problem's degrees of freedom, drastically minimizing the memory exchange. We demonstrate the method's capability on the supercomputer Polaris, a testbed for the future exascale Aurora system. Each node is equipped with four GPUs, a setup amenable to our two-level approach. Our experiments on the stochastic optimal power flow problem show that the method can achieve a 50x speed-up compared to the state-of-the-art method.

preprint2021arXiv

Object Detection for Understanding Assembly Instruction Using Context-aware Data Augmentation and Cascade Mask R-CNN

Understanding assembly instruction has the potential to enhance the robot s task planning ability and enables advanced robotic applications. To recognize the key components from the 2D assembly instruction image, We mainly focus on segmenting the speech bubble area, which contains lots of information about instructions. For this, We applied Cascade Mask R-CNN and developed a context-aware data augmentation scheme for speech bubble segmentation, which randomly combines images cuts by considering the context of assembly instructions. We showed that the proposed augmentation scheme achieves a better segmentation performance compared to the existing augmentation algorithm by increasing the diversity of trainable data while considering the distribution of components locations. Also, we showed that deep learning can be useful to understand assembly instruction by detecting the essential objects in the assembly instruction, such as tools and parts.

preprint2020arXiv

A Hierarchical Optimization Architecture for Large-Scale Power Networks

We present a hierarchical optimization architecture for large-scale power networks that overcomes limitations of fully centralized and fully decentralized architectures. The architecture leverages principles of multigrid computing schemes, which are widely used in the solution of partial differential equations on massively parallel computers. The top layer of the architecture uses a coarse representation of the entire network while the bottom layer is composed of a family of decentralized optimization agents each operating on a network subdomain at full resolution. We use an alternating direction method of multipliers (ADMM) framework to drive coordination of the decentralized agents. We show that state and dual information obtained from the top layer can be used to accelerate the coordination of the decentralized optimization agents and to recover optimality for the entire system. We demonstrate that the hierarchical architecture can be used to manage large collections of microgrids.

preprint2020arXiv

Characterizing the Predictive Accuracy of Dynamic Mode Decomposition for Data-Driven Control

Dynamic mode decomposition (DMD) is a versatile approach that enables the construction of low-order models from data. Controller design tasks based on such models require estimates and guarantees on predictive accuracy. In this work, we provide a theoretical analysis of DMD model errors that reveals impact of model order and data availability. The analysis also establishes conditions under which DMD models can be made asymptotically exact. We verify our results using a 2D diffusion system.

preprint2020arXiv

Computing Economic-Optimal and Stable Equilibria for Droop-Controlled Microgrids

We consider the problem of computing equilibria (steady-states) for droop-controlled, islanded, AC microgrids that are both economic-optimal and dynamically stable. This work is motivated by the observation that classical optimal power flow (OPF) formulations used for economic optimization might provide equilibria that are not reachable by low-level controllers (i.e., closed-loop unstable). This arises because OPF problems only enforce steady-state conditions and do not capture the dynamics. We explain this behavior by using a port-Hamiltonian microgrid representation. To overcome the limitations of OPF, the port-Hamiltonian representation can be exploited to derive a bilevel OPF formulation that seeks to optimize economics while enforcing stability. Unfortunately, bilevel optimization with a nonconvex inner problem is difficult to solve in general. As such, we propose an alternative approach (that we call probing OPF), which identifies an economic-optimal and stable equilibrium by probing a neighborhood of equilibria using random perturbations. The probing OPF is advantageous in that it is formulated as a standard nonlinear program, in that it is compatible with existing OPF frameworks, and in that it is applicable to diverse microgrid models. Experiments with the IEEE 118-bus system reveal that few probing points are required to enforce stability.

preprint2020arXiv

Decentralized Schemes with Overlap for Solving Graph-Structured Optimization Problems

We present a new algorithmic paradigm for the decentralized solution of graph-structured optimization problems that arise in the estimation and control of network systems. A key and novel design concept of the proposed approach is that it uses overlapping subdomains to promote and accelerate convergence. We show that the algorithm converges if the size of the overlap is sufficiently large and that the convergence rate improves exponentially with the size of the overlap. The proposed approach provides a bridge between fully decentralized and centralized architectures and is flexible in that it enables the implementation of asynchronous schemes, handling of constraints, and balancing of computing, communication, and data privacy needs. The proposed scheme is tested in an estimation problem for a 9241-node power network and we show that it outperforms the alternating direction method of multipliers.

preprint2020arXiv

Multi-Grid Schemes for Multi-Scale Coordination of Energy Systems

We discuss how multi-grid computing schemes can be used to design hierarchical coordination architectures for energy systems. These hierarchical architectures can be used to manage multiple temporal and spatial scales and mitigate fundamental limitations of centralized and decentralized architectures. We present the basic elements of a multi-grid scheme, which includes a smoothing operator (a high-resolution decentralized coordination layer that targets phenomena at high frequencies) and a coarsening operator (a low-resolution centralized coordination layer that targets phenomena at low frequencies). For smoothing, we extend existing convergence results for Gauss-Seidel schemes by applying them to systems that cover unstructured domains. This allows us to target problems with multiple timescales and arbitrary networks. The proposed coordination schemes can be used to guide transactions in decentralized electricity markets. We present a storage control example and a power flow diffusion example to illustrate the developments.

preprint2020arXiv

Multiple Classification with Split Learning

Privacy issues were raised in the process of training deep learning in medical, mobility, and other fields. To solve this problem, we present privacy-preserving distributed deep learning method that allow clients to learn a variety of data without direct exposure. We divided a single deep learning architecture into a common extractor, a cloud model and a local classifier for the distributed learning. First, the common extractor, which is used by local clients, extracts secure features from the input data. The secure features also take the role that the cloud model can employ various task and diverse types of data. The feature contain the most important information that helps to proceed various task. Second, the cloud model including most parts of the whole training model gets the embedded features from the massive local clients, and performs most of deep learning operations which takes severe computing cost. After the operations in cloud model finished, outputs of the cloud model send back to local clients. Finally, the local classifier determined classification results and delivers the results to local clients. When clients train models, our model does not directly expose sensitive information to exterior network. During the test, the average performance improvement was 2.63% over the existing local training model. However, in a distributed environment, there is a possibility of inversion attack due to exposed features. For this reason, we experimented with the common extractor to prevent data restoration. The quality of restoration of the original image was tested by adjusting the depth of the common extractor. As a result, we found that the deeper the common extractor, the restoration score decreased to 89.74.

preprint2020arXiv

On the Convergence of the Dynamic Inner PCA Algorithm

Dynamic inner principal component analysis (DiPCA) is a powerful method for the analysis of time-dependent multivariate data. DiPCA extracts dynamic latent variables that capture the most dominant temporal trends by solving a large-scale, dense, and nonconvex nonlinear program (NLP). A scalable decomposition algorithm has been recently proposed in the literature to solve these challenging NLPs. The decomposition algorithm performs well in practice but its convergence properties are not well understood. In this work, we show that this algorithm is a specialized variant of a coordinate maximization algorithm. This observation allows us to explain why the decomposition algorithm might work (or not) in practice and can guide improvements. We compare the performance of the decomposition strategies with that of the off-the-shelf solver Ipopt. The results show that decomposition is more scalable and, surprisingly, delivers higher quality solutions.

preprint2020arXiv

Overlapping Schwarz Decomposition for Constrained Quadratic Programs

We present an overlapping Schwarz decomposition algorithm for constrained quadratic programs (QPs). Schwarz algorithms have been traditionally used to solve linear algebra systems arising from partial differential equations, but we have recently shown that they are also effective at solving structured optimization problems. In the proposed scheme, we consider QPs whose algebraic structure can be represented by graphs. The graph domain is partitioned into overlapping subdomains (yielding a set of coupled subproblems), solutions for the subproblems are computed in parallel, and convergence is enforced by updating primal-dual information in the overlapping regions. We show that convergence is guaranteed if the overlap is sufficiently large and that the convergence rate improves exponentially with the size of the overlap. Convergence results rely on a key property of graph-structured problems that is known as exponential decay of sensitivity. Here, we establish conditions under which this property holds for constrained QPs (as those found in network optimization and optimal control), thus extending existing work that addresses unconstrained QPs. The numerical behavior of the Schwarz scheme is demonstrated by using a DC optimal power flow problem defined over a network with 9,241 nodes.

preprint2020arXiv

Quantized Neural Networks: Characterization and Holistic Optimization

Quantized deep neural networks (QDNNs) are necessary for low-power, high throughput, and embedded applications. Previous studies mostly focused on developing optimization methods for the quantization of given models. However, quantization sensitivity depends on the model architecture. Therefore, the model selection needs to be a part of the QDNN design process. Also, the characteristics of weight and activation quantization are quite different. This study proposes a holistic approach for the optimization of QDNNs, which contains QDNN training methods as well as quantization-friendly architecture design. Synthesized data is used to visualize the effects of weight and activation quantization. The results indicate that deeper models are more prone to activation quantization, while wider models improve the resiliency to both weight and activation quantization. This study can provide insight into better optimization of QDNNs.

preprint2020arXiv

S-SGD: Symmetrical Stochastic Gradient Descent with Weight Noise Injection for Reaching Flat Minima

The stochastic gradient descent (SGD) method is most widely used for deep neural network (DNN) training. However, the method does not always converge to a flat minimum of the loss surface that can demonstrate high generalization capability. Weight noise injection has been extensively studied for finding flat minima using the SGD method. We devise a new weight-noise injection-based SGD method that adds symmetrical noises to the DNN weights. The training with symmetrical noise evaluates the loss surface at two adjacent points, by which convergence to sharp minima can be avoided. Fixed-magnitude symmetric noises are added to minimize training instability. The proposed method is compared with the conventional SGD method and previous weight-noise injection algorithms using convolutional neural networks for image classification. Particularly, performance improvements in large batch training are demonstrated. This method shows superior performance compared with conventional SGD and weight-noise injection methods regardless of the batch-size and learning rate scheduling algorithms.

preprint2020arXiv

SQWA: Stochastic Quantized Weight Averaging for Improving the Generalization Capability of Low-Precision Deep Neural Networks

Designing a deep neural network (DNN) with good generalization capability is a complex process especially when the weights are severely quantized. Model averaging is a promising approach for achieving the good generalization capability of DNNs, especially when the loss surface for training contains many sharp minima. We present a new quantized neural network optimization approach, stochastic quantized weight averaging (SQWA), to design low-precision DNNs with good generalization capability using model averaging. The proposed approach includes (1) floating-point model training, (2) direct quantization of weights, (3) capturing multiple low-precision models during retraining with cyclical learning rates, (4) averaging the captured models, and (5) re-quantizing the averaged model and fine-tuning it with low-learning rates. Additionally, we present a loss-visualization technique on the quantized weight domain to clearly elucidate the behavior of the proposed method. Visualization results indicate that a quantized DNN (QDNN) optimized with the proposed approach is located near the center of the flat minimum in the loss surface. With SQWA training, we achieved state-of-the-art results for 2-bit QDNNs on CIFAR-100 and ImageNet datasets. Although we only employed a uniform quantization scheme for the sake of implementation in VLSI or low-precision neural processing units, the performance achieved exceeded those of previous studies employing non-uniform quantization.

preprint2020arXiv

Unifying Theorems for Subspace Identification and Dynamic Mode Decomposition

This paper presents unifying results for subspace identification (SID) and dynamic mode decomposition (DMD) for autonomous dynamical systems. We observe that SID seeks to solve an optimization problem to estimate an extended observability matrix and a state sequence that minimizes the prediction error for the state-space model. Moreover, we observe that DMD seeks to solve a rank-constrained matrix regression problem that minimizes the prediction error of an extended autoregressive model. We prove that existence conditions for perfect (error-free) state-space and low-rank extended autoregressive models are equivalent and that the SID and DMD optimization problems are equivalent. We exploit these results to propose a SID-DMD algorithm that delivers a provably optimal model and that is easy to implement. We demonstrate our developments using a case study that aims to build dynamical models directly from video data.

preprint2019arXiv

A Parallel Decomposition Scheme for Solving Long-Horizon Optimal Control Problems

We present a temporal decomposition scheme for solving long-horizon optimal control problems. In the proposed scheme, the time domain is decomposed into a set of subdomains with partially overlapping regions. Subproblems associated with the subdomains are solved in parallel to obtain local primal-dual trajectories that are assembled to obtain the global trajectories. We provide a sufficient condition that guarantees convergence of the proposed scheme. This condition states that the effect of perturbations on the boundary conditions (i.e., initial state and terminal dual/adjoint variable) should decay asymptotically as one moves away from the boundaries. This condition also reveals that the scheme converges if the size of the overlap is sufficiently large and that the convergence rate improves with the size of the overlap. We prove that linear quadratic problems satisfy the asymptotic decay condition, and we discuss numerical strategies to determine if the condition holds in more general cases. We draw upon a non-convex optimal control problem to illustrate the performance of the proposed scheme.

preprint2016arXiv

Dynamic Hand Gesture Recognition for Wearable Devices with Low Complexity Recurrent Neural Networks

Gesture recognition is a very essential technology for many wearable devices. While previous algorithms are mostly based on statistical methods including the hidden Markov model, we develop two dynamic hand gesture recognition techniques using low complexity recurrent neural network (RNN) algorithms. One is based on video signal and employs a combined structure of a convolutional neural network (CNN) and an RNN. The other uses accelerometer data and only requires an RNN. Fixed-point optimization that quantizes most of the weights into two bits is conducted to optimize the amount of memory size for weight storage and reduce the power consumption in hardware and software based implementations.

preprint2016arXiv

Fixed-Point Performance Analysis of Recurrent Neural Networks

Recurrent neural networks have shown excellent performance in many applications, however they require increased complexity in hardware or software based implementations. The hardware complexity can be much lowered by minimizing the word-length of weights and signals. This work analyzes the fixed-point performance of recurrent neural networks using a retrain based quantization method. The quantization sensitivity of each layer in RNNs is studied, and the overall fixed-point optimization results minimizing the capacity of weights while not sacrificing the performance are presented. A language model and a phoneme recognition examples are used.

preprint2016arXiv

FPGA-Based Low-Power Speech Recognition with Recurrent Neural Networks

In this paper, a neural network based real-time speech recognition (SR) system is developed using an FPGA for very low-power operation. The implemented system employs two recurrent neural networks (RNNs); one is a speech-to-character RNN for acoustic modeling (AM) and the other is for character-level language modeling (LM). The system also employs a statistical word-level LM to improve the recognition accuracy. The results of the AM, the character-level LM, and the word-level LM are combined using a fairly simple N-best search algorithm instead of the hidden Markov model (HMM) based network. The RNNs are implemented using massively parallel processing elements (PEs) for low latency and high throughput. The weights are quantized to 6 bits to store all of them in the on-chip memory of an FPGA. The proposed algorithm is implemented on a Xilinx XC7Z045, and the system can operate much faster than real-time.

preprint2016arXiv

Quantized neural network design under weight capacity constraint

The complexity of deep neural network algorithms for hardware implementation can be lowered either by scaling the number of units or reducing the word-length of weights. Both approaches, however, can accompany the performance degradation although many types of research are conducted to relieve this problem. Thus, it is an important question which one, between the network size scaling and the weight quantization, is more effective for hardware optimization. For this study, the performances of fully-connected deep neural networks (FCDNNs) and convolutional neural networks (CNNs) are evaluated while changing the network complexity and the word-length of weights. Based on these experiments, we present the effective compression ratio (ECR) to guide the trade-off between the network size and the precision of weights when the hardware resource is limited.

preprint2016arXiv

Resiliency of Deep Neural Networks under Quantization

The complexity of deep neural network algorithms for hardware implementation can be much lowered by optimizing the word-length of weights and signals. Direct quantization of floating-point weights, however, does not show good performance when the number of bits assigned is small. Retraining of quantized networks has been developed to relieve this problem. In this work, the effects of retraining are analyzed for a feedforward deep neural network (FFDNN) and a convolutional neural network (CNN). The network complexity is controlled to know their effects on the resiliency of quantized networks by retraining. The complexity of the FFDNN is controlled by varying the unit size in each hidden layer and the number of layers, while that of the CNN is done by modifying the feature map configuration. We find that the performance gap between the floating-point and the retrain-based ternary (+1, 0, -1) weight neural networks exists with a fair amount in 'complexity limited' networks, but the discrepancy almost vanishes in fully complex networks whose capability is limited by the training data, rather than by the number of connections. This research shows that highly complex DNNs have the capability of absorbing the effects of severe weight quantization through retraining, but connection limited networks are less resilient. This paper also presents the effective compression ratio to guide the trade-off between the network size and the precision when the hardware resource is limited.

Sungho Shin

What is connected

Connect this record

See the researcher in context

Building this map preview

21 published item(s)

Improved Approximation Bounds for Moore-Penrose Inverses of Banded Matrices with Applications to Continuous-Time Linear Quadratic Control

Parallel Interior-Point Solver for Block-Structured Nonlinear Programs on SIMD/GPU Architectures

Object Detection for Understanding Assembly Instruction Using Context-aware Data Augmentation and Cascade Mask R-CNN

A Hierarchical Optimization Architecture for Large-Scale Power Networks

Characterizing the Predictive Accuracy of Dynamic Mode Decomposition for Data-Driven Control

Computing Economic-Optimal and Stable Equilibria for Droop-Controlled Microgrids

Decentralized Schemes with Overlap for Solving Graph-Structured Optimization Problems

Multi-Grid Schemes for Multi-Scale Coordination of Energy Systems

Multiple Classification with Split Learning

On the Convergence of the Dynamic Inner PCA Algorithm

Overlapping Schwarz Decomposition for Constrained Quadratic Programs

Quantized Neural Networks: Characterization and Holistic Optimization

S-SGD: Symmetrical Stochastic Gradient Descent with Weight Noise Injection for Reaching Flat Minima

SQWA: Stochastic Quantized Weight Averaging for Improving the Generalization Capability of Low-Precision Deep Neural Networks

Unifying Theorems for Subspace Identification and Dynamic Mode Decomposition

A Parallel Decomposition Scheme for Solving Long-Horizon Optimal Control Problems

Dynamic Hand Gesture Recognition for Wearable Devices with Low Complexity Recurrent Neural Networks

Fixed-Point Performance Analysis of Recurrent Neural Networks

FPGA-Based Low-Power Speech Recognition with Recurrent Neural Networks

Quantized neural network design under weight capacity constraint

Resiliency of Deep Neural Networks under Quantization