Source author record

Bo Shen

Bo Shen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computer Vision cond-mat.mes-hall cond-mat.supr-con math.OC Neural and Evolutionary Computing

Catalog footprint

What is connected

7works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

On the Provable Generalization of Recurrent Neural Networks

Recurrent Neural Network (RNN) is a fundamental structure in deep learning. Recently, some works study the training process of over-parameterized neural networks, and show that over-parameterized networks can learn functions in some notable concept classes with a provable generalization error bound. In this paper, we analyze the training and generalization for RNNs with random initialization, and provide the following improvements over recent works: 1) For a RNN with input sequence $x=(X_1,X_2,...,X_L)$, previous works study to learn functions that are summation of $f(β^T_lX_l)$ and require normalized conditions that $||X_l||\leqε$ with some very small $ε$ depending on the complexity of $f$. In this paper, using detailed analysis about the neural tangent kernel matrix, we prove a generalization error bound to learn such functions without normalized conditions and show that some notable concept classes are learnable with the numbers of iterations and samples scaling almost-polynomially in the input length $L$. 2) Moreover, we prove a novel result to learn N-variables functions of input sequence with the form $f(β^T[X_{l_1},...,X_{l_N}])$, which do not belong to the "additive" concept class, i,e., the summation of function $f(X_l)$. And we show that when either $N$ or $l_0=\max(l_1,..,l_N)-\min(l_1,..,l_N)$ is small, $f(β^T[X_{l_1},...,X_{l_N}])$ will be learnable with the number iterations and samples scaling almost-polynomially in the input length $L$.

preprint2022arXiv

Self-scalable Tanh (Stan): Faster Convergence and Better Generalization in Physics-informed Neural Networks

Physics-informed Neural Networks (PINNs) are gaining attention in the engineering and scientific literature for solving a range of differential equations with applications in weather modeling, healthcare, manufacturing, etc. Poor scalability is one of the barriers to utilizing PINNs for many real-world problems. To address this, a Self-scalable tanh (Stan) activation function is proposed for the PINNs. The proposed Stan function is smooth, non-saturating, and has a trainable parameter. During training, it can allow easy flow of gradients to compute the required derivatives and also enable systematic scaling of the input-output mapping. It is shown theoretically that the PINNs with the proposed Stan function have no spurious stationary points when using gradient descent algorithms. The proposed Stan is tested on a number of numerical studies involving general regression problems. It is subsequently used for solving multiple forward problems, which involve second-order derivatives and multiple dimensions, and an inverse problem where the thermal diffusivity of a rod is predicted with heat conduction data. These case studies establish empirically that the Stan activation function can achieve better training and more accurate predictions than the existing activation functions in the literature.

preprint2022arXiv

Smooth Robust Tensor Completion for Background/Foreground Separation with Missing Pixels: Novel Algorithm with Convergence Guarantee

The objective of this study is to address the problem of background/foreground separation with missing pixels by combining the video acquisition, video recovery, background/foreground separation into a single framework. To achieve this, a smooth robust tensor completion (SRTC) model is proposed to recover the data and decompose it into the static background and smooth foreground, respectively. Specifically, the static background is modeled by the low-rank tucker decomposition and the smooth foreground (moving objects) is modeled by the spatiotemporal continuity, which is enforced by the total variation regularization. An efficient algorithm based on tensor proximal alternating minimization (tenPAM) is implemented to solve the proposed model with global convergence guarantee under very mild conditions. Extensive experiments on real data demonstrate that the proposed method significantly outperforms the state-of-the-art approaches for background/foreground separation with missing pixels.

preprint2020arXiv

Is the Skip Connection Provable to Reform the Neural Network Loss Landscape?

The residual network is now one of the most effective structures in deep learning, which utilizes the skip connections to ``guarantee" the performance will not get worse. However, the non-convexity of the neural network makes it unclear whether the skip connections do provably improve the learning ability since the nonlinearity may create many local minima. In some previous works \cite{freeman2016topology}, it is shown that despite the non-convexity, the loss landscape of the two-layer ReLU network has good properties when the number $m$ of hidden nodes is very large. In this paper, we follow this line to study the topology (sub-level sets) of the loss landscape of deep ReLU neural networks with a skip connection and theoretically prove that the skip connection network inherits the good properties of the two-layer network and skip connections can help to control the connectedness of the sub-level sets, such that any local minima worse than the global minima of some two-layer ReLU network will be very ``shallow". The ``depth" of these local minima are at most $O(m^{(η-1)/n})$, where $n$ is the input dimension, $η<1$. This provides a theoretical explanation for the effectiveness of the skip connection in deep learning.

preprint2020arXiv

Second-Order Convergence of Asynchronous Parallel Stochastic Gradient Descent: When Is the Linear Speedup Achieved?

In machine learning, asynchronous parallel stochastic gradient descent (APSGD) is broadly used to speed up the training process through multi-workers. Meanwhile, the time delay of stale gradients in asynchronous algorithms is generally proportional to the total number of workers, which brings additional deviation from the accurate gradient due to using delayed gradients. This may have a negative influence on the convergence of the algorithm. One may ask: How many workers can we use at most to achieve a good convergence and the linear speedup? In this paper, we consider the second-order convergence of asynchronous algorithms in non-convex optimization. We investigate the behaviors of APSGD with consistent read near strictly saddle points and provide a theoretical guarantee that if the total number of workers is bounded by $\widetilde{O}(K^{1/3}M^{-1/3})$ ($K$ is the total steps and $M$ is the mini-batch size), APSGD will converge to good stationary points ($||\nabla f(x)||\leq ε, \nabla^2 f(x)\succeq -\sqrtε\bm{I}, ε^2\leq O(\sqrt{\frac{1}{MK}}) $) and the linear speedup is achieved. Our works give the first theoretical guarantee on the second-order convergence for asynchronous algorithms. The technique we provide can be generalized to analyze other types of asynchronous algorithms to understand the behaviors of asynchronous algorithms in distributed asynchronous parallel training.

preprint2014arXiv

Generation and Electric Control of Spin-Coupled Valley Current in WSe2

The valley degree of freedom in layered transition-metal dichalcogenides (MX2) provides the opportunity to extend functionalities of novel spintronics and valleytronics devices. Due to spin splitting induced by spin-orbital coupling (SOC), the non-equilibrium charge carrier imbalance between two degenerate and inequivalent valleys to realize valley/spin polarization has been successfully demonstrated theoretically and supported by optical experiments. However, the generation of a valley/spin current by the valley polarization in MX2 remains elusive and a great challenge. Here, within an electric-double-layer transistor based on WSe2, we demonstrated a spin-coupled valley photocurrent whose direction and magnitude depend on the degree of circular polarization of the incident radiation and can be further greatly modulated with an external electric field. Such room temperature generation and electric control of valley/spin photocurrent provides a new property of electrons in MX2 systems, thereby enabling new degrees of control for quantum-confined spintronics devices.

preprint2011arXiv

Anisotropy of zero-resistance states in InN films under an in-plane magnetic filed

We report low temperature current-voltage measurements on n-type InN films grown by molecular beam epitaxy. The zero-resistance state with a large critical current around 1 mA has been observed at 0.3 K. Under in-plane field configuration, the zero-resistance state shows a large anisotropy in critical current for B parallel and perpendicular to applied current. The ratio of critical current between B parallel and perpendicular to the applied current can be up to 2.5, when B = 0.15T. The anisotropy is explained by the vortex flow in the context of type II superconductivity. We have thus established an important aspect of the phenomenology of superconductivity in an otherwise typical narrow gap semiconductor.

Bo Shen

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

On the Provable Generalization of Recurrent Neural Networks

Self-scalable Tanh (Stan): Faster Convergence and Better Generalization in Physics-informed Neural Networks

Smooth Robust Tensor Completion for Background/Foreground Separation with Missing Pixels: Novel Algorithm with Convergence Guarantee

Is the Skip Connection Provable to Reform the Neural Network Loss Landscape?

Second-Order Convergence of Asynchronous Parallel Stochastic Gradient Descent: When Is the Linear Speedup Achieved?

Generation and Electric Control of Spin-Coupled Valley Current in WSe2

Anisotropy of zero-resistance states in InN films under an in-plane magnetic filed