Source author record

Haoran Chen

Haoran Chen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Computation and Language Machine Learning cond-mat.mes-hall cond-mat.mtrl-sci Numerical Analysis physics.comp-ph Robotics

Catalog footprint

What is connected

9works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Enhancing SignSGD: Small-Batch Convergence Analysis and a Hybrid Switching Strategy

SignSGD compresses each stochastic gradient coordinate to a single bit, offering substantial memory and communication savings, but its 1-bit quantization removes magnitude information and is known to leave a generalization gap relative to well-tuned SGD. We revisit SignSGD from a 1-bit quantization and dithering perspective and contribute three improvements. First, we derive a small-batch convergence rate for SignSGD under unimodal symmetric gradient noise using a signal-to-noise weighted stationarity measure, removing the large-batch assumption of prior analyses. Second, we inject annealed Gaussian noise before the sign operator, which acts as a classical dithering mechanism and probabilistically restores magnitude information lost to hard thresholding. Third, we adapt the SWATS strategy to sign-based updates with a projection-based learning-rate calibration that smoothly transitions from SignSGD to SGD. Single-worker experiments on ResNet-18 isolate optimizer effects from communication aspects: pre-sign dithering surpasses Adam on CIFAR-100, and the calibrated switch reaches 92.18% test accuracy on CIFAR-10, outperforming both pure SGD 91.38% and pure SignSGD with momentum 90.82%.

preprint2026arXiv

Soft Responsive Materials Enhance Humanoid Safety

Humanoid robots are envisioned as general-purpose platforms in human-centered environments, yet their deployment is limited by vulnerability to falls and the risks posed by rigid metal-plastic structures to people and surroundings. We introduce a soft-rigid co-design framework that leverages non-Newtonian fluid-based soft responsive materials to enhance humanoid safety. The material remains compliant during normal interaction but rapidly stiffens under impact, absorbing and dissipating fall-induced forces. Physics-based simulations guide protector placement and thickness and enable learning of active fall policies. Applied to a 42 kg life-size humanoid, the protector markedly reduces peak impact and allows repeated falls without hardware damage, including drops from 3 m and tumbles down long staircases. Across diverse scenarios, the approach improves robot robustness and environmental safety. By uniting responsive materials, structural co-design, and learning-based control, this work advances interact-safe, industry-ready humanoid robots.

preprint2022arXiv

Non-perturbative $ab$ $initio$ approach for calculating the electrical conductivity of a liquid metal

We propose a non-perturbative $ab$ $initio$ approach to calculate the electrical conductivity of a liquid metal. Our approach is based on the Kubo formula and the theory of electron-phonon coupling (EPC), and unlike the conventional empirical approach based on the Kubo-Greenwood formula, fully takes into account the effect of coupling between electrons and moving ions. We show that the electrical conductivity at high temperature is determined by an EPC parameter $λ_{\mathrm{tr}}$, which can be inferred, non-perturbatively, from the correlation of electron scattering matrices induced by ions. The latter can be evaluated in a molecular dynamics simulation. Based on the density-functional theory and pseudopotential methods, we implement the approach in an $ab$ $initio$ manner. We apply it to liquid sodium and obtain results in good agreement with experiments. This approach is efficient and based on a rigorous theory, suitable for applying to general metallic liquid systems.

preprint2021arXiv

Shock-wave-like emission of spin waves induced by interfacial Dzyaloshinskii-Moriya interaction

We investigated spin wave (SW) propagation and emission in thin film systems with strong interfacial Dzyaloshinskii-Moriya interaction (DMI) utilizing micromagnetic simulation. The effect of DMI on SW propagation is analogous to the flow of magnetic medium leading to the spin Doppler effect, and a spin-polarized current can enhance or suppress it. It is demonstrated that, for a Doppler velocity exceeding a critical value, a shock-wave-like emission of SWs with a cone-shape emerges from a magnetically irregular point as the cone apex. The cone angle is quantitatively determined by the DMI-induced Doppler velocity. Combining the interfacial DMI and the spin-polarized current, a constant SW emission by a static source is demonstrated, which provides a promising route to efficiently generate SWs with tunable frequency.

preprint2020arXiv

A Semantics-Assisted Video Captioning Model Trained with Scheduled Sampling

Given the features of a video, recurrent neural networks can be used to automatically generate a caption for the video. Existing methods for video captioning have at least three limitations. First, semantic information has been widely applied to boost the performance of video captioning models, but existing networks often fail to provide meaningful semantic features. Second, the Teacher Forcing algorithm is often utilized to optimize video captioning models, but during training and inference, different strategies are applied to guide word generation, leading to poor performance. Third, current video captioning models are prone to generate relatively short captions that express video contents inappropriately. Toward resolving these three problems, we suggest three corresponding improvements. First of all, we propose a metric to compare the quality of semantic features, and utilize appropriate features as input for a semantic detection network (SDN) with adequate complexity in order to generate meaningful semantic features for videos. Then, we apply a scheduled sampling strategy that gradually transfers the training phase from a teacher-guided manner toward a more self-teaching manner. Finally, the ordinary logarithm probability loss function is leveraged by sentence length so that the inclination of generating short sentences is alleviated. Our model achieves better results than previous models on the YouTube2Text dataset and is competitive with the previous best model on the MSR-VTT dataset.

preprint2020arXiv

Delving Deeper into the Decoder for Video Captioning

Video captioning is an advanced multi-modal task which aims to describe a video clip using a natural language sentence. The encoder-decoder framework is the most popular paradigm for this task in recent years. However, there exist some problems in the decoder of a video captioning model. We make a thorough investigation into the decoder and adopt three techniques to improve the performance of the model. First of all, a combination of variational dropout and layer normalization is embedded into a recurrent unit to alleviate the problem of overfitting. Secondly, a new online method is proposed to evaluate the performance of a model on a validation set so as to select the best checkpoint for testing. Finally, a new training strategy called professional learning is proposed which uses the strengths of a captioning model and bypasses its weaknesses. It is demonstrated in the experiments on Microsoft Research Video Description Corpus (MSVD) and MSR-Video to Text (MSR-VTT) datasets that our model has achieved the best results evaluated by BLEU, CIDEr, METEOR and ROUGE-L metrics with significant gains of up to 18% on MSVD and 3.5% on MSR-VTT compared with the previous state-of-the-art models.

preprint2020arXiv

TextScanner: Reading Characters in Order for Robust Scene Text Recognition

Driven by deep learning and the large volume of data, scene text recognition has evolved rapidly in recent years. Formerly, RNN-attention based methods have dominated this field, but suffer from the problem of \textit{attention drift} in certain situations. Lately, semantic segmentation based algorithms have proven effective at recognizing text of different forms (horizontal, oriented and curved). However, these methods may produce spurious characters or miss genuine characters, as they rely heavily on a thresholding procedure operated on segmentation maps. To tackle these challenges, we propose in this paper an alternative approach, called TextScanner, for scene text recognition. TextScanner bears three characteristics: (1) Basically, it belongs to the semantic segmentation family, as it generates pixel-wise, multi-channel segmentation maps for character class, position and order; (2) Meanwhile, akin to RNN-attention based methods, it also adopts RNN for context modeling; (3) Moreover, it performs paralleled prediction for character position and class, and ensures that characters are transcripted in correct order. The experiments on standard benchmark datasets demonstrate that TextScanner outperforms the state-of-the-art methods. Moreover, TextScanner shows its superiority in recognizing more difficult text such Chinese transcripts and aligning with target characters.

preprint2016arXiv

Partial Least Squares Regression on Riemannian Manifolds and Its Application in Classifications

Partial least squares regression (PLSR) has been a popular technique to explore the linear relationship between two datasets. However, most of algorithm implementations of PLSR may only achieve a suboptimal solution through an optimization on the Euclidean space. In this paper, we propose several novel PLSR models on Riemannian manifolds and develop optimization algorithms based on Riemannian geometry of manifolds. This algorithm can calculate all the factors of PLSR globally to avoid suboptimal solutions. In a number of experiments, we have demonstrated the benefits of applying the proposed model and algorithm to a variety of learning tasks in pattern recognition and object classification.

preprint2015arXiv

Fast Optimization Algorithm on Riemannian Manifolds and Its Application in Low-Rank Representation

The paper addresses the problem of optimizing a class of composite functions on Riemannian manifolds and a new first order optimization algorithm (FOA) with a fast convergence rate is proposed. Through the theoretical analysis for FOA, it has been proved that the algorithm has quadratic convergence. The experiments in the matrix completion task show that FOA has better performance than other first order optimization methods on Riemannian manifolds. A fast subspace pursuit method based on FOA is proposed to solve the low-rank representation model based on augmented Lagrange method on the low rank matrix variety. Experimental results on synthetic and real data sets are presented to demonstrate that both FOA and SP-RPRG(ALM) can achieve superior performance in terms of faster convergence and higher accuracy.

Haoran Chen

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

Enhancing SignSGD: Small-Batch Convergence Analysis and a Hybrid Switching Strategy

Soft Responsive Materials Enhance Humanoid Safety

Non-perturbative $ab$ $initio$ approach for calculating the electrical conductivity of a liquid metal

Shock-wave-like emission of spin waves induced by interfacial Dzyaloshinskii-Moriya interaction

A Semantics-Assisted Video Captioning Model Trained with Scheduled Sampling

Delving Deeper into the Decoder for Video Captioning

TextScanner: Reading Characters in Order for Robust Scene Text Recognition

Partial Least Squares Regression on Riemannian Manifolds and Its Application in Classifications

Fast Optimization Algorithm on Riemannian Manifolds and Its Application in Low-Rank Representation