Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
24works
0followers
20topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

24 published item(s)

preprint2026arXiv

Non-Prehensile Tool-Object Manipulation by Integrating LLM-Based Planning and Manoeuvrability-Driven Controls

The ability to wield tools was once considered exclusive to human intelligence, but it's now known that many other animals, like crows, possess this capability. Yet, robotic systems still fall short of matching biological dexterity. In this paper, we investigate the use of Large Language Models (LLMs), tool affordances, and object manoeuvrability for non-prehensile tool-based manipulation tasks. Our novel method leverages LLMs based on scene information and natural language instructions to enable symbolic task planning for tool-object manipulation. This approach allows the system to convert a human language sentence into a sequence of feasible motion functions. We have developed a novel manoeuvrability-driven controller using a new tool affordance model derived from visual feedback. This controller helps guide the robot's tool utilization and manipulation actions, even within confined areas, using a stepping incremental approach. The proposed methodology is evaluated with experiments to prove its effectiveness under various manipulation scenarios.

preprint2026arXiv

UniBiDex: A Unified Teleoperation Framework for Robotic Bimanual Dexterous Manipulation

We present UniBiDex a unified teleoperation framework for robotic bimanual dexterous manipulation that supports both VRbased and leaderfollower input modalities UniBiDex enables realtime contactrich dualarm teleoperation by integrating heterogeneous input devices into a shared control stack with consistent kinematic treatment and safety guarantees The framework employs nullspace control to optimize bimanual configurations ensuring smooth collisionfree and singularityaware motion across tasks We validate UniBiDex on a longhorizon kitchentidying task involving five sequential manipulation subtasks demonstrating higher task success rates smoother trajectories and improved robustness compared to strong baselines By releasing all hardware and software components as opensource we aim to lower the barrier to collecting largescale highquality human demonstration datasets and accelerate progress in robot learning.

preprint2023arXiv

Resolving Task Confusion in Dynamic Expansion Architectures for Class Incremental Learning

The dynamic expansion architecture is becoming popular in class incremental learning, mainly due to its advantages in alleviating catastrophic forgetting. However, task confusion is not well assessed within this framework, e.g., the discrepancy between classes of different tasks is not well learned (i.e., inter-task confusion, ITC), and certain priority is still given to the latest class batch (i.e., old-new confusion, ONC). We empirically validate the side effects of the two types of confusion. Meanwhile, a novel solution called Task Correlated Incremental Learning (TCIL) is proposed to encourage discriminative and fair feature utilization across tasks. TCIL performs a multi-level knowledge distillation to propagate knowledge learned from old tasks to the new one. It establishes information flow paths at both feature and logit levels, enabling the learning to be aware of old classes. Besides, attention mechanism and classifier re-scoring are applied to generate more fair classification scores. We conduct extensive experiments on CIFAR100 and ImageNet100 datasets. The results demonstrate that TCIL consistently achieves state-of-the-art accuracy. It mitigates both ITC and ONC, while showing advantages in battle with catastrophic forgetting even no rehearsal memory is reserved.

preprint2022arXiv

A Dual-Arm Collaborative Framework for Dexterous Manipulation in Unstructured Environments with Contrastive Planning

Most object manipulation strategies for robots are based on the assumption that the object is rigid (i.e., with fixed geometry) and the goal's details have been fully specified (e.g., the exact target pose). However, there are many tasks that involve spatial relations in human environments where these conditions may be hard to satisfy, e.g., bending and placing a cable inside an unknown container. To develop advanced robotic manipulation capabilities in unstructured environments that avoid these assumptions, we propose a novel long-horizon framework that exploits contrastive planning in finding promising collaborative actions. Using simulation data collected by random actions, we learn an embedding model in a contrastive manner that encodes the spatio-temporal information from successful experiences, which facilitates the subgoal planning through clustering in the latent space. Based on the keypoint correspondence-based action parameterization, we design a leader-follower control scheme for the collaboration between dual arms. All models of our policy are automatically trained in simulation and can be directly transferred to real-world environments. To validate the proposed framework, we conduct a detailed experimental study on a complex scenario subject to environmental and reachability constraints in both simulation and real environments.

preprint2022arXiv

A Fully Memristive Spiking Neural Network with Unsupervised Learning

We present a fully memristive spiking neural network (MSNN) consisting of physically-realizable memristive neurons and memristive synapses to implement an unsupervised Spiking Time Dependent Plasticity (STDP) learning rule. The system is fully memristive in that both neuronal and synaptic dynamics can be realized by using memristors. The neuron is implemented using the SPICE-level memristive integrate-and-fire (MIF) model, which consists of a minimal number of circuit elements necessary to achieve distinct depolarization, hyperpolarization, and repolarization voltage waveforms. The proposed MSNN uniquely implements STDP learning by using cumulative weight changes in memristive synapses from the voltage waveform changes across the synapses, which arise from the presynaptic and postsynaptic spiking voltage signals during the training process. Two types of MSNN architectures are investigated: 1) a biologically plausible memory retrieval system, and 2) a multi-class classification system. Our circuit simulation results verify the MSNN's unsupervised learning efficacy by replicating biological memory retrieval mechanisms, and achieving 97.5% accuracy in a 4-pattern recognition problem in a large scale discriminative MSNN.

preprint2022arXiv

A Multi-Sensor Interface to Improve the Learning Experience in Arc Welding Training Tasks

This paper presents the development of a multi-sensor user interface to facilitate the instruction of arc welding tasks. Traditional methods to acquire hand-eye coordination skills are typically conducted through one-to-one instruction where trainees must wear protective helmets and conduct several tests. This approach is inefficient as the harmful light emitted from the electric arc impedes the close monitoring of the process; Practitioners can only observe a small bright spot. To tackle these problems, recent training approaches have leveraged virtual reality to safely simulate the process and visualize the geometry of the workpieces. However, the synthetic nature of these types of simulation platforms reduces their effectiveness as they fail to comprise actual welding interactions with the environment, which hinders the trainees' learning process. To provide users with a real welding experience, we have developed a new multi-sensor extended reality platform for arc welding training. Our system is composed of: (1) An HDR camera, monitoring the real welding spot in real-time; (2) A depth sensor, capturing the 3D geometry of the scene; and (3) A head-mounted VR display, visualizing the process safely. Our innovative platform provides users with a "bot trainer", virtual cues of the seam geometry, automatic spot tracking, and performance scores. To validate the platform's feasibility, we conduct extensive experiments with several welding training tasks. We show that compared with the traditional training practice and recent virtual reality approaches, our automated multi-sensor method achieves better performances in terms of accuracy, learning curve, and effectiveness.

preprint2022arXiv

Gradient-based Neuromorphic Learning on Dynamical RRAM Arrays

We present MEMprop, the adoption of gradient-based learning to train fully memristive spiking neural networks (MSNNs). Our approach harnesses intrinsic device dynamics to trigger naturally arising voltage spikes. These spikes emitted by memristive dynamics are analog in nature, and thus fully differentiable, which eliminates the need for surrogate gradient methods that are prevalent in the spiking neural network (SNN) literature. Memristive neural networks typically either integrate memristors as synapses that map offline-trained networks, or otherwise rely on associative learning mechanisms to train networks of memristive neurons. We instead apply the backpropagation through time (BPTT) training algorithm directly on analog SPICE models of memristive neurons and synapses. Our implementation is fully memristive, in that synaptic weights and spiking neurons are both integrated on resistive RAM (RRAM) arrays without the need for additional circuits to implement spiking dynamics, e.g., analog-to-digital converters (ADCs) or thresholded comparators. As a result, higher-order electrophysical effects are fully exploited to use the state-driven dynamics of memristive neurons at run time. By moving towards non-approximate gradient-based learning, we obtain highly competitive accuracy amongst previously reported lightweight dense fully MSNNs on several benchmarks.

preprint2022arXiv

Logical and Physical Reversibility of Conservative Skyrmion Logic

Magnetic skyrmions are nanoscale whirls of magnetism that can be propagated with electrical currents. The repulsion between skyrmions inspires their use for reversible computing based on the elastic billiard ball collisions proposed for conservative logic in 1982. Here we evaluate the logical and physical reversibility of this skyrmion logic paradigm, as well as the limitations that must be addressed before dissipation-free computation can be realized.

preprint2022arXiv

Model Predictive Manipulation of Compliant Objects with Multi-Objective Optimizer and Adversarial Network for Occlusion Compensation

The robotic manipulation of compliant objects is currently one of the most active problems in robotics due to its potential to automate many important applications. Despite the progress achieved by the robotics community in recent years, the 3D shaping of these types of materials remains an open research problem. In this paper, we propose a new vision-based controller to automatically regulate the shape of compliant objects with robotic arms. Our method uses an efficient online surface/curve fitting algorithm that quantifies the object's geometry with a compact vector of features; This feedback-like vector enables to establish an explicit shape servo-loop. To coordinate the motion of the robot with the computed shape features, we propose a receding-time estimator that approximates the system's sensorimotor model while satisfying various performance criteria. A deep adversarial network is developed to robustly compensate for visual occlusions in the camera's field of view, which enables to guide the shaping task even with partial observations of the object. Model predictive control is utilized to compute the robot's shaping motions subject to workspace and saturation constraints. A detailed experimental study is presented to validate the effectiveness of the proposed control framework.

preprint2022arXiv

SPICEprop: Backpropagating Errors Through Memristive Spiking Neural Networks

We present a fully memristive spiking neural network (MSNN) consisting of novel memristive neurons trained using the backpropagation through time (BPTT) learning rule. Gradient descent is applied directly to the memristive integrated-and-fire (MIF) neuron designed using analog SPICE circuit models, which generates distinct depolarization, hyperpolarization, and repolarization voltage waveforms. Synaptic weights are trained by BPTT using the membrane potential of the MIF neuron model and can be processed on memristive crossbars. The natural spiking dynamics of the MIF neuron model are fully differentiable, eliminating the need for gradient approximations that are prevalent in the spiking neural network literature. Despite the added complexity of training directly on SPICE circuit models, we achieve 97.58% accuracy on the MNIST testing dataset and 75.26% on the Fashion-MNIST testing dataset, the highest accuracies among all fully MSNNs.

preprint2022arXiv

WE model: A Machine Learning Model Based on Data-Driven Movie Derivatives Market Prediction

The mature development and the extension of the industry chain make the income structure of the film industry. The income of the traditional film industry depends on the box office and also includes movie merchandising, advertisement, home entertainment, book sales etc. Movie merchandising can even become more profitable than the box office. Therefore, market analysis and forecasting methods for multi-feature merchandising of multi-type films are particularly important. Traditional market research is time-consuming and labour-intensive, and its practical value is restricted. Due to the limited research method, more effective predictive analysis technology needs to be formed. With the rapid development of machine learning and big data, a large number of machine learning algorithms for predictive regression and classification recognition have been proposed and widely used in product design and industry analysis. This paper proposes a high-precision movie merchandising prediction model based on machine learning technology: WE model. This model integrates three machine learning algorithms to accurately predict the movie merchandising market. The WE model learns the relationship between the movie merchandising market and movie features by analyzing the main feature information of movies. After testing, the accuracy rate of prediction and evaluation in the merchandising market reaches 72.5%, and it has achieved a strong market control effect.

preprint2021arXiv

Deep Video Inpainting Detection

This paper studies video inpainting detection, which localizes an inpainted region in a video both spatially and temporally. In particular, we introduce VIDNet, Video Inpainting Detection Network, which contains a two-stream encoder-decoder architecture with attention module. To reveal artifacts encoded in compression, VIDNet additionally takes in Error Level Analysis frames to augment RGB frames, producing multimodal features at different levels with an encoder. Exploring spatial and temporal relationships, these features are further decoded by a Convolutional LSTM to predict masks of inpainted regions. In addition, when detecting whether a pixel is inpainted or not, we present a quad-directional local attention module that borrows information from its surrounding pixels from four directions. Extensive experiments are conducted to validate our approach. We demonstrate, among other things, that VIDNet not only outperforms by clear margins alternative inpainting detection methods but also generalizes well on novel videos that are unseen during training.

preprint2021arXiv

On the Coupling Effects between Elastic and Electromagnetic Fields from the Perspective of Conservation of Energy

Coupling effects among different physical fields substantially reflect the conversion of energies from one form into another. For simple physical processes, their governing or constitutive equations all satisfy the law of conservation of energy. Then, analysis is extended to coupling effects. First, it is found for the linear direct and converse piezoelectric and piezomagnetic effects, their constitutive equations guarantee that the total energy is conserved during the process of energy conversion between the elastic and electromagnetic fields; however, energies are converted via work terms, $(β_{ijk} E_i )_{,k} v_j$ and $(γ_{ijk} H_i)_{,k} v_j$, rather than via energy terms, $β_{ijk} E_i e_{jk}$ and $γ_{ijk} H_i e_{jk}$. Second, for the generalized Villari effects, the electromagnetic energy can be treated as an extra contribution to the generalized elastic energy. Third, for electrostriction and magnetostriction, it is argued both effects are induced by the Maxwell stress; moreover, their energy is purely electromagnetic and thus both have no converse effects. During these processes, energy can be converted in three different ways, i.e., via non-potential forces, via cross-dependence of energy terms and directly via the electromagnetic interactions of ions and electrons. In the end, general coupling processes which involve elastic, electromagnetic fields and diffusion are also analyzed. The advantage of using this energy formulation is that it facilitates discussions of the conversion of energies and provides better physical insights into the mechanisms of these coupling effects.

preprint2020arXiv

DeepStrip: High Resolution Boundary Refinement

In this paper, we target refining the boundaries in high resolution images given low resolution masks. For memory and computation efficiency, we propose to convert the regions of interest into strip images and compute a boundary prediction in the strip domain. To detect the target boundary, we present a framework with two prediction layers. First, all potential boundaries are predicted as an initial prediction and then a selection layer is used to pick the target boundary and smooth the result. To encourage accurate prediction, a loss which measures the boundary distance in the strip domain is introduced. In addition, we enforce a matching consistency and C0 continuity regularization to the network to reduce false alarms. Extensive experiments on both public and a newly created high resolution dataset strongly validate our approach.

preprint2020arXiv

FastBERT: a Self-distilling BERT with Adaptive Inference Time

Pre-trained language models like BERT have proven to be highly performant. However, they are often computationally expensive in many practical scenarios, for such heavy models can hardly be readily implemented with limited resources. To improve their efficiency with an assured model performance, we propose a novel speed-tunable FastBERT with adaptive inference time. The speed at inference can be flexibly adjusted under varying demands, while redundant calculation of samples is avoided. Moreover, this model adopts a unique self-distillation mechanism at fine-tuning, further enabling a greater computational efficacy with minimal loss in performance. Our model achieves promising results in twelve English and Chinese datasets. It is able to speed up by a wide range from 1 to 12 times than BERT if given different speedup thresholds to make a speed-performance tradeoff.

preprint2020arXiv

Inclusive GAN: Improving Data and Minority Coverage in Generative Models

Generative Adversarial Networks (GANs) have brought about rapid progress towards generating photorealistic images. Yet the equitable allocation of their modeling capacity among subgroups has received less attention, which could lead to potential biases against underrepresented minorities if left uncontrolled. In this work, we first formalize the problem of minority inclusion as one of data coverage, and then propose to improve data coverage by harmonizing adversarial training with reconstructive generation. The experiments show that our method outperforms the existing state-of-the-art methods in terms of data coverage on both seen and unseen data. We develop an extension that allows explicit control over the minority subgroups that the model should ensure to include, and validate its effectiveness at little compromise from the overall performance on the entire dataset. Code, models, and supplemental videos are available at GitHub.

preprint2020arXiv

M2KD: Multi-model and Multi-level Knowledge Distillation for Incremental Learning

Incremental learning targets at achieving good performance on new categories without forgetting old ones. Knowledge distillation has been shown critical in preserving the performance on old classes. Conventional methods, however, sequentially distill knowledge only from the last model, leading to performance degradation on the old classes in later incremental learning steps. In this paper, we propose a multi-model and multi-level knowledge distillation strategy. Instead of sequentially distilling knowledge only from the last model, we directly leverage all previous model snapshots. In addition, we incorporate an auxiliary distillation to further preserve knowledge encoded at the intermediate feature levels. To make the model more memory efficient, we adapt mask based pruning to reconstruct all previous models with a small memory footprint. Experiments on standard incremental learning benchmarks show that our method preserves the knowledge on old classes better and improves the overall performance over standard distillation techniques.

preprint2020arXiv

Occlusion-Adaptive Deep Network for Robust Facial Expression Recognition

Recognizing the expressions of partially occluded faces is a challenging computer vision problem. Previous expression recognition methods, either overlooked this issue or resolved it using extreme assumptions. Motivated by the fact that the human visual system is adept at ignoring the occlusion and focus on non-occluded facial areas, we propose a landmark-guided attention branch to find and discard corrupted features from occluded regions so that they are not used for recognition. An attention map is first generated to indicate if a specific facial part is occluded and guide our model to attend to non-occluded regions. To further improve robustness, we propose a facial region branch to partition the feature maps into non-overlapping facial blocks and task each block to predict the expression independently. This results in more diverse and discriminative features, enabling the expression recognition system to recover even though the face is partially occluded. Depending on the synergistic effects of the two branches, our occlusion-adaptive deep network significantly outperforms state-of-the-art methods on two challenging in-the-wild benchmark datasets and three real-world occluded expression datasets.

preprint2020arXiv

Quasi-parallel X-ray microbeam obtained using a parabolic monocapillary X-ray lens with an embedded square-shaped lead occluder

A parabolic monocapillary X-ray lens (PMXRL) is designed to effectively constrain a laboratory point X-ray source into a parallel beam. A square-shaped lead occluder (SSLO) is used to block direct X-rays in the PMXRL. To design the PMXRL, we use Python to simulate the conic parameter (p = 0.001 mm) of the lens and then use a drawing machine to draw a corresponding lens (p = 0.000939 mm) with a total length of 60.8 mm. We place the SSLO at the lens inlet for optical testing. The results show that the controlled outgoing beam has a divergence of less than 0.4 mrad in the range of 15-45 mm of the lens outlet, which achieves excellent optical performance in X-ray imaging methodology. The design details are reported in this paper.

preprint2020arXiv

Reservoir Computing with Planar Nanomagnet Arrays

Reservoir computing is an emerging methodology for neuromorphic computing that is especially well-suited for hardware implementations in size, weight, and power (SWaP) constrained environments. This work proposes a novel hardware implementation of a reservoir computer using a planar nanomagnet array. A small nanomagnet reservoir is demonstrated via micromagnetic simulations to be able to identify simple waveforms with 100% accuracy. Planar nanomagnet reservoirs are a promising new solution to the growing need for dedicated neuromorphic hardware.

preprint2018arXiv

Central Limit theorem for toric \kahler manifolds

Associated to the Bergman kernels of a polarized toric \kahler manifold $(M, ω, L, h)$ are sequences of measures $\{μ_k^z\}_{k=1}^{\infty}$ parametrized by the points $z \in M$. For each $z$ in the open orbit, we prove a central limit theorem for $μ_k^z$. The center of mass of $μ_k^z$ is the image of $z$ under the moment map; after re-centering at $0$ and dilating by $\sqrt{k}$, the re-normalized measure tends to a centered Gaussian whose variance is the Hessian of the \kahler potential at $z$. We further give a remainder estimate of Berry-Esseen type. The sequence $\{μ_k^z\}$ is generally not a sequence of convolution powers and the proofs only involve \kahler analysis.

preprint2018arXiv

Interface asymptotics of Partial Bergman kernels around a critical level

In a recent series of articles (arXiv:1604.06655, arXiv:1708.09267), the authors have studied the transition behavior of partial Bergman kernels $Π_{k, [E_1, E_2]}(z,w)$ and the associated DOS (density of states) $Π_{k, [E_1, E_2]}(z)$ across the interface $\ccal$ between the allowed and forbidden regions. Partial Bergman kernels are Toeplitz Hamiltonians quantizing Morse functions $H: M \to \R$ on a \kahler manifold. The allowed region is $H^{-1}([E_1, E_2])$ and the interface $\ccal$ is its boundary. In prior articles it was assumed that the endpoints $E_j$ were regular values of $H$. This article completes the series by giving parallel results when an endpoint is a critical value of $H$. In place of the Erf scaling asymptotics in a $k^{-\half} $ tube around $\ccal$ for regular interfaces, one obtains $δ$-asymptotics in $k^{-\frac{1}{4}}$-tubes around singular points of a critical interface. In $k^{-\half}$ tubes, the transition law is given by the osculating metaplectic propagator.

preprint2017arXiv

Interface asymptotics of partial Bergman kernels on $S^1$-symmetric Kaehler manifolds

This article is concerned with asymptotics of equivariant Bergman kernels and partial Bergman kernels for polarized projective Kahler manifolds invariant under a Hamiltonian holomorphic $S^1$ action. Asymptotics of partial Bergman kernel are obtained in the allowed region $\mathcal{A}$ resp. forbidden region $\mathcal{F}$, generalizing results of Shiffman-Zelditch, Shiffman-Tate-Zelditch and Pokorny-Singer for toric Kahler manifolds. The main result gives scaling asymptotics of equivariant Bergman kernels and partial Bergman kernels in the transition region around the interface $\partial \mathcal{A}$, generalizing recent work of Ross-Singer on partial Bergman kernels, and refining the Ross-Singer transition asymptotics to apply to equivariant Bergman kernels.