Source author record

Wei Gao

Wei Gao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

65works

34topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Crab: A Semantics-Aware Checkpoint/Restore Runtime for Agent Sandboxes

Autonomous agents act through sandboxed containers and microVMs whose state spans filesystems, processes, and runtime artifacts. Checkpoint and restore (C/R) of this state is needed for fault tolerance, spot execution, RL rollout branching, and safe rollback-yet existing approaches fall into two extremes: application-level recovery preserves chat history but misses OS-side effects, while full per-turn checkpointing is correct but too expensive under dense co-location. The root cause is an agent-OS semantic gap: agent frameworks see tool calls but not their OS effects; the OS sees state changes but lacks turn-level context to judge recovery relevance. This gap hides massive sparsity: over 75% of agent turns produce no recovery-relevant state, so most checkpoints are unnecessary. Crab (Checkpoint-and-Restore for Agent SandBoxes) is a transparent host-side runtime that bridges this gap without modifying agents or C/R backends. An eBPF-based inspector classifies each turn's OS-visible effects to decide checkpoint granularity; a coordinator aligns checkpoints with turn boundaries and overlaps C/R with LLM wait time; and a host-scoped engine schedules checkpoint traffic across co-located sandboxes. On shell-intensive and code-repair workloads, Crab raises recovery correctness from 8% (chat-only) to 100%, cuts checkpoint traffic by up to 87%, and stays within 1.9% of fault-free execution time.

preprint2026arXiv

Lever: Speculative LLM Inference on Smartphones

Large language models (LLMs) are increasingly needed for interactive mobile applications, but high-quality models exceed the limited DRAM available on smartphones. Flash storage can hold larger models, yet flash-backed inference is slow because autoregressive decoding repeatedly invokes the target model and incurs costly I/O. We observe that speculative decoding is a natural fit for this setting: a small draft model can remain in DRAM, while a larger flash-resident target model verifies multiple candidate tokens per invocation. However, existing methods assume server-class accelerators and fail to account for prolonged I/O latency, limited computation parallelism, and irregular speculation execution. We present Lever, an end-to-end system for efficient flash-backed LLM inference on smartphones. Lever jointly optimizes the three stages of speculative decoding under mobile constraints. For drafting, it builds token trees using an I/O- and compute-aware gain-cost objective. For verification, it prunes low-value branches through early-exit prediction to reduce target-model computation. For execution, it maps speculation efficiently across mobile CPU-NPU hardware to improve utilization. Comprehensive evaluations show that Lever reduces inference latency by an average of 2.93x over baseline flash-offloaded inference and 1.50x over conventional speculative decoding, narrowing the latency gap between flash-backed and memory-resident LLM inference.

preprint2025arXiv

DriveExplorer: Images-Only Decoupled 4D Reconstruction with Progressive Restoration for Driving View Extrapolation

This paper presents an effective solution for view extrapolation in autonomous driving scenarios. Recent approaches focus on generating shifted novel view images from given viewpoints using diffusion models. However, these methods heavily rely on priors such as LiDAR point clouds, 3D bounding boxes, and lane annotations, which demand expensive sensors or labor-intensive labeling, limiting applicability in real-world deployment. In this work, with only images and optional camera poses, we first estimate a global static point cloud and per-frame dynamic point clouds, fusing them into a unified representation. We then employ a deformable 4D Gaussian framework to reconstruct the scene. The initially trained 4D Gaussian model renders degraded and pseudo-images to train a video diffusion model. Subsequently, progressively shifted Gaussian renderings are iteratively refined by the diffusion model,and the enhanced results are incorporated back as training data for 4DGS. This process continues until extrapolation reaches the target viewpoints. Compared with baselines, our method produces higher-quality images at novel extrapolated viewpoints.

preprint2025arXiv

Ultrahigh-Energy Gamma-ray Emission Associated with Black Hole-Jet Systems

Black holes (BH), one of the most intriguing objects in the universe, can manifest themselves through electromagnetic radiation initiated by the accretion flow. Some stellar-mass BHs drive relativistic jets when accreting matter from their companion stars, forming microquasars. Non-thermal emission from the radio to tera-electronvolt (TeV) gamma-ray band has been observed from microquasars, indicating the acceleration of relativistic particles. Here we report detection of four microquasars (SS 433, V4641 Sgr, GRS 1915+105, MAXI J1820+070) of spectrum extending to the ultrahigh-energy (UHE; photon energy $E>100$ TeV) band and one microquasar (Cygnus X-1) of spectrum approaching 100 TeV, using the Large High Altitude Air Shower Observatory (LHAASO). Notably, the total emission associated with SS 433 cannot be interpreted with a single leptonic component. In the UHE band, its emission is in spatial coincidence with a giant atomic cloud, which is consistent with a hadronic origin. An elongated source is discovered from V4641 Sgr with the spectrum continuing up to 800 TeV. The detection of UHE gamma rays demonstrates that accreting BHs and their environments can operate as extremely efficient accelerators of particles out of 1 peta-electronvolt (PeV), suggesting microquasars to be important contributors to Galactic cosmic rays especially around the `knee' region.

preprint2023arXiv

Unravelling the deterministic effect of the solid-state diffusion energy barrier for charge carrier on the self-discharge of supercapacitors

The further development of fast electrochemical devices is hindered by self-discharge. Current strategies for suppressing self-discharge are mainly focused on the extrinsic and general mechanisms including faradaic reactions, charge redistribution, and ohmic leakage. However, the self-discharge process is still severe for conventional supercapacitors. Herein, we unravel the deterministic effect of solid-state diffusion energy barrier by constructing conjugately configured supercapacitors based on pairs of pre-lithiated niobium oxides with similar intercalation pseudocapacitive process but different phases. This device works with a single type of charge carrier while materials with various diffusion barriers can be implanted, thus serving as an ideal platform to illustrate the influence of the diffusion barrier. The results show that the comprehensive effect of solid-state diffusion energy barrier and extrinsic effects drives the self-discharge process. Noteworthy, the diffusion barrier presents with an exponential form, which governs the self-discharge of supercapacitors. This work is expected to unravel the deterministic effect of the solid-state diffusion energy barrier and provide a general guidance for suppressing self-discharge for supercapacitors.

preprint2022arXiv

A Rate Control Algorithm for Video-based Point Cloud Compression

Video-based point cloud compression (V-PCC) has been an emerging compression technology that projects the 3D point cloud into a 2D plane and uses high efficiency video coding (HEVC) to encode the projected 2D videos (geometry video and color video). In this work, we propose a rate control algorithm for the all-intra (AI) configuration of V-PCC. Specifically, based on the quality-dependency existing in the projected videos, we develop an optimization formulation to allocate target bits between the geometry video and the color video. Furthermore, we design a two-pass method for HEVC to adapt to the new characteristics of projected videos, which significantly improves the accuracy of rate control. Experimental results demonstrate that our algorithm outperforms V-PCC without rate control in R-D performance with just 0.43% bitrate error.

preprint2022arXiv

A Weakly Supervised Propagation Model for Rumor Verification and Stance Detection with Multiple Instance Learning

The diffusion of rumors on microblogs generally follows a propagation tree structure, that provides valuable clues on how an original message is transmitted and responded by users over time. Recent studies reveal that rumor detection and stance detection are two different but relevant tasks which can jointly enhance each other, e.g., rumors can be debunked by cross-checking the stances conveyed by their relevant microblog posts, and stances are also conditioned on the nature of the rumor. However, most stance detection methods require enormous post-level stance labels for training, which are labor-intensive given a large number of posts. Enlightened by Multiple Instance Learning (MIL) scheme, we first represent the diffusion of claims with bottom-up and top-down trees, then propose two tree-structured weakly supervised frameworks to jointly classify rumors and stances, where only the bag-level labels concerning claim's veracity are needed. Specifically, we convert the multi-class problem into a multiple MIL-based binary classification problem where each binary model focuses on differentiating a target stance or rumor type and other types. Finally, we propose a hierarchical attention mechanism to aggregate the binary predictions, including (1) a bottom-up or top-down tree attention layer to aggregate binary stances into binary veracity; and (2) a discriminative attention layer to aggregate the binary class into finer-grained classes. Extensive experiments conducted on three Twitter-based datasets demonstrate promising performance of our model on both claim-level rumor detection and post-level stance classification compared with state-of-the-art methods.

preprint2022arXiv

Adaptive Random Fourier Features Kernel LMS

We propose the adaptive random Fourier features Gaussian kernel LMS (ARFF-GKLMS). Like most kernel adaptive filters based on stochastic gradient descent, this algorithm uses a preset number of random Fourier features to save computation cost. However, as an extra flexibility, it can adapt the inherent kernel bandwidth in the random Fourier features in an online manner. This adaptation mechanism allows to alleviate the problem of selecting the kernel bandwidth beforehand for the benefit of an improved tracking in non-stationary circumstances. Simulation results confirm that the proposed algorithm achieves a performance improvement in terms of convergence rate, error at steady-state and tracking ability over other kernel adaptive filters with preset kernel bandwidth.

preprint2022arXiv

BCS-Net: Boundary, Context and Semantic for Automatic COVID-19 Lung Infection Segmentation from CT Images

The spread of COVID-19 has brought a huge disaster to the world, and the automatic segmentation of infection regions can help doctors to make diagnosis quickly and reduce workload. However, there are several challenges for the accurate and complete segmentation, such as the scattered infection area distribution, complex background noises, and blurred segmentation boundaries. To this end, in this paper, we propose a novel network for automatic COVID-19 lung infection segmentation from CT images, named BCS-Net, which considers the boundary, context, and semantic attributes. The BCS-Net follows an encoder-decoder architecture, and more designs focus on the decoder stage that includes three progressively Boundary-Context-Semantic Reconstruction (BCSR) blocks. In each BCSR block, the attention-guided global context (AGGC) module is designed to learn the most valuable encoder features for decoder by highlighting the important spatial and boundary locations and modeling the global context dependence. Besides, a semantic guidance (SG) unit generates the semantic guidance map to refine the decoder features by aggregating multi-scale high-level features at the intermediate resolution. Extensive experiments demonstrate that our proposed framework outperforms the existing competitors both qualitatively and quantitatively.

preprint2022arXiv

Consistent Quality Oriented Rate Control in HEVC via Balancing Intra and Inter Frame Coding

Consistent quality oriented rate control in video coding has attracted much more attention. However, the existing efforts only focus on decreasing variations between every two adjacent frames, but neglect coding trade-off problem between intra and inter frames. In this paper, we deal with it from a new perspective, where intra frame quantization parameter (IQP) and rate control are optimized for balanced coding. First, due to the importance of intra frames, a new framework is proposed for consistent quality oriented IQP prediction, and then we remove unqualified IQP candidates using the proposed penalty term. Second, we extensively evaluate possible features, and select target bits per pixel for all remaining frames, average and standard variance of frame QPs, where equivalent acquisition methods for QP features are given. Third, predicted IQPs are clipped effectively according to bandwidth and previous information for better bit rate accuracy. Compared with High Efficiency Video Coding (HEVC) reference baseline, experiments demonstrate that our method reduces quality fluctuation greatly by 37.2% on frame-level standard variance of peak-signal-noise-ratio (PSNR) and 45.1% on that of structural similarity (SSIM). Moreover, it also can have satisfactory results on Rate-Distortion (R-D) performance, bit accuracy and buffer control.

preprint2022arXiv

Context-Hierarchy Inverse Reinforcement Learning

An inverse reinforcement learning (IRL) agent learns to act intelligently by observing expert demonstrations and learning the expert's underlying reward function. Although learning the reward functions from demonstrations has achieved great success in various tasks, several other challenges are mostly ignored. Firstly, existing IRL methods try to learn the reward function from scratch without relying on any prior knowledge. Secondly, traditional IRL methods assume the reward functions are homogeneous across all the demonstrations. Some existing IRL methods managed to extend to the heterogeneous demonstrations. However, they still assume one hidden variable that affects the behavior and learn the underlying hidden variable together with the reward from demonstrations. To solve these issues, we present Context Hierarchy IRL(CHIRL), a new IRL algorithm that exploits the context to scale up IRL and learn reward functions of complex behaviors. CHIRL models the context hierarchically as a directed acyclic graph; it represents the reward function as a corresponding modular deep neural network that associates each network module with a node of the context hierarchy. The context hierarchy and the modular reward representation enable data sharing across multiple contexts and state abstraction, significantly improving the learning performance. CHIRL has a natural connection with hierarchical task planning when the context hierarchy represents subtask decomposition. It enables to incorporate the prior knowledge of causal dependencies of subtasks and make it capable of solving large complex tasks by decoupling it into several subtasks and conquering each subtask to solve the original task. Experiments on benchmark tasks, including a large scale autonomous driving task in the CARLA simulator, show promising results in scaling up IRL for tasks with complex reward functions.

preprint2022arXiv

Deep Geometry Post-Processing for Decompressed Point Clouds

Point cloud compression plays a crucial role in reducing the huge cost of data storage and transmission. However, distortions can be introduced into the decompressed point clouds due to quantization. In this paper, we propose a novel learning-based post-processing method to enhance the decompressed point clouds. Specifically, a voxelized point cloud is first divided into small cubes. Then, a 3D convolutional network is proposed to predict the occupancy probability for each location of a cube. We leverage both local and global contexts by generating multi-scale probabilities. These probabilities are progressively summed to predict the results in a coarse-to-fine manner. Finally, we obtain the geometry-refined point clouds based on the predicted probabilities. Different from previous methods, we deal with decompressed point clouds with huge variety of distortions using a single model. Experimental results show that the proposed method can significantly improve the quality of the decompressed point clouds, achieving 9.30dB BDPSNR gain on three representative datasets on average.

preprint2022arXiv

Deep Learning Workload Scheduling in GPU Datacenters: Taxonomy, Challenges and Vision

Deep learning (DL) shows its prosperity in a wide variety of fields. The development of a DL model is a time-consuming and resource-intensive procedure. Hence, dedicated GPU accelerators have been collectively constructed into a GPU datacenter. An efficient scheduler design for such GPU datacenter is crucially important to reduce the operational cost and improve resource utilization. However, traditional approaches designed for big data or high performance computing workloads can not support DL workloads to fully utilize the GPU resources. Recently, substantial schedulers are proposed to tailor for DL workloads in GPU datacenters. This paper surveys existing research efforts for both training and inference workloads. We primarily present how existing schedulers facilitate the respective workloads from the scheduling objectives and resource consumption features. Finally, we prospect several promising future research directions. More detailed summary with the surveyed paper and code links can be found at our project website: https://github.com/S-Lab-System-Group/Awesome-DL-Scheduling-Papers

preprint2022arXiv

Deep Visual Navigation under Partial Observability

How can a robot navigate successfully in rich and diverse environments, indoors or outdoors, along office corridors or trails on the grassland, on the flat ground or the staircase? To this end, this work aims to address three challenges: (i) complex visual observations, (ii) partial observability of local visual sensing, and (iii) multimodal robot behaviors conditioned on both the local environment and the global navigation objective. We propose to train a neural network (NN) controller for local navigation via imitation learning. To tackle complex visual observations, we extract multi-scale spatial representations through CNNs. To tackle partial observability, we aggregate multi-scale spatial information over time and encode it in LSTMs. To learn multimodal behaviors, we use a separate memory module for each behavior mode. Importantly, we integrate the multiple neural network modules into a unified controller that achieves robust performance for visual navigation in complex, partially observable environments. We implemented the controller on the quadrupedal Spot robot and evaluated it on three challenging tasks: adversarial pedestrian avoidance, blind-spot obstacle avoidance, and elevator riding. The experiments show that the proposed NN architecture significantly improves navigation performance.

preprint2022arXiv

DialMed: A Dataset for Dialogue-based Medication Recommendation

Medication recommendation is a crucial task for intelligent healthcare systems. Previous studies mainly recommend medications with electronic health records (EHRs). However, some details of interactions between doctors and patients may be ignored or omitted in EHRs, which are essential for automatic medication recommendation. Therefore, we make the first attempt to recommend medications with the conversations between doctors and patients. In this work, we construct DIALMED, the first high-quality dataset for medical dialogue-based medication recommendation task. It contains 11,996 medical dialogues related to 16 common diseases from 3 departments and 70 corresponding common medications. Furthermore, we propose a Dialogue structure and Disease knowledge aware Network (DDN), where a QA Dialogue Graph mechanism is designed to model the dialogue structure and the knowledge graph is used to introduce external disease knowledge. The extensive experimental results demonstrate that the proposed method is a promising solution to recommend medications with medical dialogues. The dataset and code are available at https://github.com/f-window/DialMed.

preprint2022arXiv

End-to-end lossless compression of high precision depth maps guided by pseudo-residual

As a fundamental data format representing spatial information, depth map is widely used in signal processing and computer vision fields. Massive amount of high precision depth maps are produced with the rapid development of equipment like laser scanner or LiDAR. Therefore, it is urgent to explore a new compression method with better compression ratio for high precision depth maps. Utilizing the wide spread deep learning environment, we propose an end-to-end learning-based lossless compression method for high precision depth maps. The whole process is comprised of two sub-processes, named pre-processing of depth maps and deep lossless compression of processed depth maps. The deep lossless compression network consists of two sub-networks, named lossy compression network and lossless compression network. We leverage the concept of pseudo-residual to guide the generation of distribution for residual and avoid introducing context models. Our end-to-end lossless compression network achieves competitive performance over engineered codecs and has low computational cost.

preprint2022arXiv

Exploiting Robust Unsupervised Video Person Re-identification

Unsupervised video person re-identification (reID) methods usually depend on global-level features. And many supervised reID methods employed local-level features and achieved significant performance improvements. However, applying local-level features to unsupervised methods may introduce an unstable performance. To improve the performance stability for unsupervised video reID, this paper introduces a general scheme fusing part models and unsupervised learning. In this scheme, the global-level feature is divided into equal local-level feature. A local-aware module is employed to explore the poentials of local-level feature for unsupervised learning. A global-aware module is proposed to overcome the disadvantages of local-level features. Features from these two modules are fused to form a robust feature representation for each input image. This feature representation has the advantages of local-level feature without suffering from its disadvantages. Comprehensive experiments are conducted on three benchmarks, including PRID2011, iLIDS-VID, and DukeMTMC-VideoReID, and the results demonstrate that the proposed approach achieves state-of-the-art performance. Extensive ablation studies demonstrate the effectiveness and robustness of proposed scheme, local-aware module and global-aware module. The code and generated features are available at https://github.com/deropty/uPMnet.

preprint2022arXiv

Frequency conversion of abruptly autofocusing waves

Abruptly autofocusing waves and associated ring-Airy (RA) beams are attracting increasing interest owing to their fascinating properties such as their ability of abruptly autofocusing to small F-number. Optical frequency conversion via nonlinear interactions can further expand their applications to new area, yet are rarely studied. In this work, we report the frequency conversion of RA beams via sum-frequency generation using perfect flattop and common Gauss beams as the pump beams. The nonlinear transformation of the spatial complex amplitude of the signal and associated influences on autofocusing behavior, under different conditions of interaction location (i.e., original, autofocusing, and Fourier planes) and pump structure, were systematically studied and experimentally investigated. This proof-of principle demonstration provides a general guideline to build the frequency interface for abruptly autofocusing waves and a reference for relevant studies involving nonlinear transformation of abruptly autofocusing waves.

preprint2022arXiv

Learning to Disentangle Scenes for Person Re-identification

There are many challenging problems in the person re-identification (ReID) task, such as the occlusion and scale variation. Existing works usually tried to solve them by employing a one-branch network. This one-branch network needs to be robust to various challenging problems, which makes this network overburdened. This paper proposes to divide-and-conquer the ReID task. For this purpose, we employ several self-supervision operations to simulate different challenging problems and handle each challenging problem using different networks. Concretely, we use the random erasing operation and propose a novel random scaling operation to generate new images with controllable characteristics. A general multi-branch network, including one master branch and two servant branches, is introduced to handle different scenes. These branches learn collaboratively and achieve different perceptive abilities. In this way, the complex scenes in the ReID task are effectively disentangled, and the burden of each branch is relieved. The results from extensive experiments demonstrate that the proposed method achieves state-of-the-art performances on three ReID benchmarks and two occluded ReID benchmarks. Ablation study also shows that the proposed scheme and operations significantly improve the performance in various scenes. The code is available at https://git.openi.org.cn/zangxh/LDS.git.

preprint2022arXiv

Multi-direction and Multi-scale Pyramid in Transformer for Video-based Pedestrian Retrieval

In video surveillance, pedestrian retrieval (also called person re-identification) is a critical task. This task aims to retrieve the pedestrian of interest from non-overlapping cameras. Recently, transformer-based models have achieved significant progress for this task. However, these models still suffer from ignoring fine-grained, part-informed information. This paper proposes a multi-direction and multi-scale Pyramid in Transformer (PiT) to solve this problem. In transformer-based architecture, each pedestrian image is split into many patches. Then, these patches are fed to transformer layers to obtain the feature representation of this image. To explore the fine-grained information, this paper proposes to apply vertical division and horizontal division on these patches to generate different-direction human parts. These parts provide more fine-grained information. To fuse multi-scale feature representation, this paper presents a pyramid structure containing global-level information and many pieces of local-level information from different scales. The feature pyramids of all the pedestrian images from the same video are fused to form the final multi-direction and multi-scale feature representation. Experimental results on two challenging video-based benchmarks, MARS and iLIDS-VID, show the proposed PiT achieves state-of-the-art performance. Extensive ablation studies demonstrate the superiority of the proposed pyramid structure. The code is available at https://git.openi.org.cn/zangxh/PiT.git.

preprint2022arXiv

NTIRE 2021 Challenge on Quality Enhancement of Compressed Video: Methods and Results

This paper reviews the first NTIRE challenge on quality enhancement of compressed video, with a focus on the proposed methods and results. In this challenge, the new Large-scale Diverse Video (LDV) dataset is employed. The challenge has three tracks. Tracks 1 and 2 aim at enhancing the videos compressed by HEVC at a fixed QP, while Track 3 is designed for enhancing the videos compressed by x265 at a fixed bit-rate. Besides, the quality enhancement of Tracks 1 and 3 targets at improving the fidelity (PSNR), and Track 2 targets at enhancing the perceptual quality. The three tracks totally attract 482 registrations. In the test phase, 12 teams, 8 teams and 11 teams submitted the final results of Tracks 1, 2 and 3, respectively. The proposed methods and solutions gauge the state-of-the-art of video quality enhancement. The homepage of the challenge: https://github.com/RenYang-home/NTIRE21_VEnh

preprint2022arXiv

OctAttention: Octree-Based Large-Scale Contexts Model for Point Cloud Compression

In point cloud compression, sufficient contexts are significant for modeling the point cloud distribution. However, the contexts gathered by the previous voxel-based methods decrease when handling sparse point clouds. To address this problem, we propose a multiple-contexts deep learning framework called OctAttention employing the octree structure, a memory-efficient representation for point clouds. Our approach encodes octree symbol sequences in a lossless way by gathering the information of sibling and ancestor nodes. Expressly, we first represent point clouds with octree to reduce spatial redundancy, which is robust for point clouds with different resolutions. We then design a conditional entropy model with a large receptive field that models the sibling and ancestor contexts to exploit the strong dependency among the neighboring nodes and employ an attention mechanism to emphasize the correlated nodes in the context. Furthermore, we introduce a mask operation during training and testing to make a trade-off between encoding time and performance. Compared to the previous state-of-the-art works, our approach obtains a 10%-35% BD-Rate gain on the LiDAR benchmark (e.g. SemanticKITTI) and object point cloud dataset (e.g. MPEG 8i, MVUB), and saves 95% coding time compared to the voxel-based baseline. The code is available at https://github.com/zb12138/OctAttention.

preprint2022arXiv

On the conjecture about the exponential reduced Sombor index

Let $G=(V(G),E(G))$ be a graph and $d(v)$ be the degree of the vertex $v\in V(G)$. The exponential reduced Sombor index of $G$, denoted by $e^{SO_{red}}(G)$, is defined as $$e^{SO_{red}}(G)=\sum_{uv\in E(G)}e^{\sqrt{(d(u)-1)^2+(d(v)-1)^2}}.$$ We obtain a characterization of extremal trees with the maximal exponential reduced Sombor index among all chemical trees of order $n$. This result shows the conjecture on the exponential reduced Sombor index proposed by Liu, You, Tang and Liu [On the reduced Sombor index and its applications, MATCH Commun. Math. Comput. Chem. 86 (2021) 729--753] is negative.

preprint2022arXiv

On the Optimization of Margin Distribution

Margin has played an important role on the design and analysis of learning algorithms during the past years, mostly working with the maximization of the minimum margin. Recent years have witnessed the increasing empirical studies on the optimization of margin distribution according to different statistics such as medium margin, average margin, margin variance, etc., whereas there is a relative paucity of theoretical understanding. In this work, we take one step on this direction by providing a new generalization error bound, which is heavily relevant to margin distribution by incorporating ingredients such as average margin and semi-variance, a new margin statistics for the characterization of margin distribution. Inspired by the theoretical findings, we propose the MSVMAv, an efficient approach to achieve better performance by optimizing margin distribution in terms of its empirical average margin and semi-variance. We finally conduct extensive experiments to show the superiority of the proposed MSVMAv approach.

preprint2022arXiv

OpenMedIA: Open-Source Medical Image Analysis Toolbox and Benchmark under Heterogeneous AI Computing Platforms

In this paper, we present OpenMedIA, an open-source toolbox library containing a rich set of deep learning methods for medical image analysis under heterogeneous Artificial Intelligence (AI) computing platforms. Various medical image analysis methods, including 2D/3D medical image classification, segmentation, localisation, and detection, have been included in the toolbox with PyTorch and/or MindSpore implementations under heterogeneous NVIDIA and Huawei Ascend computing systems. To our best knowledge, OpenMedIA is the first open-source algorithm library providing compared PyTorch and MindSpore implementations and results on several benchmark datasets. The source codes and models are available at https://git.openi.org.cn/OpenMedIA.

preprint2022arXiv

Self-Supervised Arbitrary-Scale Point Clouds Upsampling via Implicit Neural Representation

Point clouds upsampling is a challenging issue to generate dense and uniform point clouds from the given sparse input. Most existing methods either take the end-to-end supervised learning based manner, where large amounts of pairs of sparse input and dense ground-truth are exploited as supervision information; or treat up-scaling of different scale factors as independent tasks, and have to build multiple networks to handle upsampling with varying factors. In this paper, we propose a novel approach that achieves self-supervised and magnification-flexible point clouds upsampling simultaneously. We formulate point clouds upsampling as the task of seeking nearest projection points on the implicit surface for seed points. To this end, we define two implicit neural functions to estimate projection direction and distance respectively, which can be trained by two pretext learning tasks. Experimental results demonstrate that our self-supervised learning based scheme achieves competitive or even better performance than supervised learning based state-of-the-art methods. The source code is publicly available at https://github.com/xnowbzhao/sapcu.

preprint2021arXiv

Assessing Individual and Community Vulnerability to Fake News in Social Networks

The plague of false information, popularly called fake news has affected lives of news consumers ever since the prevalence of social media. Thus understanding the spread of false information in social networks has gained a lot of attention in the literature. While most proposed models do content analysis of the information, no much work has been done by exploring the community structures that also play an important role in determining how people get exposed to it. In this paper we base our idea on Computational Trust in social networks to propose a novel Community Health Assessment model against fake news. Based on the concepts of neighbor, boundary and core nodes of a community, we propose novel evaluation metrics to quantify the vulnerability of nodes (individual-level) and communities (group-level) to spreading false information. Our model hypothesizes that if the boundary nodes trust the neighbor nodes of a community who are spreaders, the densely-connected core nodes of the community are highly likely to become spreaders. We test our model with communities generated using three popular community detection algorithms based on two new datasets of information spreading networks collected from Twitter. Our experimental results show that the proposed metrics perform clearly better on the networks spreading false information than on those spreading true ones, indicating our community health assessment model is effective.

preprint2021arXiv

Bidirectional Trajectory Computation for Odometer-Aided Visual-Inertial SLAM

Odometer-aided visual-inertial SLAM systems typically have a good performance for navigation of wheeled platforms, while they usually suffer from degenerate cases before the first turning. In this paper, firstly we perform an observability analysis w.r.t. the extrinsic parameters before the first turning, which is a complement of the existing results of observability analyses. Secondly, inspired by the above observability analyses, we propose a bidirectional trajectory computation method, by which the poses before the first turning are refined in the backward computation thread, and the real-time trajectory is adjusted accordingly. Experimental results prove that our proposed method not only solves the problem of the unobservability of accelerometer bias and extrinsic parameters before the first turning, but also results in more accurate trajectories in comparison with the state-of-the-art approaches.

preprint2021arXiv

Conformal frequency conversion for arbitrary vectorial structured light

Vectorial structured light with spatially varying amplitude, phase, and polarization is reshaping many areas of modern optics, including nonlinear optics, as diverse parametric processes can be used to explore interactions between such complex vector fields, extending the frontiers of optics to new physical phenomena. However, the most basic nonlinear application, i.e., frequency conversion, still remains challenging for vectorial structured light since parametric processes are polarization dependent, leading to a change in the spatial topological structure of signals. In this work, to break this fundamental limit, we propose a novel conformal frequency conversion scheme that allows to maintain the full spatial structure of vectorial structured light in the conversion; and systematically examine its spatial polarization independence based on non-degenerate sum-frequency generation with type-0 phase matching. This proof-of-principle demonstration paves the way for a wide range of applications requiring conformal frequency conversion, and, particularly, to implement frequency interfaces with multimodal communication channels, high-dimensional quantum states, and polarization-resolved upconversion imaging.

preprint2021arXiv

Efficient computational algorithms for approximate optimal designs

In this paper, we propose two simple yet efficient computational algorithms to obtain approximate optimal designs for multi-dimensional linear regression on a large variety of design spaces. We focus on the two commonly used optimal criteria, $D$- and $A$-optimal criteria. For $D$-optimality, we provide an alternative proof for the monotonic convergence for $D$-optimal criterion and propose an efficient computational algorithm to obtain the approximate $D$-optimal design. We further show that the proposed algorithm converges to the $D$-optimal design, and then prove that the approximate $D$-optimal design converges to the continuous $D$-optimal design under certain conditions. For $A$-optimality, we provide an efficient algorithm to obtain approximate $A$-optimal design and conjecture the monotonicity of the proposed algorithm. Numerical comparisons suggest that the proposed algorithms perform well and they are comparable or superior to some existing algorithms.

preprint2021arXiv

kPAM 2.0: Feedback Control for Category-Level Robotic Manipulation

In this paper, we explore generalizable, perception-to-action robotic manipulation for precise, contact-rich tasks. In particular, we contribute a framework for closed-loop robotic manipulation that automatically handles a category of objects, despite potentially unseen object instances and significant intra-category variations in shape, size and appearance. Previous approaches typically build a feedback loop on top of a real-time 6-DOF pose estimator. However, representing an object with a parameterized transformation from a fixed geometric template does not capture large intra-category shape variation. Hence we adopt the keypoint-based object representation proposed in kPAM for category-level pick-and-place, and extend it to closed-loop manipulation policies with contact-rich tasks. We first augment keypoints with local orientation information. Using the oriented keypoints, we propose a novel object-centric action representation in terms of regulating the linear/angular velocity or force/torque of these oriented keypoints. This formulation is surprisingly versatile -- we demonstrate that it can accomplish contact-rich manipulation tasks that require precision and dexterity for a category of objects with different shapes, sizes and appearances, such as peg-hole insertion for pegs and holes with significant shape variation and tight clearance. With the proposed object and action representation, our framework is also agnostic to the robot grasp pose and initial object configuration, making it flexible for integration and deployment.

preprint2021arXiv

Subspace Clustering for Panel Data with Interactive Effects

In this paper, a statistical model for panel data with unobservable grouped factor structures which are correlated with the regressors and the group membership can be unknown. The factor loadings are assumed to be in different subspaces and the subspace clustering for factor loadings are considered. A method called least squares subspace clustering estimate (LSSC) is proposed to estimate the model parameters by minimizing the least-square criterion and to perform the subspace clustering simultaneously. The consistency of the proposed subspace clustering is proved and the asymptotic properties of the estimation procedure are studied under certain conditions. A Monte Carlo simulation study is used to illustrate the advantages of the proposed method. Further considerations for the situations that the number of subspaces for factors, the dimension of factors and the dimension of subspaces are unknown are also discussed. For illustrative purposes, the proposed method is applied to study the linkage between income and democracy across countries while subspace patterns of unobserved factors and factor loadings are allowed.

preprint2021arXiv

Towards Understanding Theoretical Advantages of Complex-Reaction Networks

Complex-valued neural networks have attracted increasing attention in recent years, while it remains open on the advantages of complex-valued neural networks in comparison with real-valued networks. This work takes one step on this direction by introducing the \emph{complex-reaction network} with fully-connected feed-forward architecture. We prove the universal approximation property for complex-reaction networks, and show that a class of radial functions can be approximated by a complex-reaction network using the polynomial number of parameters, whereas real-valued networks need at least exponential parameters to reach the same approximation level. For empirical risk minimization, our theoretical result shows that the critical point set of complex-reaction networks is a proper subset of that of real-valued networks, which may show some insights on finding the optimal solutions more easily for complex-reaction networks.

preprint2021arXiv

Transient Performance Analysis of the $\ell_1$-RLS

The recursive least-squares algorithm with $\ell_1$-norm regularization ($\ell_1$-RLS) exhibits excellent performance in terms of convergence rate and steady-state error in identification of sparse systems. Nevertheless few works have studied its stochastic behavior, in particular its transient performance. In this letter, we derive analytical models of the transient behavior of the $\ell_1$-RLS in the mean and mean-square sense. Simulation results illustrate the accuracy of these models.

preprint2020arXiv

Classically-entangled Ince-Gaussian modes

Complex vector light modes, classically-entangled in their spatial and polarisation degrees of freedom (DoF), havebecome ubiquitous in a vast diversity of research fields. Crucially, while polarisation is limited to a bi-dimensionalspace, the spatial mode is unbounded, it can be specified by any of the sets of solutions the wave equation can supportin the different coordinate systems. Here we report on a class of vector beams with elliptical symmetry where thespatial DoF is encoded in the Ince-Gaussian modes of the cylindrical elliptical coordinates. We outline their geometricrepresentation on the Higher-Order Poincaré Sphere, demonstrate their experimental generation and analyse the qualityof the generated modes via Stokes polarimetry. We anticipate that such vector modes will be of great relevance inapplications, such as, optical manipulations, laser material processing and optical communications amongst others.

preprint2020arXiv

LIPSS-Sticks: Laser induced double self organization enhances the broadband light harvesting of TiO2 nanotube arrays

Sub-wavelength laser induced periodic surface structures (LIPSS-Sticks) created by ultrashort pulsed laser irradiation on the surface of titanium are used for the first time to template the electrochemical growth of titanium dioxide nanotube arrays. This is an example of a double self-organized process, as both LIPSS formation and electrochemical anodization involve spontaneous generation of order from initially non-ordered precursors. LIPSS-Sticks have a 2x greater visible to near infrared light (400 - 1400 nm) collection efficiency compared to flat titanium dioxide due to the enhanced light scattering from grating-like structures. The growth of nanostructures with time was modelled electrostatically to explain the features of a templated anodization process that differ from the usual anodization of flat surfaces. This new templated growth method is general and can also be applied to Cu, W, Fe, Ti alloys and Al for the fabrication of hierarchically nanostructured surfaces using two complementary fabrication techniques: ultrashort pulsed laser ablation and electrochemical anodization.

preprint2020arXiv

Measuring the non-separability of vector modes with digital micromirror devices

The non-separability between the spatial and polarisation Degrees of Freedom (DoFs) of complex vector light fields has drawn significant attention in recent time. Key to this are its remarkable similarities with quantum entanglement, with quantum-like effects observed at the classical level. Crucially, this parallelism enables the use of quantum tools to quantify the coupling between the spatial and polarisation DoFs, usually implemented with polarisation-dependent spatial light modulators, which requires the splitting of the vector mode into two orthogonal polarisation components. Here we put forward a novel approach that relies on the use of Digital Micromirror Devices (DMDs) for fast, cheap and robust measurement, while the polarisation-independent nature of DMDs enables a reduction in the number of required measurements by 25\%. We tested our approach experimentally on cylindrical vector modes with arbitrary degrees of non-separability, of great relevance in a wide variety of applications. Our technique provides a reliable way to measure in real time the purity of vector modes, paving the way to novel applications where the degree of non-separability can be used as an optical sensor.

preprint2020arXiv

Mostar index of graph operations

Very recently, a bond-additive topological descriptor, known as the Mostar index, has been proposed as a measure of peripherality in graphs and networks. In this article, we compute the Mostar index of corona product, Cartesian product, join, lexicographic product, Indu-Bala product and subdivision vertex-edge join of graphs and apply these results to find the Mostar index of various classes of chemical graphs and nanostructures.

preprint2020arXiv

Parametric upconversion of Ince-Gaussian modes

Ince-Gaussian (IG) mode, a recently discovered type of structured Gaussian beam, corresponds to eigenfunctions of the paraxial wave equation in elliptical coordinates. This propagation-invariant mode is of significance in various domains, and in particular, its nonlinear transformation; however, there have been few relevant studies to date. In this work, we report the parametric upconversion of IG modes and associated full-field selection rule for the first time. We demonstrate that IG signals can be perfectly upconverted by a flattop-beam pump; in contrast, significant mode distortion occurred when using the most common Gaussian pump. Particular attention was given to the origin of the distortion, i.e., radial-mode degeneration induced by the sum-frequency generation excited by Gaussian pump. This proof-of-principle demonstration has great significance in relevant areas, such as high-dimension quantum frequency interfacing and upconversion imaging.

preprint2020arXiv

Radial modal transitions of Laguerre-Gauss modes during parametric upconversion: towards the full-field selection rule of spatial modes

Optical orbital angular momentum transformation and corresponding azimuthal-mode selection rules have been studied exhaustively for various nonlinear optical interactions. However, nonlinear transformation of radial mode has not been systematically studied since the pioneering work [Phys. Rev. A 56, 4193, 1997]. In this paper, we theoretically study and experimentally verify the radial modal transitions of Laguerre-Gauss (LG) modes in parametric upconversion. Specifically, we provide a general solution that describes the sum-frequency generation (SFG) field excited by two arbitrary LG modes. Based on the solution, one can predict the full spatial complex amplitude of SFG fields upon propagation precisely and readily obtain the associated full-field selection rule including both azimuthal and radial modes. This work provides a theoretical basis for quantum and nonlinear optical research involving parametric upconversion of complex structured light, and paves the way for future work on full-field transformation of spatial modes in other nonlinear interactions.

preprint2020arXiv

Spatial polarization independent parametric upconversion of vectorially structured light

Spatial polarization independent (SPI) parametric conversion is the basis of many optical applications, such as SPI frequency interface for communication channels carried by vector modes and upconversion detection for polarization-resolved imaging. However, realizing such conversion remains a challenge. In this proof-of-principle work, we demonstrated SPI parametric upconversion using a polarization Sagnac nonlinear interferometer based on type-II second-harmonic generation (SHG). Our results show that the vector (including both polarization and intensity) profile and associated SOC state of the vector signal beam could be transferred to the SHG beam with a high fidelity. The principle lays a foundation of SPI frequency interface for quantum/classical channels based on vector modes and also paves the way for upconversion detection of polarization-resolved imaging in Mid-/far-infrared region.

preprint2020arXiv

Training a U-Net based on a random mode-coupling matrix model to recover acoustic interference striations

A U-Net is trained to recover acoustic interference striations (AISs) from distorted ones. A random mode-coupling matrix model is introduced to generate a large number of training data quickly, which are used to train the U-Net. The performance of AIS recovery of the U-Net is tested in range-dependent waveguides with nonlinear internal waves (NLIWs). Although the random mode-coupling matrix model is not an accurate physical model, the test results show that the U-Net successfully recovers AISs under different signal-to-noise ratios (SNRs) and different amplitudes and widths of NLIWs for different shapes.

preprint2018arXiv

Unorganized Malicious Attacks Detection

Recommender system has attracted much attention during the past decade. Many attack detection algorithms have been developed for better recommendations, mostly focusing on shilling attacks, where an attack organizer produces a large number of user profiles by the same strategy to promote or demote an item. This work considers a different attack style: unorganized malicious attacks, where attackers individually utilize a small number of user profiles to attack different items without any organizer. This attack style occurs in many real applications, yet relevant study remains open. We first formulate the unorganized malicious attacks detection as a matrix completion problem, and propose the Unorganized Malicious Attacks detection (UMA) approach, a proximal alternating splitting augmented Lagrangian method. We verify, both theoretically and empirically, the effectiveness of our proposed approach.

preprint2016arXiv

Cavitation of Water by Volume-Controlled Stretching

A liquid subjected to negative pressure is thermodynamically metastable. Confined within a small volume, negative pressure can build up until cavities form spontaneously. The critical negative pressure for cavitation in water has been theoretically predicted to be in the range of -100 to -200 MPa at room temperature, whereas values around -30 MPa have been obtained by many experiments. The discrepancy has yet to be resolved. In this study we perform molecular dynamics simulations to study cavitation of water under volume controlled stretching. It is found that liquid water exhibits a nonlinear elastic compressibility (or stretchability) under hydrostatic tension and remains stable within the confined volume until spontaneous cavitation occurs at a critical strain. Subsequently, as the volume-controlled stretching continues, the cavity grows stably and the hydrostatic tension decreases continuously until the box volume is large enough for another transition to form a water droplet. A modified nucleation theory is proposed to predict the critical condition for cavitation. In particular, a strong dependence of the critical strain and stress for cavitation on the initial liquid volume is predicted by the modified nucleation theory, which may offer a possible explanation for the discrepancies in the values of the critical negative pressure obtained from experiments.

preprint2016arXiv

Modern Physiognomy: An Investigation on Predicting Personality Traits and Intelligence from the Human Face

The human behavior of evaluating other individuals with respect to their personality traits and intelligence by evaluating their faces plays a crucial role in human relations. These trait judgments might influence important social outcomes in our lives such as elections and court sentences. Previous studies have reported that human can make valid inferences for at least four personality traits. In addition, some studies have demonstrated that facial trait evaluation can be learned using machine learning methods accurately. In this work, we experimentally explore whether self-reported personality traits and intelligence can be predicted reliably from a facial image. More specifically, the prediction problem is separately cast in two parts: a classification task and a regression task. A facial structural feature is constructed from the relations among facial salient points, and an appearance feature is built by five texture descriptors. In addition, a minutia-based fingerprint feature from a fingerprint image is also explored. The classification results show that the personality traits "Rule-consciousness" and "Vigilance" can be predicted reliably, and that the traits of females can be predicted more accurately than those of male. However, the regression experiments show that it is difficult to predict scores for individual personality traits and intelligence. The residual plots and the correlation results indicate no evident linear correlation between the measured scores and the predicted scores. Both the classification and the regression results reveal that "Rule-consciousness" and "Tension" can be reliably predicted from the facial features, while "Social boldness" gets the worst prediction results. The experiments results show that it is difficult to predict intelligence from either the facial features or the fingerprint feature, a finding that is in agreement with previous studies.

preprint2016arXiv

Observation of reversible orbital angular momentum transfer based on photon-phonon coupling

Orbital angular momentum (OAM) has gained great interest due to its most attractive feature of high dimensionality, and several ground-breaking demonstrations in communication based on OAM multiplexing have been carried out. Accordingly, a rapid data-density growth from OAM multiplexing has posed a great challenge to the signal-processing layer. Meanwhile, in another area, optical signal-processing circuit based on photon-phonon conversion has received considerable attention and made rapid progress. Here, with an aim of finding the intersection between OAM multiplexing and photon-phonon conversion, we report on the observation of reversible OAM photon-phonon conversion. A specific OAM state can be flexibly and controllably interconverted between photonic and phononic domains via Brillouin photon-phonon coupling within the decay time of acoustic signal, in which OAM and spin angular momentum are independently conserved. Our result demonstrates the controllable OAM transfer between photons and phonons, shows the potential of using OAM multiplexing to extend the capacity of photon-phonon conversion based signal-processing scheme, and may trigger the development of OAM-multiplexed photon-phonon circuit.

preprint2016arXiv

On the Irregularity of Some Molecular Structures

Measures of the irregularity of chemical graphs could be helpful for QSAR/QSPR studies and for the descriptive purposes of biological and chemical properties, such as melting and boiling points, toxicity and resistance. Here we consider the following four established irregularity measures: the irregularity index by Albertson, the total irregularity, the variance of vertex degrees and the Collatz-Sinogowitz index. Through the means of graph structural analysis and derivation, we study the above-mentioned irregularity measures of several chemical molecular graphs which frequently appear in chemical, medical and material engineering, as well as the nanotubes: $TUC_4 C_8(S)$, $TUC_4 C_8(R)$, Zig-Zag $TUHC_{6}$, $TUC_4$, Armchair $TUVC_{6}$, then dendrimers $T_{k,d}$ and the circumcoronene series of benzenoid $H_k$. In addition, the irregularities of Mycielski's constructions of cycle and path graphs are analyzed.

preprint2016arXiv

Roadmap for gravitational wave detection in space - a preliminary study

Part of a review paper entitled "Gravitational wave astronomy: the current status.", appeared in " Science China Physics, Mechanics & Astronomy 58.12 (2015): 1-41.

preprint2015arXiv

Entropic Effects of Thermal Rippling on van der Waals Interactions between Monolayer Graphene and a Rigid Substrate

Graphene monolayer, with extremely low flexural stiffness, displays spontaneous rippling due to thermal fluctuations at a finite temperature. When a graphene membrane is placed on a solid substrate, the adhesive interactions between graphene and the substrate could considerably suppress thermal rippling. On the other hand, the statistical nature of thermal rippling adds an entropic contribution to the graphene-substrate interactions. In this paper we present a statistical mechanics analysis on thermal rippling of monolayer graphene supported on a rigid substrate, assuming a generic form of van der Waals interactions between graphene and substrate at T = 0 K. The rippling amplitude, the equilibrium average separation, and the average interaction energy are predicted simultaneously and compared with molecular dynamics (MD) simulations. While the amplitude of thermal rippling is reduced by adhesive interactions, the entropic contribution leads to an effective repulsion. As a result, the equilibrium average separation increases and the effective adhesion energy decreases with increasing temperature. Moreover, the effect of a biaxial pre-strain in graphene is considered, and a buckling instability is predicted at a critical compressive strain that depends on both the temperature and the adhesive interactions. Limited by the harmonic approximations, the theoretical predictions agree with MD simulations only for relatively small rippling amplitudes but can be extended to account for the anharmonic effects.

preprint2015arXiv

Minimum codegree threshold for $C_6^3$-factors in $3$-uniform Hypergraphs

Let $C_6^3$ be the 3-uniform hypergraph on $\{1,\dots, 6\}$ with edges $123, 345,561$, which can be seen as the triangle in 3-uniform hypergraphs. For sufficiently large $n$ divisible by 6, we show that every $n$-vertex 3-uniform hypergraph $H$ with minimum codegree at least $n/3$ contains a $C_6^3$-factor, i.e., a spanning subhypergraph consisting of vertex-disjoint copies of $C_6^3$. The minimum codegree condition is best possible. This improves the asymptotical result obtained by Mycroft and answers a question of Rödl and Ruciński exactly.

preprint2015arXiv

Mouse Pose Estimation From Depth Images

We focus on the challenging problem of efficient mouse 3D pose estimation based on static images, and especially single depth images. We introduce an approach to discriminatively train the split nodes of trees in random forest to improve their performance on estimation of 3D joint positions of mouse. Our algorithm is capable of working with different types of rodents and with different types of depth cameras and imaging setups. In particular, it is demonstrated in this paper that when a top-mounted depth camera is combined with a bottom-mounted color camera, the final system is capable of delivering full-body pose estimation including four limbs and the paws. Empirical examinations on synthesized and real-world depth images confirm the applicability of our approach on mouse pose estimation, as well as the closely related task of part-based labeling of mouse.

preprint2015arXiv

On the distance spectra of graphs

The distance matrix of a graph $G$ is the matrix containing the pairwise distances between vertices. The distance eigenvalues of $G$ are the eigenvalues of its distance matrix and they form the distance spectrum of $G$. We determine the distance spectra of halved cubes, double odd graphs, and Doob graphs, completing the determination of distance spectra of distance regular graphs having exactly one positive distance eigenvalue. We characterize strongly regular graphs having more positive than negative distance eigenvalues. We give examples of graphs with few distinct distance eigenvalues but lacking regularity properties. We also determine the determinant and inertia of the distance matrices of lollipop and barbell graphs.

preprint2015arXiv

Parametric amplification of orbital angular momentum beams based on light-acoustic interaction

A high fidelity amplification of beams carrying orbital angular momentum (OAM) is very crucial for OAM multiplexing and other OAM-based applications. Here, we report the first study of stimulated Brillouin amplification (SBA) for OAM beams, the energy conversion efficiency of photon-phonon coupling and the phase structure of amplified signals are investigated in collinear and noncollinear frame systems, respectively. Our results demonstrate that the OAM signals can be efficiently amplified without obvious noise introduced, and the modes of output signal are independent of the pump modes or the geometrical frames. Meanwhile, an OAM state depending on the optical modes and the geometrical frames is loaded into phonons by coherent light-acoustic interaction, which reveals more fundamental significance and a great application potential in OAM-multiplexing.

preprint2014arXiv

Descope of the ALIA mission

The present work reports on a feasibility study commissioned by the Chinese Academy of Sciences of China to explore various possible mission options to detect gravitational waves in space alternative to that of the eLISA/LISA mission concept. Based on the relative merits assigned to science and technological viability, a few representative mission options descoped from the ALIA mission are considered. A semi-analytic Monte Carlo simulation is carried out to understand the cosmic black hole merger histories starting from intermediate mass black holes at high redshift as well as the possible scientific merits of the mission options considered in probing the light seed black holes and their coevolution with galaxies in early Universe. The study indicates that, by choosing the armlength of the interferometer to be three million kilometers and shifting the sensitivity floor to around one-hundredth Hz, together with a very moderate improvement on the position noise budget, there are certain mission options capable of exploring light seed, intermediate mass black hole binaries at high redshift that are not readily accessible to eLISA/LISA, and yet the technological requirements seem to within reach in the next few decades for China.

preprint2014arXiv

Dropout Rademacher Complexity of Deep Neural Networks

Great successes of deep neural networks have been witnessed in various real applications. Many algorithmic and implementation techniques have been developed, however, theoretical understanding of many aspects of deep neural networks is far from clear. A particular interesting issue is the usefulness of dropout, which was motivated from the intuition of preventing complex co-adaptation of feature detectors. In this paper, we study the Rademacher complexity of different types of dropout, and our theoretical results disclose that for shallow neural networks (with one or none hidden layer) dropout is able to reduce the Rademacher complexity in polynomial, whereas for deep neural networks it can amazingly lead to an exponential reduction of the Rademacher complexity.

preprint2014arXiv

Estimation for Dynamic and Static Panel Probit Models with Large Individual Effects

For discrete panel data, the dynamic relationship between successive observations is often of interest. We consider a dynamic probit model for short panel data. A problem with estimating the dynamic parameter of interest is that the model contains a large number of nuisance parameters, one for each individual. Heckman proposed to use maximum likelihood estimation of the dynamic parameter, which, however, does not perform well if the individual effects are large. We suggest new estimators for the dynamic parameter, based on the assumption that the individual parameters are random and possibly large. Theoretical properties of our estimators are derived and a simulation study shows they have some advantages compared to Heckman's estimator.

preprint2014arXiv

Interfacial adhesion between graphene and silicon dioxide by density functional theory with van der Waals corrections

Interfacial adhesion between graphene and a SiO2 substrate is studied by density functional theory (DFT) with dispersion corrections. The results demonstrate the van der Waals (vdW) interaction as the predominate mechanism for the graphene/SiO2 interface. It is found that the interaction strength is strongly influenced by changes of the SiO2 surface structures due to surface reactions with water. The adhesion energy is reduced when the reconstructed SiO2 surface is hydroxylated, and further reduced when covered by a monolayer of adsorbed water molecules. Thus, the effect of humidity may help explain the wide variation of adhesion energies measured in recent experiments between graphene and SiO2. Moreover, it is noted that vdW forces are required to accurately model the graphene/SiO2 interface with DFT and that the adhesion energy is underestimated by empirical force fields commonly used in atomistic simulations.

preprint2014arXiv

On the Consistency of AUC Pairwise Optimization

AUC (area under ROC curve) is an important evaluation criterion, which has been popularly used in many learning tasks such as class-imbalance learning, cost-sensitive learning, learning to rank, etc. Many learning approaches try to optimize AUC, while owing to the non-convexity and discontinuousness of AUC, almost all approaches work with surrogate loss functions. Thus, the consistency of AUC is crucial; however, it has been almost untouched before. In this paper, we provide a sufficient condition for the asymptotic consistency of learning approaches based on surrogate loss functions. Based on this result, we prove that exponential loss and logistic loss are consistent with AUC, but hinge loss is inconsistent. Then, we derive the $q$-norm hinge loss and general hinge loss that are consistent with AUC. We also derive the consistent bounds for exponential loss and logistic loss, and obtain the consistent bounds for many surrogate loss functions under the non-noise setting. Further, we disclose an equivalence between the exponential surrogate loss of AUC and exponential surrogate loss of accuracy, and one straightforward consequence of such finding is that AdaBoost and RankBoost are equivalent.

preprint2013arXiv

Convergence analysis of kernel LMS algorithm with pre-tuned dictionary

The kernel least-mean-square (KLMS) algorithm is an appealing tool for online identification of nonlinear systems due to its simplicity and robustness. In addition to choosing a reproducing kernel and setting filter parameters, designing a KLMS adaptive filter requires to select a so-called dictionary in order to get a finite-order model. This dictionary has a significant impact on performance, and requires careful consideration. Theoretical analysis of KLMS as a function of dictionary setting has rarely, if ever, been addressed in the literature. In an analysis previously published by the authors, the dictionary elements were assumed to be governed by the same probability density function of the input data. In this paper, we modify this study by considering the dictionary as part of the filter parameters to be set. This theoretical analysis paves the way for future investigations on KLMS dictionary design.

preprint2013arXiv

Efficient Computational Algorithm for Optimal Allocation in Regression Models

In this article, we discuss the optimal allocation problem in an experiment when a regression model is used for statistical analysis. Monotonic convergence for a general class of multiplicative algorithms for $D$-optimality has been discussed in the literature. Here, we provide an alternate proof of the monotonic convergence for $D$-criterion with a simple computational algorithm and furthermore show it converges to the $D$-optimality. We also discuss an algorithm as well as a conjecture of the monotonic convergence for $A$-criterion. Monte Carlo simulations are used to demonstrate the reliability, efficiency and usefulness of the proposed algorithms.

preprint2013arXiv

On the Doubt about Margin Explanation of Boosting

Margin theory provides one of the most popular explanations to the success of \texttt{AdaBoost}, where the central point lies in the recognition that \textit{margin} is the key for characterizing the performance of \texttt{AdaBoost}. This theory has been very influential, e.g., it has been used to argue that \texttt{AdaBoost} usually does not overfit since it tends to enlarge the margin even after the training error reaches zero. Previously the \textit{minimum margin bound} was established for \texttt{AdaBoost}, however, \cite{Breiman1999} pointed out that maximizing the minimum margin does not necessarily lead to a better generalization. Later, \cite{Reyzin:Schapire2006} emphasized that the margin distribution rather than minimum margin is crucial to the performance of \texttt{AdaBoost}. In this paper, we first present the \textit{$k$th margin bound} and further study on its relationship to previous work such as the minimum margin bound and Emargin bound. Then, we improve the previous empirical Bernstein bounds \citep{Maurer:Pontil2009,Audibert:Munos:Szepesvari2009}, and based on such findings, we defend the margin-based explanation against Breiman's doubts by proving a new generalization error bound that considers exactly the same factors as \cite{Schapire:Freund:Bartlett:Lee1998} but is sharper than \cite{Breiman1999}'s minimum margin bound. By incorporating factors such as average margin and variance, we present a generalization error bound that is heavily related to the whole margin distribution. We also provide margin distribution bounds for generalization error of voting classifiers in finite VC-dimension space.

preprint2013arXiv

One-Pass AUC Optimization

AUC is an important performance measure and many algorithms have been devoted to AUC optimization, mostly by minimizing a surrogate convex loss on a training data set. In this work, we focus on one-pass AUC optimization that requires only going through the training data once without storing the entire training dataset, where conventional online learning algorithms cannot be applied directly because AUC is measured by a sum of losses defined over pairs of instances from different classes. We develop a regression-based algorithm which only needs to maintain the first and second order statistics of training data in memory, resulting a storage requirement independent from the size of training data. To efficiently handle high dimensional data, we develop a randomized algorithm that approximates the covariance matrices by low rank matrices. We verify, both theoretically and empirically, the effectiveness of the proposed algorithm.

preprint2013arXiv

Online dictionary learning for kernel LMS. Analysis and forward-backward splitting algorithm

Adaptive filtering algorithms operating in reproducing kernel Hilbert spaces have demonstrated superiority over their linear counterpart for nonlinear system identification. Unfortunately, an undesirable characteristic of these methods is that the order of the filters grows linearly with the number of input data. This dramatically increases the computational burden and memory requirement. A variety of strategies based on dictionary learning have been proposed to overcome this severe drawback. Few, if any, of these works analyze the problem of updating the dictionary in a time-varying environment. In this paper, we present an analytical study of the convergence behavior of the Gaussian least-mean-square algorithm in the case where the statistics of the dictionary elements only partially match the statistics of the input data. This allows us to emphasize the need for updating the dictionary in an online way, by discarding the obsolete elements and adding appropriate ones. We introduce a kernel least-mean-square algorithm with L1-norm regularization to automatically perform this task. The stability in the mean of this method is analyzed, and its performance is tested with experiments.

preprint2013arXiv

Sign patterns with minimum rank 3 and point-line configurations

A \emph{sign pattern (matrix)} is a matrix whose entries are from the set $\{+, -, 0\}$. The \emph{minimum rank} (respectively, \emph{rational minimum rank}) of a sign pattern matrix $\cal A$ is the minimum of the ranks of the real (respectively, rational) matrices whose entries have signs equal to the corresponding entries of $\cal A$. A sign pattern $\cal A$ is said to be \emph{condensed} if $\cal A$ has no zero row or column and no two rows or columns are identical or negatives of each other. In this paper, a new direct connection between condensed $m \times n $ sign patterns with minimum rank $r$ and $m$ point--$n$ hyperplane configurations in ${\mathbb R}^{r-1}$ is established. In particular, condensed sign patterns with minimum rank 3 are closed related to point--line configurations on the plane. It is proved that for any sign pattern $\cal A$ with minimum rank $r\geq 3$, if the number of zero entries on each column of $\cal A$ is at most $r-1$, then the rational minimum rank of $\cal A$ is also $r$. Furthermore, we construct the smallest known sign pattern whose minimum rank is 3 but whose rational minimum rank is greater than 3.

preprint2011arXiv

High-Speed Propulsion of Flexible Nanowire Motors: Theory and Experiments

Micro/nano-scale propulsion has attracted considerable recent attention due to its promise for biomedical applications such as targeted drug delivery. In this paper, we report on a new experimental design and theoretical modelling of high-speed fuel-free magnetically-driven propellers which exploit the flexibility of nanowires for propulsion. These readily prepared nanomotors display both high dimensional propulsion velocities (up to ~ 21 micrometer per second) and dimensionless speeds (in body lengths per revolution) when compared with natural microorganisms and other artificial propellers. Their propulsion characteristics are studied theoretically using an elastohydrodynamic model which takes into account the elasticity of the nanowire and its hydrodynamic interaction with the fluid medium. The critical role of flexibility in this mode of propulsion is illustrated by simple physical arguments, and is quantitatively investigated with the help of an asymptotic analysis for small-amplitude swimming. The theoretical predictions are then compared with experimental measurements and we obtain good agreement. Finally, we demonstrate the operation of these nanomotors in a real biological environment (human serum), emphasizing the robustness of their propulsion performance and their promise for biomedical applications.

Wei Gao

What is connected

Connect this record

See the researcher in context

Building this map preview

65 published item(s)

Crab: A Semantics-Aware Checkpoint/Restore Runtime for Agent Sandboxes

Lever: Speculative LLM Inference on Smartphones

DriveExplorer: Images-Only Decoupled 4D Reconstruction with Progressive Restoration for Driving View Extrapolation

Ultrahigh-Energy Gamma-ray Emission Associated with Black Hole-Jet Systems

Unravelling the deterministic effect of the solid-state diffusion energy barrier for charge carrier on the self-discharge of supercapacitors

A Rate Control Algorithm for Video-based Point Cloud Compression

A Weakly Supervised Propagation Model for Rumor Verification and Stance Detection with Multiple Instance Learning

Adaptive Random Fourier Features Kernel LMS

BCS-Net: Boundary, Context and Semantic for Automatic COVID-19 Lung Infection Segmentation from CT Images

Consistent Quality Oriented Rate Control in HEVC via Balancing Intra and Inter Frame Coding

Context-Hierarchy Inverse Reinforcement Learning

Deep Geometry Post-Processing for Decompressed Point Clouds

Deep Learning Workload Scheduling in GPU Datacenters: Taxonomy, Challenges and Vision

Deep Visual Navigation under Partial Observability

DialMed: A Dataset for Dialogue-based Medication Recommendation

End-to-end lossless compression of high precision depth maps guided by pseudo-residual

Exploiting Robust Unsupervised Video Person Re-identification

Frequency conversion of abruptly autofocusing waves

Learning to Disentangle Scenes for Person Re-identification

Multi-direction and Multi-scale Pyramid in Transformer for Video-based Pedestrian Retrieval

NTIRE 2021 Challenge on Quality Enhancement of Compressed Video: Methods and Results

OctAttention: Octree-Based Large-Scale Contexts Model for Point Cloud Compression

On the conjecture about the exponential reduced Sombor index

On the Optimization of Margin Distribution

OpenMedIA: Open-Source Medical Image Analysis Toolbox and Benchmark under Heterogeneous AI Computing Platforms

Self-Supervised Arbitrary-Scale Point Clouds Upsampling via Implicit Neural Representation

Assessing Individual and Community Vulnerability to Fake News in Social Networks

Bidirectional Trajectory Computation for Odometer-Aided Visual-Inertial SLAM

Conformal frequency conversion for arbitrary vectorial structured light

Efficient computational algorithms for approximate optimal designs

kPAM 2.0: Feedback Control for Category-Level Robotic Manipulation

Subspace Clustering for Panel Data with Interactive Effects

Towards Understanding Theoretical Advantages of Complex-Reaction Networks

Transient Performance Analysis of the $\ell_1$-RLS

Classically-entangled Ince-Gaussian modes

LIPSS-Sticks: Laser induced double self organization enhances the broadband light harvesting of TiO2 nanotube arrays

Measuring the non-separability of vector modes with digital micromirror devices

Mostar index of graph operations

Parametric upconversion of Ince-Gaussian modes

Radial modal transitions of Laguerre-Gauss modes during parametric upconversion: towards the full-field selection rule of spatial modes

Spatial polarization independent parametric upconversion of vectorially structured light

Training a U-Net based on a random mode-coupling matrix model to recover acoustic interference striations

Unorganized Malicious Attacks Detection

Cavitation of Water by Volume-Controlled Stretching

Modern Physiognomy: An Investigation on Predicting Personality Traits and Intelligence from the Human Face

Observation of reversible orbital angular momentum transfer based on photon-phonon coupling

On the Irregularity of Some Molecular Structures

Roadmap for gravitational wave detection in space - a preliminary study

Entropic Effects of Thermal Rippling on van der Waals Interactions between Monolayer Graphene and a Rigid Substrate

Minimum codegree threshold for $C_6^3$-factors in $3$-uniform Hypergraphs

Mouse Pose Estimation From Depth Images

On the distance spectra of graphs

Parametric amplification of orbital angular momentum beams based on light-acoustic interaction

Descope of the ALIA mission

Dropout Rademacher Complexity of Deep Neural Networks

Estimation for Dynamic and Static Panel Probit Models with Large Individual Effects

Interfacial adhesion between graphene and silicon dioxide by density functional theory with van der Waals corrections

On the Consistency of AUC Pairwise Optimization

Convergence analysis of kernel LMS algorithm with pre-tuned dictionary

Efficient Computational Algorithm for Optimal Allocation in Regression Models

On the Doubt about Margin Explanation of Boosting

One-Pass AUC Optimization

Online dictionary learning for kernel LMS. Analysis and forward-backward splitting algorithm

Sign patterns with minimum rank 3 and point-line configurations

High-Speed Propulsion of Flexible Nanowire Motors: Theory and Experiments