Source author record

Yang Zhou

Yang Zhou appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

74works

39topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A Forward Simulation-Based Hierarchy of Linearizable Concurrent Objects

In this paper, we systematically investigate the connection between linearizable objects and forward simulation. We prove that the sets of linearizable objects satisfying wait-freedom (resp., lock-freedom or obstruction-freedom) form a bounded join-semilattice under the forward simulation relation, and that the sets of linearizable objects without liveness constraints form a bounded lattice under the same relation. As part of our lattice result, we propose an equivalent characterization of linearizability by reducing checking linearizability w.r.t. sequential specification $Spec$ into checking forward simulation by an object $\mathcal{U}_{Spec}$. To demonstrate the forward simulation relation between linearizable objects, we prove that the objects that are strongly linearizable w.r.t. the same sequential specification and are wait-free (resp., lock-free, obstruction-free) simulate each other, and we prove that the time-stamped queue simulates the Herlihy-Wing queue. We also prove that the Herlihy-Wing queue is simulated by $\mathcal{U}_{Spec}$, and thus, our equivalent characterization of linearizability can be used in the verification of linearizability.

preprint2026arXiv

A Real-time Scale-robust Network for Glottis Segmentation in Nasal Transnasal Intubation

Nasotracheal intubation (NTI) is a critical clinical procedure for establishing and maintaining patient airway patency. Machine-assisted NTI has emerged as a pivotal approach for optimizing procedural efficiency and minimizing manual intervention. However, visual detection algorithms employed for NTI navigation encounter significant challenges, including complex anatomical environments and suboptimal illumination conditions surrounding the glottis. Additionally, the glottis presents considerable scale variability throughout the procedure, initially appearing as a small, difficult-to-capture structure before expanding to occupy nearly the entire field of view. Moreover, traditional visual detection methods often have high computational costs, making real-time, high-precision detection on portable devices challenging. To enhance NTI efficacy and address these challenges, this paper proposes a novel glottis segmentation framework optimized for vision-assisted NTI applications. First, we designed a lightweight, multi-receptive field feature extraction module to reduce intra-class differences, achieving robustness to scale variations of the glottis. This module was then stacked to form the backbone and neck of our network. Subsequently, we developed an advanced label assignment method and redefined the number of samples to further reduce intra-class differences and enhance accuracy in the complex NTI environment. Experiments on three distinct datasets demonstrate that our network surpasses state-of-the-art algorithms, achieving a segmentation mDice of 92.9\% with a compact model size of 19 MB and an inference speed exceeding 170 frames per second. % Our code and datasets will be open-sourced on GitHub after the manuscript is accepted. Our code and datasets are available at https://github.com/HBUT-CV/GlottisNet.

preprint2026arXiv

AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving

Recent advancements in large vision language models (VLMs) tailored for autonomous driving (AD) have shown strong scene understanding and reasoning capabilities, making them undeniable candidates for end-to-end driving systems. However, limited work exists on studying the trustworthiness of DriveVLMs -- a critical factor that directly impacts public transportation safety. In this paper, we introduce AutoTrust, a comprehensive trustworthiness benchmark for large vision-language models in autonomous driving (DriveVLMs), considering diverse perspectives -- including trustfulness, safety, robustness, privacy, and fairness. We constructed the largest visual question-answering dataset for investigating trustworthiness issues in driving scenarios, comprising over 10k unique scenes and 18k queries. We evaluated six publicly available VLMs, spanning from generalist to specialist, from open-source to commercial models. Our exhaustive evaluations have unveiled previously undiscovered vulnerabilities of DriveVLMs to trustworthiness threats. Specifically, we found that the general VLMs like LLaVA-v1.6 and GPT-4o-mini surprisingly outperform specialized models fine-tuned for driving in terms of overall trustworthiness. DriveVLMs like DriveLM-Agent are particularly vulnerable to disclosing sensitive information. Additionally, both generalist and specialist VLMs remain susceptible to adversarial attacks and struggle to ensure unbiased decision-making across diverse environments and populations. Our findings call for immediate and decisive action to address the trustworthiness of DriveVLMs -- an issue of critical importance to public safety and the welfare of all citizens relying on autonomous transportation systems. We release all the codes and datasets in https://github.com/taco-group/AutoTrust.

preprint2026arXiv

CTIS-QA: Clinical Template-Informed Slide-level Question Answering for Pathology

In this paper, we introduce a clinical diagnosis template-based pipeline to systematically collect and structure pathological information. In collaboration with pathologists and guided by the the College of American Pathologists (CAP) Cancer Protocols, we design a Clinical Pathology Report Template (CPRT) that ensures comprehensive and standardized extraction of diagnostic elements from pathology reports. We validate the effectiveness of our pipeline on TCGA-BRCA. First, we extract pathological features from reports using CPRT. These features are then used to build CTIS-Align, a dataset of 80k slide-description pairs from 804 WSIs for vision-language alignment training, and CTIS-Bench, a rigorously curated VQA benchmark comprising 977 WSIs and 14,879 question-answer pairs. CTIS-Bench emphasizes clinically grounded, closed-ended questions (e.g., tumor grade, receptor status) that reflect real diagnostic workflows, minimize non-visual reasoning, and require genuine slide understanding. We further propose CTIS-QA, a Slide-level Question Answering model, featuring a dual-stream architecture that mimics pathologists' diagnostic approach. One stream captures global slide-level context via clustering-based feature aggregation, while the other focuses on salient local regions through attention-guided patch perception module. Extensive experiments on WSI-VQA, CTIS-Bench, and slide-level diagnostic tasks show that CTIS-QA consistently outperforms existing state-of-the-art models across multiple metrics. Code and data are available at https://github.com/HLSvois/CTIS-QA.

preprint2026arXiv

CUE: Concept-Aware Multi-Label Expansion to Mitigate Concept Confusion in Long-Tailed Learning

Long-tailed distributions are common in real-world recognition tasks, where a few head classes have many samples while most tail classes have very few. Recently, fine-tuning foundation models for long-tailed learning has gained attention due to their excellent performance. However, most existing methods focus solely on mitigating long-tailed distribution bias while overlooking concept confusion caused by the long-tailed distribution. In this paper, we study this problem and attribute it to the mutual exclusivity of single-label supervision under long-tailed distributions, which suppresses feature sharing among related classes and amplifies the dominance of head classes, leading to disrupted inter-class discriminability. To address this, we propose CUE, Concept-aware mUlti-label Expansion, which introduces multi-label concept signals to preserve disrupted inter-class relationships. Specifically, CUE constructs concept sets by (i) extracting instance-level visual cues from zero-shot CLIP and (ii) generating class-level semantic cues with LLM; the two cues are incorporated via separately weighted Binary Logit-Adjustment (BLA) auxiliary losses and jointly optimized with the baseline Logit-Adjustment (LA) loss. Experiments on several long-tailed benchmarks, CUE achieves balanced and strong performance, surpassing recent state-of-the-art methods. Code is available at: https://github.com/zhangruichi/CUE.

preprint2026arXiv

Cyclic Modulation Control of Multi-Conflict Connected Automated Traffic

Multi-conflict traffic is ubiquitous. Connected Automated Vehicles (CAVs) offer unprecedented opportunities to enhance safety, reduce emissions, and increase throughput through precise coordination and automation. However, existing CAV strategies remain confined to specialized scenarios, such as highway on-ramp merging or single-lane roundabouts, and traditional traffic signals sacrifice efficiency for safety via rigid phasing and all-red intervals. In this paper, we present Cyclic Modulation Control of Multi-Conflict Connected Automated Traffic (CMAT), a unified, geometry-agnostic framework that embeds each conflict point into a repeating sequence of "micro-phases". Vehicles dynamically form platoons with demand-responsive sizes and negotiate time slots for occupying conflict points, enabling collision-free traversal and high intersection utilization. CMAT aims to minimize delay, guarantee safety, and accommodate arbitrary merging, diverging, and crossing patterns without manual retuning. We formalize CMAT as a mixed-integer linear programming model constructed on a directed graph abstracted from the physical intersection layout. The performance of CMAT is evaluated across a suite of multi-conflict tests, including simple two-way crossings, four-leg intersections, complex connected intersections. The results demonstrate substantial reductions in delay and significant throughput improvements compared with state-of-the-art CAV coordination methods and traditional signal timing strategies.

preprint2026arXiv

DARE: Difficulty-Adaptive Reinforcement Learning with Co-Evolved Difficulty Estimation

Reinforcement learning improves the reasoning ability of large language models but remains costly and sample-inefficient, as many rollouts provide weak learning signals. Difficulty-aware data selection methods attempt to address this by prioritizing moderately difficult prompts, yet our analysis reveals three limitations: difficulty estimates become inaccurate under policy drift, data selection alone yields limited final-performance gains, and inference efficiency remains largely unchanged. These findings suggest that efficient and effective RL requires more than filtering by difficulty: the policy should learn to solve hard tasks while producing concise responses for easy ones. To this end, we propose **Dare**, a unified framework that co-evolves difficulty estimation with the policy via self-normalized importance sampling, maintains diverse difficulty coverage through a symmetric Beta sampling distribution, and applies tailored training strategies across difficulty tiers with adaptive compute allocation. Extensive experiments across multiple models and domains demonstrate that **Dare** consistently outperforms existing methods in training efficiency, final effectiveness, and inference efficiency, producing more concise responses on easy tasks while improving correctness on hard ones. Code is available at https://github.com/EtaYang10th/DARE.

preprint2026arXiv

Evidence Over Plans: Online Trajectory Verification for Skill Distillation

Agent skills can remarkably improve task success rates by using human-written procedural documents, but their quality is difficult to assess without environment-grounded verification. Existing skill generation methods heavily rely on preference logs rather than direct environment interaction, often yielding negligible or even degraded gains. We identify that it is a fundamental timing bottleneck: robust skills should be posterior-based, distilled from empirical environment interaction rather than prior plans. In this study, we introduce the Posterior Distillation Index (PDI), a trajectory-level metric that quantifies how well a distilled skill is grounded in the task-environment evidence. To operationalize PDI, we present SPARK (Structured Pipelines for Autonomous Runnable tasKs and sKill generation) for preserving task execution evidence towards full trajectory-level analysis. SPARK generates environment-verified trajectories used to compute PDI, and it applies PDI as an online diagnostic and intervention signal to ensure posterior skill formation. Across 86 runnable tasks, SPARK-generated skills consistently surpass no-skill baselines and outperform human-written skills on student models (inference cost up to 1,000x cheaper than teacher models). These findings show that PDI-guided distillation produces efficient and transferable skills grounded in the task-environment interaction. We release our code at https://github.com/EtaYang10th/spark-skills .

preprint2026arXiv

Explainability-Guided Defense: Attribution-Aware Model Refinement Against Adversarial Data Attacks

The growing reliance on deep learning models in safety-critical domains such as healthcare and autonomous navigation underscores the need for defenses that are both robust to adversarial perturbations and transparent in their decision-making. In this paper, we identify a connection between interpretability and robustness that can be directly leveraged during training. Specifically, we observe that spurious, unstable, or semantically irrelevant features identified through Local Interpretable Model-Agnostic Explanations (LIME) contribute disproportionately to adversarial vulnerability. Building on this insight, we introduce an attribution-guided refinement framework that transforms LIME from a passive diagnostic into an active training signal. Our method systematically suppresses spurious features using feature masking, sensitivity-aware regularization, and adversarial augmentation in a closed-loop refinement pipeline. This approach does not require additional datasets or model architectures and integrates seamlessly into standard adversarial training. Theoretically, we derive an attribution-aware lower bound on adversarial distortion that formalizes the link between explanation alignment and robustness. Empirical evaluations on CIFAR-10, CIFAR-10-C, and CIFAR-100 demonstrate substantial improvements in adversarial robustness and out-of-distribution generalization.

preprint2026arXiv

ForgeVLA: Federated Vision-Language-Action Learning without Language Annotations

Vision-Language-Action (VLA) models hold great promise for general-purpose robotic intelligence, yet scaling up such models is severely bottlenecked by the high cost of acquiring annotated training data. Fortunately, vision-equipped robots deployed across various domains already produce abundant vision-action pairs that can be leveraged to scale up VLA training more efficiently. However, these raw data cannot be centrally aggregated due to various constraints and also exhibit severe heterogeneity. To address these challenges, in this paper, we propose ForgeVLA, a federated VLA training framework that learns VLA models from distributed vision-action pairs without centralizing raw data or requiring manual annotations. Specifically, each client in ForgeVLA is equipped with an embodied instruction classifier that maps vision-action pairs to a predefined instruction set, recovering the missing language modality and forming complete vision-language-action triplets. Beyond triplet construction, we also identify vision-language feature collapse as a critical challenge that has been largely overlooked in prior federated VLA research. To mitigate this issue, ForgeVLA combines a client-side contrastive planning loss with a server-side adaptive aggregation strategy to learn task-discriminative representations efficiently. Extensive experiments across multiple benchmarks show that ForgeVLA significantly outperforms other baselines, and ablation studies further validate the contribution of each component.

preprint2026arXiv

Large Dimensional Kernel Ridge Regression: Extending to Product Kernels

Recent studies have reported $\textit{saturation effects}$ and $\textit{multiple descent behavior}$ in large dimensional kernel ridge regression (KRR). However, these findings are predominantly derived under restrictive settings, such as inner product kernels on sphere or strong eigenfunction assumptions like hypercontractivity. Whether such behaviors hold for other kernels remains an open question. In this paper, we establish a broad, new family of large dimensional kernels and derive the corresponding convergence rates of the generalization error. As a result, we recover key phenomena previously associated with inner product kernels on sphere, including: $i)$ the $\textit{minimax optimality}$ when the source condition $s\le 1$; $ii)$ the $\textit{saturation effect}$ when $s>1$; $iii)$ a $\textit{periodic plateau phenomenon}$ in the convergence rate and a $\textit {multiple-descent behavior}$ with respect to the sample size $n$.

preprint2026arXiv

Learning Higher-Order Structure from Incomplete Spatiotemporal Data: Multi-Scale Hypergraph Laplacians with Neural Refinement

Sensor networks increasingly govern modern infrastructure, yet the data they lose are rarely missing in the uniform-random patterns assumed by standard imputation benchmarks. Loop detectors go offline during calibration, roadside cabinets silence clusters of nearby sensors, and newly installed instruments provide no history. Such failures create structured absences whose values are constrained by higher-order relations among groups of sensors, not merely by pairwise proximity. Existing low-rank and graph-based methods often miss this collective structure and can fail when missingness becomes coherent. We introduce Multi-Scale Hypergraph Laplacians (MSHL), a two-stage framework for learning higher-order structure from incomplete spatiotemporal observations. The Discovery stage builds a multi-scale hypergraph from complementary topology and residual-correlation evidence, with an observation-only selector that adapts to the supported interaction scale. The Refinement stage adds a small hypergraph-conditioned residual network that is safe by construction: it learns nonlinear corrections where informative residual features exist and defers to the linear estimate where they do not. We prove that MSHL represents group-conservation patterns inaccessible to pairwise graph priors, adapts to the best fixed scale up to a logarithmic factor, transfers this advantage to held-out imputation error, and admits a one-sided refinement guarantee. On two real traffic networks evaluated across scattered cell missingness, contiguous block outages, and whole-sensor blackouts at five rates, MSHL improves over a pairwise-graph baseline whenever higher-order structure is identifiable and otherwise matches it within sampling noise. The results point to a broader principle for reliable infrastructure learning: missing data should be treated not as isolated entries to fill, but as evidence of structure to discover.

preprint2026arXiv

MoCam: Unified Novel View Synthesis via Structured Denoising Dynamics

Generative novel view synthesis faces a fundamental dilemma: geometric priors provide spatial alignment but become sparse and inaccurate under view changes, while appearance priors offer visual fidelity but lack geometric correspondence. Existing methods either propagate geometric errors throughout generation or suffer from signal conflicts when fusing both statically. We introduce MoCam, which employs structured denoising dynamics to orchestrate a coordinated progression from geometry to appearance within the diffusion process. MoCam first leverages geometric priors in early stages to anchor coarse structures and tolerate their incompleteness, then switches to appearance priors in later stages to actively correct geometric errors and refine details. This design naturally unifies static and dynamic view synthesis by temporally decoupling geometric alignment and appearance refinement within the diffusion process. Experiments demonstrate that MoCam significantly outperforms prior methods, particularly when point clouds contain severe holes or distortions, achieving robust geometry-appearance disentanglement.

preprint2026arXiv

NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation

We present NextFlow, a unified decoder-only autoregressive transformer trained on 6 trillion interleaved text-image discrete tokens. By leveraging a unified vision representation within a unified autoregressive architecture, NextFlow natively activates multimodal understanding and generation capabilities, unlocking abilities of image editing, interleaved content and video generation. Motivated by the distinct nature of modalities - where text is strictly sequential and images are inherently hierarchical - we retain next-token prediction for text but adopt next-scale prediction for visual generation. This departs from traditional raster-scan methods, enabling the generation of 1024x1024 images in just 5 seconds - orders of magnitude faster than comparable AR models. We address the instabilities of multi-scale generation through a robust training recipe. Furthermore, we introduce a prefix-tuning strategy for reinforcement learning. Experiments demonstrate that NextFlow achieves state-of-the-art performance among unified models and rivals specialized diffusion baselines in visual quality.

preprint2026arXiv

Nonlinear Oscillatory Response of Automated Vehicle Car-following: Theoretical Analysis with Traffic State and Control Input Limits

This paper presents a framework grounded in the theory of describing function (DF) and incremental-input DF to theoretically analyze the nonlinear oscillatory response of automated vehicles (AVs) car-following (CF) amidst traffic oscillations, considering the limits of traffic state and control input. While prevailing approaches largely ignore these limits (i.e., saturation of acceleration/deceleration and speed) and focus on linear string stability analysis, this framework establishes a basis for theoretically analyzing the frequency response of AV systems with nonlinearities imposed by these limits. To this end, trajectories of CF pairs are decomposed into nominal and oscillatory trajectories, subsequently, the controlled AV system is repositioned within the oscillatory trajectory coordinates. Built on this base, DFs are employed to approximate the frequency responses of nonlinear saturation components by using their first harmonic output, thereby capturing the associated amplification ratio and phase shift. Considering the closed-loop nature of AV control systems, where system states and control input mutually influence each other, amplification ratios and phase shifts are balanced within the loop to ensure consistency. This balancing process may render multiple solutions, hence the incremental-input DF is further applied to identify the reasonable ones. The proposed method is validated by estimations from Simulink, and further comparisons with prevailing methods are conducted. Results confirm the alignment of our framework with Simulink results and exhibit its superior accuracy in analysis compared to the prevailing methods. Furthermore, the framework proves valuable in string stability analysis, especially when conventional linear methods offer misleading insights.

preprint2026arXiv

Reflected multi-entropy and its holographic dual

We introduce a mixed-state generalization of the multi-entropy through the canonical purification, which we call reflected multi-entropy. We propose the holographic dual of this measure. For the tripartite case, a field-theoretical calculation is performed using a six-point function of twist operators at large $c$ limit. At both zero and finite temperature, the field-theoretical results match the holographic results, supporting our holographic conjecture of this new measure.

preprint2023arXiv

High precision atom interferometer-based dynamic gravimeter measurement by eliminating the cross-coupling effect

A dynamic gravimeter with an atomic interferometer (AI) can perform absolute gravity measurements with high precision. AI-based dynamic gravity measurement is a type of joint measurement that uses AI sensors and a classical accelerometer. The coupling of the two sensors may degrade the measurement precision. In this study, we analyzed the cross-coupling effect and introduced a recovery vector to suppress this effect. We improved the phase noise of the interference fringe by a factor of 1.9 by performing marine gravity measurements using an AI-based gravimeter and optimizing the recovery vector. Marine gravity measurements were performed, and high gravity measurement precision was achieved. The external and inner coincidence accuracies of the gravity measurement are 0.42 mGal and 0.46 mGal, which were improved by factors of 4.18 and 4.21 by optimizing the cross-coupling effect.

preprint2023arXiv

Understanding Heterogeneity of Automated Vehicles and Its Traffic-level Impact: A Stochastic Behavioral Perspective

This paper develops a stochastic and unifying framework to examine variability in car-following (CF) dynamics of commercial automated vehicles (AVs) and its direct relation to traffic-level dynamics. The asymmetric behavior (AB) model by Chen at al. (2012a) is extended to accommodate a range of CF behaviors by AVs and compare with the baseline of human-driven vehicles (HDVs). The parameters of the extended AB (EAB) model are calibrated using an adaptive sequential Monte Carlo method for Approximate Bayesian Computation (ABC-ASMC) to stochastically capture various uncertainties including model mismatch resulting from unknown AV CF logic. The estimated posterior distributions of the parameters reveal significant differences in CF behavior (1) between AVs and HDVs, and (2) across AV developers, engine modes, and speed ranges, albeit to a lesser degree. The estimated behavioral patterns and simulation experiments further reveal mixed platoon dynamics in terms of traffic throughout reduction and hysteresis.

preprint2022arXiv

A Lightweight NMS-free Framework for Real-time Visual Fault Detection System of Freight Trains

Real-time vision-based system of fault detection (RVBS-FD) for freight trains is an essential part of ensuring railway transportation safety. Most existing vision-based methods still have high computational costs based on convolutional neural networks. The computational cost is mainly reflected in the backbone, neck, and post-processing, i.e., non-maximum suppression (NMS). In this paper, we propose a lightweight NMS-free framework to achieve real-time detection and high accuracy simultaneously. First, we use a lightweight backbone for feature extraction and design a fault detection pyramid to process features. This fault detection pyramid includes three novel individual modules using attention mechanism, bottleneck, and dilated convolution for feature enhancement and computation reduction. Instead of using NMS, we calculate different loss functions, including classification and location costs in the detection head, to further reduce computation. Experimental results show that our framework achieves over 83 frames per second speed with a smaller model size and higher accuracy than the state-of-the-art detectors. Meanwhile, the hardware resource requirements of our method are low during the training and testing process.

preprint2022arXiv

Accelerated Federated Learning with Decoupled Adaptive Optimization

The federated learning (FL) framework enables edge clients to collaboratively learn a shared inference model while keeping privacy of training data on clients. Recently, many heuristics efforts have been made to generalize centralized adaptive optimization methods, such as SGDM, Adam, AdaGrad, etc., to federated settings for improving convergence and accuracy. However, there is still a paucity of theoretical principles on where to and how to design and utilize adaptive optimization methods in federated settings. This work aims to develop novel adaptive optimization methods for FL from the perspective of dynamics of ordinary differential equations (ODEs). First, an analytic framework is established to build a connection between federated optimization methods and decompositions of ODEs of corresponding centralized optimizers. Second, based on this analytic framework, a momentum decoupling adaptive optimization method, FedDA, is developed to fully utilize the global momentum on each local iteration and accelerate the training convergence. Last but not least, full batch gradients are utilized to mimic centralized optimization in the end of the training process to ensure the convergence and overcome the possible inconsistency caused by adaptive optimization methods.

preprint2022arXiv

An improved approximation algorithm for maximizing a DR-submodular function over a convex set

Maximizing a DR-submodular function subject to a general convex set is an NP-hard problem arising from many applications in combinatorial optimization and machine learning. While it is highly desirable to design efficient approximation algorithms under this general setting where neither the objective function is monotonic nor the feasible set is down-closed, our main contribution is to present a 0.25-approximation Frank-Wolfe type of algorithm with a sub-exponential time-complexity under the value oracle model.

preprint2022arXiv

APES: Articulated Part Extraction from Sprite Sheets

Rigged puppets are one of the most prevalent representations to create 2D character animations. Creating these puppets requires partitioning characters into independently moving parts. In this work, we present a method to automatically identify such articulated parts from a small set of character poses shown in a sprite sheet, which is an illustration of the character that artists often draw before puppet creation. Our method is trained to infer articulated parts, e.g. head, torso and limbs, that can be re-assembled to best reconstruct the given poses. Our results demonstrate significantly better performance than alternatives qualitatively and quantitatively.Our project page https://zhan-xu.github.io/parts/ includes our code and data.

preprint2022arXiv

Connected and Automated Vehicle Distributed Control for On-ramp Merging Scenario: A Virtual Rotation Approach

In this study, we propose a rotation-based connected automated vehicle (CAV) distributed cooperative control strategy for an on-ramp merging scenario. By assuming the mainline and ramp line are straight, we firstly design a virtual rotation approach that transfers the merging problem to a virtual car following (CF) problem to reduce the complexity and dimension of the cooperative CAVs merging control. Based on this concept, a multiple-predecessor virtual CF model and a unidirectional multi-leader communication topology are developed to determine the longitudinal behavior of each CAV. Specifically, we exploit a distributed feedback and feedforward longitudinal controller in preparation for actively generating gaps for merging CAVs, reducing the voids caused by merging, and ensuring safety and traffic efficiency during the process. To ensure the disturbance attenuation property of this system, practical string stability is mathematically proved for the virtual CF controllers to prohibit the traffic oscillation amplification through the traffic stream. Moreover, as a provision for extending the virtual CF application scenarios of any curvy ramp geometry, we utilize a curvilinear coordinate to model the two-dimensional merging control, and further design a local lateral controller based on an extended linear-quadratic regulator to regulate the position deviation and angular deviation of the lane centerlines. For the purpose of systematically evaluating the control performance of the proposed methods, numerical simulation experiments are conducted. As the results indicate, the proposed controllers can actively reduce the void and meanwhile guarantee the damping of traffic oscillations in the merging control area.

preprint2022arXiv

Diversity Matters: Fully Exploiting Depth Clues for Reliable Monocular 3D Object Detection

As an inherently ill-posed problem, depth estimation from single images is the most challenging part of monocular 3D object detection (M3OD). Many existing methods rely on preconceived assumptions to bridge the missing spatial information in monocular images, and predict a sole depth value for every object of interest. However, these assumptions do not always hold in practical applications. To tackle this problem, we propose a depth solving system that fully explores the visual clues from the subtasks in M3OD and generates multiple estimations for the depth of each target. Since the depth estimations rely on different assumptions in essence, they present diverse distributions. Even if some assumptions collapse, the estimations established on the remaining assumptions are still reliable. In addition, we develop a depth selection and combination strategy. This strategy is able to remove abnormal estimations caused by collapsed assumptions, and adaptively combine the remaining estimations into a single one. In this way, our depth solving system becomes more precise and robust. Exploiting the clues from multiple subtasks of M3OD and without introducing any extra information, our method surpasses the current best method by more than 20% relatively on the Moderate level of test split in the KITTI 3D object detection benchmark, while still maintaining real-time efficiency.

preprint2022arXiv

FedDUAP: Federated Learning with Dynamic Update and Adaptive Pruning Using Shared Data on the Server

Despite achieving remarkable performance, Federated Learning (FL) suffers from two critical challenges, i.e., limited computational resources and low training efficiency. In this paper, we propose a novel FL framework, i.e., FedDUAP, with two original contributions, to exploit the insensitive data on the server and the decentralized data in edge devices to further improve the training efficiency. First, a dynamic server update algorithm is designed to exploit the insensitive data on the server, in order to dynamically determine the optimal steps of the server update for improving the convergence and accuracy of the global model. Second, a layer-adaptive model pruning method is developed to perform unique pruning operations adapted to the different dimensions and importance of multiple layers, to achieve a good balance between efficiency and effectiveness. By integrating the two original techniques together, our proposed FL model, FedDUAP, significantly outperforms baseline approaches in terms of accuracy (up to 4.8% higher), efficiency (up to 2.8 times faster), and computational cost (up to 61.9% smaller).

preprint2022arXiv

From Distributed Machine Learning to Federated Learning: A Survey

In recent years, data and computing resources are typically distributed in the devices of end users, various regions or organizations. Because of laws or regulations, the distributed data and computing resources cannot be directly shared among different regions or organizations for machine learning tasks. Federated learning emerges as an efficient approach to exploit distributed data and computing resources, so as to collaboratively train machine learning models, while obeying the laws and regulations and ensuring data security and data privacy. In this paper, we provide a comprehensive survey of existing works for federated learning. We propose a functional architecture of federated learning systems and a taxonomy of related techniques. Furthermore, we present the distributed training, data communication, and security of FL systems. Finally, we analyze their limitations and propose future research directions.

preprint2022arXiv

Human-Centered Prior-Guided and Task-Dependent Multi-Task Representation Learning for Action Recognition Pre-Training

Recently, much progress has been made for self-supervised action recognition. Most existing approaches emphasize the contrastive relations among videos, including appearance and motion consistency. However, two main issues remain for existing pre-training methods: 1) the learned representation is neutral and not informative for a specific task; 2) multi-task learning-based pre-training sometimes leads to sub-optimal solutions due to inconsistent domains of different tasks. To address the above issues, we propose a novel action recognition pre-training framework, which exploits human-centered prior knowledge that generates more informative representation, and avoids the conflict between multiple tasks by using task-dependent representations. Specifically, we distill knowledge from a human parsing model to enrich the semantic capability of representation. In addition, we combine knowledge distillation with contrastive learning to constitute a task-dependent multi-task framework. We achieve state-of-the-art performance on two popular benchmarks for action recognition task, i.e., UCF101 and HMDB51, verifying the effectiveness of our method.

preprint2022arXiv

Input-agnostic Certified Group Fairness via Gaussian Parameter Smoothing

Only recently, researchers attempt to provide classification algorithms with provable group fairness guarantees. Most of these algorithms suffer from harassment caused by the requirement that the training and deployment data follow the same distribution. This paper proposes an input-agnostic certified group fairness algorithm, FairSmooth, for improving the fairness of classification models while maintaining the remarkable prediction accuracy. A Gaussian parameter smoothing method is developed to transform base classifiers into their smooth versions. An optimal individual smooth classifier is learnt for each group with only the data regarding the group and an overall smooth classifier for all groups is generated by averaging the parameters of all the individual smooth ones. By leveraging the theory of nonlinear functional analysis, the smooth classifiers are reformulated as output functions of a Nemytskii operator. Theoretical analysis is conducted to derive that the Nemytskii operator is smooth and induces a Frechet differentiable smooth manifold. We theoretically demonstrate that the smooth manifold has a global Lipschitz constant that is independent of the domain of the input data, which derives the input-agnostic certified group fairness.

preprint2022arXiv

Learning Visibility for Robust Dense Human Body Estimation

Estimating 3D human pose and shape from 2D images is a crucial yet challenging task. While prior methods with model-based representations can perform reasonably well on whole-body images, they often fail when parts of the body are occluded or outside the frame. Moreover, these results usually do not faithfully capture the human silhouettes due to their limited representation power of deformable models (e.g., representing only the naked body). An alternative approach is to estimate dense vertices of a predefined template body in the image space. Such representations are effective in localizing vertices within an image but cannot handle out-of-frame body parts. In this work, we learn dense human body estimation that is robust to partial observations. We explicitly model the visibility of human joints and vertices in the x, y, and z axes separately. The visibility in x and y axes help distinguishing out-of-frame cases, and the visibility in depth axis corresponds to occlusions (either self-occlusions or occlusions by other objects). We obtain pseudo ground-truths of visibility labels from dense UV correspondences and train a neural network to predict visibility along with 3D coordinates. We show that visibility can serve as 1) an additional signal to resolve depth ordering ambiguities of self-occluded vertices and 2) a regularization term when fitting a human body model to the predictions. Extensive experiments on multiple 3D human datasets demonstrate that visibility modeling significantly improves the accuracy of human body estimation, especially for partial-body cases. Our project page with code is at: https://github.com/chhankyao/visdb.

preprint2022arXiv

Lesion Localization in OCT by Semi-Supervised Object Detection

Over 300 million people worldwide are affected by various retinal diseases. By noninvasive Optical Coherence Tomography (OCT) scans, a number of abnormal structural changes in the retina, namely retinal lesions, can be identified. Automated lesion localization in OCT is thus important for detecting retinal diseases at their early stage. To conquer the lack of manual annotation for deep supervised learning, this paper presents a first study on utilizing semi-supervised object detection (SSOD) for lesion localization in OCT images. To that end, we develop a taxonomy to provide a unified and structured viewpoint of the current SSOD methods, and consequently identify key modules in these methods. To evaluate the influence of these modules in the new task, we build OCT-SS, a new dataset consisting of over 1k expert-labeled OCT B-scan images and over 13k unlabeled B-scans. Extensive experiments on OCT-SS identify Unbiased Teacher (UnT) as the best current SSOD method for lesion localization. Moreover, we improve over this strong baseline, with mAP increased from 49.34 to 50.86.

preprint2022arXiv

Maximizing Modular plus Non-monotone Submodular Functions

The research problem in this work is the relaxation of maximizing non-negative submodular plus modular with the entire real number domain as its value range over a family of down-closed sets. We seek a feasible point $\mathbf{x}^*$ in the polytope of the given constraint such that $\mathbf{x}^*\in\arg\max_{\mathbf{x}\in\mathcal{P}\subseteq[0,1]^n}F(\mathbf{x})+L(\mathbf{x})$, where $F$, $L$ denote the extensions of the underlying submodular function $f$ and modular function $\ell$. We provide an approximation algorithm named \textsc{Measured Continuous Greedy with Adaptive Weights}, which yields a guarantee $F(\mathbf{x})+L(\mathbf{x})\geq \left(1/e-\mathcal{O}(ε)\right)\cdot f(OPT)+\left(\frac{β-e}{e(β-1)}-\mathcal{O}(ε)\right)\cdot\ell(OPT)$ under the assumption that the ratio of non-negative part within $\ell(OPT)$ to the absolute value of its negative part is demonstrated by a parameter $β\in[0, \infty]$, where $OPT$ is the optimal integral solution for the discrete problem. It is obvious that the factor of $\ell(OPT)$ is $1$ when $β=0$, which means the negative part is completely dominant at this time; otherwise the factor is closed to $1/e$ whe $β\rightarrow\infty$. Our work first breaks the restriction on the specific value range of the modular function without assuming non-positivity or non-negativity as previous results and quantifies the relative variation of the approximation guarantee for optimal solutions with arbitrary structure. Moreover, we also give an analysis for the inapproximability of the problem we consider. We show a hardness result that there exists no polynomial algorithm whose output $S$ satisfies $f(S)+\ell(S)\geq0.478\cdot f(OPT)+\ell(OPT)$.

preprint2022arXiv

Multi-Robot Collaborative Perception with Graph Neural Networks

Multi-robot systems such as swarms of aerial robots are naturally suited to offer additional flexibility, resilience, and robustness in several tasks compared to a single robot by enabling cooperation among the agents. To enhance the autonomous robot decision-making process and situational awareness, multi-robot systems have to coordinate their perception capabilities to collect, share, and fuse environment information among the agents in an efficient and meaningful way such to accurately obtain context-appropriate information or gain resilience to sensor noise or failures. In this paper, we propose a general-purpose Graph Neural Network (GNN) with the main goal to increase, in multi-robot perception tasks, single robots' inference perception accuracy as well as resilience to sensor failures and disturbances. We show that the proposed framework can address multi-view visual perception problems such as monocular depth estimation and semantic segmentation. Several experiments both using photo-realistic and real data gathered from multiple aerial robots' viewpoints show the effectiveness of the proposed approach in challenging inference conditions including images corrupted by heavy noise and camera occlusions or failures.

preprint2022arXiv

Play It Cool: Dynamic Shifting Prevents Thermal Throttling

Machine learning (ML) has entered the mobile era where an enormous number of ML models are deployed on edge devices. However, running common ML models on edge devices continuously may generate excessive heat from the computation, forcing the device to "slow down" to prevent overheating, a phenomenon called thermal throttling. This paper studies the impact of thermal throttling on mobile phones: when it occurs, the CPU clock frequency is reduced, and the model inference latency may increase dramatically. This unpleasant inconsistent behavior has a substantial negative effect on user experience, but it has been overlooked for a long time. To counter thermal throttling, we propose to utilize dynamic networks with shared weights and dynamically shift between large and small ML models seamlessly according to their thermal profile, i.e., shifting to a small model when the system is about to throttle. With the proposed dynamic shifting, the application runs consistently without experiencing CPU clock frequency degradation and latency increase. In addition, we also study the resulting accuracy when dynamic shifting is deployed and show that our approach provides a reasonable trade-off between model latency and model accuracy.

preprint2022arXiv

Skeleton-free Pose Transfer for Stylized 3D Characters

We present the first method that automatically transfers poses between stylized 3D characters without skeletal rigging. In contrast to previous attempts to learn pose transformations on fixed or topology-equivalent skeleton templates, our method focuses on a novel scenario to handle skeleton-free characters with diverse shapes, topologies, and mesh connectivities. The key idea of our method is to represent the characters in a unified articulation model so that the pose can be transferred through the correspondent parts. To achieve this, we propose a novel pose transfer network that predicts the character skinning weights and deformation transformations jointly to articulate the target character to match the desired pose. Our method is trained in a semi-supervised manner absorbing all existing character data with paired/unpaired poses and stylized shapes. It generalizes well to unseen stylized characters and inanimate objects. We conduct extensive experiments and demonstrate the effectiveness of our method on this novel task.

preprint2022arXiv

Vision-based Relative Detection and Tracking for Teams of Micro Aerial Vehicles

In this paper, we address the vision-based detection and tracking problems of multiple aerial vehicles using a single camera and Inertial Measurement Unit (IMU) as well as the corresponding perception consensus problem (i.e., uniqueness and identical IDs across all observing agents). We design several vision-based decentralized Bayesian multi-tracking filtering strategies to resolve the association between the incoming unsorted measurements obtained by a visual detector algorithm and the tracked agents. We compare their accuracy in different operating conditions as well as their scalability according to the number of agents in the team. This analysis provides useful insights about the most appropriate design choice for the given task. We further show that the proposed perception and inference pipeline which includes a Deep Neural Network (DNN) as visual target detector is lightweight and capable of concurrently running control and planning with Size, Weight, and Power (SWaP) constrained robots on-board. Experimental results show the effective tracking of multiple drones in various challenging scenarios such as heavy occlusions.

preprint2021arXiv

CogNet: Bridging Linguistic Knowledge, World Knowledge and Commonsense Knowledge

In this paper, we present CogNet, a knowledge base (KB) dedicated to integrating three types of knowledge: (1) linguistic knowledge from FrameNet, which schematically describes situations, objects and events. (2) world knowledge from YAGO, Freebase, DBpedia and Wikidata, which provides explicit knowledge about specific instances. (3) commonsense knowledge from ConceptNet, which describes implicit general facts. To model these different types of knowledge consistently, we introduce a three-level unified frame-styled representation architecture. To integrate free-form commonsense knowledge with other structured knowledge, we propose a strategy that combines automated labeling and crowdsourced annotation. At present, CogNet integrates 1,000+ semantic frames from linguistic KBs, 20,000,000+ frame instances from world KBs, as well as 90,000+ commonsense assertions from commonsense KBs. All these data can be easily queried and explored on our online platform, and free to download in RDF format for utilization under a CC-BY-SA 4.0 license. The demo and data are available at http://cognet.top/.

preprint2021arXiv

Defect Extremal Surface for Reflected Entropy

Defect extremal surface is defined by extremizing the Ryu-Takayanagi formula corrected by the quantum defect theory. This is interesting when the AdS bulk contains a defect brane (or string). We introduce a defect extremal surface formula for reflected entropy, which is a mixed state generalization of entanglement entropy measure. Based on a decomposition procedure of an AdS bulk with a brane, we demonstrate the equivalence between defect extremal surface formula and island formula for reflected entropy in AdS$_3$/BCFT$_2$. We also compute the evolution of reflected entropy in evaporating black hole model and find that defect extremal surface formula agrees with island formula.

preprint2021arXiv

MakeItTalk: Speaker-Aware Talking-Head Animation

We present a method that generates expressive talking heads from a single facial image with audio as the only input. In contrast to previous approaches that attempt to learn direct mappings from audio to raw pixels or points for creating talking faces, our method first disentangles the content and speaker information in the input audio signal. The audio content robustly controls the motion of lips and nearby facial regions, while the speaker information determines the specifics of facial expressions and the rest of the talking head dynamics. Another key component of our method is the prediction of facial landmarks reflecting speaker-aware dynamics. Based on this intermediate representation, our method is able to synthesize photorealistic videos of entire talking heads with full range of motion and also animate artistic paintings, sketches, 2D cartoon characters, Japanese mangas, stylized caricatures in a single unified framework. We present extensive quantitative and qualitative evaluation of our method, in addition to user studies, demonstrating generated talking heads of significantly higher quality compared to prior state-of-the-art.

preprint2021arXiv

Optimal Dynamic Futures Portfolios Under a Multiscale Central Tendency Ornstein-Uhlenbeck Model

We study the problem of dynamically trading multiple futures whose underlying asset price follows a multiscale central tendency Ornstein-Uhlenbeck (MCTOU) model. Under this model, we derive the closed-form no-arbitrage prices for the futures contracts. Applying a utility maximization approach, we solve for the optimal trading strategies under different portfolio configurations by examining the associated system of Hamilton-Jacobi-Bellman (HJB) equations. The optimal strategies depend on not only the parameters of the underlying asset price process but also the risk premia embedded in the futures prices. Numerical examples are provided to illustrate the investor's optimal positions and optimal wealth over time.

preprint2021arXiv

Partial Reduction and Cosmology at Defect Brane

Partial reduction is a Randall-Sundrum reduction for only part of the AdS region between finite tension brane and zero tension brane. This is interesting in AdS/BCFT where the AdS bulk contains a defect brane. We employ partial reduction for a AdS bulk with a brane evolving as a $2d$ Friedmann-Robertson-Walker (FRW) cosmology and demonstrate the equivalence between defect extremal surface and island formula for a large subregion fine grained entropy in boundary CFT. We then move to higher dimensions and demonstrate the existence of $4d$ massless graviton on AdS$_4$ brane in partial reduction. We also propose a partial reduction for a $4d$ FRW cosmology at defect brane and obtain the Newton constant by computing boundary entropy.

preprint2020arXiv

A Probabilistic Model with Commonsense Constraints for Pattern-based Temporal Fact Extraction

Textual patterns (e.g., Country's president Person) are specified and/or generated for extracting factual information from unstructured data. Pattern-based information extraction methods have been recognized for their efficiency and transferability. However, not every pattern is reliable: A major challenge is to derive the most complete and accurate facts from diverse and sometimes conflicting extractions. In this work, we propose a probabilistic graphical model which formulates fact extraction in a generative process. It automatically infers true facts and pattern reliability without any supervision. It has two novel designs specially for temporal facts: (1) it models pattern reliability on two types of time signals, including temporal tag in text and text generation time; (2) it models commonsense constraints as observable variables. Experimental results demonstrate that our model significantly outperforms existing methods on extracting true temporal facts from news data.

preprint2020arXiv

A Video Analysis Method on Wanfang Dataset via Deep Neural Network

The topic of object detection has been largely improved recently, especially with the development of convolutional neural network. However, there still exist a lot of challenging cases, such as small object, compact and dense or highly overlapping object. Existing methods can detect multiple objects wonderfully, but because of the slight changes between frames, the detection effect of the model will become unstable, the detection results may result in dropping or increasing the object. In the pedestrian flow detection task, such phenomenon can not accurately calculate the flow. To solve this problem, in this paper, we describe the new function for real-time multi-object detection in sports competition and pedestrians flow detection in public based on deep learning. Our work is to extract a video clip and solve this frame of clips efficiently. More specfically, our algorithm includes two stages: judge method and optimization method. The judge can set a maximum threshold for better results under the model, the threshold value corresponds to the upper limit of the algorithm with better detection results. The optimization method to solve detection jitter problem. Because of the occurrence of frame hopping in the video, and it will result in the generation of video fragments discontinuity. We use optimization algorithm to get the key value, and then the detection result value of index is replaced by key value to stabilize the change of detection result sequence. Based on the proposed algorithm, we adopt wanfang sports competition dataset as the main test dataset and our own test dataset for YOLOv3-Abnormal Number Version(YOLOv3-ANV), which is 5.4% average improvement compared with existing methods. Also, video above the threshold value can be obtained for further analysis. Spontaneously, our work also can used for pedestrians flow detection and pedestrian alarm tasks.

preprint2020arXiv

Cutoff $\rm AdS_3$ versus $\rm T\bar{T}$ $\rm CFT_2$ in the large central charge sector: correlators of energy-momentum tensor

In this article we probe the proposed holographic duality between $T\bar{T}$ deformed two dimensional conformal field theory and the gravity theory of $\rm AdS_3$ with a Dirichlet cutoff by computing correlators of energy-momentum tensor. We focus on the large central charge sector of the $T\bar{T}$ CFT in a Euclidean plane and in a sphere, and compute the correlators of energy-momentum tensor using an operator identity promoted from the classical trace relation. The result agrees with a computation of classical pure gravity in $\rm AdS_3$ with the corresponding cutoff surface, given a holographic dictionary which identifies gravity parameters with $T\bar{T}$ CFT parameters.

preprint2020arXiv

Defect extremal surface as the holographic counterpart of Island formula

We propose defect extremal surface as the holographic counterpart of boundary quantum extremal surface. The defect extremal surface is defined by minimizing the Ryu-Takayanagi surface corrected by the defect theory. This is particularly interesting when the RT surface crosses or terminates on the defect. In a simple set up of AdS/BCFT, we find that the defect extremal surface formula gives precisely the same results of the boundary quantum extremal surface. We provide a decomposition procedure of an AdS bulk with a defect brane to see clearly how Island formula emerges from a brane world system with gravity glued to a flat space quantum field theory.

preprint2020arXiv

End-To-End Trainable Video Super-Resolution Based on a New Mechanism for Implicit Motion Estimation and Compensation

Video super-resolution aims at generating a high-resolution video from its low-resolution counterpart. With the rapid rise of deep learning, many recently proposed video super-resolution methods use convolutional neural networks in conjunction with explicit motion compensation to capitalize on statistical dependencies within and across low-resolution frames. Two common issues of such methods are noteworthy. Firstly, the quality of the final reconstructed HR video is often very sensitive to the accuracy of motion estimation. Secondly, the warp grid needed for motion compensation, which is specified by the two flow maps delineating pixel displacements in horizontal and vertical directions, tends to introduce additional errors and jeopardize the temporal consistency across video frames. To address these issues, we propose a novel dynamic local filter network to perform implicit motion estimation and compensation by employing, via locally connected layers, sample-specific and position-specific dynamic local filters that are tailored to the target pixels. We also propose a global refinement network based on ResBlock and autoencoder structures to exploit non-local correlations and enhance the spatial consistency of super-resolved frames. The experimental results demonstrate that the proposed method outperforms the state-of-the-art, and validate its strength in terms of local transformation handling, temporal consistency as well as edge sharpness.

preprint2020arXiv

High-sensitivity bio-sensor based on the real-splitting indirectly coupled Anti-Parity time symmetric WGMs

Detecting the size of single nanoparticle with high precision is crucial to understanding the characteristic of the nanoparticle. In this paper, we research the single particle detection based on the Anti-parity time symmetric (APT) indirectly coupled WGMs. The results show that the Anti-parity time symmetric WGM nanoparticle sensor exhibits giant enhancement in frequency splitting compared with single WGM sensor, when the system operating at exceptional point (EP). With respect to the parity-time symmetric nanoparticle sensor, our research exhibits a real eigenfrequency splitting, which can be directly detected.

preprint2020arXiv

Multi-boundary entanglement in Chern-Simons theory with finite gauge groups

We study the multi-boundary entanglement structure of the states prepared in (1+1) and (2+1) dimensional Chern-Simons theory with finite discrete gauge group $G$. The states in (1+1)-$d$ are associated with Riemann surfaces of genus $g$ with multiple $S^1$ boundaries and we use replica trick to compute the entanglement entropy for such states. In (2+1)-$d$, we focus on the states associated with torus link complements which live in the tensor product of Hilbert spaces associated with multiple $T^2$. We present a quantitative analysis of the entanglement structure for both abelian and non-abelian groups. For all the states considered in this work, we find that the entanglement entropy for direct product of groups is the sum of entropy for individual groups, i.e. $\text{EE}(G_1 \times G_2) = \text{EE}(G_1)+\text{EE}(G_2)$. Moreover, the reduced density matrix obtained by tracing out a subset of the total Hilbert space has a positive semidefinite partial transpose on any bi-partition of the remaining Hilbert space.

preprint2020arXiv

Quasimap wall-crossing for GIT quotients

In this paper, we prove a wall-crossing formula for $ε$-stable quasimaps to GIT quotients conjectured by Ciocan-Fontanine and Kim, for all targets in all genera, including the orbifold case. We prove that stability conditions in adjacent chambers give equivalent invariants, provided that both chambers are stable. In the case of genus-zero quasimaps with one marked point, we compute the invariants in the left-most stable chamber in terms of the small $I$-function. Using this we prove that the quasimap $J$-functions are on the Lagrangian cone of the Gromov--Witten theory. The proofs are based on virtual localization on a master space, obtained via some universal construction on the moduli of weighted curves. The fixed-point loci are in one-to-one correspondence with the terms in the wall-crossing formula.

preprint2020arXiv

Reflected Entropy for an Evaporating Black Hole

We study reflected entropy as a correlation measure in black hole evaporation. As a measure for bipartite mixed states, reflected entropy can be computed between black hole and radiation, radiation and radiation. We compute reflected entropy curves in three different models: 3-side wormhole model, End-of-the-World (EOW) brane model in three dimensions and two-dimensional eternal black hole plus CFT model. For 3-side wormhole model, we find that reflected entropy is dual to island cross sections. The reflected entropy between radiation and black hole increases at early time and then decreases to zero, similar to Page curve, but with a later transition time. The reflected entropy between radiation and radiation first increases and then saturates. For the EOW brane model, similar behaviors of reflected entropy are found. We propose a quantum extremal surface for reflected entropy, which we call quantum extremal cross section. In the eternal black hole plus CFT model, we find a generalized formula for reflected entropy with island cross section as its area term by considering the right half as the canonical purification of the left. Interestingly, the reflected entropy curve between the left black hole and the left radiation is nothing but the Page curve. We also find that reflected entropy between the left black hole and the right black hole decreases and goes to zero at late time. The reflected entropy between radiation and radiation increases at early time and saturates at late time.

preprint2020arXiv

RigNet: Neural Rigging for Articulated Characters

We present RigNet, an end-to-end automated method for producing animation rigs from input character models. Given an input 3D model representing an articulated character, RigNet predicts a skeleton that matches the animator expectations in joint placement and topology. It also estimates surface skin weights based on the predicted skeleton. Our method is based on a deep architecture that directly operates on the mesh representation without making assumptions on shape class and structure. The architecture is trained on a large and diverse collection of rigged models, including their mesh, skeletons and corresponding skin weights. Our evaluation is three-fold: we show better results than prior art when quantitatively compared to animator rigs; qualitatively we show that our rigs can be expressively posed and animated at multiple levels of detail; and finally, we evaluate the impact of various algorithm choices on our output rigs.

preprint2020arXiv

The Cooperative Sorting Strategy for Connected and Automated Vehicle Platoons

This paper presents a "cooperative vehicle sorting" strategy that seeks to optimally sort connected and automated vehicles (CAVs) in a multi-lane platoon to reach an ideally organized platoon. In the proposed method, a CAV platoon is firstly discretized into a grid system, where a CAV moves from one cell to another in the discrete time-space domain. Then, the cooperative sorting problem is modeled as a path-finding problem in the graphic domain. The problem is solved by the deterministic Astar algorithm with a stepwise strategy, where only one vehicle can move within a movement step. The resultant shortest path is further optimized with an integer linear programming algorithm to minimize the sorting time by allowing multiple movements within a step. To improve the algorithm running time and address multiple shortest paths, a distributed stochastic Astar algorithm (DSA) is developed by introducing random disturbances to the edge costs to break uniform paths (with equal path cost). Numerical experiments are conducted to demonstrate the effectiveness of the proposed DSA method. The results report shorter sorting time and significantly improved algorithm running time due to the use of DSA. In addition, we find that the optimization performance can be further improved by increasing the number of processes in the distributed computing system.

preprint2020arXiv

Traffic Performance Score for Measuring the Impact of COVID-19 on Urban Mobility

Measuring traffic performance is critical for public agencies who manage traffic and individuals who plan trips, especially when special events happen. The COVID-19 pandemic has significantly influenced almost every aspect of daily life, including urban traffic patterns. Thus, it is important to measure the impact of COVID-19 on transportation to further guide agencies and residents to properly respond to changes in traffic patterns. However, most existing traffic performance metrics incorporate only a single traffic parameter and measure only the performance of individual corridors. To overcome these challenges, in this study, a Traffic Performance Score (TPS) is proposed that incorporates multiple parameters for measuring network-wide traffic performance. An interactive web-based TPS platform that provides real-time and historical spatial-temporal traffic performance analysis is developed by the STAR Lab at the University of Washington. Based on data from this platform, this study analyzes the impact of COVID-19 on different road segments and the traffic network as a whole. Considering this pandemic has greatly reshaped social and economic operations, this study also evaluates how COVID-19 is changing the urban mobility from both travel demand and driving behavior perspectives.

preprint2019arXiv

Generalizations of Reflected Entropy and the Holographic Dual

We introduce a new class of quantum and classical correlation measures by generalizing the reflected entropy to multipartite states. We define the new measures for quantum systems in one spatial dimension. For quantum systems having gravity duals, we show that the holographic duals of these new measures are various types of minimal surfaces consist of different entanglement wedge cross sections. One special generalized reflected entropy is $Δ_R$, with the holographic dual proportional to the so called multipartite entanglement wedge cross section $Δ_W$ defined before. We then perform a large $c$ computation of $Δ_R$ and find precise agreement with the holographic computation of 2$Δ_{W}$. This agreement shows another candidate $Δ_R$ as the dual of $Δ_W$ and also supports our holographic conjecture of the new class of generalized reflected entropies.

preprint2018arXiv

Non-Stationary Texture Synthesis by Adversarial Expansion

The real world exhibits an abundance of non-stationary textures. Examples include textures with large-scale structures, as well as spatially variant and inhomogeneous textures. While existing example-based texture synthesis methods can cope well with stationary textures, non-stationary textures still pose a considerable challenge, which remains unresolved. In this paper, we propose a new approach for example-based non-stationary texture synthesis. Our approach uses a generative adversarial network (GAN), trained to double the spatial extent of texture blocks extracted from a specific texture exemplar. Once trained, the fully convolutional generator is able to expand the size of the entire exemplar, as well as of any of its sub-blocks. We demonstrate that this conceptually simple approach is highly effective for capturing large-scale structures, as well as other non-stationary attributes of the input exemplar. As a result, it can cope with challenging textures, which, to our knowledge, no other existing method can handle.

preprint2016arXiv

A Large-scale Distributed Video Parsing and Evaluation Platform

Visual surveillance systems have become one of the largest data sources of Big Visual Data in real world. However, existing systems for video analysis still lack the ability to handle the problems of scalability, expansibility and error-prone, though great advances have been achieved in a number of visual recognition tasks and surveillance applications, e.g., pedestrian/vehicle detection, people/vehicle counting. Moreover, few algorithms explore the specific values/characteristics in large-scale surveillance videos. To address these problems in large-scale video analysis, we develop a scalable video parsing and evaluation platform through combining some advanced techniques for Big Data processing, including Spark Streaming, Kafka and Hadoop Distributed Filesystem (HDFS). Also, a Web User Interface is designed in the system, to collect users' degrees of satisfaction on the recognition tasks so as to evaluate the performance of the whole system. Furthermore, the highly extensible platform running on the long-term surveillance videos makes it possible to develop more intelligent incremental algorithms to enhance the performance of various visual recognition tasks.

preprint2016arXiv

A Loop-philic Pseudoscalar

We construct a weakly-coupled renormalizable model to explain the $750\mbox{GeV}$ diphoton excess. The $750\mbox{GeV}$ resonance (denoted as $X(750)$) is interpreted as a pseudoscalar coming from a complex singlet. The model also naturally provides a dark matter candidate. One most attractive feature of the model is that decays of $X(750)$ are all loop-induced so the diphoton rate is not diluted by unwanted tree level branching fractions. Relevant Yukawa interactions need not to be tuned to near non-perturbative region to explain the rate. The model is highly predictive, including the pseudoscalar nature of $X(750)$, and two nearly mass-degenerate exotic quarks carrying electric charge $5/3$ and $2/3$, respectively. Rich phenomenology is expected with respect to collider searches, flavor physics and dark matter detection, if $X(750)$ can be pinned down by future LHC experiments.

preprint2016arXiv

Oxygen vacancy induced room temperature metal-insulator transition in nickelates films and its potential application in photovoltaics

Oxygen vacancy is intrinsically coupled with magnetic, electronic and transport properties of transition-metal oxide materials and directly determines their multifunctionality. Here, we demonstrate reversible control of oxygen content by post-annealing at temperature lower than 300 degree centigrade and realize the reversible metal-insulator transition in epitaxial NdNiO3 films. Importantly, over six orders of magnitude in the resistance modulation and a large change in optical band gap are demonstrated at room temperature without destroying the parent framework and changing the p-type conductive mechanism. Further study revealed that oxygen vacancies stabilized the insulating phase at room temperature is universal for perovskite nickelates films. Acting as electron donors, oxygen vacancies not only stabilize the insulating phase at room temperature, but also induce a large magnetization of ~50 emu/cm3 due to the formation of strongly correlated Ni2+ t2g6eg2 states. The band gap opening is an order of magnitude larger than that of the thermally driven metal-insulator transition and continuously tunable. Potential application of the newly found insulating phase in photovoltaics has been demonstrated in the nickelates-based heterojunctions. Our discovery opens up new possibilities for strongly correlated perovskite nickelates.

preprint2015arXiv

Oxygen Vacancy Induced Flat Phonon Mode at FeSe /SrTiO3 interface

A high-frequency optical phonon mode of SrTiO3 (STO) was found to assist the high-temperature superconductivity observed recently at the interface between monolayer FeSe and STO substrate. However, the origin of this mode is not clear. Through first-principles calculations, we find that there is a novel polar phonon mode on the surface layers of the STO substrate, which does not exist in the STO crystals. The oxygen vacancies near the FeSe/STO interface drives the dispersion of this phonon mode to be flat and lowers its energy, whereas the charge transfer between STO substrate and FeSe monolayer further reduces its energy to 81 meV. This energy is in good agreement with the experimental value fitted by Lee et al. for the phonon mode responsible for the observed replica band separations and the increased superconducting gap. The oxygen-vacancy-induced flat and polar phonon mode provides clues for understanding the origin of high Tc superconductivity at the FeSe/STO interface.

preprint2015arXiv

Renyi Entropy of Free (2,0) Tensor Multiplet and its Supersymmetric Counterpart

We compute the Renyi entropy and the supersymmetric Renyi entropy for the six-dimensional free (2,0) tensor multiplet. We make various checks on our results, and they are consistent with the previous results about the (2,0) tensor multiplet. As a by-product, we have established a canonical way to compute the Renyi entropy for p-form fields in d-dimensions.

preprint2015arXiv

Spherical $t_ε$-Designs for Approximations on the Sphere

A spherical $t$-design is a set of points on the sphere that are nodes of a positive equal weight quadrature rule having algebraic accuracy $t$ for all spherical polynomials with degrees $\le t$. Spherical $t$-designs have many distinguished properties in approximations on the sphere and receive remarkable attention. Although the existence of a spherical $t$-design is known for any $t\ge 0$, a spherical design is only known in a set of interval enclosures on the sphere \cite{chen2011computational} for $t\le 100$. It is unknown how to choose a set of points from the set of interval enclosures to obtain a spherical $t$-design. In this paper we investigate a new concept of point sets on the sphere named spherical $t_ε$-design ($0<ε<1$), which are nodes of a positive weight quadrature rule with algebraic accuracy $t$. The sum of the weights is equal to the area of the sphere and the mean value of the weights is equal to the weight of the quadrature rule defined by the spherical $t$-design. A spherical $t_ε$-design is a spherical $t$-design when $ε=0,$ and a spherical $t$-design is a spherical $t_ε$-design for any $0<ε<1$. We show that any point set chosen from the set of interval enclosures \cite{chen2011computational} is a spherical $t_ε$-design. We then study the worst-case errors of quadrature rules using spherical $t_ε$-designs in a Sobolev space, and investigate a model of polynomial approximation with the $l_1$-regularization using spherical $t_ε$-designs. Numerical results illustrate good performance of spherical $t_ε$-designs for numerical integration and function approximation on the sphere.

preprint2015arXiv

Universal Features of Four-Dimensional Superconformal Field Theory on Conic Space

Following the set up in arXiv:1408.3393, we study 4d N=1 superconformal field theories in conic spaces. We show that the universal part of supersymmetric Rényi entropy S_q across a spherical entangling surface in the limit q goes to 0 is proportional to a linear combination of central charges, 3c-2a. This is equivalent to a similar statement about the free energy of SCFTs on conic space or hyperbolic space S^1_q*H^3 in the corresponding limit. We first derive the asymptotic formula by the free field computation in the presence of a U(1) R-symmetry background and then provide an independent derivation by studying N=1 theories on a primary Hopf surface S^1_β*S^3_b with a particular scaling β~1/\sqrt{q} and b=\sqrt{q}, which thus confirms the validity of the formula for general interacting N=1 SCFTs. Finally we revisit the supersymmetric Rényi entropy of general N=2 SCFTs and find a simple formula for it in terms of central charges a and c.

preprint2014arXiv

Ghost-in-the-Wireless: Energy Depletion Attack on ZigBee

ZigBee has been recently drawing a lot of attention as a promising solution for ubiquitous computing. The ZigBee devices are normally resource-limited, making the network susceptible to a variety of security threats. This paper presents a severe attack on ZigBee networks termed as ghost, which leverages the underlying vulnerabilities of the IEEE 802.15.4 security suites to deplete the energy of the devices. We manifest that the impact of ghost is severe as it can reduce the lifetime of devices from years to days and facilitate a variety of threats including denial of service and replay attacks. We highlight that merely deploying a standard suite of advanced security techniques does not necessarily guarantee improved security, but instead might be leveraged by adversaries to cause severe disruption in the network. We propose several recommendations on how to localize and withstand the ghost and other related attacks in ZigBee networks. Extensive simulations are provided to show the impact of the ghost and the performance of the proposed recommendations. Moreover, physical experiments also have been conducted and the observations confirm the severity of the impact by the ghost attack. We believe that the presented work will aid the researchers to improve the security of ZigBee further.

preprint2014arXiv

N = 4 Super-Yang-Mills on Conic Space as Hologram of STU Topological Black Hole

We construct four-dimensional N=4 super-Yang-Mills theories on a conic sphere with various background R-symmetry gauge fields. We study free energy and supersymmetric Renyi entropy using heat kernel method as well as localization technique. We find that the universal contribution to the partition function in the free field limit is the same as that in the strong coupling limit, which implies that it may be protected by supersymmetry. Based on the fact that, the conic sphere can be conformally mapped to $S^1\times H^3$ and the R-symmetry background fields can be supported by the R-charges of black hole, we propose that the holographic dual of these theories are five-dimensional, supersymmetric STU topological black holes. We demonstrate perfect agreement between N=4 super-Yang-Mills theories in the planar limit and the STU topological black holes.

preprint2014arXiv

Three-Dimensional Superconformal Field Theory on Conic Space as Hologram of Charged Topological Black Hole

We construct three-dimensional N=2 supersymmetric conformal field theories on conic spaces. Built upon the fact that the partition function depends solely on the Reeb vector of the Killing vector, we propose that holographic dual of these theories are four-dimensional, supersymmetric charged topological black holes. With the supersymmetry localization technique, we study conserved supercharges, free energy, and Renyi entropy. At planar large N limit, we demonstrate perfect agreement between the superconformal field theories and the supersymmetric charged topological black holes.

preprint2013arXiv

Mechanism of Polarization Fatigue in BiFeO3: the Role of Schottky Barrier

By using piezoelectric force microscopy and scanning Kelvin probe microscopy, we have investigated the domain evolution and space charge distribution in planar BiFeO3 capacitors with different electrodes. It is observed that charge injection at the film/electrode interface leads to domain pinning and polarization fatigue in BiFeO3. Furthermore, the Schottky barrier at the interface is crucial for the charge injection process. Lowering the Schottky barrier by using low work function metals as the electrodes can also improve the fatigue property of the device, similar to what oxide electrodes can achieve.

preprint2013arXiv

Self-energy of Strongly Interacting Fermions in Medium: a Holographic Approach

We consider the self-energy of strongly interacting fermions in the medium using gauge/gravity duality of $D4/D8$ system. We study the mass generation of the thermal and/or dense medium and the collective excitation called plasmino, by considering the spectral function of fermion and its dispersion relation. Our results are very different from those of the hard thermal loop method: for zero density, there is no thermal mass or plasmino in any phase. Plasmino in deconfined phase is not allowed in $D4/D8$ set up. In the confined phase, there is plasmino modes only for a window of density.

preprint2013arXiv

Thermal Mass and Plasmino for Strongly Interacting Fermions

We investigate fermion self energy problem in the strongly coupled dense medium in holographic approach. By working out bottom up models as well as top down ones we showed that vanishing thermal mass and non-existence of temperature generated plasmino mode is the universal feature of the strongly interacting fermion system. We identified that the dual of the bulk Rashiba effect, which was recently found by the Herzog et.al, is the presence of the plasmino mode generated by the density.

preprint2012arXiv

Holographic RG Flow and Sound Modes of sQGP

We consider the hydrodynamics of strongly interacting quark gluon plasma in finite temperature and density using the holographic duality of charged black hole in anti DeSitter space. We calculate the transport coefficients at arbitrary energy scale by considering the holographic screen at finite radial position. We first calculate the flow of sound velocity in this method and check the consistence with previous result. Then we calculate diffusion constant of charge and find that Einstein relation between susceptibility, conductivity and diffusion constant will hold at arbitrary slice.

preprint2012arXiv

Multiple Kernel Learning from Noisy Labels by Stochastic Programming

We study the problem of multiple kernel learning from noisy labels. This is in contrast to most of the previous studies on multiple kernel learning that mainly focus on developing efficient algorithms and assume perfectly labeled training examples. Directly applying the existing multiple kernel learning algorithms to noisily labeled examples often leads to suboptimal performance due to the incorrect class assignments. We address this challenge by casting multiple kernel learning from noisy labels into a stochastic programming problem, and presenting a minimax formulation. We develop an efficient algorithm for solving the related convex-concave optimization problem with a fast convergence rate of $O(1/T)$ where $T$ is the number of iterations. Empirical studies on UCI data sets verify both the effectiveness of the proposed framework and the efficiency of the proposed optimization algorithm.

preprint2012arXiv

The Impact of Visual Appearance on User Response in Online Display Advertising

Display advertising has been a significant source of revenue for publishers and ad networks in online advertising ecosystem. One of the main goals in display advertising is to maximize user response rate for advertising campaigns, such as click through rates (CTR) or conversion rates. Although in the online advertising industry we believe that the visual appearance of ads (creatives) matters for propensity of user response, there is no published work so far to address this topic via a systematic data-driven approach. In this paper we quantitatively study the relationship between the visual appearance and performance of creatives using large scale data in the world's largest display ads exchange system, RightMedia. We designed a set of 43 visual features, some of which are novel and some are inspired by related work. We extracted these features from real creatives served on RightMedia. We also designed and conducted a series of experiments to evaluate the effectiveness of visual features for CTR prediction, ranking and performance classification. Based on the evaluation results, we selected a subset of features that have the most important impact on CTR. We believe that the findings presented in this paper will be very useful for the online advertising industry in designing high-performance creatives. It also provides the research community with the first ever data set, initial insights into visual appearance's effect on user response propensity, and evaluation benchmarks for further study.

preprint2011arXiv

Holographic Superconductor for a Lifshitz fixed point

We consider the gravity dual of strongly coupled system at a Lifshitz-fixed point and finite temperature, which was constructed in a recent work arXiv:0909.0263. We construct an Abelian Higgs model in that background and calculate condensation and conductivity using holographic techniques. We find that condensation happens and DC conductivity blows up when temperature turns below a critical value.

preprint2011arXiv

Holographic Wilsonian RG Flow and Sliding Membrane Paradigm

We study the relations between two different approaches to the holographic Renormalization Group (RG) flow at the dual gravity level: One is the radial evolution of the classical equation of motion and the other is the flow equation given by the holographic Wilsonian RG coming from the cut off independence. Apparently, the two flows look different. We give general proofs that the two flows are actually equivalent. The role of the momentum continuity (MC) is essential. We show that MC together with cutoff independence gives the evolution equation of the boundary values. Equivalence of conductivity flows in two paradigm has been shown as an explicit example. We also get the connecting formula of Green functions and AC conductivity at arbitrary slice in terms of its value at horizon for various geometry backgrounds.

preprint2011arXiv

Mixed RG Flows and Hydrodynamics at Finite Holographic Screen

We consider quark-gluon plasma with chemical potential and study renormalization group flows of transport coefficients in the framework of gauge/gravity duality. We first study them using the flow equations and compare the results with hydrodynamic results by calculating the Green functions on the arbitrary slice. Two results match exactly. Transport coefficients at arbitrary scale is ontained by calculating hydrodynamics Green functions. When either momentum or charge vanishes, transport coefficients decouple from each other.

preprint2011arXiv

Structure Learning of Probabilistic Graphical Models: A Comprehensive Survey

Probabilistic graphical models combine the graph theory and probability theory to give a multivariate statistical modeling. They provide a unified description of uncertainty using probability and complexity using the graphical model. Especially, graphical models provide the following several useful properties: - Graphical models provide a simple and intuitive interpretation of the structures of probabilistic models. On the other hand, they can be used to design and motivate new models. - Graphical models provide additional insights into the properties of the model, including the conditional independence properties. - Complex computations which are required to perform inference and learning in sophisticated models can be expressed in terms of graphical manipulations, in which the underlying mathematical expressions are carried along implicitly. The graphical models have been applied to a large number of fields, including bioinformatics, social science, control theory, image processing, marketing analysis, among others. However, structure learning for graphical models remains an open challenge, since one must cope with a combinatorial search over the space of all possible structures. In this paper, we present a comprehensive survey of the existing structure learning algorithms.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Source provenance

Where this author record came from

arxivconfidence 95%

external id: arxiv:2412.15206:author:11:yang-zhou

Imported May 21, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.09188:author:1:yang-zhou

Imported May 20, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.17316:author:6:yang-zhou

Imported May 20, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.14524:author:1:yang-zhou

Imported May 20, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2604.27383:author:1:yang-zhou