Source author record

Yang Zhang

Yang Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

hep-ph hep-th Computer Vision Machine Learning cond-mat.mtrl-sci hep-ex Artificial Intelligence Computation and Language Cryptography and Security gr-qc astro-ph.CO cond-mat.str-el quant-ph cond-mat.mes-hall eess.IV Social and Information Networks eess.AS physics.flu-dyn Sound Information Retrieval astro-ph math.AP cond-mat.stat-mech math-ph math.MP math.RA physics.comp-ph cond-mat.soft cond-mat.supr-con Distributed, Parallel, and Cluster Computing eess.SP Networking and Internet Architecture physics.app-ph physics.optics astro-ph.HE cond-mat.other Digital Libraries math.AG physics.ins-det physics.soc-ph Robotics astro-ph.GA astro-ph.IM Computer Science and Game Theory cond-mat.dis-nn eess.SY Hardware Architecture Human-Computer Interaction Information Theory math.IT math.NT math.OC math.QA math.RT Molecular Networks nlin.SI physics.acc-ph physics.ao-ph physics.chem-ph physics.geo-ph physics.med-ph Systems and Control

Catalog footprint

What is connected

300works

62topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A Real-time Scale-robust Network for Glottis Segmentation in Nasal Transnasal Intubation

Nasotracheal intubation (NTI) is a critical clinical procedure for establishing and maintaining patient airway patency. Machine-assisted NTI has emerged as a pivotal approach for optimizing procedural efficiency and minimizing manual intervention. However, visual detection algorithms employed for NTI navigation encounter significant challenges, including complex anatomical environments and suboptimal illumination conditions surrounding the glottis. Additionally, the glottis presents considerable scale variability throughout the procedure, initially appearing as a small, difficult-to-capture structure before expanding to occupy nearly the entire field of view. Moreover, traditional visual detection methods often have high computational costs, making real-time, high-precision detection on portable devices challenging. To enhance NTI efficacy and address these challenges, this paper proposes a novel glottis segmentation framework optimized for vision-assisted NTI applications. First, we designed a lightweight, multi-receptive field feature extraction module to reduce intra-class differences, achieving robustness to scale variations of the glottis. This module was then stacked to form the backbone and neck of our network. Subsequently, we developed an advanced label assignment method and redefined the number of samples to further reduce intra-class differences and enhance accuracy in the complex NTI environment. Experiments on three distinct datasets demonstrate that our network surpasses state-of-the-art algorithms, achieving a segmentation mDice of 92.9\% with a compact model size of 19 MB and an inference speed exceeding 170 frames per second. % Our code and datasets will be open-sourced on GitHub after the manuscript is accepted. Our code and datasets are available at https://github.com/HBUT-CV/GlottisNet.

preprint2026arXiv

Automorphisms of odd dimensional $(2,2)$-complete intersections in characteristic $2$

We compute the automorphism scheme of a generic odd dimensional $(2,2)$-complete intersection in characteristic $2$. This is the only case for complete intersections having a non-trivial identity component in automorphism schemes apart from quadric hypersurfaces and genus $1$ curves.

preprint2026arXiv

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

General reasoning represents a long-standing and formidable challenge in artificial intelligence. Recent breakthroughs, exemplified by large language models (LLMs) and chain-of-thought prompting, have achieved considerable success on foundational reasoning tasks. However, this success is heavily contingent upon extensive human-annotated demonstrations, and models' capabilities are still insufficient for more complex problems. Here we show that the reasoning abilities of LLMs can be incentivized through pure reinforcement learning (RL), obviating the need for human-labeled reasoning trajectories. The proposed RL framework facilitates the emergent development of advanced reasoning patterns, such as self-reflection, verification, and dynamic strategy adaptation. Consequently, the trained model achieves superior performance on verifiable tasks such as mathematics, coding competitions, and STEM fields, surpassing its counterparts trained via conventional supervised learning on human demonstrations. Moreover, the emergent reasoning patterns exhibited by these large-scale models can be systematically harnessed to guide and enhance the reasoning capabilities of smaller models.

preprint2026arXiv

Don't Start Over: A Cost-Effective Framework for Migrating Personalized Prompts Between LLMs

Personalization in Large Language Models (LLMs) often relies on user-specific soft prompts. However, these prompts become obsolete when the foundation model is upgraded, necessitating costly, full-scale retraining. To overcome this limitation, we propose the Prompt-level User Migration Adapter (PUMA), a lightweight framework to efficiently migrate personalized prompts across incompatible models. PUMA utilizes a parameter-efficient adapter to bridge the semantic gap, combined with a group-based user selection strategy to significantly reduce training costs. Experiments on three large-scale datasets show our method matches or even surpasses the performance of retraining from scratch, reducing computational cost by up to 98%. The framework demonstrates strong generalization across diverse model architectures and robustness in advanced scenarios like chained and aggregated migrations, offering a practical path for the sustainable evolution of personalized AI by decoupling user assets from the underlying models.

preprint2026arXiv

FinDeepForecast: A Live Multi-Agent System for Benchmarking Deep Research Agents in Financial Forecasting

Deep Research (DR) Agents powered by advanced Large Language Models (LLMs) have fundamentally shifted the paradigm for completing complex research tasks. Yet, a comprehensive and live evaluation of their forecasting performance on real-world, research-oriented tasks in high-stakes domains (e.g., finance) remains underexplored. We introduce FinDeepForecast, the first live, end-to-end multi-agent system for automatically evaluating DR agents by continuously generating research-oriented financial forecasting tasks. This system is equipped with a dual-track taxonomy, enabling the dynamic generation of recurrent and non-recurrent forecasting tasks at both corporate and macro levels. With this system, we generate FinDeepForecastBench, a weekly evaluation benchmark over a ten-week horizon, encompassing 8 global economies and 1,314 listed companies, and evaluate 13 representative methods. Extensive experiments show that, while DR agents consistently outperform strong baselines, their performance still falls short of genuine forward-looking financial reasoning. We expect the proposed FinDeepForecast system to consistently facilitate future advancements of DR agents in research-oriented financial forecasting tasks. The benchmark and leaderboard are publicly available on the OpenFinArena Platform.

preprint2026arXiv

FinDeepResearch: Evaluating Deep Research Agents in Rigorous Financial Analysis

Deep Research (DR) agents, powered by advanced Large Language Models (LLMs), have recently garnered increasing attention for their capability in conducting complex research tasks. However, existing literature lacks a rigorous and systematic evaluation of DR Agent's capabilities in critical research analysis. To address this gap, we first propose HisRubric, a novel evaluation framework with a hierarchical analytical structure and a fine-grained grading rubric for rigorously assessing DR agents' capabilities in corporate financial analysis. This framework mirrors the professional analyst's workflow, progressing from data recognition to metric calculation, and finally to strategic summarization and interpretation. Built on this framework, we construct a FinDeepResearch benchmark that comprises 64 listed companies from 8 financial markets across 4 languages, encompassing a total of 15,808 grading items. We further conduct extensive experiments on the FinDeepResearch using 16 representative methods, including 6 DR agents, 5 LLMs equipped with both deep reasoning and search capabilities, and 5 LLMs with deep reasoning capabilities only. The results reveal the strengths and limitations of these approaches across diverse capabilities, financial markets, and languages, offering valuable insights for future research and development. The benchmark and evaluation code is publicly available at https://OpenFinArena.com/.

preprint2026arXiv

FlowCompile: An Optimizing Compiler for Structured LLM Workflows

Structured LLM workflows, where specialized LLM sub-agents execute according to a predefined graph, have become a powerful abstraction for solving complex tasks. Optimizing such workflows, i.e., selecting configurations for each sub-agent to balance accuracy and latency, is challenging due to the combinatorial design space over model choices, reasoning budgets, and workflow structures. Existing cost-aware methods largely treat workflow optimization as a routing problem, selecting a configuration at inference time for each query according to the accuracy-latency objective used during training. We argue that structured LLM workflows can also be optimized from a compilation perspective: before deployment, the system can globally explore the workflow design space and construct a reusable set of workflow-level configurations spanning diverse accuracy-latency trade-offs. Drawing inspiration from machine learning compilers, we introduce FlowCompile, a structured LLM workflow compiler that performs compile-time design space exploration to identify a high-quality, reusable trade-off set. FlowCompile decomposes a workflow into sub-agents, profiles each sub-agent under diverse configurations, and composes these measurements through a structure-aware proxy to estimate workflow-level accuracy and latency. It then identifies diverse high-quality configurations in a single compile-time pass, without retraining or online adaptation. Experiments across diverse workflows and challenging benchmarks show that FlowCompile consistently outperforms heuristically optimized workflow configurations and routing-based baselines, delivering up to 6.4x speedup. The compiled configuration set further serves as a reusable optimization artifact, enabling flexible deployment under varying runtime preferences and supporting downstream selection or routing.

preprint2026arXiv

GIFT: Games as Informal Training for Generalizable LLMs

While Large Language Models (LLMs) have achieved remarkable success in formal learning tasks such as mathematics and code generation, they still struggle with the "practical wisdom" and generalizable intelligence, such as strategic creativity and social reasoning, that characterize human cognition. This gap arises from a lack of informal learning, which thrives on interactive feedback rather than goal-oriented instruction. In this paper, we propose treating Games as a primary environment for LLM informal learning, leveraging their intrinsic reward signals and abstracted complexity to cultivate diverse competencies. To address the performance degradation observed in multi-task learning, we introduce a Nested Training Framework. Unlike naive task mixing optimizing an implicit "OR" objective, our framework employs sequential task composition to enforce an explicit "AND" objective, compelling the model to master multiple abilities simultaneously to achieve maximal rewards. Using GRPO-based reinforcement learning across Matrix Games, TicTacToe, and Who's the Spy games, we demonstrate that integrating game-based informal learning not only prevents task interference but also significantly bolsters the model's generalization across broad ability-oriented benchmarks. The framework and implementation are publicly available.

preprint2026arXiv

GPS-Synchronized Monitoring of Core-collapse Supernova Bursts with PandaX-4T via Coherent Elastic Neutrino Nuclear Scattering

The landmark detection of neutrinos from SN1987A marked the dawn of neutrino astrophysics. The neutrino burst provided essential insights into fundamental properties of neutrinos, and served as key probes of stellar evolution and supernova dynamics. The recent advancement in coherent elastic neutrino-nucleus scattering enables the detection of core-collapse supernova burst neutrinos using tonne-scale liquid xenon detectors originally designed for dark matter direct detection. Leveraging this capability, we developed and deployed an online supernova monitoring system for the PandaX-4T experiment. This system features a GPS module with millisecond-level timing precision, a low false-alarm rate, and high sensitivity to galactic core-collapse supernova explosion events. The methodology is robust, directly scalable, and planned for implementation in the next-generation PandaX-20T experiment.

preprint2026arXiv

Guardians of the Hair: Rescuing Soft Boundaries in Depth, Stereo, and Novel Views

Soft boundaries, like thin hairs, are commonly observed in natural and computer-generated imagery, but they remain challenging for 3D vision due to the ambiguous mixing of foreground and background cues. This paper introduces Guardians of the Hair (HairGuard), a framework designed to recover fine-grained soft boundary details in 3D vision tasks. Specifically, we first propose a novel data curation pipeline that leverages image matting datasets for training and design a depth fixer network to automatically identify soft boundary regions. With a gated residual module, the depth fixer refines depth precisely around soft boundaries while maintaining global depth quality, allowing plug-and-play integration with state-of-the-art depth models. For view synthesis, we perform depth-based forward warping to retain high-fidelity textures, followed by a generative scene painter that fills disoccluded regions and eliminates redundant background artifacts within soft boundaries. Finally, a color fuser adaptively combines warped and inpainted results to produce novel views with consistent geometry and fine-grained details. Extensive experiments demonstrate that HairGuard achieves state-of-the-art performance across monocular depth estimation, stereo image/video conversion, and novel view synthesis, with significant improvements in soft boundary regions.

preprint2026arXiv

Large Language Models as Amortized Pareto-Front Generators for Constrained Bi-Objective Convex Optimization

Generating feasible Pareto fronts for constrained bi-objective continuous optimization is central to multi-criteria decision-making. Existing methods usually rely on iterative scalarization, evolutionary search, or problem-specific solvers, requiring repeated optimization for each instance. We introduce DIPS, an end-to-end framework that fine-tunes large language models as amortized Pareto-front generators for constrained bi-objective convex optimization. Given a textual problem description, DIPS directly outputs an ordered set of feasible continuous decision vectors approximating the Pareto front. To make continuous optimization compatible with autoregressive language modeling, DIPS combines a compact discretization scheme, Numerically Grounded Token Initialization for new numerical tokens, and Three-Phase Curriculum Optimization, which progressively aligns structural validity, feasibility, and Pareto-front quality. Across five families of constrained bi-objective convex problems, a fine-tuned 7B-parameter model achieves normalized hypervolume ratios of 95.29% to 98.18% relative to reference fronts. With vLLM-accelerated inference, DIPS solves one instance in as little as 0.16 seconds and outperforms general-purpose and reasoning LLM baselines under the evaluated setting. These results suggest that LLMs can serve as effective amortized generators for continuous Pareto-front approximation.

preprint2026arXiv

LLM-ReSum: A Framework for LLM Reflective Summarization through Self-Evaluation

Reliable evaluation of large language model (LLM)-generated summaries remains an open challenge, particularly across heterogeneous domains and document lengths. We conduct a comprehensive meta-evaluation of 14 automatic summarization metrics and LLM-based evaluators across seven datasets spanning five domains, covering documents from short news articles to long scientific, governmental, and legal texts (2K-27K words) with over 1,500 human-annotated summaries. Our results show that traditional lexical overlap metrics (e.g., ROUGE, BLEU) exhibit weak or negative correlation with human judgments, while task-specific neural metrics and LLM-based evaluators achieve substantially higher alignment, especially for linguistic quality assessment. Leveraging these findings, we propose LLM-ReSum, a self-reflective summarization framework that integrates LLM-based evaluation and generation in a closed feedback loop without model finetuning. Across three domains, LLM-ReSum improves low-quality summaries by up to 33% in factual accuracy and 39% in coverage, with human evaluators preferring refined summaries in 89% of cases. We additionally introduce PatentSumEval, a new human-annotated benchmark for legal document summarization comprising 180 expert-evaluated summaries. All code and datasets will be released in GitHub.

preprint2026arXiv

MASH: A Multiplatform and Multimodal Annotated Dataset for Societal Impact of Hurricane

Natural disasters cause multidimensional threats to human societies, with hurricanes exemplifying one of the most disruptive events that not only caused severe physical damage but also sparked widespread discussion on social media platforms. Existing datasets for studying societal impacts of hurricanes often focus on outdated hurricanes and are limited to a single social media platform, failing to capture the broader societal impact in today's diverse social media environment. Moreover, existing datasets annotate visual and textual content of the post separately, failing to account for the multimodal nature of social media posts. To address these gaps, we present a multiplatform and Multimodal Annotated Dataset for Societal Impact of Hurricane (MASH) that includes 59,607 relevant social media data posts from Reddit, TikTok, and YouTube. In addition, all relevant social media data posts are annotated in a multimodal approach that considers both textual and visual content on three dimensions: Humanitarian Classes, Bias Classes, and Information Integrity Classes. To our best knowledge, MASH is the first large-scale, multi-platform, multimodal, and multi-dimensionally annotated dataset centered on hurricane disasters. In addition, we introduce an online platform that supports interactive data exploration, provides preliminary analytical results, and allows users to share their insights regarding the societal impacts of hurricanes. We envision that MASH can contribute to the study of hurricanes' impact on society, such as disaster response, disaster severity classification, public sentiment analysis, disaster policy making, and bias identification. The dataset is publicly available at https://huggingface.co/datasets/YRC10/MASH under the Creative Commons Attribution 4.0 (CC BY 4.0) license.

preprint2026arXiv

Multimodal Cultural Heritage Knowledge Graph Extension with Language and Vision Models

The preservation and interpretation of cultural heritage increasingly rely on digital technologies, among which Knowledge Graphs (KGs) stand out for their ability to structure vast amounts of data. However, the construction and expansion of these KGs often face challenges due to the diverse and complex nature of cultural heritage information. In this paper, we propose a novel approach for extending KG resources in the domain of cultural heritage, which we applied to French data. First, we introduce a new knowledge graph in the domain of French cultural heritage, WJoconde, which is distinguished by its multimodality as it integrates both textual and image information of the entities. We further introduce three variants of WJoconde to facilitate downstream research, such as Knowledge Graph Completion (KGC). We also built a comprehensive benchmark for KGC methods on our dataset. Second, we propose a new framework for extending cultural heritage KGs using multi-modal approaches leveraging Large Language Models (LLMs) and Vision-Language Models (VLMs), which includes automated data extraction from unstructured resources combined with a special validation pipeline for grounding the output of both models, to further extend WJoconde. Our results show that by integrating the rich text and image information in cultural heritage data, we can efficiently enhance KGs with high reliability. We open-source all code and benchmark datasets with text and images, as well as the original data with an interactive access point

preprint2026arXiv

OCR-Memory: Optical Context Retrieval for Long-Horizon Agent Memory

Autonomous LLM agents increasingly operate in long-horizon, interactive settings where success depends on reusing experience accumulated over extended histories. However, existing agent memory systems are fundamentally constrained by text-context budgets: storing or revisiting raw trajectories is prohibitively token-expensive, while summarization and text-only retrieval trade token savings for information loss and fragmented evidence. To address this limitation, we propose Optical Context Retrieval Memory (OCR-Memory), a memory framework that leverages the visual modality as a high-density representation of agent experience, enabling retention of arbitrarily long histories with minimal prompt overhead at retrieval time. Specifically, OCR-Memory renders historical trajectories into images annotated with unique visual identifiers. OCR-Memory retrieves stored experience via a \emph{locate-and-transcribe} paradigm that selects relevant regions through visual anchors and retrieves the corresponding verbatim text, avoiding free-form generation and reducing hallucination. Experiments on long-horizon agent benchmarks show consistent gains under strict context limits, demonstrating that optical encoding increases effective memory capacity while preserving faithful evidence recovery.

preprint2026arXiv

PERM: Psychology-grounded Empathetic Reward Modeling for Large Language Models

Large Language Models (LLMs) are increasingly deployed in human-centric applications, yet they often fail to provide substantive emotional support. While Reinforcement Learning (RL) has been utilized to enhance empathy of LLMs, existing reward models typically evaluate empathy from a single perspective, overlooking the inherently bidirectional interaction nature of empathy between the supporter and seeker as defined by Empathy Cycle theory. To address this limitation, we propose Psychology-grounded Empathetic Reward Modeling (PERM). PERM operationalizes empathy evaluation through a bidirectional decomposition: 1) Supporter perspective, assessing internal resonation and communicative expression; 2) Seeker perspective, evaluating emotional reception. Additionally, it incorporates a bystander perspective to monitor overall interaction quality. Extensive experiments on a widely-used emotional intelligence benchmark and an industrial daily conversation dataset demonstrate that PERM outperforms state-of-the-art baselines by over 10\%. Furthermore, a blinded user study reveals a 70\% preference for our approach, highlighting its efficacy in generating more empathetic responses. Our code, dataset, and models are available at https://github.com/ZhengWwwq/PERM.

preprint2026arXiv

PRTS: A Primitive Reasoning and Tasking System via Contrastive Representations

Vision-Language-Action (VLA) models advance robotic control via strong visual-linguistic priors. However, existing VLAs predominantly frame pretraining as supervised behavior cloning, overlooking the fundamental nature of robot learning as a goal-reaching process that requires understanding temporal task progress. We present \textbf{PRTS} (\textbf{P}rimitive \textbf{R}easoning and \textbf{T}asking \textbf{S}ystem), a VLA foundation model that reformulates pretraining through Goal-Conditioned Reinforcement Learning. By treating language instructions as goals and employing contrastive reinforcement learning, PRTS learns a unified embedding space where the inner product of state-action and goal embeddings approximates the log-discounted goal occupancy, the probability of reaching the language-specified goal from the current state-action, quantitatively assessing physical feasibility beyond static semantic matching. PRTS draws this dense goal-reachability supervision directly from offline trajectories without reward annotations, and folds it into the VLM backbone via a role-aware causal mask, incurring negligible overhead over vanilla behavior cloning. This paradigm endows the high-level reasoning system with intrinsic goal reachability awareness, bridging semantic reasoning and temporal task progress, and further benefits goal-conditioned action prediction. Pretrained on 167B tokens of diverse manipulation and embodied-reasoning data, PRTS reaches state-of-the-art performance on LIBERO, LIBERO-Pro, LIBERO-Plus, SimplerEnv, and a real-world suite of 14 complex tasks, with particularly substantial gains on long-horizon, contact-rich, and zero-shot novel-instruction settings, confirming that injecting goal-reachability awareness significantly improves both execution success and long-horizon planning of general-purpose robotic foundation policies.

preprint2026arXiv

Quantization Commutes with Reduction of Chern-Simons Gauge Theory

We prove an infinite-dimensional version of "quantization commutes with reduction" in the framework of geometric quantization of Chern-Simons gauge theory, focusing on the genus one case. The proof is complex-analytic and relies on the Atiyah-Bott stack and the Chern-Simons line bundle.

preprint2026arXiv

Rethinking the Text-Vision Reasoning Imbalance in MLLMs through the Lens of Training Recipes

Multimodal large language models (MLLMs) have demonstrated strong capabilities on vision-and-language tasks. However, recent findings reveal an imbalance in their reasoning capabilities across visual and textual modalities. Specifically, current MLLMs often over-rely on textual cues while under-attending to visual content, resulting in suboptimal performance on tasks that require genuine visual reasoning. We refer to this phenomenon as the \textit{modality gap}, defined as the performance disparity between text-centric and vision-centric inputs. In this paper, we analyze the modality gap through the lens of training recipes. We first show that existing training recipes tend to amplify this gap. Then, we systematically explore strategies to bridge it from two complementary perspectives: data and loss design. Our findings provide insights into developing training recipes that mitigate the modality gap and promote more balanced multimodal reasoning. Our code is publicly available at https://github.com/UCSB-NLP-Chang/Bridging-Modality-Gap.

preprint2026arXiv

Safactory: A Scalable Agentic Infrastructure for Training Trustworthy Autonomous Intelligence

As large models evolve from conversational assistants into autonomous agents, challenges increasingly arise from long-horizon decision making, tool use, and real environment interaction. Existing agenticinfrastructure remain fragmented across evaluation, data management, and agent evolution, making it difficult to discover risks systematically and improve models in a continuous closed loop. In this report, we present \textbf{Safactory}, a scalable agent factory for trustworthy autonomous intelligence. Safactory integrates three tightly coupled platforms: a \textbf{Parallel Simulation Platform} for trajectory generation, a \textbf{Trustworthy Data Platform} for trajectory storage and experience extraction, and an \textbf{Autonomous Evolution Platform} for asynchronous reinforcement learning and on-policy distillation. As far as we know, Safactory is the first framework to propose a unified evolutionary pipeline for next-generation trustworthy autonomous intelligence.

preprint2026arXiv

Song Aesthetics Evaluation with Multi-Stem Attention and Hierarchical Uncertainty Modeling

Music generative artificial intelligence (AI) is rapidly expanding music content, necessitating automated song aesthetics evaluation. However, existing studies largely focus on speech, audio or singing quality, leaving song aesthetics underexplored. Moreover, conventional approaches often predict a precise Mean Opinion Score (MOS) value directly, which struggles to capture the nuances of human perception in song aesthetics evaluation. This paper proposes a song-oriented aesthetics evaluation framework, featuring two novel modules: 1) Multi-Stem Attention Fusion (MSAF) builds bidirectional cross-attention between mixture-vocal and mixture-accompaniment pairs, fusing them to capture complex musical features; 2) Hierarchical Granularity-Aware Interval Aggregation (HiGIA) learns multi-granularity score probability distributions, aggregates them into a score interval, and applies a regression within the interval to produce the final score. We evaluated on two datasets of full-length songs: SongEval dataset (AI-generated) and an internal aesthetics dataset (human-created), and compared with two state-of-the-art (SOTA) models. Results show that the proposed method achieves stronger performance for multi-dimensional song aesthetics evaluation.

preprint2026arXiv

Task-Aware Scanning Parameter Configuration for Robotic Inspection Using Vision Language Embeddings and Hyperdimensional Computing

Robotic laser profiling is widely used for dimensional verification and surface inspection, yet measurement fidelity is often dominated by sensor configuration rather than robot motion. Industrial profilers expose multiple coupled parameters, including sampling frequency, measurement range, exposure time, receiver dynamic range, and illumination, that are still tuned by trial-and-error; mismatches can cause saturation, clipping, or missing returns that cannot be recovered downstream. We formulate instruction-conditioned sensing parameter recommendation; given a pre-scan RGB observation and a natural-language inspection instruction, infer a discrete configuration over key parameters of a robot-mounted profiler. To benchmark this problem, we develop Instruct-Obs2Param, a real-world multimodal dataset linking inspection intents and multi-view pose and illumination variation across 16 objects to canonical parameter regimes. We then propose ScanHD, a hyperdimensional computing framework that binds instruction and observation into a task-aware code and performs parameter-wise associative reasoning with compact memories, matching discrete scanner regimes while yielding stable, interpretable, low-latency decisions. On Instruct-Obs2Param, ScanHD achieves 92.7% average exact accuracy and 98.1% average Win@1 accuracy across the five parameters, with strong cross-split generalization and low-latency inference suitable for deployment, outperforming rule-based heuristics, conventional multimodal models, and multimodal large language models. This work enables autonomous, instruction-conditioned sensing configuration from task intent and scene context, eliminating manual tuning and elevating sensor configuration from a static setting to an adaptive decision variable.

preprint2026arXiv

UniFixer: A Universal Reference-Guided Fixer for Diffusion-Based View Synthesis

With the recent surge of generative models, diffusion-based approaches have become mainstream for view synthesis tasks, either in an explicit depth-warp-inpaint or in an implicit end-to-end manner. Despite their success, both paradigms often suffer from noticeable quality degradation, e.g., blurred details and distorted structures, caused by pixel-to-latent compression and diffusion hallucination. In this paper, we investigate diffusion degradation from three key dimensions (i.e., spatial, temporal, and backbone-related) and propose UniFixer, a universal reference-guided framework that fixes diverse degradation artifacts via a coarse-to-fine strategy. Specifically, a reference pre-alignment module is first designed to perform coarse alignment between the reference view and the degraded novel view. A global structure anchoring mechanism then rectifies geometric distortions to ensure structural fidelity, followed by a local detail injection module that recovers fine-grained texture details for high-quality view synthesis. Our UniFixer serves as a plug-and-play refiner that achieves zero-shot fixing across different types of diffusion degradation, and extensive experiments verify our state-of-the-art performance on novel view synthesis and stereo conversion.

preprint2026arXiv

WOW-Seg: A Word-free Open World Segmentation Model

Open world image segmentation aims to achieve precise segmentation and semantic understanding of targets within images by addressing the infinitely open set of object categories encountered in the real world. However, traditional closed-set segmentation approaches struggle to adapt to complex open world scenarios, while foundation segmentation models such as SAM exhibit notable discrepancies between their strong segmentation capabilities and relatively weaker semantic understanding. To bridge these discrepancies, we propose WOW-Seg, a Word-free Open World Segmentation model for segmenting and recognizing objects from open-set categories. Specifically, WOW-Seg introduces a novel visual prompt module, Mask2Token, which transforms image masks into visual tokens and ensures their alignment with the VLLM feature space. Moreover, we introduce the Cascade Attention Mask to decouple information across different instances. This approach mitigates inter-instance interference, leading to a significant improvement in model performance. We further construct an open world region recognition test benchmark: the Region Recognition Dataset (RR-7K). With 7,662 classes, it represents the most extensive category-rich region recognition dataset to date. WOW-Seg attains strong results on the LVIS dataset, achieving a semantic similarity of 89.7 and a semantic IoU of 82.4. This performance surpasses the previous SOTA while using only one-eighth the parameter count. These results underscore the strong open world generalization capabilities of WOW-Seg. The code and related resources are available at https://github.com/AAwcAA/WOW-Seg-Meta.

preprint2025arXiv

Chiral superconductivity from spin polarized Chern band in twisted MoTe$_2$

Superconductivity has been observed in twisted MoTe2 within the anomalous Hall metal parent state. Key signatures-including a fully spin/valley polarized normal state, anomalous Hall resistivity hysteresis, superconducting phase adjacent to the fractional Chern insulating state and a narrow superconducting dome at zero gating field-collectively indicate chiral superconductivity driven by intravalley pairing of electrons. Within the Kohn-Luttinger mechanism, we compute the superconducting phase diagram via random phase approximation, incorporating Coulomb repulsion in a realistic continuum model. Our results identify a dominant intravalley pairing with a narrow superconducting dome of p+ip type at zero gate field. This chiral phase contrasts sharply with the much weaker time-reversal-symmetric intervalley pairing at finite gating field. Our work highlights the role of band topology in achieving robust topological superconductivity, and supports the chiral and topological nature of the superconductivity observed in twisted MoTe2.

preprint2024arXiv

From Beginner to Expert: Modeling Medical Knowledge into General LLMs

Recently, large language model (LLM) based artificial intelligence (AI) systems have demonstrated remarkable capabilities in natural language understanding and generation. However, these models face a significant challenge when it comes to sensitive applications, such as reasoning over medical knowledge and answering medical questions in a physician-like manner. Prior studies attempted to overcome this challenge by increasing the model size (>100B) to learn more general medical knowledge, while there is still room for improvement in LLMs with smaller-scale model sizes (<100B). In this work, we start from a pre-trained general LLM model (AntGLM-10B) and fine-tune it from a medical beginner towards a medical expert (called AntGLM-Med-10B), which leverages a 3-stage optimization procedure, i.e., general medical knowledge injection, medical domain instruction tuning, and specific medical task adaptation. Our contributions are threefold: (1) We specifically investigate how to adapt a pre-trained general LLM in medical domain, especially for a specific medical task. (2) We collect and construct large-scale medical datasets for each stage of the optimization process. These datasets encompass various data types and tasks, such as question-answering, medical reasoning, multi-choice questions, and medical conversations. (3) Specifically for multi-choice questions in the medical domain, we propose a novel Verification-of-Choice approach for prompting engineering, which significantly enhances the reasoning ability of LLMs. Remarkably, by combining the above approaches, our AntGLM-Med-10B model can outperform the most of LLMs on PubMedQA, including both general and medical LLMs, even when these LLMs have larger model size.

preprint2024arXiv

Multiple Chern bands in twisted MoTe$_2$ and possible non-Abelian states

We investigate the moiré band structures and possible even denominator fractional quantum Hall state in small angle twisted bilayer MoTe$_2$, using combined large-scale local basis density functional theory calculation and continuum model exact diagonalization. Via large-scale first principles calculations at $θ=1.89^{\circ}$, we find a sequence of $C=1$(Chern number in K valley)moiré Chern bands, in analogy to Landau levels. By constructing the continuum model with multiple Chern bands, we undertake band-projected exact diagonalization using unscreened Coulomb repulsion to identify possible non-Abelian states near twist angle $θ=1.89^{\circ}$ at the half filling of second moiré band.

preprint2023arXiv

A Bi-Step Grounding Paradigm for Large Language Models in Recommendation Systems

As the focus on Large Language Models (LLMs) in the field of recommendation intensifies, the optimization of LLMs for recommendation purposes (referred to as LLM4Rec) assumes a crucial role in augmenting their effectiveness in providing recommendations. However, existing approaches for LLM4Rec often assess performance using restricted sets of candidates, which may not accurately reflect the models' overall ranking capabilities. In this paper, our objective is to investigate the comprehensive ranking capacity of LLMs and propose a two-step grounding framework known as BIGRec (Bi-step Grounding Paradigm for Recommendation). It initially grounds LLMs to the recommendation space by fine-tuning them to generate meaningful tokens for items and subsequently identifies appropriate actual items that correspond to the generated tokens. By conducting extensive experiments on two datasets, we substantiate the superior performance, capacity for handling few-shot scenarios, and versatility across multiple domains exhibited by BIGRec. Furthermore, we observe that the marginal benefits derived from increasing the quantity of training samples are modest for BIGRec, implying that LLMs possess the limited capability to assimilate statistical information, such as popularity and collaborative filtering, due to their robust semantic priors. These findings also underline the efficacy of integrating diverse statistical information into the LLM4Rec framework, thereby pointing towards a potential avenue for future research. Our code and data are available at https://github.com/SAI990323/Grounding4Rec.

preprint2023arXiv

Backdoor Attacks Against Dataset Distillation

Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.

preprint2023arXiv

DE-FAKE: Detection and Attribution of Fake Images Generated by Text-to-Image Generation Models

Text-to-image generation models that generate images based on prompt descriptions have attracted an increasing amount of attention during the past few months. Despite their encouraging performance, these models raise concerns about the misuse of their generated fake images. To tackle this problem, we pioneer a systematic study on the detection and attribution of fake images generated by text-to-image generation models. Concretely, we first build a machine learning classifier to detect the fake images generated by various text-to-image generation models. We then attribute these fake images to their source models, such that model owners can be held responsible for their models' misuse. We further investigate how prompts that generate fake images affect detection and attribution. We conduct extensive experiments on four popular text-to-image generation models, including DALL$\cdot$E 2, Stable Diffusion, GLIDE, and Latent Diffusion, and two benchmark prompt-image datasets. Empirical results show that (1) fake images generated by various models can be distinguished from real ones, as there exists a common artifact shared by fake images from different models; (2) fake images can be effectively attributed to their source models, as different models leave unique fingerprints in their generated images; (3) prompts with the ``person'' topic or a length between 25 and 75 enable models to generate fake images with higher authenticity. All findings contribute to the community's insight into the threats caused by text-to-image generation models. We appeal to the community's consideration of the counterpart solutions, like ours, against the rapidly-evolving fake image generation.

preprint2023arXiv

Implications of Nano-Hertz Gravitational Waves on Electroweak Phase Transition in the Singlet Dark Matter Model

Inspired by the recent evidences of nano-Hertz stochastic gravitational waves observed by the pulsar timing array collaborations, we explore their implied supercooled electroweak phase transition in the singlet extension of the Standard Model. Our findings reveal that by adjusting the model parameter at per milli level, the corresponding percolation temperature can be continuously lowered to 1 GeV. With such a low percolation temperature, the singlet dark matter may freeze out before the electroweak phase transition, and, consequently, the entropy generated during the transition can significantly affect the dark matter relic density. It alleviates the tension between the requirement of a strong electroweak phase transition and the constraints imposed by dark matter direct detection, and can be tested in future experiments.

preprint2022arXiv

A first look at the function space for planar two-loop six-particle Feynman integrals

Two-loop corrections to scattering amplitudes are crucial theoretical input for collider physics. Recent years have seen tremendous advances in computing Feynman integrals, scattering amplitudes, and cross sections for five-particle processes. In this paper, we initiate the study of the function space for planar two-loop six-particle processes. We study all genuine six-particle Feynman integrals, and derive the differential equations they satisfy on maximal cuts. Performing a leading singularity analysis in momentum space, and in Baikov representation, we find an integral basis that puts the differential equations into canonical form. The corresponding differential equation in the eight independent kinematic variables is derived with the finite-field reconstruction method and the symbol letters are identified. We identify the dual conformally invariant hexagon alphabet known from maximally supersymmetric Yang-Mills theory as a subset of our alphabet. This paper constitutes an important step in the analytic calculation of planar two-loop six-particle Feynman integrals.

preprint2022arXiv

A joint explanation of W-mass and muon g-2 in 2HDM

Since both $W$-mass and muon $g-2$ can be affected by the mass splittings among extra Higgs bosons $(H,~A,~H^\pm)$ in a 2HDM, we take a model with $μ$-$τ$ LFV interactions to examine the two anomalies reported respectively by CDF II and FNAL. We obtain the following observations: (i) Combined with theoretical constraints, the CDF $W$-mass measurement disfavors $H$ or $A$ to degenerate in mass with $H^\pm$, but allows $H$ and $A$ to degenerate. The mass splitting between $H^\pm$ and $H/A$ is required to be larger than 10 GeV. The $m_{H^\pm}$ and $m_{A}$ are favored to be smaller than 650 GeV for $m_H<120$ GeV, and allowed to have more large values with increasing of $m_H$. (ii) After imposing other relevant experimental constraints, there are parameter spaces that simultaneously satisfy (at $2σ$ level) the CDF $W$-mass, the FNAL muon $g-2$ and the data of lepton universality in $τ$ decays, but the mass splittings among extra Higgs bosons are strictly constrained.

preprint2022arXiv

A Lightweight NMS-free Framework for Real-time Visual Fault Detection System of Freight Trains

Real-time vision-based system of fault detection (RVBS-FD) for freight trains is an essential part of ensuring railway transportation safety. Most existing vision-based methods still have high computational costs based on convolutional neural networks. The computational cost is mainly reflected in the backbone, neck, and post-processing, i.e., non-maximum suppression (NMS). In this paper, we propose a lightweight NMS-free framework to achieve real-time detection and high accuracy simultaneously. First, we use a lightweight backbone for feature extraction and design a fault detection pyramid to process features. This fault detection pyramid includes three novel individual modules using attention mechanism, bottleneck, and dilated convolution for feature enhancement and computation reduction. Instead of using NMS, we calculate different loss functions, including classification and location costs in the detection head, to further reduce computation. Experimental results show that our framework achieves over 83 frames per second speed with a smaller model size and higher accuracy than the state-of-the-art detectors. Meanwhile, the hardware resource requirements of our method are low during the training and testing process.

preprint2022arXiv

Addressing Confounding Feature Issue for Causal Recommendation

In recommender system, some feature directly affects whether an interaction would happen, making the happened interactions not necessarily indicate user preference. For instance, short videos are objectively easier to be finished even though the user does not like the video. We term such feature as confounding feature, and video length is a confounding feature in video recommendation. If we fit a model on such interaction data, just as done by most data-driven recommender systems, the model will be biased to recommend short videos more, and deviate from user actual requirement. This work formulates and addresses the problem from the causal perspective. Assuming there are some factors affecting both the confounding feature and other item features, e.g., the video creator, we find the confounding feature opens a backdoor path behind user item matching and introduces spurious correlation. To remove the effect of backdoor path, we propose a framework named Deconfounding Causal Recommendation (DCR), which performs intervened inference with do-calculus. Nevertheless, evaluating do calculus requires to sum over the prediction on all possible values of confounding feature, significantly increasing the time cost. To address the efficiency challenge, we further propose a mixture-of experts (MoE) model architecture, modeling each value of confounding feature with a separate expert module. Through this way, we retain the model expressiveness with few additional costs. We demonstrate DCR on the backbone model of neural factorization machine (NFM), showing that DCR leads to more accurate prediction of user preference with small inference time cost.

preprint2022arXiv

Adversarial Support Alignment

We study the problem of aligning the supports of distributions. Compared to the existing work on distribution alignment, support alignment does not require the densities to be matched. We propose symmetric support difference as a divergence measure to quantify the mismatch between supports. We show that select discriminators (e.g. discriminator trained for Jensen-Shannon divergence) are able to map support differences as support differences in their one-dimensional output space. Following this result, our method aligns supports by minimizing a symmetrized relaxed optimal transport cost in the discriminator 1D space via an adversarial process. Furthermore, we show that our approach can be viewed as a limit of existing notions of alignment by increasing transportation assignment tolerance. We quantitatively evaluate the method across domain adaptation tasks with shifts in label distributions. Our experiments show that the proposed method is more robust against these shifts than other alignment-based baselines.

preprint2022arXiv

AI for Global Climate Cooperation: Modeling Global Climate Negotiations, Agreements, and Long-Term Cooperation in RICE-N

Comprehensive global cooperation is essential to limit global temperature increases while continuing economic development, e.g., reducing severe inequality or achieving long-term economic growth. Achieving long-term cooperation on climate change mitigation with n strategic agents poses a complex game-theoretic problem. For example, agents may negotiate and reach climate agreements, but there is no central authority to enforce adherence to those agreements. Hence, it is critical to design negotiation and agreement frameworks that foster cooperation, allow all agents to meet their individual policy objectives, and incentivize long-term adherence. This is an interdisciplinary challenge that calls for collaboration between researchers in machine learning, economics, climate science, law, policy, ethics, and other fields. In particular, we argue that machine learning is a critical tool to address the complexity of this domain. To facilitate this research, here we introduce RICE-N, a multi-region integrated assessment model that simulates the global climate and economy, and which can be used to design and evaluate the strategic outcomes for different negotiation and agreement frameworks. We also describe how to use multi-agent reinforcement learning to train rational agents using RICE-N. This framework underpinsAI for Global Climate Cooperation, a working group collaboration and competition on climate negotiation and agreement design. Here, we invite the scientific community to design and evaluate their solutions using RICE-N, machine learning, economic intuition, and other domain knowledge. More information can be found on www.ai4climatecoop.org.

preprint2022arXiv

Algorithms of Real-Time Navigation and Control of Autonomous Unmanned Vehicles

The rapid development of robotics has benefited by more and more people putting their attention to it. With the demand for robots is growing for the purpose of fulfilling tasks instead of humans, how to control the robot better is becoming a hot topic. For obstacle avoidance, we proposed algorithms for both 2D planar environments and 3D space environments. The example cases we raise are those that need to be addressed but have always been ignored. In addition, we put efforts into trajectory planning for robots. The two scenarios we set are self-driving cars on the road and reconnaissance and surveillance of drones. For future expectations, there are some possible directions. How to combine traditional navigation algorithms and high-tech algorithms together so as to fulfill the tasks perfectly while the computational efficiency is not too high is a worthy topic. In addition, extending the obstacle avoidance algorithms to more competitive situations. Moreover, cooperation among multi robots are worth attention by researchers. All in all, there is still a long way to go for the development of navigation and control of mobile robots. Despite this, we believe we do not need to wait for too long time to see the revolution of robots.

preprint2022arXiv

An inverse boundary value problem arising in nonlinear acoustics

We consider an inverse problem arising in nonlinear ultrasound imaging. The propagation of ultrasound waves is modeled by a quasilinear wave equation. We make measurements at the boundary of the medium encoded in the Dirichlet-to-Neumann map, and we show that these measurements determine the nonlinearity.

preprint2022arXiv

Analytical Equation of Three-point Correlation Function of Galaxies: to Third Order of Density Perturbation

Applying functional differentiation to the density field with Newtonian gravity, we obtain the static, nonlinear equation of the three-point correlation function $ζ$ of galaxies, to the third order density perturbations. We make the equation closed and perform renormalization of the mass and the Jeans wavenumber. Using the boundary condition inferred from observations, we obtain the third order solution $ζ(r, u, θ)$ at fixed $u=2$, which is positive, exhibits a $U$-shape along the angle $θ$, and decreases monotonously along the radial $r$ up to the range $r \leq 30\, h^{-1}$Mpc in our computation. The corresponding reduced $Q(r, u, θ)$ deviates from 1 of the Gaussian case, has a deeper $U$-shape along $θ$, and varies non-monotonously along $r$. The third order solution agrees with the SDSS data of galaxies, quite close to the previous second order solution, especially at large scales. This indicates that the equations of correlation functions with increasing orders of density perturbation provide a stable description of the nonlinear galaxy system.

preprint2022arXiv

Analyzing the Effects of Handling Data Imbalance on Learned Features from Medical Images by Looking Into the Models

One challenging property lurking in medical datasets is the imbalanced data distribution, where the frequency of the samples between the different classes is not balanced. Training a model on an imbalanced dataset can introduce unique challenges to the learning problem where a model is biased towards the highly frequent class. Many methods are proposed to tackle the distributional differences and the imbalanced problem. However, the impact of these approaches on the learned features is not well studied. In this paper, we look deeper into the internal units of neural networks to observe how handling data imbalance affects the learned features. We study several popular cost-sensitive approaches for handling data imbalance and analyze the feature maps of the convolutional neural networks from multiple perspectives: analyzing the alignment of salient features with pathologies and analyzing the pathology-related concepts encoded by the networks. Our study reveals differences and insights regarding the trained models that are not reflected by quantitative metrics such as AUROC and AP and show up only by looking at the models through a lens.

preprint2022arXiv

Applications of Multi-Agent Reinforcement Learning in Future Internet: A Comprehensive Survey

Future Internet involves several emerging technologies such as 5G and beyond 5G networks, vehicular networks, unmanned aerial vehicle (UAV) networks, and Internet of Things (IoTs). Moreover, future Internet becomes heterogeneous and decentralized with a large number of involved network entities. Each entity may need to make its local decision to improve the network performance under dynamic and uncertain network environments. Standard learning algorithms such as single-agent Reinforcement Learning (RL) or Deep Reinforcement Learning (DRL) have been recently used to enable each network entity as an agent to learn an optimal decision-making policy adaptively through interacting with the unknown environments. However, such an algorithm fails to model the cooperations or competitions among network entities, and simply treats other entities as a part of the environment that may result in the non-stationarity issue. Multi-agent Reinforcement Learning (MARL) allows each network entity to learn its optimal policy by observing not only the environments, but also other entities' policies. As a result, MARL can significantly improve the learning efficiency of the network entities, and it has been recently used to solve various issues in the emerging networks. In this paper, we thus review the applications of MARL in the emerging networks. In particular, we provide a tutorial of MARL and a comprehensive survey of applications of MARL in next generation Internet. In particular, we first introduce single-agent RL and MARL. Then, we review a number of applications of MARL to solve emerging issues in future Internet. The issues consist of network access, transmit power control, computation offloading, content caching, packet routing, trajectory design for UAV-aided networks, and network security issues.

preprint2022arXiv

Attention-based Dual Supervised Decoder for RGBD Semantic Segmentation

Encoder-decoder models have been widely used in RGBD semantic segmentation, and most of them are designed via a two-stream network. In general, jointly reasoning the color and geometric information from RGBD is beneficial for semantic segmentation. However, most existing approaches fail to comprehensively utilize multimodal information in both the encoder and decoder. In this paper, we propose a novel attention-based dual supervised decoder for RGBD semantic segmentation. In the encoder, we design a simple yet effective attention-based multimodal fusion module to extract and fuse deeply multi-level paired complementary information. To learn more robust deep representations and rich multi-modal information, we introduce a dual-branch decoder to effectively leverage the correlations and complementary cues of different tasks. Extensive experiments on NYUDv2 and SUN-RGBD datasets demonstrate that our method achieves superior performance against the state-of-the-art methods.

preprint2022arXiv

Auditing Membership Leakages of Multi-Exit Networks

Relying on the fact that not all inputs require the same amount of computation to yield a confident prediction, multi-exit networks are gaining attention as a prominent approach for pushing the limits of efficient deployment. Multi-exit networks endow a backbone model with early exits, allowing to obtain predictions at intermediate layers of the model and thus save computation time and/or energy. However, current various designs of multi-exit networks are only considered to achieve the best trade-off between resource usage efficiency and prediction accuracy, the privacy risks stemming from them have never been explored. This prompts the need for a comprehensive investigation of privacy risks in multi-exit networks. In this paper, we perform the first privacy analysis of multi-exit networks through the lens of membership leakages. In particular, we first leverage the existing attack methodologies to quantify the multi-exit networks' vulnerability to membership leakages. Our experimental results show that multi-exit networks are less vulnerable to membership leakages and the exit (number and depth) attached to the backbone model is highly correlated with the attack performance. Furthermore, we propose a hybrid attack that exploits the exit information to improve the performance of existing attacks. We evaluate membership leakage threat caused by our hybrid attack under three different adversarial setups, ultimately arriving at a model-free and data-free adversary. These results clearly demonstrate that our hybrid attacks are very broadly applicable, thereby the corresponding risks are much more severe than shown by existing membership inference attacks. We further present a defense mechanism called TimeGuard specifically for multi-exit networks and show that TimeGuard mitigates the newly proposed attacks perfectly.

preprint2022arXiv

ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers

Self-supervised learning in speech involves training a speech representation network on a large-scale unannotated speech corpus, and then applying the learned representations to downstream tasks. Since the majority of the downstream tasks of SSL learning in speech largely focus on the content information in speech, the most desirable speech representations should be able to disentangle unwanted variations, such as speaker variations, from the content. However, disentangling speakers is very challenging, because removing the speaker information could easily result in a loss of content as well, and the damage of the latter usually far outweighs the benefit of the former. In this paper, we propose a new SSL method that can achieve speaker disentanglement without severe loss of content. Our approach is adapted from the HuBERT framework, and incorporates disentangling mechanisms to regularize both the teacher labels and the learned representations. We evaluate the benefit of speaker disentanglement on a set of content-related downstream tasks, and observe a consistent and notable performance advantage of our speaker-disentangled representations.

preprint2022arXiv

Crystal growth engineering and origin of the weak ferromagnetism in antiferromagnetic matrix of orthochromates from $t$-$e$ orbital hybridization

We report a combined experimental and theoretical study on intriguing magnetic properties of quasiferroelectric orthochromates. Large single crystals of the family of RECrO$_3$ (RE = Y, Eu, Gd, Tb, Dy, Ho, Er, Tm, Yb, and Lu) compounds were successfully grown. Neutron Laue study indicates a good quality of the obtained single crystals. Applied magnetic-field and temperature dependent magnetization measurements reveal their intrinsic magnetic properties, especially the antiferromagnetic (AFM) transition temperatures. Density functional theory studies of the electronic structures were carried out using the Perdew-Burke-Ernzerhof functional plus Hubbard $U$ method. Crystallographic information and magnetism were theoretically optimized systematically. When RE$^{3+}$ cations vary from Y$^{3+}$ and Eu$^{3+}$ to Lu$^{3+}$ ions, the calculated $t$-$e$ orbital hybridization degree and Néel temperature behave similarly to the experimentally-determined AFM transition temperature with variation in cationic radius. We found that the $t$-$e$ hybridization is anisotropic, causing a magnetic anisotropy of Cr$^{3+}$ sublattices. This was evaluated with the nearest-neighbour $J_1$-$J_2$ model. Our research provides a picture of the electronic structures during the $t$-$e$ hybridization process while changing RE ions and sheds light on the nature of the weak ferromagnetism coexisting with predominated antiferromagnetism. The available large RECrO$_3$ single crystals build a platform for further studies of orthochromates.

preprint2022arXiv

Data-Efficient Double-Win Lottery Tickets from Robust Pre-training

Pre-training serves as a broadly adopted starting point for transfer learning on various downstream tasks. Recent investigations of lottery tickets hypothesis (LTH) demonstrate such enormous pre-trained models can be replaced by extremely sparse subnetworks (a.k.a. matching subnetworks) without sacrificing transferability. However, practical security-crucial applications usually pose more challenging requirements beyond standard transfer, which also demand these subnetworks to overcome adversarial vulnerability. In this paper, we formulate a more rigorous concept, Double-Win Lottery Tickets, in which a located subnetwork from a pre-trained model can be independently transferred on diverse downstream tasks, to reach BOTH the same standard and robust generalization, under BOTH standard and adversarial training regimes, as the full pre-trained model can do. We comprehensively examine various pre-training mechanisms and find that robust pre-training tends to craft sparser double-win lottery tickets with superior performance over the standard counterparts. For example, on downstream CIFAR-10/100 datasets, we identify double-win matching subnetworks with the standard, fast adversarial, and adversarial pre-training from ImageNet, at 89.26%/73.79%, 89.26%/79.03%, and 91.41%/83.22% sparsity, respectively. Furthermore, we observe the obtained double-win lottery tickets can be more data-efficient to transfer, under practical data-limited (e.g., 1% and 10%) downstream schemes. Our results show that the benefits from robust pre-training are amplified by the lottery ticket scheme, as well as the data-limited transfer setting. Codes are available at https://github.com/VITA-Group/Double-Win-LTH.

preprint2022arXiv

Decoupled Pyramid Correlation Network for Liver Tumor Segmentation from CT images

Purpose: Automated liver tumor segmentation from Computed Tomography (CT) images is a necessary prerequisite in the interventions of hepatic abnormalities and surgery planning. However, accurate liver tumor segmentation remains challenging due to the large variability of tumor sizes and inhomogeneous texture. Recent advances based on Fully Convolutional Network (FCN) for medical image segmentation drew on the success of learning discriminative pyramid features. In this paper, we propose a Decoupled Pyramid Correlation Network (DPC-Net) that exploits attention mechanisms to fully leverage both low- and high-level features embedded in FCN to segment liver tumor. Methods: We first design a powerful Pyramid Feature Encoder (PFE) to extract multi-level features from input images. Then we decouple the characteristics of features concerning spatial dimension (i.e., height, width, depth) and semantic dimension (i.e., channel). On top of that, we present two types of attention modules, Spatial Correlation (SpaCor) and Semantic Correlation (SemCor) modules, to recursively measure the correlation of multi-level features. The former selectively emphasizes global semantic information in low-level features with the guidance of high-level ones. The latter adaptively enhance spatial details in high-level features with the guidance of low-level ones. Results: We evaluate the DPC-Net on MICCAI 2017 LiTS Liver Tumor Segmentation (LiTS) challenge dataset. Dice Similarity Coefficient (DSC) and Average Symmetric Surface Distance (ASSD) are employed for evaluation. The proposed method obtains a DSC of 76.4% and an ASSD of 0.838 mm for liver tumor segmentation, outperforming the state-of-the-art methods. It also achieves a competitive results with a DSC of 96.0% and an ASSD of 1.636 mm for liver segmentation.

preprint2022arXiv

Delayed Impact of Interdisciplinary Research

Interdisciplinary research increasingly fuels innovation, and is considered to be a key to tomorrow breakthrough. Yet little is known about whether interdisciplinary research manifests delayed impact. Here, we use the time to reach the citation peak to quantify the highest impact time and citation dynamics, and examine its relationship with interdisciplinarity. Using large scale publication datasets, our results suggest that interdisciplinary papers show significant delayed impact both microscopically per paper and macroscopically collectively, as it takes longer time for interdisciplinary papers to reach their citation peak. Furthermore, we study the underlying forces of such delayed impact, finding that the effect goes beyond the Matthew effect (i.e., the rich-get-richer effect). Finally, we find that team size and content conventionality only partly account for this effect. Overall, our results suggest that governments, research administrators, funding agencies should be aware of this general feature of interdisciplinary science, which may have broad policy implications.

preprint2022arXiv

DiffCSE: Difference-based Contrastive Learning for Sentence Embeddings

We propose DiffCSE, an unsupervised contrastive learning framework for learning sentence embeddings. DiffCSE learns sentence embeddings that are sensitive to the difference between the original sentence and an edited sentence, where the edited sentence is obtained by stochastically masking out the original sentence and then sampling from a masked language model. We show that DiffSCE is an instance of equivariant contrastive learning (Dangovski et al., 2021), which generalizes contrastive learning and learns representations that are insensitive to certain types of augmentations and sensitive to other "harmful" types of augmentations. Our experiments show that DiffCSE achieves state-of-the-art results among unsupervised sentence representation learning methods, outperforming unsupervised SimCSE by 2.3 absolute points on semantic textual similarity tasks.

preprint2022arXiv

Direct observation of moiré flat-band breakdown at the edge of magic-angle twisted bilayer graphene

Low-energy moiré flat bands in magic-angle twisted bilayer graphene (tBG) have demonstrated incredible potentials to exhibit rich exotic quantum phenomena. Theoretically, the moiré flat bands of tBG are based on the extended structures, i.e., the moiré patterns with periodic boundary conditions. However, a fundamental question of whether the flat bands can exist in the graphene moiré patterns with a reduced structure symmetry, such as sample edges, remains unanswered. Here, via scanning tunneling microscopy and spectroscopy, we study the local electronic properties of a magic-angle tBG near the sample terminated edge and report a direct observation of breakdown of the moiré flat bands. We show that the moiré electronic structures, including the low-energy flat bands, can sufficiently exist in a complete moiré spot, i.e., a moiré supercell, right at the edge even the translational symmetry of the moiré patterns is broken in one direction. However, the flat-band characteristic is obviously absent in the incomplete moiré spots that are partly terminated by the edge. Our results indicate that a whole moiré spot is sufficient and indispensable for the generation of the effective moiré flat bands in tBG.

preprint2022arXiv

Domain Randomization and Pyramid Consistency: Simulation-to-Real Generalization without Accessing Target Domain Data

We propose to harness the potential of simulation for the semantic segmentation of real-world self-driving scenes in a domain generalization fashion. The segmentation network is trained without any data of target domains and tested on the unseen target domains. To this end, we propose a new approach of domain randomization and pyramid consistency to learn a model with high generalizability. First, we propose to randomize the synthetic images with the styles of real images in terms of visual appearances using auxiliary datasets, in order to effectively learn domain-invariant representations. Second, we further enforce pyramid consistency across different "stylized" images and within an image, in order to learn domain-invariant and scale-invariant features, respectively. Extensive experiments are conducted on the generalization from GTA and SYNTHIA to Cityscapes, BDDS and Mapillary; and our method achieves superior results over the state-of-the-art techniques. Remarkably, our generalization results are on par with or even better than those obtained by state-of-the-art simulation-to-real domain adaptation methods, which access the target domain data at training time.

preprint2022arXiv

Dynamic Backdoor Attacks Against Machine Learning Models

Machine learning (ML) has made tremendous progress during the past decade and is being adopted in various critical real-world applications. However, recent research has shown that ML models are vulnerable to multiple security and privacy attacks. In particular, backdoor attacks against ML models have recently raised a lot of awareness. A successful backdoor attack can cause severe consequences, such as allowing an adversary to bypass critical authentication systems. Current backdooring techniques rely on adding static triggers (with fixed patterns and locations) on ML model inputs which are prone to detection by the current backdoor detection mechanisms. In this paper, we propose the first class of dynamic backdooring techniques against deep neural networks (DNN), namely Random Backdoor, Backdoor Generating Network (BaN), and conditional Backdoor Generating Network (c-BaN). Triggers generated by our techniques can have random patterns and locations, which reduce the efficacy of the current backdoor detection mechanisms. In particular, BaN and c-BaN based on a novel generative network are the first two schemes that algorithmically generate triggers. Moreover, c-BaN is the first conditional backdooring technique that given a target label, it can generate a target-specific trigger. Both BaN and c-BaN are essentially a general framework which renders the adversary the flexibility for further customizing backdoor attacks. We extensively evaluate our techniques on three benchmark datasets: MNIST, CelebA, and CIFAR-10. Our techniques achieve almost perfect attack performance on backdoored data with a negligible utility loss. We further show that our techniques can bypass current state-of-the-art defense mechanisms against backdoor attacks, including ABS, Februus, MNTD, Neural Cleanse, and STRIP.

preprint2022arXiv

Effective Tensor Completion via Element-wise Weighted Low-rank Tensor Train with Overlapping Ket Augmentation

In recent years, there have been an increasing number of applications of tensor completion based on the tensor train (TT) format because of its efficiency and effectiveness in dealing with higher-order tensor data. However, existing tensor completion methods using TT decomposition have two obvious drawbacks. One is that they only consider mode weights according to the degree of mode balance, even though some elements are recovered better in an unbalanced mode. The other is that serious blocking artifacts appear when the missing element rate is relatively large. To remedy such two issues, in this work, we propose a novel tensor completion approach via the element-wise weighted technique. Accordingly, a novel formulation for tensor completion and an effective optimization algorithm, called as tensor completion by parallel weighted matrix factorization via tensor train (TWMac-TT), is proposed. In addition, we specifically consider the recovery quality of edge elements from adjacent blocks. Different from traditional reshaping and ket augmentation, we utilize a new tensor augmentation technique called overlapping ket augmentation, which can further avoid blocking artifacts. We then conduct extensive performance evaluations on synthetic data and several real image data sets. Our experimental results demonstrate that the proposed algorithm TWMac-TT outperforms several other competing tensor completion methods.

preprint2022arXiv

Electronic structure, magnetic properties and pairing tendencies of the copper-based honeycomb lattice Na$_2$Cu$_2$TeO$_6$

Spin-$1/2$ chains with alternating antiferromagnetic and ferromagnetic couplings have attracted considerable interest due to the topological character of their spin excitations. Here, using density functional theory and density matrix renormalization group methods, we have systematically studied the dimerized chain system Na$_2$Cu$_2$TeO$_6$. Near the Fermi level, the dominant states are mainly contributed by the Cu $3d_{x^2-y^2}$ orbitals highly hybridized with the O $2p$ orbitals in the nonmagnetic phase, leading to an "effective" single-orbital low-energy model. Furthermore, the bandwidth of the Cu $3d_{x^2-y^2}$ states is small ($\sim 0.8$ eV), suggesting that electronic correlations will strongly affect this system. By introducing such electronic correlations, we found this system is a Mott insulator. Moreover, by calculating the magnetic exchange interactions ($J_1$, $J_2$ and $J_3$), we explained the size and sign of the exchange interactions in Na$_2$Cu$_2$TeO$_6$, in agreement with neutron experiments. In addition, we constructed a single-orbital Hubbard model for this dimerized chain system, where the quantum fluctuations are taken into account. Both AFM and FM coupling ($\uparrow$-$\downarrow$-$\downarrow$-$\uparrow$) along the chain were found in our DMRG and Lanczos calculations, in agreement with DFT and neutron results. We also calculated the hole pairing binding energy $ΔE$ which becomes negative at Hubbard $U \sim 11$ eV, indicating incipient pairing tendencies. Finally, we also looked at various cases of hole doping that always exhibit tight pairs. Thus, we believe our results for Na$_2$Cu$_2$TeO$_6$ could provide guidance to experimentalists and theorists working on this dimerized chain system, such as short-range magnetic coupling, doping effects, and possible pairing tendencies.

preprint2022arXiv

Evolution of barchan dune interactions investigated by a downscaled water tunnel experiment: the temporal characteristics and a soliton-like behavior

This paper reports a downscaled water tunnel experiment to study the temporal characteristics of a double dune interaction system and the new pattern of dune interaction when the initial mass ratio of the two dunes is large. These topics are useful for a comprehensive understanding of the dune interaction system but were rarely covered before. The turnover time scale under dune interaction is defined, and its time averaged value is found to have a nonmonotonic relationship with the initial mass ratio. A nonmonotonic relationship is also found between the convexity of the downstream dune tip and the initial mass ratio. The stationary points of the two nonmonotonic curves above correspond to the same dune interaction pattern named 'exchange-chasing', which is considered indispensable in the classification map of dune interactions. The upstream dune acts as an energy transmitter between fluid flow and the downstream dune. A soliton-like behavior occurs when the downstream dune enlarges, where a small dune is detached from the downstream dune tip and gets passed by the upstream dune approximately without mass exchange. The activity of such temporary soliton is found to be negatively related with the initial dune spacing and positively related with the initial mass ratio.

preprint2022arXiv

Exploring Timbre Disentanglement in Non-Autoregressive Cross-Lingual Text-to-Speech

In this paper, we study the disentanglement of speaker and language representations in non-autoregressive cross-lingual TTS models from various aspects. We propose a phoneme length regulator that solves the length mismatch problem between IPA input sequence and monolingual alignment results. Using the phoneme length regulator, we present a FastPitch-based cross-lingual model with IPA symbols as input representations. Our experiments show that language-independent input representations (e.g. IPA symbols), an increasing number of training speakers, and explicit modeling of speech variance information all encourage non-autoregressive cross-lingual TTS model to disentangle speaker and language representations. The subjective evaluation shows that our proposed model can achieve decent naturalness and speaker similarity in cross-language voice cloning.

preprint2022arXiv

Finding MNEMON: Reviving Memories of Node Embeddings

Previous security research efforts orbiting around graphs have been exclusively focusing on either (de-)anonymizing the graphs or understanding the security and privacy issues of graph neural networks. Little attention has been paid to understand the privacy risks of integrating the output from graph embedding models (e.g., node embeddings) with complex downstream machine learning pipelines. In this paper, we fill this gap and propose a novel model-agnostic graph recovery attack that exploits the implicit graph structural information preserved in the embeddings of graph nodes. We show that an adversary can recover edges with decent accuracy by only gaining access to the node embedding matrix of the original graph without interactions with the node embedding models. We demonstrate the effectiveness and applicability of our graph recovery attack through extensive experiments.

preprint2022arXiv

Freedom to Choose: Understanding Input Modality Preferences of People with Upper-body Motor Impairments for Activities of Daily Living

Many people with upper-body motor impairments encounter challenges while performing Activities of Daily Living (ADLs) and Instrumental Activities of Daily Living (IADLs), such as toileting, grooming, and managing finances, which have impacts on their Quality of Life (QOL). Although existing assistive technologies enable people with upper-body motor impairments to use different input modalities to interact with computing devices independently (e.g., using voice to interact with a computer), many people still require Personal Care Assistants (PCAs) to perform ADLs. Multimodal input has the potential to enable users to perform ADLs without human assistance. We conducted 12 semi-structured interviews with people who have upper-body motor impairments to capture their existing practices and challenges of performing ADLs, identify opportunities to expand the input possibilities for assistive devices, and understand user preferences for multimodal interaction during everyday tasks. Finally, we discuss implications for the design and use of multimodal input solutions to support user independence and collaborative experiences when performing daily living tasks.

preprint2022arXiv

Geometric Algebra and Algebraic Geometry of Loop and Potts Models

We uncover a connection between two seemingly separate subjects in integrable models: the representation theory of the affine Temperley-Lieb algebra, and the algebraic structure of solutions to the Bethe equations of the XXZ spin chain. We study the solution of Bethe equations analytically by computational algebraic geometry, and find that the solution space encodes rich information about the representation theory of Temperley-Lieb algebra. Using these connections, we compute the partition function of the completely-packed loop model and of the closely related random-cluster Potts model, on medium-size lattices with toroidal boundary conditions, by two quite different methods. We consider the partial thermodynamic limit of infinitely long tori and analyze the corresponding condensation curves of the zeros of the partition functions. Two components of these curves are obtained analytically in the full thermodynamic limit.

preprint2022arXiv

Global fits of SUSY at future Higgs factories

In this work, we study the impact of electroweak and Higgs precision measurements at future electron-positron colliders on several typical supersymmetric models, including the Constrained Minimal Supersymmetric Standard Model (CMSSM), Non-Universal Higgs Mass generalisations (NUHM1, NUHM2), and the 7-dimensional Minimal Supersymmetric Standard Model (MSSM7). Using publicly-available data from the \textsf{GAMBIT} community, we post-process previous SUSY global fits with additional likelihoods to explore the discovery potential of Higgs factories, such as the Circular Electron Positron Collider (CEPC), the Future Circular Collider (FCC) and the International Linear Collider (ILC). We show that the currently allowed parameter space of these models will be further tested by future precision measurements. In particular, dark matter annihilation mechanisms may be distinguished by precise measurements of Higgs observables.

preprint2022arXiv

High-throughput study of the anomalous Hall effect

Despite being known for a long time the anomalous Hall effect still attracts attention because of its complex origins, its connection to topology and because it serves as a useful probe of the magnetic order. Here we study the anomalous Hall effect using automatic high-throughput calculation scheme. We calculate the intrinsic anomalous Hall effect in 2871 ferromagnetic materials. We use these results to study general properties of the anomalous Hall effect such as its dependence on the strength of the spin-orbit coupling or magnetization. We also examine the origin of the anomalous Hall effect in the materials with the largest effect and show that the origin of the large anomalous Hall effect is usually associated with symmetry protected band degeneracies in the non-relativistic electronic structure, typically mirror symmetry protected nodal lines. Additionally, we study the dependence of the anomalous Hall effect on the magnetization direction, showing that in many materials it differs significantly from the commonly assumed expression $\mathbf{j}^\text{AHE} \sim \mathbf{M} \times \mathbf{E}$.

preprint2022arXiv

Learning Rich Features for Gait Recognition by Integrating Skeletons and Silhouettes

Gait recognition captures gait patterns from the walking sequence of an individual for identification. Most existing gait recognition methods learn features from silhouettes or skeletons for the robustness to clothing, carrying, and other exterior factors. The combination of the two data modalities, however, is not fully exploited. Previous multimodal gait recognition methods mainly employ the skeleton to assist the local feature extraction where the intrinsic discrimination of the skeleton data is ignored. This paper proposes a simple yet effective Bimodal Fusion (BiFusion) network which mines discriminative gait patterns in skeletons and integrates with silhouette representations to learn rich features for identification. Particularly, the inherent hierarchical semantics of body joints in a skeleton is leveraged to design a novel Multi-Scale Gait Graph (MSGG) network for the feature extraction of skeletons. Extensive experiments on CASIA-B and OUMVLP demonstrate both the superiority of the proposed MSGG network in modeling skeletons and the effectiveness of the bimodal fusion for gait recognition. Under the most challenging condition of walking in different clothes on CASIA-B, our method achieves the rank-1 accuracy of 92.1%.

preprint2022arXiv

Linking Emergent and Natural Languages via Corpus Transfer

The study of language emergence aims to understand how human languages are shaped by perceptual grounding and communicative intent. Computational approaches to emergent communication (EC) predominantly consider referential games in limited domains and analyze the learned protocol within the game framework. As a result, it remains unclear how the emergent languages from these settings connect to natural languages or provide benefits in real-world language processing tasks, where statistical models trained on large text corpora dominate. In this work, we propose a novel way to establish such a link by corpus transfer, i.e. pretraining on a corpus of emergent language for downstream natural language tasks, which is in contrast to prior work that directly transfers speaker and listener parameters. Our approach showcases non-trivial transfer benefits for two different tasks -- language modeling and image captioning. For example, in a low-resource setup (modeling 2 million natural language tokens), pre-training on an emergent language corpus with just 2 million tokens reduces model perplexity by $24.6\%$ on average across ten natural languages. We also introduce a novel metric to predict the transferability of an emergent language by translating emergent messages to natural language captions grounded on the same images. We find that our translation-based metric highly correlates with the downstream performance on modeling natural languages (for instance $ρ=0.83$ on Hebrew), while topographic similarity, a popular metric in previous work, shows surprisingly low correlation ($ρ=0.003$), hinting that simple properties like attribute disentanglement from synthetic domains might not capture the full complexities of natural language. Our findings also indicate potential benefits of moving language emergence forward with natural language resources and models.

preprint2022arXiv

Low energy supersymmetry confronted with current experiments: an overview

This study provides a brief overview of low-energy supersymmetry (SUSY) in light of current experimental constraints, such as collider searches, dark matter searches, and muon $g-2$ measurements. In addition, we survey a variety of low energy supersymmetric models: the phenomenological minimal supersymmetric model (MSSM); the supersymmetric models with cut-off-scale boundary conditions, i.e., the minimal supergravity (mSUGRA) or the constrained MSSM (CMSSM), the gauge mediation of SUSY breaking (GMSB), and the anomaly mediation of SUSY breaking (AMSB), as well as their extensions. The conclusion is that the low energy SUSY can survive all current experimental constraints and remains compelling, albeit suffering from a little fine-tuning problem. The fancy models like mSUGRA, GMSB, and AMSB need to be extended if the muon $g-2$ anomaly comes from new physics.

preprint2022arXiv

Low energy SUSY confronted with new measurements of W-boson mass and muon g-2

The new CDF II measurement of $W$-boson mass shows a 7$σ$ deviation from the Standard Model (SM) prediction, while the recent FNAL measurement of the muon $g-2$ shows a 4.2$σ$ deviation (combined with the BNL result) from the SM. Both of them strongly indicate new physics beyond the SM. In this work we study the implication of both measurements on low energy supersymmetry. With an extensive exploration of the parameter space of the minimal supersymmetric standard model (MSSM), we find that in the parameter space allowed by current experimental constraints from colliders and dark matter detections, the MSSM can simultaneously explain both measurements on the edge of $2σ$ level, taking theoretical uncertainties into consideration. The favored parameter space, characterized by a compressed spectrum between bino, wino and stau, with the stop being around 1 TeV, may be covered in the near future LHC searches.

preprint2022arXiv

m-Order Time Optimal Control Synthesis Function of Discrete System

In this paper, first of all, we introduce the basic concepts of generating function in combinatorics and some combinatorial identities. In order to facilitate the understanding of m-order time optimal control synthesis function of discrete system (referred as m-order synthesis function), secondly, we introduce the derivation process and control ideas of 2nd-order synthesis function, and then deduce in detail the m-order synthesis function by means of generating function. By use of the m-order tracking-form synthesis function with filter factor, the methods of signal extraction and its predictive compensation are presented in this paper, and their immunity and effectiveness are verified by numerical simulation.

preprint2022arXiv

Maxwell field with gauge fixing term in de Sitter space: exact solution and stress tensor

The Maxwell field with a general gauge fixing (GF) term is nontrivial, not only the longitudinal and temporal modes are mixed up in the field equations, but also unwanted consequences might arise from the GF term. We derive the complete set of solutions in de Sitter space, and implement the covariant canonical quantization which restricts the residual gauge transformation down to a quantum residual gauge transformation. Then, in the Gupta-Bleuler (GB) physical state, we calculate the stress tensor which is amazingly independent of the gauge fixing constant and is also invariant under the quantum residual gauge transformation. The transverse components are simply the same as those in the Minkowski spacetime, and the transverse vacuum stress tensor has only one UV divergent term ($\propto k^4$), which becomes zero by the 0th-order adiabatic regularization. The longitudinal-temporal stress tensor in the GB state is zero due to a cancelation between the longitudinal and temporal parts. More interesting is the stress tensor of the GF term. Its particle contribution is zero due to the cancelation in the GB state, and its vacuum contribution is twice that of a minimally-coupling massless scalar field, containing $k^4$ and $k^2$ divergences. After the 2nd-order adiabatic regularization, the GF vacuum stress tensor becomes zero too, so that there is no need to introduce a ghost field, and the zero GF vacuum stress tensor can not be a possible candidate for the cosmological constant. Thus, all the physics predicted by the Maxwell field with the GF term will be the same as that without the GF term. We also carry out analogous calculation in the Minkowski spacetime, and the stress tensor is similar to, but simpler than that in de Sitter space.

preprint2022arXiv

Membership Inference Attacks by Exploiting Loss Trajectory

Machine learning models are vulnerable to membership inference attacks in which an adversary aims to predict whether or not a particular sample was contained in the target model's training dataset. Existing attack methods have commonly exploited the output information (mostly, losses) solely from the given target model. As a result, in practical scenarios where both the member and non-member samples yield similarly small losses, these methods are naturally unable to differentiate between them. To address this limitation, in this paper, we propose a new attack method, called \system, which can exploit the membership information from the whole training process of the target model for improving the attack performance. To mount the attack in the common black-box setting, we leverage knowledge distillation, and represent the membership information by the losses evaluated on a sequence of intermediate models at different distillation epochs, namely \emph{distilled loss trajectory}, together with the loss from the given target model. Experimental results over different datasets and model architectures demonstrate the great advantage of our attack in terms of different metrics. For example, on CINIC-10, our attack achieves at least 6$\times$ higher true-positive rate at a low false-positive rate of 0.1\% than existing methods. Further analysis demonstrates the general effectiveness of our attack in more strict scenarios.

preprint2022arXiv

Membership-Doctor: Comprehensive Assessment of Membership Inference Against Machine Learning Models

Machine learning models are prone to memorizing sensitive data, making them vulnerable to membership inference attacks in which an adversary aims to infer whether an input sample was used to train the model. Over the past few years, researchers have produced many membership inference attacks and defenses. However, these attacks and defenses employ a variety of strategies and are conducted in different models and datasets. The lack of comprehensive benchmark, however, means we do not understand the strengths and weaknesses of existing attacks and defenses. We fill this gap by presenting a large-scale measurement of different membership inference attacks and defenses. We systematize membership inference through the study of nine attacks and six defenses and measure the performance of different attacks and defenses in the holistic evaluation. We then quantify the impact of the threat model on the results of these attacks. We find that some assumptions of the threat model, such as same-architecture and same-distribution between shadow and target models, are unnecessary. We are also the first to execute attacks on the real-world data collected from the Internet, instead of laboratory datasets. We further investigate what determines the performance of membership inference attacks and reveal that the commonly believed overfitting level is not sufficient for the success of the attacks. Instead, the Jensen-Shannon distance of entropy/cross-entropy between member and non-member samples correlates with attack performance much better. This gives us a new way to accurately predict membership inference risks without running the attack. Finally, we find that data augmentation degrades the performance of existing attacks to a larger extent, and we propose an adaptive attack using augmentation to train shadow and attack models that improve attack performance.

preprint2022arXiv

mmFormer: Multimodal Medical Transformer for Incomplete Multimodal Learning of Brain Tumor Segmentation

Accurate brain tumor segmentation from Magnetic Resonance Imaging (MRI) is desirable to joint learning of multimodal images. However, in clinical practice, it is not always possible to acquire a complete set of MRIs, and the problem of missing modalities causes severe performance degradation in existing multimodal segmentation methods. In this work, we present the first attempt to exploit the Transformer for multimodal brain tumor segmentation that is robust to any combinatorial subset of available modalities. Concretely, we propose a novel multimodal Medical Transformer (mmFormer) for incomplete multimodal learning with three main components: the hybrid modality-specific encoders that bridge a convolutional encoder and an intra-modal Transformer for both local and global context modeling within each modality; an inter-modal Transformer to build and align the long-range correlations across modalities for modality-invariant features with global semantics corresponding to tumor region; a decoder that performs a progressive up-sampling and fusion with the modality-invariant features to generate robust segmentation. Besides, auxiliary regularizers are introduced in both encoder and decoder to further enhance the model's robustness to incomplete modalities. We conduct extensive experiments on the public BraTS $2018$ dataset for brain tumor segmentation. The results demonstrate that the proposed mmFormer outperforms the state-of-the-art methods for incomplete multimodal brain tumor segmentation on almost all subsets of incomplete modalities, especially by an average 19.07% improvement of Dice on tumor segmentation with only one available modality. The code is available at https://github.com/YaoZhang93/mmFormer.

preprint2022arXiv

On Xing Tian and the Perseverance of Anti-China Sentiment Online

Sinophobia, anti-Chinese sentiment, has existed on the Web for a long time. The outbreak of COVID-19 and the extended quarantine has further amplified it. However, we lack a quantitative understanding of the cause of Sinophobia as well as how it evolves over time. In this paper, we conduct a large-scale longitudinal measurement of Sinophobia, between 2016 and 2021, on two mainstream and fringe Web communities. By analyzing 8B posts from Reddit and 206M posts from 4chan's /pol/, we investigate the origins, evolution, and content of Sinophobia. We find that, anti-Chinese content may be evoked by political events not directly related to China, e.g., the U.S. withdrawal from the Paris Agreement. And during the COVID-19 pandemic, daily usage of Sinophobic slurs has significantly increased even with the hate-speech ban policy. We also show that the semantic meaning of the words "China" and "Chinese" are shifting towards Sinophobic slurs with the rise of COVID-19 and remain the same in the pandemic period. We further use topic modeling to show the topics of Sinophobic discussion are pretty diverse and broad. We find that both Web communities share some common Sinophobic topics like ethnics, economics and commerce, weapons and military, foreign relations, etc. However, compared to 4chan's /pol/, more daily life-related topics including food, game, and stock are found in Reddit. Our finding also reveals that the topics related to COVID-19 and blaming the Chinese government are more prevalent in the pandemic period. To the best of our knowledge, this paper is the longest quantitative measurement of Sinophobia.

preprint2022arXiv

Point-splitting regularization of the stress tensor of a coupling scalar field in de Sitter space

We perform the point-splitting regularization on the vacuum stress tensor of a coupling scalar field in de Sitter space under the guidance from the adiabatically regularized Green's function. For the massive scalar field with the minimal coupling $ξ=0$, the 2nd order point-splitting regularization yields a finite vacuum stress tensor with a positive, constant energy density, which can be identified as the cosmological constant that drives de Sitter inflation. For the coupling $ξ\ne 0$, we find that, even if the regularized Green's function is continuous, UV and IR convergent, the point-splitting regularization does not automatically lead to an appropriate stress tensor. The coupling $ξR$ causes log divergent terms, as well as higher-order finite terms which depend upon the path of the coincidence limit. After removing these unwanted terms by extra treatments, the 2nd-order regularization for small couplings $ξ\in(0,\frac{1}{7.04})$, and respectively the 0th-order regularization for the conformal coupling $ξ=\frac16$, yield a finite, constant vacuum stress tensor, in analogy to the case $ξ=0$. For the massless field with $ξ=0$ or $ξ=\frac16$, the point-splitting regularization yields a vanishing vacuum stress tensor, and there is no conformal trace anomaly for $ξ=\frac16$. If the 4th-order regularization were taken, the regularized energy density for general $ξ$ would be negative, which is inconsistent with the de Sitter inflation, and the regularized Green's function would be singular at the zero mass, which is unphysical. In all these cases, the stress tensor from the point-splitting regularization is equal to that from the adiabatic one.

preprint2022arXiv

Pro-UIGAN: Progressive Face Hallucination from Occluded Thumbnails

In this paper, we study the task of hallucinating an authentic high-resolution (HR) face from an occluded thumbnail. We propose a multi-stage Progressive Upsampling and Inpainting Generative Adversarial Network, dubbed Pro-UIGAN, which exploits facial geometry priors to replenish and upsample (8*) the occluded and tiny faces (16*16 pixels). Pro-UIGAN iteratively (1) estimates facial geometry priors for low-resolution (LR) faces and (2) acquires non-occluded HR face images under the guidance of the estimated priors. Our multi-stage hallucination network super-resolves and inpaints occluded LR faces in a coarse-to-fine manner, thus reducing unwanted blurriness and artifacts significantly. Specifically, we design a novel cross-modal transformer module for facial priors estimation, in which an input face and its landmark features are formulated as queries and keys, respectively. Such a design encourages joint feature learning across the input facial and landmark features, and deep feature correspondences will be discovered by attention. Thus, facial appearance features and facial geometry priors are learned in a mutual promotion manner. Extensive experiments demonstrate that our Pro-UIGAN achieves visually pleasing HR faces, reaching superior performance in downstream tasks, i.e., face alignment, face parsing, face recognition and expression classification, compared with other state-of-the-art (SotA) methods.

preprint2022arXiv

Pseudogap metal and magnetization plateau from doping moiré Mott insulator

The problem of doping Mott insulators is of fundamental importance and long-standing interest in the study of strongly correlated electron systems. The advent of semiconductor based moiré materials opens a new ground for simulating the Hubbard model on the triangular lattice and exploring the rich phase diagram of doped Mott insulators as a function of doping and external magnetic field. Based on our recent identification of spin polaron quasiparticle in Mott insulator, in this work we predict a new metallic state emerges at small doping and intermediate field range, a pseudogap metal that exhibits a single-particle gap and a doping-dependent magnetization plateau.

preprint2022arXiv

Semi-Leak: Membership Inference Attacks Against Semi-supervised Learning

Semi-supervised learning (SSL) leverages both labeled and unlabeled data to train machine learning (ML) models. State-of-the-art SSL methods can achieve comparable performance to supervised learning by leveraging much fewer labeled data. However, most existing works focus on improving the performance of SSL. In this work, we take a different angle by studying the training data privacy of SSL. Specifically, we propose the first data augmentation-based membership inference attacks against ML models trained by SSL. Given a data sample and the black-box access to a model, the goal of membership inference attack is to determine whether the data sample belongs to the training dataset of the model. Our evaluation shows that the proposed attack can consistently outperform existing membership inference attacks and achieves the best performance against the model trained by SSL. Moreover, we uncover that the reason for membership leakage in SSL is different from the commonly believed one in supervised learning, i.e., overfitting (the gap between training and testing accuracy). We observe that the SSL model is well generalized to the testing data (with almost 0 overfitting) but ''memorizes'' the training data by giving a more confident prediction regardless of its correctness. We also explore early stopping as a countermeasure to prevent membership inference attacks against SSL. The results show that early stopping can mitigate the membership inference attack, but with the cost of model's utility degradation.

preprint2022arXiv

Semi-supervised Cardiac Image Segmentation via Label Propagation and Style Transfer

Accurate segmentation of cardiac structures can assist doctors to diagnose diseases, and to improve treatment planning, which is highly demanded in the clinical practice. However, the shortage of annotation and the variance of the data among different vendors and medical centers restrict the performance of advanced deep learning methods. In this work, we present a fully automatic method to segment cardiac structures including the left (LV) and right ventricle (RV) blood pools, as well as for the left ventricular myocardium (MYO) in MRI volumes. Specifically, we design a semi-supervised learning method to leverage unlabelled MRI sequence timeframes by label propagation. Then we exploit style transfer to reduce the variance among different centers and vendors for more robust cardiac image segmentation. We evaluate our method in the M&Ms challenge 7 , ranking 2nd place among 14 competitive teams.

preprint2022arXiv

Shallow Fusion of Weighted Finite-State Transducer and Language Model for Text Normalization

Text normalization (TN) systems in production are largely rule-based using weighted finite-state transducers (WFST). However, WFST-based systems struggle with ambiguous input when the normalized form is context-dependent. On the other hand, neural text normalization systems can take context into account but they suffer from unrecoverable errors and require labeled normalization datasets, which are hard to collect. We propose a new hybrid approach that combines the benefits of rule-based and neural systems. First, a non-deterministic WFST outputs all normalization candidates, and then a neural language model picks the best one -- similar to shallow fusion for automatic speech recognition. While the WFST prevents unrecoverable errors, the language model resolves contextual ambiguity. The approach is easy to extend and we show it is effective. It achieves comparable or better results than existing state-of-the-art TN models.

preprint2022arXiv

Simple and statistically sound recommendations for analysing physical theories

Physical theories that depend on many parameters or are tested against data from many different experiments pose unique challenges to statistical inference. Many models in particle physics, astrophysics and cosmology fall into one or both of these categories. These issues are often sidestepped with statistically unsound ad hoc methods, involving intersection of parameter intervals estimated by multiple experiments, and random or grid sampling of model parameters. Whilst these methods are easy to apply, they exhibit pathologies even in low-dimensional parameter spaces, and quickly become problematic to use and interpret in higher dimensions. In this article we give clear guidance for going beyond these procedures, suggesting where possible simple methods for performing statistically sound inference, and recommendations of readily-available software tools and standards that can assist in doing so. Our aim is to provide any physicists lacking comprehensive statistical training with recommendations for reaching correct scientific conclusions, with only a modest increase in analysis burden. Our examples can be reproduced with the code publicly available at https://doi.org/10.5281/zenodo.4322283.

preprint2022arXiv

SpeechSplit 2.0: Unsupervised speech disentanglement for voice conversion Without tuning autoencoder Bottlenecks

SpeechSplit can perform aspect-specific voice conversion by disentangling speech into content, rhythm, pitch, and timbre using multiple autoencoders in an unsupervised manner. However, SpeechSplit requires careful tuning of the autoencoder bottlenecks, which can be time-consuming and less robust. This paper proposes SpeechSplit 2.0, which constrains the information flow of the speech component to be disentangled on the autoencoder input using efficient signal processing methods instead of bottleneck tuning. Evaluation results show that SpeechSplit 2.0 achieves comparable performance to SpeechSplit in speech disentanglement and superior robustness to the bottleneck size variations. Our code is available at https://github.com/biggytruck/SpeechSplit2.

preprint2022arXiv

SSLGuard: A Watermarking Scheme for Self-supervised Learning Pre-trained Encoders

Self-supervised learning is an emerging machine learning paradigm. Compared to supervised learning which leverages high-quality labeled datasets, self-supervised learning relies on unlabeled datasets to pre-train powerful encoders which can then be treated as feature extractors for various downstream tasks. The huge amount of data and computational resources consumption makes the encoders themselves become the valuable intellectual property of the model owner. Recent research has shown that the machine learning model's copyright is threatened by model stealing attacks, which aim to train a surrogate model to mimic the behavior of a given model. We empirically show that pre-trained encoders are highly vulnerable to model stealing attacks. However, most of the current efforts of copyright protection algorithms such as watermarking concentrate on classifiers. Meanwhile, the intrinsic challenges of pre-trained encoder's copyright protection remain largely unstudied. We fill the gap by proposing SSLGuard, the first watermarking scheme for pre-trained encoders. Given a clean pre-trained encoder, SSLGuard injects a watermark into it and outputs a watermarked version. The shadow training technique is also applied to preserve the watermark under potential model stealing attacks. Our extensive evaluation shows that SSLGuard is effective in watermark injection and verification, and it is robust against model stealing and other watermark removal attacks such as input noising, output perturbing, overwriting, model pruning, and fine-tuning.

preprint2022arXiv

Stability and low-energy orientations of interphase boundaries in multiaxial ferroelectrics: Phase-field simulations

The coexistence of different ferroelectric phases enables the tunability of the macroscopic properties and extensive applications from piezoelectric transducers to nonvolatile memories. Here we develop a thermodynamic model to predict the stability and low-energy orientations of boundaries between different phases in ferroelectrics. Taking lead zirconate titanate and bismuth ferrite as two examples, we demonstrate that the low-energy orientations of interphase boundaries are largely determined by minimizing the electrostatic and elastic energies. Phase-field simulations are employed to analyze the competition between the interfacial energy and the electrostatic and elastic energies. Our simulation results demonstrate that the lowering of crystal symmetry could occur due to the electrical and mechanical incompatibilities between the two phases, which can be used to explain the experimentally observed low-symmetry phases near morphotropic phase boundaries. Our work provides theoretical foundations for understanding and controlling the interphase boundaries in ferroelectric materials for multifunctional applications.

preprint2022arXiv

Strongly anisotropic electronic and magnetic structures in oxide dichlorides RuOCl$_2$ and OsOCl$_2$

Here, using density functional theory and density matrix renormalization group methods, we investigate the electronic and magnetic properties of RuOCl$_2$ and OsOCl$_2$ with $d^4$ electronic configurations. Different from a previous study using VOI$_2$ with $d^1$ configuration, these systems with $4d^4$ or $5d^4$ do not exhibit a ferroelectric instability along the $a$-axis. Due to the fully-occupied $d_{xy}$ orbital in RuOCl$_2$ and OsOCl$_2$, the Peierls instability distortion disappears along the $b$-axis, leading to an undistorted I${\rm mmm}$ phase (No. 71). Furthermore, we observe strongly anisotropic electronic and magnetic structures along the $a$-axis. The large crystal-field splitting energy (between $d_{xz/yz}$ and $d_{xy}$ orbitals) and large hopping between nearest-neighbor Ru and Os atoms suppresses the spin-orbital effect in $M$OCl$_2$ ($M$ = Ru or Os) with electronic density $n = 4$, resulting in a spin-1 system instead of a $J = 0$ singlet ground state. Moreover, we find staggered antiferromagnetic order with $π$ wavevector along the $M$-O chain direction ($a$-axis) while the magnetic coupling along the $b$-axis is weak. Based on Wannier functions from first-principles calculations, we calculated the relevant hopping amplitudes and crystal-field splitting energies of the $t_{2g}$ orbitals for the Os atoms to construct a multi-orbital Hubbard model for the $M$-O chains. Staggered AFM with $\uparrow$-$\downarrow$-$\uparrow$-$\downarrow$ spin structure dominates in our DMRG calculations, in agreement with DFT calculations.

preprint2022arXiv

Tackling Spoofing-Aware Speaker Verification with Multi-Model Fusion

Recent years have witnessed the extraordinary development of automatic speaker verification (ASV). However, previous works show that state-of-the-art ASV models are seriously vulnerable to voice spoofing attacks, and the recently proposed high-performance spoofing countermeasure (CM) models only focus solely on the standalone anti-spoofing tasks, and ignore the subsequent speaker verification process. How to integrate the CM and ASV together remains an open question. A spoofing aware speaker verification (SASV) challenge has recently taken place with the argument that better performance can be delivered when both CM and ASV subsystems are optimized jointly. Under the challenge's scenario, the integrated systems proposed by the participants are required to reject both impostor speakers and spoofing attacks from target speakers, which intuitively and effectively matches the expectation of a reliable, spoofing-robust ASV system. This work focuses on fusion-based SASV solutions and proposes a multi-model fusion framework to leverage the power of multiple state-of-the-art ASV and CM models. The proposed framework vastly improves the SASV-EER from 8.75% to 1.17\%, which is 86% relative improvement compared to the best baseline system in the SASV challenge.

preprint2022arXiv

Teacher Model Fingerprinting Attacks Against Transfer Learning

Transfer learning has become a common solution to address training data scarcity in practice. It trains a specified student model by reusing or fine-tuning early layers of a well-trained teacher model that is usually publicly available. However, besides utility improvement, the transferred public knowledge also brings potential threats to model confidentiality, and even further raises other security and privacy issues. In this paper, we present the first comprehensive investigation of the teacher model exposure threat in the transfer learning context, aiming to gain a deeper insight into the tension between public knowledge and model confidentiality. To this end, we propose a teacher model fingerprinting attack to infer the origin of a student model, i.e., the teacher model it transfers from. Specifically, we propose a novel optimization-based method to carefully generate queries to probe the student model to realize our attack. Unlike existing model reverse engineering approaches, our proposed fingerprinting method neither relies on fine-grained model outputs, e.g., posteriors, nor auxiliary information of the model architecture or training dataset. We systematically evaluate the effectiveness of our proposed attack. The empirical results demonstrate that our attack can accurately identify the model origin with few probing queries. Moreover, we show that the proposed attack can serve as a stepping stone to facilitating other attacks against machine learning models, such as model stealing.

preprint2022arXiv

The X-ray transform on a generic family of smooth curves

We study the X-ray transform over a generic family of smooth curves in $\mathbb{R}^2$ with a Riemannian metric $g$. We show that the singularities cannot be recovered from local data in the presence of conjugate points, and therefore artifacts may arise in the reconstruction. We perform numerical experiments to illustrate the results.

preprint2022arXiv

Theoretical study of the crystal and electronic properties of $α$-RuI$_3$

The material $α$-RuCl$_3$, with a two-dimensional Ru-honeycomb sublattice, has attracted considerable attention because it may be a realization of the Kitaev quantum spin liquid (QSL). Recently, a new honeycomb material, $α$-RuI$_3$, was prepared under moderate high-pressure and it is stable under ambient conditions. However, different from $α$-RuCl$_3$, $α$-RuI$_3$ was reported to be a paramagnetic metal without long-range magnetic order down to $0.35$ K. Here, the structural and electronic properties of the quasi-two-dimensional $α$-RuI$_3$ are theoretically studied. First, based on first-principles density functional theory (DFT) calculations, the ABC stacking honeycomb-layer $R\overline{3}$ (No. 148) structure is found to be the most likely stacking order for $α$-RuI$_3$ along the $c$-axis. Furthermore, both $R\overline{3}$ and $P\overline{3}1c$ are dynamically stable because no imaginary frequency modes were obtained in the phononic dispersion spectrum. Moreover, the different physical behavior of $α$-RuI$_3$ compared to $α$-RuCl$_3$ can be understood naturally. The strong hybridization between Ru $4d$ and I $5p$ orbitals decreases the effective atomic Hubbard repulsion $U$, leading the electrons of RuI$_3$ to be less localized than in RuCl$_3$. As a consequence, the effective repulsion $U$ is reduced from Cl to I, leading to the metallic nature of $α$-RuI$_3$. Based on the DFT+$U$ ($U_{\rm eff} = 2$ eV), plus spin-orbital coupling (SOC), we obtained a spin-orbit Mott insulating behavior for $α$-RuCl$_3$ and, by the same procedure, a metallic behavior for $α$-RuI$_3$, in good agreement with experimental results. Furthermore, when introducing a large (unrealistic) $U_{\rm eff} = 6$ eV, the spin-orbit Mott gap opens in $α$-RuI$_3$ as well, supporting the physical picture we are proposing.

preprint2022arXiv

Thermodynamics of the Reissner-Nordström-de Sitter Spacetime with Quintessence

For Anti-de Sitte (AdS) black holes, the isochoric heat capacity of system is vanished, while the isobaric heat capacity is not. However, this situation does not hold on for de Sitter (dS) black holes. In this work, by introducing the interaction between the black hole horizon and the cosmological horizon of the Reissner-Nordström-de Sitter (RNdS) spacetime with quintessence, we discuss the phase transition of this system. The results show that the spacetime not only has the similar phase transition behavior to that of Van der Waals (VdW) system, and the non-vanishing isochoric heat capacity fulfills the whole thermodynamics system. Through the discussion of the entropic force between two horizons, we find out the role of entropic force in the evolution of spacetime. In addition, we also study the influence of various parameters on the phase transition and entropic force, which will provide a new method for exploring the interaction among black hole molecules from a micro perspective.

preprint2022arXiv

Towards Realistic Visual Dubbing with Heterogeneous Sources

The task of few-shot visual dubbing focuses on synchronizing the lip movements with arbitrary speech input for any talking head video. Albeit moderate improvements in current approaches, they commonly require high-quality homologous data sources of videos and audios, thus causing the failure to leverage heterogeneous data sufficiently. In practice, it may be intractable to collect the perfect homologous data in some cases, for example, audio-corrupted or picture-blurry videos. To explore this kind of data and support high-fidelity few-shot visual dubbing, in this paper, we novelly propose a simple yet efficient two-stage framework with a higher flexibility of mining heterogeneous data. Specifically, our two-stage paradigm employs facial landmarks as intermediate prior of latent representations and disentangles the lip movements prediction from the core task of realistic talking head generation. By this means, our method makes it possible to independently utilize the training corpus for two-stage sub-networks using more available heterogeneous data easily acquired. Besides, thanks to the disentanglement, our framework allows a further fine-tuning for a given talking head, thereby leading to better speaker-identity preserving in the final synthesized results. Moreover, the proposed method can also transfer appearance features from others to the target speaker. Extensive experimental results demonstrate the superiority of our proposed method in generating highly realistic videos synchronized with the speech over the state-of-the-art.

preprint2022arXiv

Transcranial photoacoustic computed tomography of human brain function

Herein we report the first in-human transcranial imaging of brain function using photoacoustic computed tomography. Functional responses to benchmark motor tasks were imaged on both the skull-less and the skull-intact hemispheres of a hemicraniectomy patient. The observed brain responses in these preliminary results demonstrate the potential of photoacoustic computed tomography for achieving transcranial functional imaging.

preprint2022arXiv

Trust It or Not: Confidence-Guided Automatic Radiology Report Generation

Medical imaging plays a pivotal role in diagnosis and treatment in clinical practice. Inspired by the significant progress in automatic image captioning, various deep learning (DL)-based methods have been proposed to generate radiology reports for medical images. Despite promising results, previous works overlook the uncertainties of their models and are thus unable to provide clinicians with the reliability/confidence of the generated radiology reports to assist their decision-making. In this paper, we propose a novel method to explicitly quantify both the visual uncertainty and the textual uncertainty for DL-based radiology report generation. Such multi-modal uncertainties can sufficiently capture the model confidence degree at both the report level and the sentence level, and thus they are further leveraged to weight the losses for more comprehensive model optimization. Experimental results have demonstrated that the proposed method for model uncertainty characterization and estimation can produce more reliable confidence scores for radiology report generation, and the modified loss function, which takes into account the uncertainties, leads to better model performance on two public radiology report datasets. In addition, the quality of the automatically generated reports was manually evaluated by human raters and the results also indicate that the proposed uncertainties can reflect the variance of clinical diagnosis.

preprint2022arXiv

Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition

An unsupervised text-to-speech synthesis (TTS) system learns to generate speech waveforms corresponding to any written sentence in a language by observing: 1) a collection of untranscribed speech waveforms in that language; 2) a collection of texts written in that language without access to any transcribed speech. Developing such a system can significantly improve the availability of speech technology to languages without a large amount of parallel speech and text data. This paper proposes an unsupervised TTS system based on an alignment module that outputs pseudo-text and another synthesis module that uses pseudo-text for training and real text for inference. Our unsupervised system can achieve comparable performance to the supervised system in seven languages with about 10-20 hours of speech each. A careful study on the effect of text units and vocoders has also been conducted to better understand what factors may affect unsupervised TTS performance. The samples generated by our models can be found at https://cactuswiththoughts.github.io/UnsupTTS-Demo, and our code can be found at https://github.com/lwang114/UnsupTTS.

preprint2022arXiv

WAVPROMPT: Towards Few-Shot Spoken Language Understanding with Frozen Language Models

Large-scale auto-regressive language models pretrained on massive text have demonstrated their impressive ability to perform new natural language tasks with only a few text examples, without the need for fine-tuning. Recent studies further show that such a few-shot learning ability can be extended to the text-image setting by training an encoder to encode the images into embeddings functioning like the text embeddings of the language model. Interested in exploring the possibility of transferring the few-shot learning ability to the audio-text setting, we propose a novel speech understanding framework, WavPrompt, where we finetune a wav2vec model to generate a sequence of audio embeddings understood by the language model. We show that WavPrompt is a few-shot learner that can perform speech understanding tasks better than a naive text baseline. We conduct detailed ablation studies on different components and hyperparameters to empirically identify the best model configuration. In addition, we conduct a non-speech understanding experiment to show WavPrompt can extract more information than just the transcriptions. Code is available at https://github.com/Hertin/WavPrompt

preprint2022arXiv

Why So Toxic? Measuring and Triggering Toxic Behavior in Open-Domain Chatbots

Chatbots are used in many applications, e.g., automated agents, smart home assistants, interactive characters in online games, etc. Therefore, it is crucial to ensure they do not behave in undesired manners, providing offensive or toxic responses to users. This is not a trivial task as state-of-the-art chatbot models are trained on large, public datasets openly collected from the Internet. This paper presents a first-of-its-kind, large-scale measurement of toxicity in chatbots. We show that publicly available chatbots are prone to providing toxic responses when fed toxic queries. Even more worryingly, some non-toxic queries can trigger toxic responses too. We then set out to design and experiment with an attack, ToxicBuddy, which relies on fine-tuning GPT-2 to generate non-toxic queries that make chatbots respond in a toxic manner. Our extensive experimental evaluation demonstrates that our attack is effective against public chatbot models and outperforms manually-crafted malicious queries proposed by previous work. We also evaluate three defense mechanisms against ToxicBuddy, showing that they either reduce the attack performance at the cost of affecting the chatbot's utility or are only effective at mitigating a portion of the attack. This highlights the need for more research from the computer security and online safety communities to ensure that chatbot models do not hurt their users. Overall, we are confident that ToxicBuddy can be used as an auditing tool and that our work will pave the way toward designing more effective defenses for chatbot safety.

preprint2022arXiv

Winning solutions and post-challenge analyses of the ChaLearn AutoDL challenge 2019

This paper reports the results and post-challenge analyses of ChaLearn's AutoDL challenge series, which helped sorting out a profusion of AutoML solutions for Deep Learning (DL) that had been introduced in a variety of settings, but lacked fair comparisons. All input data modalities (time series, images, videos, text, tabular) were formatted as tensors and all tasks were multi-label classification problems. Code submissions were executed on hidden tasks, with limited time and computational resources, pushing solutions that get results quickly. In this setting, DL methods dominated, though popular Neural Architecture Search (NAS) was impractical. Solutions relied on fine-tuned pre-trained networks, with architectures matching data modality. Post-challenge tests did not reveal improvements beyond the imposed time limit. While no component is particularly original or novel, a high level modular organization emerged featuring a "meta-learner", "data ingestor", "model selector", "model/learner", and "evaluator". This modularity enabled ablation studies, which revealed the importance of (off-platform) meta-learning, ensembling, and efficient data management. Experiments on heterogeneous module combinations further confirm the (local) optimality of the winning solutions. Our challenge legacy includes an ever-lasting benchmark (http://autodl.chalearn.org), the open-sourced code of the winners, and a free "AutoDL self-service".

preprint2021arXiv

"Go eat a bat, Chang!": On the Emergence of Sinophobic Behavior on Web Communities in the Face of COVID-19

The outbreak of the COVID-19 pandemic has changed our lives in unprecedented ways. In the face of the projected catastrophic consequences, many countries have enacted social distancing measures in an attempt to limit the spread of the virus. Under these conditions, the Web has become an indispensable medium for information acquisition, communication, and entertainment. At the same time, unfortunately, the Web is being exploited for the dissemination of potentially harmful and disturbing content, such as the spread of conspiracy theories and hateful speech towards specific ethnic groups, in particular towards Chinese people since COVID-19 is believed to have originated from China. In this paper, we make a first attempt to study the emergence of Sinophobic behavior on the Web during the outbreak of the COVID-19 pandemic. We collect two large-scale datasets from Twitter and 4chan's Politically Incorrect board (/pol/) over a time period of approximately five months and analyze them to investigate whether there is a rise or important differences with regard to the dissemination of Sinophobic content. We find that COVID-19 indeed drives the rise of Sinophobia on the Web and that the dissemination of Sinophobic content is a cross-platform phenomenon: it exists on fringe Web communities like \dspol, and to a lesser extent on mainstream ones like Twitter. Also, using word embeddings over time, we characterize the evolution and emergence of new Sinophobic slurs on both Twitter and /pol/. Finally, we find interesting differences in the context in which words related to Chinese people are used on the Web before and after the COVID-19 outbreak: on Twitter we observe a shift towards blaming China for the situation, while on /pol/ we find a shift towards using more (and new) Sinophobic slurs.

preprint2021arXiv

$t$-$k$-means: A Robust and Stable $k$-means Variant

$k$-means algorithm is one of the most classical clustering methods, which has been widely and successfully used in signal processing. However, due to the thin-tailed property of the Gaussian distribution, $k$-means algorithm suffers from relatively poor performance on the dataset containing heavy-tailed data or outliers. Besides, standard $k$-means algorithm also has relatively weak stability, $i.e.$ its results have a large variance, which reduces its credibility. In this paper, we propose a robust and stable $k$-means variant, dubbed the $t$-$k$-means, as well as its fast version to alleviate those problems. Theoretically, we derive the $t$-$k$-means and analyze its robustness and stability from the aspect of the loss function and the expression of the clustering center, respectively. Extensive experiments are also conducted, which verify the effectiveness and efficiency of the proposed method. The code for reproducing main results is available at \url{https://github.com/THUYimingLi/t-k-means}.

preprint2021arXiv

A Unified Light Framework for Real-time Fault Detection of Freight Train Images

Real-time fault detection for freight trains plays a vital role in guaranteeing the security and optimal operation of railway transportation under stringent resource requirements. Despite the promising results for deep learning based approaches, the performance of these fault detectors on freight train images, are far from satisfactory in both accuracy and efficiency. This paper proposes a unified light framework to improve detection accuracy while supporting a real-time operation with a low resource requirement. We firstly design a novel lightweight backbone (RFDNet) to improve the accuracy and reduce computational cost. Then, we propose a multi region proposal network using multi-scale feature maps generated from RFDNet to improve the detection performance. Finally, we present multi level position-sensitive score maps and region of interest pooling to further improve accuracy with few redundant computations. Extensive experimental results on public benchmark datasets suggest that our RFDNet can significantly improve the performance of baseline network with higher accuracy and efficiency. Experiments on six fault datasets show that our method is capable of real-time detection at over 38 frames per second and achieves competitive accuracy and lower computation than the state-of-the-art detectors.

preprint2021arXiv

Affinity Fusion Graph-based Framework for Natural Image Segmentation

This paper proposes an affinity fusion graph framework to effectively connect different graphs with highly discriminating power and nonlinearity for natural image segmentation. The proposed framework combines adjacency-graphs and kernel spectral clustering based graphs (KSC-graphs) according to a new definition named affinity nodes of multi-scale superpixels. These affinity nodes are selected based on a better affiliation of superpixels, namely subspace-preserving representation which is generated by sparse subspace clustering based on subspace pursuit. Then a KSC-graph is built via a novel kernel spectral clustering to explore the nonlinear relationships among these affinity nodes. Moreover, an adjacency-graph at each scale is constructed, which is further used to update the proposed KSC-graph at affinity nodes. The fusion graph is built across different scales, and it is partitioned to obtain final segmentation result. Experimental results on the Berkeley segmentation dataset and Microsoft Research Cambridge dataset show the superiority of our framework in comparison with the state-of-the-art methods. The code is available at https://github.com/Yangzhangcst/AF-graph.

preprint2021arXiv

AT-BERT: Adversarial Training BERT for Acronym Identification Winning Solution for SDU@AAAI-21

Acronym identification focuses on finding the acronyms and the phrases that have been abbreviated, which is crucial for scientific document understanding tasks. However, the limited size of manually annotated datasets hinders further improvement for the problem. Recent breakthroughs of language models pre-trained on large corpora clearly show that unsupervised pre-training can vastly improve the performance of downstream tasks. In this paper, we present an Adversarial Training BERT method named AT-BERT, our winning solution to acronym identification task for Scientific Document Understanding (SDU) Challenge of AAAI 2021. Specifically, the pre-trained BERT is adopted to capture better semantic representation. Then we incorporate the FGM adversarial training strategy into the fine-tuning of BERT, which makes the model more robust and generalized. Furthermore, an ensemble mechanism is devised to involve the representations learned from multiple BERT variants. Assembling all these components together, the experimental results on the SciAI dataset show that our proposed approach outperforms all other competitive state-of-the-art methods.

preprint2021arXiv

ByteSing: A Chinese Singing Voice Synthesis System Using Duration Allocated Encoder-Decoder Acoustic Models and WaveRNN Vocoders

This paper presents ByteSing, a Chinese singing voice synthesis (SVS) system based on duration allocated Tacotron-like acoustic models and WaveRNN neural vocoders. Different from the conventional SVS models, the proposed ByteSing employs Tacotron-like encoder-decoder structures as the acoustic models, in which the CBHG models and recurrent neural networks (RNNs) are explored as encoders and decoders respectively. Meanwhile an auxiliary phoneme duration prediction model is utilized to expand the input sequence, which can enhance the model controllable capacity, model stability and tempo prediction accuracy. WaveRNN neural vocoders are also adopted as neural vocoders to further improve the voice quality of synthesized songs. Both objective and subjective experimental results prove that the SVS method proposed in this paper can produce quite natural, expressive and high-fidelity songs by improving the pitch and spectrogram prediction accuracy and the models using attention mechanism can achieve best performance.

preprint2021arXiv

Discovery of carbon-based strongest and hardest amorphous material

Carbon is likely the most fascinating element of the periodic table because of the diversity of its allotropes stemming from its variable (sp, sp2, and sp3) bonding motifs. Exploration of new forms of carbon has been an eternal theme of contemporary scientific research. Here we report on novel amorphous carbon phases containing high fraction of sp3 bonded atoms recovered after compressing fullerene C60 to previously unexplored high pressure and temperature. The synthesized carbons are the hardest and strongest amorphous materials known to date, capable of scratching diamond crystal and approaching its strength which is evidenced by complimentary mechanical tests. Photoluminescence and absorption spectra of the materials demonstrate they are semiconductors with tunable bandgaps in the range of 1.5-2.2 eV, comparable to that of amorphous silicon. A remarkable combination of the outstanding mechanical and electronic properties makes this class of amorphous carbons an excellent candidate for photovoltaic applications demanding ultrahigh strength and wear resistance.

preprint2021arXiv

Does Non-COVID19 Lung Lesion Help? Investigating Transferability in COVID-19 CT Image Segmentation

Coronavirus disease 2019 (COVID-19) is a highly contagious virus spreading all around the world. Deep learning has been adopted as an effective technique to aid COVID-19 detection and segmentation from computed tomography (CT) images. The major challenge lies in the inadequate public COVID-19 datasets. Recently, transfer learning has become a widely used technique that leverages the knowledge gained while solving one problem and applying it to a different but related problem. However, it remains unclear whether various non-COVID19 lung lesions could contribute to segmenting COVID-19 infection areas and how to better conduct this transfer procedure. This paper provides a way to understand the transferability of non-COVID19 lung lesions. Based on a publicly available COVID-19 CT dataset and three public non-COVID19 datasets, we evaluate four transfer learning methods using 3D U-Net as a standard encoder-decoder method. The results reveal the benefits of transferring knowledge from non-COVID19 lung lesions, and learning from multiple lung lesion datasets can extract more general features, leading to accurate and robust pre-trained models. We further show the capability of the encoder to learn feature representations of lung lesions, which improves segmentation accuracy and facilitates training convergence. In addition, our proposed Hybrid-encoder learning method incorporates transferred lung lesion features from non-COVID19 datasets effectively and achieves significant improvement. These findings promote new insights into transfer learning for COVID-19 CT image segmentation, which can also be further generalized to other medical tasks.

preprint2021arXiv

Dye-Encapsulated Zeolitic Imidazolate Framework (ZIF-71) for Fluorochromic Sensing of Pressure, Temperature, and Volatile Solvents

Luminescent metal-organic frameworks (MOFs) offer a multifunctional platform for creating non-invasive sensors and tuneable optoelectronics. However, fluorochromic materials that are photophysically resilient and show high sensitivity towards different physical and chemical stimuli are scarce. We report a facile host-guest nanoconfinement strategy to construct a fluorescent hybrid material with multiple sensing capabilities. We design and fabricate a new Guest@MOF material: comprising a zeolitic MOF (ZIF-71) as a nanoporous host for encapsulating rhodamine B (RhB dye) guest molecules, resulting in the RhB@ZIF-71 system with mechanochromic, thermochromic, and solvatochromic sensing response. The fluorochromic sensing properties stem from the nanoconfinement effect that ZIF-71 imposes on RhB monomers, yielding the H- or J-type aggregates with tuneable photophysical and photochemical properties. For mechanochromism, the external pressure causes an emission red shift in a linear fashion, switching RhB guests from H-type to J-type aggregates through a shear deformation. For thermochromism, we demonstrate a linear scaling as a function of temperature due to the spatial restriction imposed on J-type aggregates incarcerated in ZIF-71 pores. Harnessing the solvatochromism of RhB@ZIF-71, we identified three diverse groups of volatile organic compounds. The multimodal sensing response pave the way to smart applications like photonic pressure sensors, non-invasive thermometers, and ultrasensitive chemosensors.

preprint2021arXiv

Evidence for $Z_{c}^{\pm}$ decays into the $ρ^{\pm} η_{c}$ final state

We study $e^{+}e^{-}$ collisions with a $π^{+}π^{-}π^{0}η_{c}$ final state using data samples collected with the BESIII detector at center-of-mass energies $\sqrt{s}=4.226$, $4.258$, $4.358$, $4.416$, and $4.600$ GeV. Evidence for the decay $\zcpm\to\rhopm\etac$ is reported with a statistical significance of $3.9σ$ with various systematic uncertainties taken into account at $\sqrt{s} = 4.226$ GeV, and the Born cross section times branching fraction $σ^{B}(\EE\to \pimp\zcpm)\times \BR(\zcpm\to\rhopm\etac)$ is measured to be $(48 \pm 11 \pm 11)\,\rm{pb}$. The $\zcpm\to \rhopm\etac$ signal is not significant at the other center-of-mass energies and the corresponding upper limits are determined. In addition, no significant signal is observed in a search for $\zcppm\to ρ^{\pm}\etac$ with the same data samples. The ratios $R_{\zc}=\BR(\zcpm\to ρ^{\pm} \etac)/\BR(\zcpm\to π^{\pm} \jpsi)$ and $R_{\zcp}=\BR(\zcppm\to ρ^{\pm} \etac)/\BR(\zcppm\to π^{\pm} \hc)$ are obtained and used to discriminate between different theoretical interpretations of the $\zcpm$ and $\zcppm$.

preprint2021arXiv

Explicit generators and relations for the centre of the quantum group

For the standard Drinfeld-Jimbo quantum group ${\rm U}_q(\mathfrak{g})$ associated with a simple Lie algebra $\mathfrak{g}$, we construct explicit generators of the centre $Z({\rm U}_q(\mathfrak{g}))$, and determine the relations satisfied by the generators. For $\mathfrak{g}$ of type $A_n(n\geq 2)$, $D_{2k+1}(k\geq 2)$ or $E_6$, the centre $Z({\rm U}_q(\mathfrak{g}))$ is isomorphic to a quotient of a polynomial algebra in multiple variables, which is described in a uniform manner for all cases. For $\mathfrak{g}$ of any other type, $Z({\rm U}_q(\mathfrak{g}))$ is generated by $n=$rank$(\mathfrak{g})$ algebraically independent elements.

preprint2021arXiv

Fuzzing Based on Function Importance by Interprocedural Control Flow Graph

Coverage-based graybox fuzzer (CGF), such as AFL has gained great success in vulnerability detection thanks to its ease-of-use and bug-finding power. Since some code fragments such as memory allocation are more vulnerable than others, various improving techniques have been proposed to explore the more vulnerable areas by collecting extra information from the program under test or its executions. However, these improvements only consider limited types of information sources and ignore the fact that the priority a seed input to be fuzzed may be influenced by all the code it covers. Based on the above observations, we propose a fuzzing method based on the importance of functions. First, a data structure called Attributed Interprocedural Control Flow Graph (AICFG) is devised to combine different features of code fragments. Second, the importance of each node in the AICFG is calculated based on an improved PageRank algorithm, which also models the influence between connected nodes. During the fuzzing process, the node importance is updated periodically by a propagation algorithm. Then the seed selection and energy scheduling of a seed input are determined by the importance of its execution trace. We implement this approach on top of AFL in a tool named FunAFL and conduct an evaluation on 14 real-world programs against AFL and two of its improvements. FunAFL, with 17% higher branch coverage than others on average, finds 13 bugs and 3 of them are confirmed by CVE after 72 hours.

preprint2021arXiv

Gluino-SUGRA scenarios in light of FNAL muon g-2 anomaly

Gluino-SUGRA ($\tilde{g}$SUGRA), which is an economical extension of the predictive mSUGRA, adopts much heavier gluino mass parameter than other gauginos mass parameters and universal scalar mass parameter at the unification scale. It can elegantly reconcile the experimental results on the Higgs boson mass, the muon $g-2$, the null results in search for supersymmetry at the LHC and the results from B-physics. In this work, we propose several new ways to generate large gaugino hierarchy (i.e. $M_3\gg M_1,M_2$) for $\tilde{g}$SUGRA model building and then discuss in detail the implications of the new muon $g-2$ results with the updated LHC constraints on such $\tilde{g}$SUGRA scenarios. We obtain the following observations: (i) For the most interesting $M_1=M_2$ case at the GUT scale with a viable bino-like dark matter, the $\tilde{g}$SUGRA can explain the muon $g-2$ anomaly at $1σ$ level and be consistent with the updated LHC constraints for $6\leq M_3/M_1 \leq 9$ at the GUT scale; (ii) For $M_1:M_2=5:1$ at the GUT scale with wino-like dark matter, the $\tilde{g}$SUGRA model can explain the muon $g-2$ anomaly at $2σ$ level and be consistent with the updated LHC constraints for $3\leq M_3/M_1 \leq 3.2$ at the GUT scale; (iii) For $M_1:M_2=3:2$ at the GUT scale with mixed bino-wino dark matter, the $\tilde{g}$SUGRA model can explain the muon $g-2$ anomaly at $1σ$ level and be consistent with the updated LHC constraints for $6.9\leq M_3/M_1 \leq 7.5$ at the GUT scale. Although the choice of heavy gluino will always increase the FT involved, some of the $1σ/2σ$ survived points of $Δa_μ^{combine}$ can still allow low EWFT of order several hundreds and be fairly natural. Constraints from (dimension-five operator induced) proton decay are also discussed.

preprint2021arXiv

Interfacial ferroelectricity in rhombohedral-stacked bilayer transition metal dichalcogenides

Van der Waals (vdW) materials have greatly expanded our design space of heterostructures by allowing individual layers to be stacked at non-equilibrium configurations, for example via control of the twist angle. Such heterostructures not only combine characteristics of the individual building blocks, but can also exhibit emergent physical properties absent in the parent compounds through interlayer interactions. Here we report on a new family of emergent, nanometer-thick, semiconductor 2D ferroelectrics, where the individual constituents are well-studied non-ferroelectric monolayer transition metal dichalcogenides (TMDs), namely WSe2, MoSe2, WS2, and MoS2. By stacking two identical monolayer TMDs in parallel, we obtain electrically switchable rhombohedral-stacking configurations, with out-of-plane polarization that is flipped by in-plane sliding motion. Fabricating nearly-parallel stacked bilayers enables the visualization of moiré ferroelectric domains as well as electric-field-induced domain wall motion with piezoelectric force microscopy (PFM). Furthermore, by using a nearby graphene electronic sensor in a ferroelectric field transistor geometry, we quantify the ferroelectric built-in interlayer potential, in good agreement with first-principles calculations. The novel semiconducting ferroelectric properties of these four new TMDs opens up the possibility of studying the interplay between ferroelectricity and their rich electric and optical properties.

preprint2021arXiv

Node-Level Membership Inference Attacks Against Graph Neural Networks

Many real-world data comes in the form of graphs, such as social networks and protein structure. To fully utilize the information contained in graph data, a new family of machine learning (ML) models, namely graph neural networks (GNNs), has been introduced. Previous studies have shown that machine learning models are vulnerable to privacy attacks. However, most of the current efforts concentrate on ML models trained on data from the Euclidean space, like images and texts. On the other hand, privacy risks stemming from GNNs remain largely unstudied. In this paper, we fill the gap by performing the first comprehensive analysis of node-level membership inference attacks against GNNs. We systematically define the threat models and propose three node-level membership inference attacks based on an adversary's background knowledge. Our evaluation on three GNN structures and four benchmark datasets shows that GNNs are vulnerable to node-level membership inference even when the adversary has minimal background knowledge. Besides, we show that graph density and feature similarity have a major impact on the attack's success. We further investigate two defense mechanisms and the empirical results indicate that these defenses can reduce the attack performance but with moderate utility loss.

preprint2021arXiv

Orbital ordering in the layered perovskite material CsVF$_4$

In strongly correlated electronic systems, several novel physical properties are induced by the orbital degree of freedom. In particular, orbital degeneracy near the Fermi level leads to spontaneous symmetry breaking, such as the nematic state in FeSe and the orbital ordering in several perovskite systems. Here, the novel layered perovskite material CsVF$_4$, with a $3d^2$ electronic configuration, was systematically studied using density functional theory and a multiorbital Hubbard model within the Hatree-Fock approximation. Our results show that CsVF$_4$ should be magnetic, with a G-type antiferromagnetic arrangement in the $ab$ plane and weak antiferromagnetic exchange along the $c$-axis, in agreement with experimental results. Driven by the Jahn-Teller distortion in the VF$_6$ octahedra that shorten the $c$-axis, the system displays an interesting electron occupancy $d_{xy}^1(d_{xz}d_{yz})^1$ corresponding to the lower nondegenerate $d_{xy}$ orbital being half-filled and the other two degenerate $d_{yz}$ and $d_{xz}$ orbitals sharing one electron per site. We show that this degeneracy is broken and a novel $d_{yz}$/$d_{xz}$ staggered orbital pattern is here predicted by both the first-principles and Hubbard model calculations. This orbital ordering is driven by the electronic instability associated with degeneracy removal to lower the energy.

preprint2021arXiv

Q-dependent Collective Relaxation Dynamics of Glass-Forming Liquid Ca0.4K0.6(NO3)1.4 Investigated by Wide-Angle Neutron Spin-Echo

Employing wide-angle neutron spin echo spectroscopy, we measured the Q-dependent coherent intermediate scattering function of the prototypical ionic glass former Ca0.4K0.6(NO3)1.4, in the equilibrium and supercooled liquid states beyond the hydrodynamic regime. The data reveal a clear two-step relaxation: an exponential fast process, and a stretched exponential slow alpha process. de Gennes narrowing is observed in all characteristic variables of the alpha process: the relaxation time, amplitude, and stretching exponent. At all length scales probed, the relative amplitude of the alpha-relaxation decreases with increasing temperature and levels off in the normal liquid state. The temperature dependence of the stretching exponent and the relaxation time at different Q's indicate that modifications of the relaxation mechanisms at the local length scales, manifested as temperature independent dynamic heterogeneity and smaller deviations from Arrhenius behavior, have occurred even above the alpha-beta (Johari-Goldstein) bifurcation temperature.

preprint2021arXiv

Quantum versus Classical Regime in Circuit Quantum Acoustodynamics

We experimentally study a circuit quantum acoustodynamics system, which consists of a superconducting artificial atom, coupled to both a two-dimensional surface acoustic wave resonator and a one-dimensional microwave transmission line. The strong coupling between the artificial atom and the acoustic wave resonator is confirmed by the observation of the vacuum Rabi splitting at the base temperature of dilution refrigerator. We show that the propagation of microwave photons in the microwave transmission line can be controlled by a few phonons in the acoustic wave resonator. Furthermore, we demonstrate the temperature effect on the measurements of the Rabi splitting and temperature induced transitions from high excited dressed states. We find that the spectrum structure of two-peak for the Rabi splitting becomes into those of several peaks, and gradually disappears with the increase of the environmental temperature $T$. The quantum-to-classical transition is observed around the crossover temperature $T_{c}$, which is determined via the thermal fluctuation energy $k_{B}T$ and the characteristic energy level spacing of the coupled system. Experimental results agree well with the theoretical simulations via the master equation of the coupled system at different effective temperatures.

Yang Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

300 published item(s)

A Real-time Scale-robust Network for Glottis Segmentation in Nasal Transnasal Intubation

Automorphisms of odd dimensional $(2,2)$-complete intersections in characteristic $2$

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Don't Start Over: A Cost-Effective Framework for Migrating Personalized Prompts Between LLMs

FinDeepForecast: A Live Multi-Agent System for Benchmarking Deep Research Agents in Financial Forecasting

FinDeepResearch: Evaluating Deep Research Agents in Rigorous Financial Analysis

FlowCompile: An Optimizing Compiler for Structured LLM Workflows

GIFT: Games as Informal Training for Generalizable LLMs

GPS-Synchronized Monitoring of Core-collapse Supernova Bursts with PandaX-4T via Coherent Elastic Neutrino Nuclear Scattering

Guardians of the Hair: Rescuing Soft Boundaries in Depth, Stereo, and Novel Views

Large Language Models as Amortized Pareto-Front Generators for Constrained Bi-Objective Convex Optimization

LLM-ReSum: A Framework for LLM Reflective Summarization through Self-Evaluation

MASH: A Multiplatform and Multimodal Annotated Dataset for Societal Impact of Hurricane

Multimodal Cultural Heritage Knowledge Graph Extension with Language and Vision Models

OCR-Memory: Optical Context Retrieval for Long-Horizon Agent Memory

PERM: Psychology-grounded Empathetic Reward Modeling for Large Language Models

PRTS: A Primitive Reasoning and Tasking System via Contrastive Representations

Quantization Commutes with Reduction of Chern-Simons Gauge Theory

Rethinking the Text-Vision Reasoning Imbalance in MLLMs through the Lens of Training Recipes

Safactory: A Scalable Agentic Infrastructure for Training Trustworthy Autonomous Intelligence

Song Aesthetics Evaluation with Multi-Stem Attention and Hierarchical Uncertainty Modeling

Task-Aware Scanning Parameter Configuration for Robotic Inspection Using Vision Language Embeddings and Hyperdimensional Computing

UniFixer: A Universal Reference-Guided Fixer for Diffusion-Based View Synthesis

WOW-Seg: A Word-free Open World Segmentation Model

Chiral superconductivity from spin polarized Chern band in twisted MoTe$_2$

From Beginner to Expert: Modeling Medical Knowledge into General LLMs

Multiple Chern bands in twisted MoTe$_2$ and possible non-Abelian states

A Bi-Step Grounding Paradigm for Large Language Models in Recommendation Systems

Backdoor Attacks Against Dataset Distillation

DE-FAKE: Detection and Attribution of Fake Images Generated by Text-to-Image Generation Models

Implications of Nano-Hertz Gravitational Waves on Electroweak Phase Transition in the Singlet Dark Matter Model

A first look at the function space for planar two-loop six-particle Feynman integrals

A joint explanation of W-mass and muon g-2 in 2HDM

A Lightweight NMS-free Framework for Real-time Visual Fault Detection System of Freight Trains

Addressing Confounding Feature Issue for Causal Recommendation

Adversarial Support Alignment

AI for Global Climate Cooperation: Modeling Global Climate Negotiations, Agreements, and Long-Term Cooperation in RICE-N

Algorithms of Real-Time Navigation and Control of Autonomous Unmanned Vehicles

An inverse boundary value problem arising in nonlinear acoustics

Analytical Equation of Three-point Correlation Function of Galaxies: to Third Order of Density Perturbation

Analyzing the Effects of Handling Data Imbalance on Learned Features from Medical Images by Looking Into the Models

Applications of Multi-Agent Reinforcement Learning in Future Internet: A Comprehensive Survey

Attention-based Dual Supervised Decoder for RGBD Semantic Segmentation

Auditing Membership Leakages of Multi-Exit Networks

ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers

Crystal growth engineering and origin of the weak ferromagnetism in antiferromagnetic matrix of orthochromates from $t$-$e$ orbital hybridization

Data-Efficient Double-Win Lottery Tickets from Robust Pre-training

Decoupled Pyramid Correlation Network for Liver Tumor Segmentation from CT images

Delayed Impact of Interdisciplinary Research

DiffCSE: Difference-based Contrastive Learning for Sentence Embeddings

Direct observation of moiré flat-band breakdown at the edge of magic-angle twisted bilayer graphene

Domain Randomization and Pyramid Consistency: Simulation-to-Real Generalization without Accessing Target Domain Data

Dynamic Backdoor Attacks Against Machine Learning Models

Effective Tensor Completion via Element-wise Weighted Low-rank Tensor Train with Overlapping Ket Augmentation

Electronic structure, magnetic properties and pairing tendencies of the copper-based honeycomb lattice Na$_2$Cu$_2$TeO$_6$

Evolution of barchan dune interactions investigated by a downscaled water tunnel experiment: the temporal characteristics and a soliton-like behavior

Exploring Timbre Disentanglement in Non-Autoregressive Cross-Lingual Text-to-Speech

Finding MNEMON: Reviving Memories of Node Embeddings

Freedom to Choose: Understanding Input Modality Preferences of People with Upper-body Motor Impairments for Activities of Daily Living

Geometric Algebra and Algebraic Geometry of Loop and Potts Models

Global fits of SUSY at future Higgs factories

High-throughput study of the anomalous Hall effect

Learning Rich Features for Gait Recognition by Integrating Skeletons and Silhouettes

Linking Emergent and Natural Languages via Corpus Transfer

Low energy supersymmetry confronted with current experiments: an overview

Low energy SUSY confronted with new measurements of W-boson mass and muon g-2

m-Order Time Optimal Control Synthesis Function of Discrete System

Maxwell field with gauge fixing term in de Sitter space: exact solution and stress tensor

Membership Inference Attacks by Exploiting Loss Trajectory

Membership-Doctor: Comprehensive Assessment of Membership Inference Against Machine Learning Models

mmFormer: Multimodal Medical Transformer for Incomplete Multimodal Learning of Brain Tumor Segmentation

On Xing Tian and the Perseverance of Anti-China Sentiment Online

Point-splitting regularization of the stress tensor of a coupling scalar field in de Sitter space

Pro-UIGAN: Progressive Face Hallucination from Occluded Thumbnails