Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
109works
0followers
52topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

109 published item(s)

preprint2026arXiv

A solution to the S8 tension through neutrino-dark matter interactions

Neutrinos and dark matter (DM) are two of the least understood components of the Universe, yet both play crucial roles in cosmic evolution. Clues about their fundamental properties may emerge from discrepancies in cosmological measurements across different epochs of cosmic history. Possible interactions between them could leave distinctive imprints on cosmological observables, offering a rare window into dark sector physics beyond the standard $Λ$CDM framework. We present compelling evidence that DM-neutrino interactions can resolve the persistent structure growth parameter discrepancy, $S_8 = σ_8\,\sqrt{Ω_m/0.3}$, between early and late universe observations. By incorporating cosmic shear measurements from current Weak Lensing surveys, we demonstrate that an interaction strength of $u \sim 10^{-4}$ not only provides a coherent explanation for the high-multipole observations from the Atacama Cosmology Telescope (\texttt{ACT}), but also alleviates the $S_8$ discrepancy. Combining early universe constraints with \texttt{DES Y3 cosmic shear} data yields a nearly $3σ$ preference for non-zero DM neutrino interactions. This strengthens previous observational claims and provides a clear path toward a significant breakthrough in cosmological research. Our findings challenge the standard $Λ$CDM paradigm and highlight the potential of future large-scale structure surveys, which can rigorously test this interaction and unveil the fundamental properties of DM.

preprint2026arXiv

Are LLMs Vulnerable to Preference-Undermining Attacks (PUA)? A Factorial Analysis Methodology for Diagnosing the Trade-off between Preference Alignment and Real-World Validity

Large Language Model (LLM) training often optimizes for preference alignment, rewarding outputs that are perceived as helpful and interaction-friendly. However, this preference-oriented objective can be exploited: manipulative prompts can steer responses toward user-appeasing agreement and away from truth-oriented correction. In this work, we investigate whether aligned models are vulnerable to Preference-Undermining Attacks (PUA), a class of manipulative prompting strategies designed to exploit the model's desire to please user preferences at the expense of truthfulness. We propose a diagnostic methodology that provides a finer-grained and more directive analysis than aggregate benchmark scores, using a factorial evaluation framework to decompose prompt-induced shifts into interpretable effects of system objectives (truth- vs. preference-oriented) and PUA-style dialogue factors (directive control, personal derogation, conditional approval, reality denial) within a controlled $2 \times 2^4$ design. Surprisingly, more advanced models are sometimes more susceptible to manipulative prompts. Beyond the dominant reality-denial factor, we observe model-specific sign reversals and interactions with PUA-style factors, suggesting tailored defenses rather than uniform robustness. These findings offer a novel, reproducible factorial evaluation methodology that provides finer-grained diagnostics for post-training processes like RLHF, enabling better trade-offs in the product iteration of LLMs by offering a more nuanced understanding of preference alignment risks and the impact of manipulative prompts.

preprint2026arXiv

Credible Plan-Driven RAG Method for Multi-Hop Question Answering

Retrieval-augmented generation (RAG) has demonstrated strong performance in single-hop question answering (QA) by integrating external knowledge into large language models (LLMs). However, its effectiveness remains limited in multi-hop QA, which demands both stable reasoning and factual consistency. Existing approaches often provide partial solutions, addressing either reasoning trajectory stability or factual verification, but rarely achieving both simultaneously. To bridge this gap, we propose PAR-RAG, a three-stage Plan-then-Act-and-Review framework inspired by the PDCA cycle. PAR-RAG incorporates semantic complexity as a unifying principle through three key components: (i) complexity-aware exemplar selection guides plan generation by aligning decomposition granularity with question difficulty, thereby stabilizing reasoning trajectories; (ii) execution follows a structured retrieve-then-read process; and (iii) dual verification identifies and corrects intermediate errors while dynamically adjusting verification strength based on question complexity: emphasizing accuracy for simple queries and multi-evidence consistency for complex ones. This cognitively inspired framework integrates theoretical grounding with practical robustness. Experiments across diverse benchmarks demonstrate that PAR-RAG consistently outperforms competitive baselines, while ablation studies confirm the complementary roles of complexity-aware planning and dual verification. Collectively, these results establish PAR-RAG as a robust and generalizable framework for reliable multi-hop reasoning.

preprint2026arXiv

Curriculum Group Policy Optimization: Adaptive Sampling for Unleashing the Potential of Text-to-Image Generation

Text-to-Image (T2I) generation has achieved remarkable progress in recent years. Meanwhile, reinforcement learning methods, particularly those based on Group Relative Policy Optimization (GRPO), have attracted widespread attention and been successfully applied to T2I tasks. However, the uniform sampling strategy commonly used during training often ignores the match between sample difficulty and the model's current learning capability, leading to low training efficiency. We argue that improving training efficiency requires continuously prioritizing prompts that match the model's evolving capability and remain actively learnable. To this end, we propose Curriculum Group Policy Optimization (CGPO), an adaptive curriculum training framework. During training, each prompt produces a group of images scored by a reward model. We use the variance of group rewards as an online proxy for prompt inconsistency. A higher variance suggests that the model has partially captured the prompt requirements but has not yet achieved stable mastery. Such prompts are more likely to provide useful learning signals, so we increase their sampling probabilities accordingly. Additionally, to address data imbalance in multi-category datasets, we design a category calibration method based on proportional fairness optimization, which balances training difficulty across categories. Experiments on GenEval, T2I-CompBench++, and DPG Bench demonstrate that our framework effectively improves generation performance.

preprint2026arXiv

Disciplined Diffusion: Text-to-Image Diffusion Model against NSFW Generation

Text-to-image (T2I) diffusion models have the ability to build high-quality pictures from text prompts, but they pose safety concerns because they can generate offensive or disturbing imagery when provided with harmful inputs. Existing safety filters typically rely on text-based classifiers or image-based checkers that completely block the output upon detecting a threat, issuing an explicit allow/block feedback signal to the user. This binary strategy leaves models vulnerable to adversarial attacks that alter keywords to bypass detection, and it causes high false-alarm rates that degrade the experience for benign users. To address such vulnerabilities, we propose Disciplined Diffusion (DDiffusion), a novel robust text-to-image diffusion that counters Not Safe For Work (NSFW) generation by uncovering implicit malicious semantics in prompt embeddings. DDiffusion leverages a semantic retrieval mechanism to evaluate prompts against concept distributions rather than relying on brittle pairwise similarity. Furthermore, it employs a localization method during the diffusion process to selectively edit only the harmful regions of the generated image. By returning locally sanitized images instead of applying uniform blocking, DDiffusion suppresses malicious content while preserving generation fidelity for benign prompts and avoiding the binary allow-deny signal on which existing probing attacks rely.

preprint2026arXiv

Hard to Read, Easy to Jailbreak: How Visual Degradation Bypasses MLLM Safety Alignment

Recent advancements in visual context compression enable MLLMs to process ultra-long contexts efficiently by rendering text into images. However, we identify a critical vulnerability inherent to this paradigm: lowering image resolution inadvertently catalyzes jailbreaking. Our experiments reveal that the safety defenses of SOTA models deteriorate sharply as resolution degrades, surprisingly persisting even when text remains legible. We attribute this to ``Cognitive Overload'', hypothesizing that the effort required to decipher degraded inputs diverts attentional resources from safety auditing. This phenomenon is consistent across various visual perturbations, including noise and geometric distortion. To address this, we propose a simple ``Structured Cognitive Offloading'' strategy that mitigates these risks by enforcing a serialized pipeline to decouple visual transcription from safety assessment. Our work exposes a significant risk in vision-based compression and provides critical insights for the secure design of future MLLMs.

preprint2026arXiv

Head Forcing: Long Autoregressive Video Generation via Head Heterogeneity

Autoregressive video diffusion models support real-time synthesis but suffer from error accumulation and context loss over long horizons. We discover that attention heads in AR video diffusion transformers serve functionally distinct roles as local heads for detail refinement, anchor heads for structural stabilization, and memory heads for long-range context aggregation, yet existing methods treat them uniformly, leading to suboptimal KV cache allocation. We propose Head Forcing, a training-free framework that assigns each head type a tailored KV cache strategy: local and anchor heads retain only essential tokens, while memory heads employ a hierarchical memory system with dynamic episodic updates for long-range consistency. A head-wise RoPE re-encoding scheme further ensures positional encodings remain within the pretrained range. Without additional training, Head Forcing extends generation from 5 seconds to minute-level duration, supports multi-prompt interactive synthesis, and consistently outperforms existing baselines. Project Page: https://jiahaotian-sjtu.github.io/headforcing.github.io/.

preprint2026arXiv

Integrating Reinforcement Learning with Visual Generative Models: Foundations and Advances

Generative models have made significant progress in synthesizing visual content, including images, videos, and 3D/4D structures. However, they are typically trained with surrogate objectives such as likelihood or reconstruction loss, which often misalign with perceptual quality, semantic accuracy, or physical realism. Reinforcement learning (RL) offers a principled framework for optimizing non-differentiable, preference-driven, and temporally structured objectives. Recent advances demonstrate its effectiveness in enhancing controllability, consistency, and human alignment across generative tasks. This survey provides a systematic overview of RL-based methods for visual content generation. We review the evolution of RL from classical control to its role as a general-purpose optimization tool, and examine its integration into image, video, and 3D/4D generation. Across these domains, RL serves not only as a fine-tuning mechanism but also as a structural component for aligning generation with complex, high-level goals. We conclude with open challenges and future research directions at the intersection of RL and generative modeling.

preprint2026arXiv

Loupe: A Generalizable and Adaptive Framework for Image Forgery Detection

The proliferation of generative models has raised serious concerns about visual content forgery. Existing deepfake detection methods primarily target either image-level classification or pixel-wise localization. While some achieve high accuracy, they often suffer from limited generalization across manipulation types or rely on complex architectures. In this paper, we propose Loupe, a lightweight yet effective framework for joint deepfake detection and localization. Loupe integrates a patch-aware classifier and a segmentation module with conditional queries, allowing simultaneous global authenticity classification and fine-grained mask prediction. To enhance robustness against distribution shifts of test set, Loupe introduces a pseudo-label-guided test-time adaptation mechanism by leveraging patch-level predictions to supervise the segmentation head. Extensive experiments on the DDL dataset demonstrate that Loupe achieves state-of-the-art performance, securing the first place in the IJCAI 2025 Deepfake Detection and Localization Challenge with an overall score of 0.846. Our results validate the effectiveness of the proposed patch-level fusion and conditional query design in improving both classification accuracy and spatial localization under diverse forgery patterns. The code is available at https://github.com/Kamichanw/Loupe.

preprint2026arXiv

Metis: Learning to Jailbreak LLMs via Self-Evolving Metacognitive Policy Optimization

Red teaming is critical for uncovering vulnerabilities in Large Language Models (LLMs). While automated methods have improved scalability, existing approaches often rely on static heuristics or stochastic search, rendering them brittle against advanced safety alignment. To address this, we introduce Metis, a framework that reformulates jailbreaking as inference-time policy optimization within an adversarial Partially Observable Markov Decision Process (POMDP). Metis employs a self-evolving metacognitive loop to perform causal diagnosis of a target's defense logic and leverages structured feedback as a semantic gradient to refine its policy, offering enhanced interpretability through transparent reasoning traces. Extensive evaluations across 10 diverse models demonstrate that Metis achieves the strongest average Attack Success Rate (ASR) among compared methods at 89.2%, maintaining high efficacy on resilient frontier models (e.g., 76.0% on O1 and 78.0% on GPT-5-chat) where traditional baselines exhibit substantial performance degradation. By replacing redundant exploration with directed optimization, Metis reduces token costs by an average of 8.2x and up to 11.4x. Our analysis reveals that current defenses remain vulnerable to internally-steered, closed-loop reasoning trajectories under the tested settings, highlighting a critical need for next-generation defenses capable of reasoning about safety dynamically during inference.

preprint2026arXiv

Mitigating Long-Tailed Anomaly Score Distributions with Importance-Weighted Loss

Anomaly detection is crucial in industrial applications for identifying rare and unseen patterns to ensure system reliability. Traditional models, trained on a single class of normal data, struggle with real-world distributions where normal data exhibit diverse patterns, leading to class imbalance and long-tailed anomaly score distributions (LTD). This imbalance skews model training and degrades detection performance, especially for minority instances. To address this issue, we propose a novel importance-weighted loss designed specifically for anomaly detection. Compared to the previous method for LTD in classification, our method does not require prior knowledge of normal data classes. Instead, we introduce a weighted loss function that incorporates importance sampling to align the distribution of anomaly scores with a target Gaussian, ensuring a balanced representation of normal data. Extensive experiments on three benchmark image datasets and three real-world hyperspectral imaging datasets demonstrate the robustness of our approach in mitigating LTD-induced bias. Our method improves anomaly detection performance by 0.043, highlighting its effectiveness in real-world applications.

preprint2026arXiv

MobileGeo: Exploring Hierarchical Knowledge Distillation for Resource-Efficient Cross-view Drone Geo-Localization

Cross-view geo-localization (CVGL) plays a vital role in drone-based multimedia applications, enabling precise localization by matching drone-captured aerial images against geo-tagged satellite databases in GNSS-denied environments. However, existing methods rely on resource-intensive feature alignment and multi-branch architectures, incurring high inference costs that limit their deployment on edge devices. We propose MobileGeo, a mobile-friendly framework designed for efficient on-device CVGL: 1) During training, a Hierarchical Distillation (HD-CVGL) paradigm, coupled with Uncertainty-Aware Prediction Alignment (UAPA), distills essential information into a compact model without incurring inference overhead. 2) During inference, an efficient Multi-view Selection Refinement Module (MSRM) leverages mutual information to filter redundant views and reduce computational load. Extensive experiments demonstrate that MobileGeo outperforms previous state-of-the-art methods, achieving a 4.19% improvement in AP on University1652 dataset while being over 5 times efficient in FLOPs and 3 times faster. Crucially, MobileGeo runs at 251.5 FPS on an NVIDIA AGX Orin edge device, demonstrating its practical viability for real-time on-device drone geo-localization. The code is available at https://github.com/SkyEyeLoc/MobileGeo.

preprint2026arXiv

Multi-Level Narrative Evaluation Outperforms Lexical Features for Mental Health

How people narrate their experiences offers a window into how the mind organizes them. Computational approaches to therapeutic writing have evolved from lexical counting to neural methods, yet remain fragmented: dictionary tools miss discourse structure, while embeddings conflate local coherence with global organization. No existing framework maps these techniques onto the hierarchical processes through which narratives are constructed. Here we introduce a three-level framework - micro-level lexical features, meso-level semantic embeddings, and macro-level LLM narrative evaluation - and show, across 830 Chinese therapeutic texts spanning depression, anxiety, and trauma, that macro-level evaluation substantially outperforms lexical and embedding features for mental health prediction. This challenges the field's emphasis on word-counting: formal structural features (Labov's story grammar, RST coherence, propositional composition) demonstrate that narrative organization per se carries predictive signal, while clinically-grounded narrative dimensions capture how psychological states are expressed through discourse. Semantic embeddings add minimal independent value but yield incremental gains in multi-level classification. By grounding computational levels in discourse processing theory, this framework identifies macro-structural organization as the primary locus of clinical signal and generates testable hypotheses for intervention design and longitudinal research.

preprint2026arXiv

Na-IRSTD: Enhancing Infrared Small Target Detection via Native-Resolution Feature Selection and Fusion

Infrared small target detection (IRSTD) faces the inherent challenge of precisely localizing dim targets amid complex background clutter. While progress has been made, existing methods usually follow conventional strategies to downsample features and discard small targets' details, resulting in suboptimal performance. In this paper, we present Na-IRSTD, a native-resolution feature extraction and fusion framework for IRSTD. This framework elegantly incorporates native-resolution features to preserve subtle target cues, overcoming the resolution limitations of existing infrared approaches and significantly improving the model's ability to localize small targets. We also introduce an effective token reduction and selection strategy, which selects target patches with high accuracy and confidence, boosting the low-level details of the feature while effectively reducing native-resolution patch tokens compared to dense processing, thereby avoiding imposing an unbearable computational burden. Extensive experiments demonstrate the robustness and effectiveness of our token reduction and selection strategy across multiple public datasets. Ultimately, our Na-IRSTD model achieves state-of-the-art performance on four benchmarks.

preprint2026arXiv

PRTS: A Primitive Reasoning and Tasking System via Contrastive Representations

Vision-Language-Action (VLA) models advance robotic control via strong visual-linguistic priors. However, existing VLAs predominantly frame pretraining as supervised behavior cloning, overlooking the fundamental nature of robot learning as a goal-reaching process that requires understanding temporal task progress. We present \textbf{PRTS} (\textbf{P}rimitive \textbf{R}easoning and \textbf{T}asking \textbf{S}ystem), a VLA foundation model that reformulates pretraining through Goal-Conditioned Reinforcement Learning. By treating language instructions as goals and employing contrastive reinforcement learning, PRTS learns a unified embedding space where the inner product of state-action and goal embeddings approximates the log-discounted goal occupancy, the probability of reaching the language-specified goal from the current state-action, quantitatively assessing physical feasibility beyond static semantic matching. PRTS draws this dense goal-reachability supervision directly from offline trajectories without reward annotations, and folds it into the VLM backbone via a role-aware causal mask, incurring negligible overhead over vanilla behavior cloning. This paradigm endows the high-level reasoning system with intrinsic goal reachability awareness, bridging semantic reasoning and temporal task progress, and further benefits goal-conditioned action prediction. Pretrained on 167B tokens of diverse manipulation and embodied-reasoning data, PRTS reaches state-of-the-art performance on LIBERO, LIBERO-Pro, LIBERO-Plus, SimplerEnv, and a real-world suite of 14 complex tasks, with particularly substantial gains on long-horizon, contact-rich, and zero-shot novel-instruction settings, confirming that injecting goal-reachability awareness significantly improves both execution success and long-horizon planning of general-purpose robotic foundation policies.

preprint2026arXiv

QwenStyle: Content-Preserving Style Transfer with Qwen-Image-Edit

Content-Preserving Style transfer, given content and style references, remains challenging for Diffusion Transformers (DiTs) due to its internal entangled content and style features. In this technical report, we propose the first content-preserving style transfer model trained on Qwen-Image-Edit, which activates Qwen-Image-Edit's strong content preservation and style customization capability. We collected and filtered high quality data of limited specific styles and synthesized triplets with thousands categories of style images in-the-wild. We introduce the Curriculum Continual Learning framework to train QwenStyle with such mixture of clean and noisy triplets, which enables QwenStyle to generalize to unseen styles without degradation of the precise content preservation capability. Our QwenStyle V1 achieves state-of-the-art performance in three core metrics: style similarity, content consistency, and aesthetic quality.

preprint2026arXiv

Ray-Aware Pointer Memory with Adaptive Updates for Streaming 3D Reconstruction

Dense 3D reconstruction from continuous image streams requires both accurate geometric aggregation and stable long-term memory management. Recent feed-forward reconstruction frameworks integrate observations through persistent memory representations, yet most rely primarily on appearance-based similarity when updating memory. Such appearance-driven integration often leads to redundant accumulation of observations and unstable geometry when viewpoint changes occur. In this work, we propose a ray-aware pointer memory for streaming 3D reconstruction that explicitly models both spatial location and viewing direction within a unified memory representation. Each memory pointer stores its 3D position, associated ray direction, and feature embedding, allowing the system to reason jointly about geometric proximity and viewpoint consistency. Based on this representation, we introduce an adaptive pointer update strategy that replaces traditional fusion-based memory compression with a retain-or-replace mechanism. Instead of averaging nearby observations, the system selectively retains informative pointers while discarding redundant ones, preserving distinctive geometric structures while maintaining bounded memory growth. Furthermore, the joint reasoning over spatial distance and ray-direction discrepancy enables the system to distinguish between local redundancy, novel observations, and potential loop revisits in a unified manner. When loop candidates are detected, pose refinement is triggered to enforce global geometric consistency across the reconstruction. Extensive experiments demonstrate that the proposed ray-aware memory design significantly improves long-term reconstruction stability and camera pose accuracy while maintaining efficient streaming inference. Our approach provides a principled framework for scalable and drift-resistant online 3D reconstruction from image streams.

preprint2026arXiv

RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics

Spatial referring is a fundamental capability of embodied robots to interact with the 3D physical world. However, even with the powerful pretrained vision language models (VLMs), recent approaches are still not qualified to accurately understand the complex 3D scenes and dynamically reason about the instruction-indicated locations for interaction. To this end, we propose RoboRefer, a 3D-aware VLM that can first achieve precise spatial understanding by integrating a disentangled but dedicated depth encoder via supervised fine-tuning (SFT). Moreover, RoboRefer advances generalized multi-step spatial reasoning via reinforcement fine-tuning (RFT), with metric-sensitive process reward functions tailored for spatial referring tasks. To support SFT and RFT training, we introduce RefSpatial, a large-scale dataset of 20M QA pairs (2x prior), covering 31 spatial relations (vs. 15 prior) and supporting complex reasoning processes (up to 5 steps). In addition, we introduce RefSpatial-Bench, a challenging benchmark filling the gap in evaluating spatial referring with multi-step reasoning. Experiments show that SFT-trained RoboRefer achieves state-of-the-art spatial understanding, with an average success rate of 89.6%. RFT-trained RoboRefer further outperforms all other baselines by a large margin, even surpassing Gemini-2.5-Pro by 17.4% in average accuracy on RefSpatial-Bench. Notably, RoboRefer can be integrated with various control policies to execute long-horizon, dynamic tasks across diverse robots (e,g., UR5, G1 humanoid) in cluttered real-world scenes.

preprint2026arXiv

Seirênes: Adversarial Self-Play with Evolving Distractions for LLM Reasoning

We present Seirênes, a self-play RL framework that transforms contextual interference from a failure mode of LLM reasoning into an internal training signal for co-evolving more resilient reasoners. While RL with verifiable rewards has significantly advanced reasoning capabilities, models can still exhibit fragility when encountering non-idealized contexts: scenarios characterized by superfluous information, tangential instructions, or incidental correlations that differ from the clean distributions typical of standard benchmarks. Seirênes harnesses this vulnerability through a parameter-shared and adversarial self-play loop. Within this framework, a single model is trained to both construct plausible yet distracting contexts that expose its own reasoning blind spots, and solve problems by discerning the essential task from these perturbations to recover the core underlying logic. By pitting these competing objectives against each other, Seirênes compels the model to move beyond superficial pattern matching and anchors its capabilities in robust underlying reasoning. This continuous interaction sustains an informative co-evolutionary curriculum as the model improves. Across seven mathematical reasoning benchmarks and model scales from 4B to 30B, Seirênes achieves average gains of +10.2, +9.1, and +7.2 points. Besides, distracting contexts produced by the 4B Seirênes model reduce the accuracy of top-tier closed-source models (GPT and Gemini) by roughly 4--5 points, revealing Seirênes' general ability to uncover reasoning models' blind spots.

preprint2026arXiv

The RoboSense Challenge: Sense Anything, Navigate Anywhere, Adapt Across Platforms

Autonomous systems are increasingly deployed in open and dynamic environments -- from city streets to aerial and indoor spaces -- where perception models must remain reliable under sensor noise, environmental variation, and platform shifts. However, even state-of-the-art methods often degrade under unseen conditions, highlighting the need for robust and generalizable robot sensing. The RoboSense 2025 Challenge is designed to advance robustness and adaptability in robot perception across diverse sensing scenarios. It unifies five complementary research tracks spanning language-grounded decision making, socially compliant navigation, sensor configuration generalization, cross-view and cross-modal correspondence, and cross-platform 3D perception. Together, these tasks form a comprehensive benchmark for evaluating real-world sensing reliability under domain shifts, sensor failures, and platform discrepancies. RoboSense 2025 provides standardized datasets, baseline models, and unified evaluation protocols, enabling large-scale and reproducible comparison of robust perception methods. The challenge attracted 143 teams from 85 institutions across 16 countries, reflecting broad community engagement. By consolidating insights from 23 winning solutions, this report highlights emerging methodological trends, shared design principles, and open challenges across all tracks, marking a step toward building robots that can sense reliably, act robustly, and adapt across platforms in real-world environments.

preprint2026arXiv

Timing is Everything: Temporal Scaffolding of Semantic Surprise in Humor

Humor is a fundamental cognitive phenomenon in which humans derive pleasure from the expectation violations and their resolution, exemplifying the brain's dynamic capacity for predictive processing. Classical humor theories emphasize semantic incongruity as the primary driver of amusement, yet overlook temporal dynamics despite comedians' intuition that "timing is everything." The extent to which temporal structure contributes to humor appreciation and how it interacts with semantic content remains poorly understood. Here, we propose the Dual Prediction Violation (DPV) framework to capture the interplay between content and timing. By analyzing 828 professional Chinese stand-up performances, we show that temporal features substantially outweigh semantic incongruity in predicting audience appreciation. Specifically, we find that peak semantic violations matter more than average incongruity levels, and pauses systematically lengthen before high-surprise punchlines--a strategic coupling that distinguishes successful from unsuccessful performances. These findings reframe humor as temporally scaffolded, where timing and semantic content operate in strategic coordination rather than independently. Our DPV framework bridges humor theory with predictive processing, demonstrating that temporal structure plays a central role in naturalistic humor appreciation with implications for understanding multi-scale prediction integration in linguistic processing.

preprint2026arXiv

Unison: Harmonizing Motion, Speech, and Sound for Human-Centric Audio-Video Generation

Motion, speech, and sound effects are fundamental elements of human-centric videos, yet their heterogeneous temporal characteristics make joint generation highly challenging. Existing audio-video generation models often fail to maintain consistent alignment across these modalities, leading to noticeable mismatches between motion, speech, and environmental sounds. We present Unison, a unified framework that explicitly promotes coherence across the motion, speech, and sound modalities. Within the audio stream, Unison employs a semantic-guided harmonization strategy that decouples the generation of speech and sound-effect components. Leveraging bidirectional audio cross-attention and semantic-conditioned gating for semantic-driven adaptive recomposition, this approach effectively mitigates speech dominance and enhances acoustic clarity. For audio-motion synchronization, we propose a bidirectional cross-modal forcing strategy where the cleaner modality guides the noisier one through decoupled denoising schedules, reinforced by a progressive stabilization strategy. Extensive experiments demonstrate that Unison achieves state-of-the-art performance in both audio perceptual quality and cross-modal synchronization, highlighting the importance of explicit multimodal harmonization in human-centric video generation.

preprint2026arXiv

Utilizing Earth Foundation Models to Enhance the Simulation Performance of Hydrological Models with AlphaEarth Embeddings

Predicting river flow in places without streamflow records is challenging because basins respond differently to climate, terrain, vegetation, and soils. Traditional basin attributes describe some of these differences, but they cannot fully represent the complexity of natural environments. This study examines whether AlphaEarth Foundation embeddings, which are learned from large collections of satellite images rather than designed by experts, offer a more informative way to describe basin characteristics. These embeddings summarize patterns in vegetation, land surface properties, and long-term environmental dynamics. We find that models using them achieve higher accuracy when predicting flows in basins not used for training, suggesting that they capture key physical differences more effectively than traditional attributes. We further investigate how selecting appropriate donor basins influences prediction in ungauged regions. Similarity based on the embeddings helps identify basins with comparable environmental and hydrological behavior, improving performance, whereas adding many dissimilar basins can reduce accuracy. The results show that satellite-informed environmental representations can strengthen hydrological forecasting and support the development of models that adapt more easily to different landscapes.

preprint2025arXiv

Revisiting Mars' Induced Magnetic Field and Clock Angle Departures under Real-Time Upstream Solar Wind Conditions

Mars lacks a global intrinsic dipole magnetic field, but its interaction with the solar wind generates a global induced magnetosphere. Until now, most studies have relied on single-spacecraft measurements, which could not simultaneously capture upstream solar wind conditions and the induced magnetic fields, thereby limiting our understanding of the system. Here, we statistically re-examine the properties of Mars' induced magnetic field by incorporating, for the first time, real-time upstream solar wind conditions from the coordinated MAVEN and Tianwen-1 observations. Our results are show that both solar wind dynamic pressure and the interplanetary magnetic field (IMF) magnitude enhance the strength of the induced magnetic field, but they exert opposite effects on the compression ratio: higher dynamic pressure strengthens compression, while stronger IMF weakens it. The induced field is stronger under quasi-perpendicular IMF conditions compared with quasi-parallel IMF, reflecting a stronger mass-loading effect. We further investigate the clock angle departures of the induced fields. They remain relatively small in the magnetosheath near the bow shock, increase gradually toward the induced magnetosphere, and become significantly larger within the induced magnetosphere. In addition, clock angle departures are strongly enhanced under quasi-parallel IMF conditions. Their dependence on upstream drivers further shows that, within the magnetosheath, clock angle departures are minimized under low dynamic pressure, high IMF magnitude, and low Alfven Mach number conditions. These results may enhance our understanding of solar wind interaction with Mars, and highlight the critical role of multi-point observations.

preprint2025arXiv

TeleWorld: Towards Dynamic Multimodal Synthesis with a 4D World Model

World models aim to endow AI systems with the ability to represent, generate, and interact with dynamic environments in a coherent and temporally consistent manner. While recent video generation models have demonstrated impressive visual quality, they remain limited in real-time interaction, long-horizon consistency, and persistent memory of dynamic scenes, hindering their evolution into practical world models. In this report, we present TeleWorld, a real-time multimodal 4D world modeling framework that unifies video generation, dynamic scene reconstruction, and long-term world memory within a closed-loop system. TeleWorld introduces a novel generation-reconstruction-guidance paradigm, where generated video streams are continuously reconstructed into a dynamic 4D spatio-temporal representation, which in turn guides subsequent generation to maintain spatial, temporal, and physical consistency. To support long-horizon generation with low latency, we employ an autoregressive diffusion-based video model enhanced with Macro-from-Micro Planning (MMPL)--a hierarchical planning method that reduces error accumulation from frame-level to segment-level-alongside efficient Distribution Matching Distillation (DMD), enabling real-time synthesis under practical computational budgets. Our approach achieves seamless integration of dynamic object modeling and static scene representation within a unified 4D framework, advancing world models toward practical, interactive, and computationally accessible systems. Extensive experiments demonstrate that TeleWorld achieves strong performance in both static and dynamic world understanding, long-term consistency, and real-time generation efficiency, positioning it as a practical step toward interactive, memory-enabled world models for multimodal generation and embodied intelligence.

preprint2024arXiv

Quantum Dueling: an Efficient Solution for Combinatorial Optimization

In this paper, we present a new algorithm for generic combinatorial optimization, which we term quantum dueling. Traditionally, potential solutions to the given optimization problems were encoded in a ``register'' of qubits. Various techniques are used to increase the probability of finding the best solution upon measurement. Quantum dueling innovates by integrating an additional qubit register, effectively creating a ``dueling'' scenario where two sets of solutions compete. This dual-register setup allows for a dynamic amplification process: in each iteration, one register is designated as the 'opponent', against which the other register's more favorable solutions are enhanced through a controlled quantum search. This iterative process gradually steers the quantum state within both registers toward the optimal solution. With a quantitative contraction for the evolution of the state vector, classical simulation under a broad range of scenarios and hyper-parameter selection schemes shows that a quadratic speedup is achieved, which is further tested in more real-world situations. In addition, quantum dueling can be generalized to incorporate arbitrary quantum search techniques and as a quantum subroutine within a higher-level algorithm. Our work demonstrates that increasing the number of qubits allows the development of previously unthought-of algorithms, paving the way for advancement of efficient quantum algorithm design.

preprint2024arXiv

Two-Stage Constrained Actor-Critic for Short Video Recommendation

The wide popularity of short videos on social media poses new opportunities and challenges to optimize recommender systems on the video-sharing platforms. Users sequentially interact with the system and provide complex and multi-faceted responses, including watch time and various types of interactions with multiple videos. One the one hand, the platforms aims at optimizing the users' cumulative watch time (main goal) in long term, which can be effectively optimized by Reinforcement Learning. On the other hand, the platforms also needs to satisfy the constraint of accommodating the responses of multiple user interactions (auxiliary goals) such like, follow, share etc. In this paper, we formulate the problem of short video recommendation as a Constrained Markov Decision Process (CMDP). We find that traditional constrained reinforcement learning algorithms can not work well in this setting. We propose a novel two-stage constrained actor-critic method: At stage one, we learn individual policies to optimize each auxiliary signal. At stage two, we learn a policy to (i) optimize the main signal and (ii) stay close to policies learned at the first stage, which effectively guarantees the performance of this main policy on the auxiliaries. Through extensive offline evaluations, we demonstrate effectiveness of our method over alternatives in both optimizing the main goal as well as balancing the others. We further show the advantage of our method in live experiments of short video recommendations, where it significantly outperforms other baselines in terms of both watch time and interactions. Our approach has been fully launched in the production system to optimize user experiences on the platform.

preprint2023arXiv

Probing temperature-responsivity of microgels and its interplay with a solid surface by superresolution microscopy and numerical simulations

Superresolution microscopy has become a powerful tool to investigate the internal structure of complex colloidal and polymeric systems, such as microgels, at the nanometer scale. The ability to monitor microgels response to temperature changes in situ opens new and exciting opportunities to design and precisely control their behaviour for various applications. When performing advanced microscopy experiments, interactions between the particle and the environment can be important. Often microgels are deposited on a substrate since they have to remain still for several minutes during the experiment. This study uses dSTORM microscopy and advanced coarse-grained molecular dynamics simulations to investigate, for the first time, how individual microgels anchored on hydrophilic and hydrophobic surfaces undergo their volume phase transition in temperature. We find that, in the presence of a hydrophilic substrate, the structure of the microgel is unperturbed and the resulting density profiles quantitatively agree with simulations performed in bulk conditions. Instead, when a hydrophobic surface is used, the microgel spreads at the interface and an interesting competition between the two hydrophobic strengths -- monomer-monomer vs monomer-surface -- comes into play at high temperatures. The remarkable agreement between experiments and simulations makes the present study a fundamental step to establish this high-resolution monitoring technique as a platform for investigating more complex systems, being these either macromolecules with peculiar internal structure or nanocomplexes where molecules of interest can be encapsulated in the microgel network and controllably released with temperature.

preprint2022arXiv

A nice two-loop next-to-next-to-MHV amplitude in ${\cal N}=4$ super-Yang-Mills

We study a scalar component of the 8-point next-to-next-to-maximally-helicity-violating (N${}^2$MHV) amplitude at two-loop level in ${\cal N}=4$ super-Yang-Mills theory; it has a leading singularity proportional to the inverse of the four-mass-box square root and receives contributions from only two types of non-trivial integrals with one-loop infrared (IR) divergences. We compute such two-loop 8-point integrals by taking (double-)collinear limits of certain finite, dual-conformal-invariant integrals, and they nicely give the IR-safe ratio function after subtracting divergences. As the first genuine two-loop N${}^2$MHV amplitude computed explicitly, we find remarkable structures in its symbol and alphabet: similar to the next-to-MHV (NMHV) case, there are still 9 algebraic letters associated with the square root, and the latter also becomes a letter for the first time; unlike the NMHV case, such algebraic letters appear at either one or all of the second, third and last entry, and the part with three odd letters is particularly simple.

preprint2022arXiv

A Structured Method for Compilation of QAOA Circuits in Quantum Computing

Quantum Approximation Optimization Algorithm (QAOA) is a highly advocated variational algorithm for solving the combinatorial optimization problem. One critical feature in the quantum circuit of QAOA algorithm is that it consists of two-qubit operators that commute. The flexibility in reordering the two-qubit gates allows compiler optimizations to generate circuits with better depths, gate count, and fidelity. However, it also imposes significant challenges due to additional freedom exposed in the compilation. Prior studies lack the following: (1) Performance guarantee, (2) Scalability, and (3) Awareness of regularity in scalable hardware. We propose a structured method that ensures linear depth for any compiled QAOA circuit on multi-dimensional quantum architectures. We also demonstrate how our method runs on Google Sycamore and IBM Non-linear architectures in a scalable manner and in linear time. Overall, we can compile a circuit with up to 1024 qubits in 10 seconds with a 3.8X speedup in depth, 17% reduction in gate count, and 18X improvement for circuit ESP.

preprint2022arXiv

Automatic Generation of Product-Image Sequence in E-commerce

Product images are essential for providing desirable user experience in an e-commerce platform. For a platform with billions of products, it is extremely time-costly and labor-expensive to manually pick and organize qualified images. Furthermore, there are the numerous and complicated image rules that a product image needs to comply in order to be generated/selected. To address these challenges, in this paper, we present a new learning framework in order to achieve Automatic Generation of Product-Image Sequence (AGPIS) in e-commerce. To this end, we propose a Multi-modality Unified Image-sequence Classifier (MUIsC), which is able to simultaneously detect all categories of rule violations through learning. MUIsC leverages textual review feedback as the additional training target and utilizes product textual description to provide extra semantic information. Based on offline evaluations, we show that the proposed MUIsC significantly outperforms various baselines. Besides MUIsC, we also integrate some other important modules in the proposed framework, such as primary image selection, noncompliant content detection, and image deduplication. With all these modules, our framework works effectively and efficiently in JD.com recommendation platform. By Dec 2021, our AGPIS framework has generated high-standard images for about 1.5 million products and achieves 13.6% in reject rate.

preprint2022arXiv

Buffer Pool Aware Query Scheduling via Deep Reinforcement Learning

In this extended abstract, we propose a new technique for query scheduling with the explicit goal of reducing disk reads and thus implicitly increasing query performance. We introduce SmartQueue, a learned scheduler that leverages overlapping data reads among incoming queries and learns a scheduling strategy that improves cache hits. SmartQueue relies on deep reinforcement learning to produce workload-specific scheduling strategies that focus on long-term performance benefits while being adaptive to previously-unseen data access patterns. We present results from a proof-of-concept prototype, demonstrating that learned schedulers can offer significant performance improvements over hand-crafted scheduling heuristics. Ultimately, we make the case that this is a promising research direction at the intersection of machine learning and databases.

preprint2022arXiv

Constrained Reinforcement Learning for Short Video Recommendation

The wide popularity of short videos on social media poses new opportunities and challenges to optimize recommender systems on the video-sharing platforms. Users provide complex and multi-faceted responses towards recommendations, including watch time and various types of interactions with videos. As a result, established recommendation algorithms that concern a single objective are not adequate to meet this new demand of optimizing comprehensive user experiences. In this paper, we formulate the problem of short video recommendation as a constrained Markov Decision Process (MDP), where platforms want to optimize the main goal of user watch time in long term, with the constraint of accommodating the auxiliary responses of user interactions such as sharing/downloading videos. To solve the constrained MDP, we propose a two-stage reinforcement learning approach based on actor-critic framework. At stage one, we learn individual policies to optimize each auxiliary response. At stage two, we learn a policy to (i) optimize the main response and (ii) stay close to policies learned at the first stage, which effectively guarantees the performance of this main policy on the auxiliaries. Through extensive simulations, we demonstrate effectiveness of our approach over alternatives in both optimizing the main goal as well as balancing the others. We further show the advantage of our approach in live experiments of short video recommendations, where it significantly outperforms other baselines in terms of watch time and interactions from video views. Our approach has been fully launched in the production system to optimize user experiences on the platform.

preprint2022arXiv

CRCNet: Few-shot Segmentation with Cross-Reference and Region-Global Conditional Networks

Few-shot segmentation aims to learn a segmentation model that can be generalized to novel classes with only a few training images. In this paper, we propose a Cross-Reference and Local-Global Conditional Networks (CRCNet) for few-shot segmentation. Unlike previous works that only predict the query image's mask, our proposed model concurrently makes predictions for both the support image and the query image. Our network can better find the co-occurrent objects in the two images with a cross-reference mechanism, thus helping the few-shot segmentation task. To further improve feature comparison, we develop a local-global conditional module to capture both global and local relations. We also develop a mask refinement module to refine the prediction of the foreground regions recurrently. Experiments on the PASCAL VOC 2012, MS COCO, and FSS-1000 datasets show that our network achieves new state-of-the-art performance.

preprint2022arXiv

DETR++: Taming Your Multi-Scale Detection Transformer

Convolutional Neural Networks (CNN) have dominated the field of detection ever since the success of AlexNet in ImageNet classification [12]. With the sweeping reform of Transformers [27] in natural language processing, Carion et al. [2] introduce the Transformer-based detection method, i.e., DETR. However, due to the quadratic complexity in the self-attention mechanism in the Transformer, DETR is never able to incorporate multi-scale features as performed in existing CNN-based detectors, leading to inferior results in small object detection. To mitigate this issue and further improve performance of DETR, in this work, we investigate different methods to incorporate multi-scale features and find that a Bi-directional Feature Pyramid (BiFPN) works best with DETR in further raising the detection precision. With this discovery, we propose DETR++, a new architecture that improves detection results by 1.9% AP on MS COCO 2017, 11.5% AP on RICO icon detection, and 9.1% AP on RICO layout extraction over existing baselines.

preprint2022arXiv

Double-End Queues with Non-Poisson Inputs and Their Effective Algorithms

It is interesting and challenging to study double-ended queues with First-Come-First-Match discipline under customers' impatient behavior and non-Poisson inputs. The system stability can be guaranteed by the customers' impatient behavior, while the existence of impatient customers makes analysis of such double-ended queues more difficult or even impossible to find an explicitly analytic solution, thus it becomes more and more important to develop effective numerical methods in a variety of practical matching problems. This paper studies a block-structured double-ended queue, whose block structure comes from two independent Markovian arrival processes (MAPs), which are non-Poisson inputs. We show that such a queue can be expressed as a new bilateral quasi birth-and-death (QBD) process which has its own interest. Based on this, we provide a detailed analysis for both the bilateral QBD process and the double-ended queue, including the system stability, the queue size distributions, the average stationary queue lengths, and the sojourn time of any arriving customers. Furthermore, we develop three effective algorithms for computing the performance measures (i.e., the probabilities of stationary queue lengths, the average stationary queue lengths, and the average sojourn times) of the double-ended queue with non-Poisson inputs. Finally, we use some numerical examples in tabular and graphical to illustrate how the performance measures are influenced by some key system parameters. We believe that the methodology and results described in this paper can be applicable to deal with more general double-ended queues in practice, and develop some effective algorithms for the purpose of many actual uses.

preprint2022arXiv

Energy Efficient Federated Learning over Heterogeneous Mobile Devices via Joint Design of Weight Quantization and Wireless Transmission

Federated learning (FL) is a popular collaborative distributed machine learning paradigm across mobile devices. However, practical FL over resource constrained mobile devices confronts multiple challenges, e.g., the local on-device training and model updates in FL are power hungry and radio resource intensive for mobile devices. To address these challenges, in this paper, we attempt to take FL into the design of future wireless networks and develop a novel joint design of wireless transmission and weight quantization for energy efficient FL over mobile devices. Specifically, we develop flexible weight quantization schemes to facilitate on-device local training over heterogeneous mobile devices. Based on the observation that the energy consumption of local computing is comparable to that of model updates, we formulate the energy efficient FL problem into a mixed-integer programming problem where the quantization and spectrum resource allocation strategies are jointly determined for heterogeneous mobile devices to minimize the overall FL energy consumption (computation + transmissions) while guaranteeing model performance and training latency. Since the optimization variables of the problem are strongly coupled, an efficient iterative algorithm is proposed, where the bandwidth allocation and weight quantization levels are derived. Extensive simulations are conducted to verify the effectiveness of the proposed scheme.

preprint2022arXiv

Few-shot Open-set Recognition Using Background as Unknowns

Few-shot open-set recognition aims to classify both seen and novel images given only limited training data of seen classes. The challenge of this task is that the model is required not only to learn a discriminative classifier to classify the pre-defined classes with few training data but also to reject inputs from unseen classes that never appear at training time. In this paper, we propose to solve the problem from two novel aspects. First, instead of learning the decision boundaries between seen classes, as is done in standard close-set classification, we reserve space for unseen classes, such that images located in these areas are recognized as the unseen classes. Second, to effectively learn such decision boundaries, we propose to utilize the background features from seen classes. As these background regions do not significantly contribute to the decision of close-set classification, it is natural to use them as the pseudo unseen classes for classifier learning. Our extensive experiments show that our proposed method not only outperforms multiple baselines but also sets new state-of-the-art results on three popular benchmarks, namely tieredImageNet, miniImageNet, and Caltech-USCD Birds-200-2011 (CUB).

preprint2022arXiv

Few-shot Segmentation with Optimal Transport Matching and Message Flow

We tackle the challenging task of few-shot segmentation in this work. It is essential for few-shot semantic segmentation to fully utilize the support information. Previous methods typically adopt masked average pooling over the support feature to extract the support clues as a global vector, usually dominated by the salient part and lost certain essential clues. In this work, we argue that every support pixel's information is desired to be transferred to all query pixels and propose a Correspondence Matching Network (CMNet) with an Optimal Transport Matching module to mine out the correspondence between the query and support images. Besides, it is critical to fully utilize both local and global information from the annotated support images. To this end, we propose a Message Flow module to propagate the message along the inner-flow inside the same image and cross-flow between support and query images, which greatly helps enhance the local feature representations. Experiments on PASCAL VOC 2012, MS COCO, and FSS-1000 datasets show that our network achieves new state-of-the-art few-shot segmentation performance.

preprint2022arXiv

Four-Family ${\cal N}=1$ Supersymmetric Pati-Salam Models from Intersecting D6-Branes

We investigate the construction of four-family ${\cal N}=1$ supersymmetric Pati-Salam models from Type IIA $\mathbb{T}^6/(\mathbb{Z}_2 \times \mathbb{Z}_2)$ orientifold with intersecting D6-branes. Utilizing the deterministic algorithm introduced in Ref. \cite{heCompleteSearchSupersymmetric2021}, we obtain $274$ types of models with three rectangular tori and distinct gauge coupling relations at string scale, while $6$ types of models with two rectangular tori and one titled torus. In both cases, there exists a class of models with gauge coupling unification at string scale. In particular, for the models with two rectangular tori, one tilted torus and gauge coupling unification, the gaugino condensations are allowed, and thus supersymmetry breaking and moduli stabilization are possible for further phenomenological study.

preprint2022arXiv

Functions Beyond Multiple Polylogarithms for Precision Collider Physics

Feynman diagrams constitute one of the essential ingredients for making precision predictions for collider experiments. Yet, while the simplest Feynman diagrams can be evaluated in terms of multiple polylogarithms -- whose properties as special functions are well understood -- more complex diagrams often involve integrals over complicated algebraic manifolds. Such diagrams already contribute at NNLO to the self-energy of the electron, $t \bar{t}$ production, $γγ$ production, and Higgs decay, and appear at two loops in the planar limit of maximally supersymmetric Yang-Mills theory. This makes the study of these more complicated types of integrals of phenomenological as well as conceptual importance. In this white paper contribution to the Snowmass community planning exercise, we provide an overview of the state of research on Feynman diagrams that involve special functions beyond multiple polylogarithms, and highlight a number of research directions that constitute essential avenues for future investigation.

preprint2022arXiv

Inner-shell excitation in the YbF molecule and its impact on laser cooling

The YbF molecule is a sensitive system for measuring the electron's electric dipole moment. The precision of this measurement can be improved by direct laser cooling of the molecules to ultracold temperature. However, low-lying electronic states arising from excitation of a 4f electron may hinder laser cooling. One set of these "4f hole" states lies below the $A^2Π_{1/2}$ excited state used for laser cooling, and radiative decay to these intermediate levels, even with branching ratios as small as $10^{-5}$, can be a hindrance. Other 4f hole states lie very close to the $A^2Π_{1/2}$ state, and a perturbation results in states of mixed character that are involved in the laser cooling cycle. This perturbation may enhance the loss of molecules to states outside of the laser cooling cycle. We model the perturbation of the $A^2Π_{1/2}$ state to determine the strength of the coupling between the states, the de-perturbed potential energy curves, and the radiative branching ratios to various vibrational levels of the ground state, $X ^{2}Σ^+$. We use electronic structure calculations to characterise the 4f hole states and the strengths of transitions between these states and the $A^2Π_{1/2}$ and $X ^{2}Σ^+$ states. We identify a leak out of the cooling cycle with a branching ratio of roughly $5 \times 10^{-4}$, dominated by the contribution of the ground state configuration in a 4f hole state. Finally, we assess the impact of these results for laser cooling of YbF and molecules with similar structure.

preprint2022arXiv

Learning Algebraic Representation for Systematic Generalization in Abstract Reasoning

Is intelligence realized by connectionist or classicist? While connectionist approaches have achieved superhuman performance, there has been growing evidence that such task-specific superiority is particularly fragile in systematic generalization. This observation lies in the central debate between connectionist and classicist, wherein the latter continually advocates an algebraic treatment in cognitive architectures. In this work, we follow the classicist's call and propose a hybrid approach to improve systematic generalization in reasoning. Specifically, we showcase a prototype with algebraic representation for the abstract spatial-temporal reasoning task of Raven's Progressive Matrices (RPM) and present the ALgebra-Aware Neuro-Semi-Symbolic (ALANS) learner. The ALANS learner is motivated by abstract algebra and the representation theory. It consists of a neural visual perception frontend and an algebraic abstract reasoning backend: the frontend summarizes the visual information from object-based representation, while the backend transforms it into an algebraic structure and induces the hidden operator on the fly. The induced operator is later executed to predict the answer's representation, and the choice most similar to the prediction is selected as the solution. Extensive experiments show that by incorporating an algebraic treatment, the ALANS learner outperforms various pure connectionist models in domains requiring systematic generalization. We further show the generative nature of the learned algebraic representation; it can be decoded by isomorphism to generate an answer.

preprint2022arXiv

Multi-agent Databases via Independent Learning

Machine learning is rapidly being used in database research to improve the effectiveness of numerous tasks included but not limited to query optimization, workload scheduling, physical design, etc. Currently, the research focus has been on replacing a single database component responsible for one task by its learning-based counterpart. However, query performance is not simply determined by the performance of a single component, but by the cooperation of multiple ones. As such, learning based database components need to collaborate during both training and execution in order to develop policies that meet end performance goals. Thus, the paper attempts to address the question "Is it possible to design a database consisting of various learned components that cooperatively work to improve end-to-end query latency?". To answer this question, we introduce MADB (Multi-Agent DB), a proof-of-concept system that incorporates a learned query scheduler and a learned query optimizer. MADB leverages a cooperative multi-agent reinforcement learning approach that allows the two components to exchange the context of their decisions with each other and collaboratively work towards reducing the query latency. Preliminary results demonstrate that MADB can outperform the non-cooperative integration of learned components.

preprint2022arXiv

Optimization and implementation of a surface-electrode ion trap junction

We describe the design of a surface-electrode ion trap junction, which is a key element for large-scale ion trap arrays. A bi-objective optimization method is used for designing the electrodes, which maintains the total pseudo-potential curvature while minimizing the axial pseudo-potential gradient along the ion transport path. To facilitate the laser beam delivery for parallel operations in multiple trap zones, we implemented integrated optics on each arm of this X-junction trap. The layout of the trap chip for commercial foundry fabrication is presented. This work suggests routes to improving ion trap junction performance in scalable implementations. Together with integrated optical addressing, this contributes to modular trapped-ion quantum computing in interconnected 2-dimensional arrays.

preprint2022arXiv

Origin of Nonlinear Damping due to Mode Coupling in Auto-Oscillatory Modes Strongly Driven by Spin-Orbit Torque

We investigate the physical origin of nonlinear damping due to mode coupling between several auto-oscillatory modes driven by spin-orbit torque in constricted Py/Pt heterostructures by examining the dependence of auto-oscillation on temperature and applied field orientation. We observe a transition in the nonlinear damping of the auto-oscillation modes extracted from the total oscillation power as a function of drive current, which coincides with the onset of power redistribution amongst several modes and the crossover from linewidth narrowing to linewidth broadening in all individual modes. This indicates the activation of another relaxation process by nonlinear magnon-magnon scattering within the modes. We also find that both nonlinear damping and threshold current in the mode-interaction damping regime at high drive current after transition are temperature independent, suggesting that the mode coupling occurs dominantly through a non-thermal magnon scattering process via a dipole or exchange interaction rather than thermally excited magnon-mediated scattering. This finding presents a promising pathway to overcome the current limitations of efficiently controlling the interaction between two highly nonlinear magnetic oscillators to prevent mode crosstalk or inter-mode energy transfer and deepens understanding of complex nonlinear spin dynamics in multimode spin wave systems.

preprint2022arXiv

Quantum computation in a hybrid array of molecules and Rydberg atoms

We show that an array of polar molecules interacting with Rydberg atoms is a promising hybrid system for scalable quantum computation. Quantum information is stored in long-lived hyperfine or rotational states of molecules which interact indirectly through resonant dipole-dipole interactions with Rydberg atoms. A two-qubit gate based on this interaction has a duration of 1 $μ$s and an achievable fidelity of 99.9%. The gate has little sensitivity to the motional states of the particles -- the molecules can be in thermal states, the atoms do not need to be trapped during Rydberg excitation, the gate does not heat the molecules, and heating of the atoms has a negligible effect. Within a large, static array, the gate can be applied to arbitrary pairs of molecules separated by tens of micrometres, making the scheme highly scalable. The molecule-atom interaction can also be used for rapid qubit initialization and efficient, non-destructive qubit readout, without driving any molecular transitions. Single qubit gates are driven using microwave pulses alone, exploiting the strong electric dipole transitions between rotational states. Thus, all operations required for large scale quantum computation can be done without moving the molecules or exciting them out of their ground electronic states.

preprint2022arXiv

Scenario-based Multi-product Advertising Copywriting Generation for E-Commerce

In this paper, we proposed an automatic Scenario-based Multi-product Advertising Copywriting Generation system (SMPACG) for E-Commerce, which has been deployed on a leading Chinese e-commerce platform. The proposed SMPACG consists of two main components: 1) an automatic multi-product combination selection module, which itself is consisted of a topic prediction model, a pattern and attribute-based selection model and an arbitrator model; and 2) an automatic multi-product advertising copywriting generation module, which combines our proposed domain-specific pretrained language model and knowledge-based data enhancement model. The SMPACG is the first system that realizes automatic scenario-based multi-product advertising contents generation, which achieves significant improvements over other state-of-the-art methods. The SMPACG has been not only developed for directly serving for our e-commerce recommendation system, but also used as a real-time writing assistant tool for merchants.

preprint2022arXiv

Sobolev Training for Implicit Neural Representations with Approximated Image Derivatives

Recently, Implicit Neural Representations (INRs) parameterized by neural networks have emerged as a powerful and promising tool to represent different kinds of signals due to its continuous, differentiable properties, showing superiorities to classical discretized representations. However, the training of neural networks for INRs only utilizes input-output pairs, and the derivatives of the target output with respect to the input, which can be accessed in some cases, are usually ignored. In this paper, we propose a training paradigm for INRs whose target output is image pixels, to encode image derivatives in addition to image values in the neural network. Specifically, we use finite differences to approximate image derivatives. We show how the training paradigm can be leveraged to solve typical INRs problems, i.e., image regression and inverse rendering, and demonstrate this training paradigm can improve the data-efficiency and generalization capabilities of INRs. The code of our method is available at \url{https://github.com/megvii-research/Sobolev_INRs}.

preprint2022arXiv

TENET: Transformer Encoding Network for Effective Temporal Flow on Motion Prediction

This technical report presents an effective method for motion prediction in autonomous driving. We develop a Transformer-based method for input encoding and trajectory prediction. Besides, we propose the Temporal Flow Header to enhance the trajectory encoding. In the end, an efficient K-means ensemble method is used. Using our Transformer network and ensemble method, we win the first place of Argoverse 2 Motion Forecasting Challenge with the state-of-the-art brier-minFDE score of 1.90.

preprint2022arXiv

The elliptic double box and symbology beyond polylogarithms

We study the elliptic double-box integral, which contributes to generic massless QFTs and is the only contribution to a particular 10-point scattering amplitude in N=4 SYM theory. Based on a Feynman parametrization, we express this integral in terms of elliptic polylogarithms. We then study its symbol, finding a rich structure and remarkable similarity with the non-elliptic case. In particular, the first entry of the symbol is expressible in terms of logarithms of dual-conformal cross-ratios, and elliptic letters only occur in the last two entries. Moreover, the symbol makes manifest a differential equation relating the double-box integral to a 6D hexagon integral, suggesting that it can be bootstrapped based on the latter integral alone.

preprint2022arXiv

Tree Representation, Growth Rate of Blockchain and Reward Allocation in Ethereum with Multiple Mining Pools

It is interesting but difficult and challenging to study Ethereum with multiple mining pools. One of the main difficulties comes from not only how to represent such a general tree with multiple block branches (or sub-chains) related to the multiple mining pools, but also how to analyze a multi-dimensional stochastic system due to the mining competition among the multiple mining pools. In this paper, we first set up a mathematical representation for the tree with multiple block branches. Then we provide a block classification of Ethereum: Regular blocks (in the main chain), orphan blocks, uncle blocks, stale blocks, and nephew blocks, and give some key probabilities of generating the different types of blocks by applying the law of large numbers. Based on this, we further discuss the growth rate of blockchain, and the reward allocation among the multiple mining pools through applying the renewal reward theorem. Finally, we use some simulation experiments to verify our theoretical results, and show that the approximate computation approaches developed, such as the key probabilities, the long-term growth rate of blockchain, and the long-term reward allocation (rate) among the multiple mining pools, can have a faster convergence. Therefore, we provide a powerful tool for observing and understanding the influence of the selfish mining attacks on the performance of Ethereum with multiple mining pools. We believe that the methodology and results developed in this paper will shed light on the study of Ethereum with multiple mining pools, such that a series of promising research can be inspired potentially.

preprint2022arXiv

Weight-dependent Gates for Network Pruning

In this paper, a simple yet effective network pruning framework is proposed to simultaneously address the problems of pruning indicator, pruning ratio, and efficiency constraint. This paper argues that the pruning decision should depend on the convolutional weights, and thus proposes novel weight-dependent gates (W-Gates) to learn the information from filter weights and obtain binary gates to prune or keep the filters automatically. To prune the network under efficiency constraints, a switchable Efficiency Module is constructed to predict the hardware latency or FLOPs of candidate pruned networks. Combined with the proposed Efficiency Module, W-Gates can perform filter pruning in an efficiency-aware manner and achieve a compact network with a better accuracy-efficiency trade-off. We have demonstrated the effectiveness of the proposed method on ResNet34, ResNet50, and MobileNet V2, respectively achieving up to 1.33/1.28/1.1 higher Top-1 accuracy with lower hardware latency on ImageNet. Compared with state-of-the-art methods, W-Gates also achieves superior performance.

preprint2021arXiv

A simple artificial damping method for total Lagrangian smoothed particle hydrodynamics

In this paper, we present a simple artificial damping method to enhance the robustness of total Lagrangian smoothed particle hydrodynamics (TL-SPH). Specifically, an artificial damping stress based on the Kelvin-Voigt type damper with a scaling factor imitating a von Neumann-Richtmyer type artificial viscosity is introduced in the constitutive equation to alleviate the spurious oscillation in the vicinity of the sharp spatial gradients. After validating the robustness and accuracy of the present method with a set of benchmark tests with very challenging cases, we demonstrate its potentials in the field of bio-mechanics by simulating the deformation of complex stent structures.

preprint2021arXiv

Conditional Gaussian Distribution Learning for Open Set Recognition

Deep neural networks have achieved state-of-the-art performance in a wide range of recognition/classification tasks. However, when applying deep learning to real-world applications, there are still multiple challenges. A typical challenge is that unknown samples may be fed into the system during the testing phase and traditional deep neural networks will wrongly recognize the unknown sample as one of the known classes. Open set recognition is a potential solution to overcome this problem, where the open set classifier should have the ability to reject unknown samples as well as maintain high classification accuracy on known classes. The variational auto-encoder (VAE) is a popular model to detect unknowns, but it cannot provide discriminative representations for known classification. In this paper, we propose a novel method, Conditional Gaussian Distribution Learning (CGDL), for open set recognition. In addition to detecting unknown samples, this method can also classify known samples by forcing different latent features to approximate different Gaussian models. Meanwhile, to avoid information hidden in the input vanishing in the middle layers, we also adopt the probabilistic ladder architecture to extract high-level abstract features. Experiments on several standard image datasets reveal that the proposed method significantly outperforms the baseline method and achieves new state-of-the-art results.

preprint2021arXiv

End-to-End Human Object Interaction Detection with HOI Transformer

We propose HOI Transformer to tackle human object interaction (HOI) detection in an end-to-end manner. Current approaches either decouple HOI task into separated stages of object detection and interaction classification or introduce surrogate interaction problem. In contrast, our method, named HOI Transformer, streamlines the HOI pipeline by eliminating the need for many hand-designed components. HOI Transformer reasons about the relations of objects and humans from global image context and directly predicts HOI instances in parallel. A quintuple matching loss is introduced to force HOI predictions in a unified way. Our method is conceptually much simpler and demonstrates improved accuracy. Without bells and whistles, HOI Transformer achieves $26.61\% $ $ AP $ on HICO-DET and $52.9\%$ $AP_{role}$ on V-COCO, surpassing previous methods with the advantage of being much simpler. We hope our approach will serve as a simple and effective alternative for HOI tasks. Code is available at https://github.com/bbepoch/HoiTransformer .

preprint2021arXiv

Experimental Side-Channel-Free Quantum Key Distribution

Quantum key distribution can provide unconditionally secure key exchange for remote users in theory. In practice, however, in most quantum key distribution systems, quantum hackers might steal the secure keys by listening to the side channels in the source, such as the photon frequency spectrum, emission time, propagation direction, spatial angular momentum, and so on. It is hard to prevent such kinds of attacks because side channels may exist in any of the encoding space whether the designers take care of or not. Here we report an experimental realization of a side-channel-free quantum key distribution protocol which is not only measurement-device-independent, but also immune to all side-channel attacks in the source. We achieve a secure key rate of 4.80e-7 per pulse through 50 km fiber spools.

preprint2021arXiv

Generalised and efficient wall boundary condition treatment in GPU-accelerated smoothed particle hydrodynamics

This paper presents a generalised and efficient wall boundary treatment in the smoothed particle hydrodynamics (SPH) method for 3-D complex and arbitrary geometries with single- and multi-phase flows to be executed on graphics processing units (GPUs). Using a force balance between the wall and fluid particles with a novel penalty method, a pressure boundary condition is applied on the wall dummy particles which effectively prevents non-physical particle penetration into the wall boundaries also in highly violent impacts and multi-phase flows with high density ratios. A new density reinitialisation scheme is also presented to enhance the accuracy. The proposed method is very simple in comparison with previous wall boundary formulations on GPUs that enforces no additional memory caching and thus is ideally suited for heterogeneous architectures of GPUs. The method is validated in various test cases involving violent single- and multi-phase flows in arbitrary geometries and demonstrates very good robustness, accuracy and performance. The new wall boundary condition treatment is able to improve the high accuracy of its previous version \citep{ADAMI2012wall} also in complex 3-D and multi-phase problems, while it is efficiently executable on GPUs with single precision floating points arithmetic which makes it suitable for a wide range of GPUs, including consumer graphic cards. Therefore, the method is a reliable solution for the long-lasting challenge of the wall boundary condition in the SPH method for a broad range of natural and industrial applications.

preprint2021arXiv

Infections Forecasting and Intervention Effect Evaluation for COVID-19 via a Data-Driven Markov Process and Heterogeneous Simulation

The Coronavirus Disease 2019 (COVID-19) pandemic has caused tremendous amount of deaths and a devastating impact on the economic development all over the world. Thus, it is paramount to control its further transmission, for which purpose it is necessary to find the mechanism of its transmission process and evaluate the effect of different control strategies. To deal with these issues, we describe the transmission of COVID-19 as an explosive Markov process with four parameters. The state transitions of the proposed Markov process can clearly disclose the terrible explosion and complex heterogeneity of COVID-19. Based on this, we further propose a simulation approach with heterogeneous infections. Experimentations show that our approach can closely track the real transmission process of COVID-19, disclose its transmission mechanism, and forecast the transmission under different non-drug intervention strategies. More importantly, our approach can helpfully develop effective strategies for controlling COVID-19 and appropriately compare their control effect in different countries/cities.

preprint2021arXiv

Low-cost and high-performance data augmentation for deep-learning-based skin lesion classification

Although deep convolutional neural networks (DCNNs) have achieved significant accuracy in skin lesion classification comparable or even superior to those of dermatologists, practical implementation of these models for skin cancer screening in low resource settings is hindered by their limitations in computational cost and training dataset. To overcome these limitations, we propose a low-cost and high-performance data augmentation strategy that includes two consecutive stages of augmentation search and network search. At the augmentation search stage, the augmentation strategy is optimized in the search space of Low-Cost-Augment (LCA) under the criteria of balanced accuracy (BACC) with 5-fold cross validation. At the network search stage, the DCNNs are fine-tuned with the full training set in order to select the model with the highest BACC. The efficiency of the proposed data augmentation strategy is verified on the HAM10000 dataset using EfficientNets as a baseline. With the proposed strategy, we are able to reduce the search space to 60 and achieve a high BACC of 0.853 by using a single DCNN model without external database, suitable to be implemented in mobile devices for DCNN-based skin lesion detection in low resource settings.

preprint2021arXiv

Non-Hermitian dynamics and $\mathcal{PT}$-symmetry breaking in interacting mesoscopic Rydberg platforms

We simulate the dissipative dynamics of a mesoscopic system of long-range interacting particles which can be mapped into non-Hermitian spin models with a $\mathcal{PT}$ symmetry. We find rich $\mathcal{PT}$-phase diagrams with $\mathcal{PT}$-symmetric and $\mathcal{PT}$-broken phases. The dynamical regimes can be further enriched by modulating tunable parameters of the system. We outline how the $\mathcal{PT}$ symmetries of such systems may be probed by studying their dynamics. We note that systems of Rydberg atoms and systems of Rydberg ions with strong dipolar interactions are particularly well suited for such studies. We show that for realistic parameters, long-range interactions allow the emergence of new $\mathcal{PT}$-symmetric regions, generating new $\mathcal{PT}$-phase transitions. In addition, such $\mathcal{PT}$-symmetry phase transitions are found by changing the Rydberg atoms configurations. We monitor the transitions by accessing the populations of the Rydberg states. Their dynamics display oscillatory or exponential dependence in each phase.

preprint2021arXiv

On Instabilities of Conventional Multi-Coil MRI Reconstruction to Small Adverserial Perturbations

Although deep learning (DL) has received much attention in accelerated MRI, recent studies suggest small perturbations may lead to instabilities in DL-based reconstructions, leading to concern for their clinical application. However, these works focus on single-coil acquisitions, which is not practical. We investigate instabilities caused by small adversarial attacks for multi-coil acquisitions. Our results suggest that, parallel imaging and multi-coil CS exhibit considerable instabilities against small adversarial perturbations.

preprint2021arXiv

Open Set Recognition with Conditional Probabilistic Generative Models

Deep neural networks have made breakthroughs in a wide range of visual understanding tasks. A typical challenge that hinders their real-world applications is that unknown samples may be fed into the system during the testing phase, but traditional deep neural networks will wrongly recognize these unknown samples as one of the known classes. Open set recognition (OSR) is a potential solution to overcome this problem, where the open set classifier should have the flexibility to reject unknown samples and meanwhile maintain high classification accuracy in known classes. Probabilistic generative models, such as Variational Autoencoders (VAE) and Adversarial Autoencoders (AAE), are popular methods to detect unknowns, but they cannot provide discriminative representations for known classification. In this paper, we propose a novel framework, called Conditional Probabilistic Generative Models (CPGM), for open set recognition. The core insight of our work is to add discriminative information into the probabilistic generative models, such that the proposed models can not only detect unknown samples but also classify known classes by forcing different latent features to approximate conditional Gaussian distributions. We discuss many model variants and provide comprehensive experiments to study their characteristics. Experiment results on multiple benchmark datasets reveal that the proposed method significantly outperforms the baselines and achieves new state-of-the-art performance.

preprint2021arXiv

Quantum key distribution over 658 km fiber with distributed vibration sensing

Twin-field quantum key distribution (TF-QKD) promises ultra-long secure key distribution which surpasses the rate distance limit and can reduce the number of the trusted nodes in long-haul quantum network. Tremendous efforts have been made towards implementation of TF-QKD, among which, the secure key with finite size analysis can distribute more than 500 km in the lab and in the field. Here, we demonstrate the sending-or-not-sending TF-QKD experimentally, achieving a secure key distribution with finite size analysis over 658 km ultra-low-loss optical fiber, improve the secure distance record by around 100 km. Meanwhile, in a TF-QKD system, any phase fluctuation due to temperature variation and ambient variation during the channel must be recorded and compensated, and all these phase information can then be utilized to sense the channel vibration perturbations. With our QKD system, we recovered the external vibrational perturbations on the fiber generated by an artificial vibroseis and successfully located the perturbation position with a resolution better than 1 km. Our results not only set a new distance record of QKD, but also demonstrate that the redundant information of TF-QKD can be used for remote sensing of the channel vibration, which can find applications in earthquake detection and landslide monitoring besides secure communication.

preprint2021arXiv

The Three-loop MHV Octagon from $\bar{Q}$ equations

The $\bar{Q}$ equations, rooted in the dual superconformal anomalies, are a powerful tool for computing amplitudes in planar $\mathcal{N}=4$ supersymmetric Yang-Mills theory. By using the $\bar{Q}$ equations, we compute the symbol of the first MHV amplitude with algebraic letters -- the three-loop 8-point amplitude (or the octagon remainder function) -- in this theory. The symbol alphabet for this amplitude consists of 204 independent rational letters and shares the same 18 algebraic letters with the two-loop 8-point NMHV amplitude.

preprint2020arXiv

A Depth-Aware Swap Insertion Scheme for the Qubit Mapping Problem

The rapid progress of physical implementation of quantum computers paved the way of realising the design of tools to help users write quantum programs for any given quantum devices. The physical constraints inherent to the current NISQ architectures prevent most quantum algorithms from being directly executed on quantum devices. To enable two-qubit gates in the algorithm, existing works focus on inserting SWAP gates to dynamically remap logical qubits to physical qubits. However, their schemes lack the consideration of the depth of generated quantum circuits. In this work, we propose a depth-aware SWAP insertion scheme for qubit mapping problem in the NISQ era.

preprint2020arXiv

BigGAN-based Bayesian reconstruction of natural images from human brain activity

In the visual decoding domain, visually reconstructing presented images given the corresponding human brain activity monitored by functional magnetic resonance imaging (fMRI) is difficult, especially when reconstructing viewed natural images. Visual reconstruction is a conditional image generation on fMRI data and thus generative adversarial network (GAN) for natural image generation is recently introduced for this task. Although GAN-based methods have greatly improved, the fidelity and naturalness of reconstruction are still unsatisfactory due to the small number of fMRI data samples and the instability of GAN training. In this study, we proposed a new GAN-based Bayesian visual reconstruction method (GAN-BVRM) that includes a classifier to decode categories from fMRI data, a pre-trained conditional generator to generate natural images of specified categories, and a set of encoding models and evaluator to evaluate generated images. GAN-BVRM employs the pre-trained generator of the prevailing BigGAN to generate masses of natural images, and selects the images that best matches with the corresponding brain activity through the encoding models as the reconstruction of the image stimuli. In this process, the semantic and detailed contents of reconstruction are controlled by decoded categories and encoding models, respectively. GAN-BVRM used the Bayesian manner to avoid contradiction between naturalness and fidelity from current GAN-based methods and thus can improve the advantages of GAN. Experimental results revealed that GAN-BVRM improves the fidelity and naturalness, that is, the reconstruction is natural and similar to the presented image stimuli.

preprint2020arXiv

Circle Loss: A Unified Perspective of Pair Similarity Optimization

This paper provides a pair similarity optimization viewpoint on deep feature learning, aiming to maximize the within-class similarity $s_p$ and minimize the between-class similarity $s_n$. We find a majority of loss functions, including the triplet loss and the softmax plus cross-entropy loss, embed $s_n$ and $s_p$ into similarity pairs and seek to reduce $(s_n-s_p)$. Such an optimization manner is inflexible, because the penalty strength on every single similarity score is restricted to be equal. Our intuition is that if a similarity score deviates far from the optimum, it should be emphasized. To this end, we simply re-weight each similarity to highlight the less-optimized similarity scores. It results in a Circle loss, which is named due to its circular decision boundary. The Circle loss has a unified formula for two elemental deep feature learning approaches, i.e. learning with class-level labels and pair-wise labels. Analytically, we show that the Circle loss offers a more flexible optimization approach towards a more definite convergence target, compared with the loss functions optimizing $(s_n-s_p)$. Experimentally, we demonstrate the superiority of the Circle loss on a variety of deep feature learning tasks. On face recognition, person re-identification, as well as several fine-grained image retrieval datasets, the achieved performance is on par with the state of the art.

preprint2020arXiv

Collaborative Inference for Efficient Remote Monitoring

While current machine learning models have impressive performance over a wide range of applications, their large size and complexity render them unsuitable for tasks such as remote monitoring on edge devices with limited storage and computational power. A naive approach to resolve this on the model level is to use simpler architectures, but this sacrifices prediction accuracy and is unsuitable for monitoring applications requiring accurate detection of the onset of adverse events. In this paper, we propose an alternative solution to this problem by decomposing the predictive model as the sum of a simple function which serves as a local monitoring tool, and a complex correction term to be evaluated on the server. A sign requirement is imposed on the latter to ensure that the local monitoring function is safe, in the sense that it can effectively serve as an early warning system. Our analysis quantifies the trade-offs between model complexity and performance, and serves as a guidance for architecture design. We validate our proposed framework on a series of monitoring experiments, where we succeed at learning monitoring models with significantly reduced complexity that minimally violate the safety requirement. More broadly, our framework is useful for learning classifiers in applications where false negatives are significantly more costly compared to false positives.

preprint2020arXiv

Creating Efficient Blockchains for the Internet of Things by Coordinated Satellite-Terrestrial Networks

Blockchain has emerged as a promising technology that can guarantee data consistency and integrity among distributed participants. It has been used in many applications of the Internet of Things (IoT). However, since IoT applications often introduce a massive number of devices into blockchain systems, the efficiency of the blockchain becomes a serious problem. In this article, we analyze the key factors affecting the efficiency of blockchain. Unlike most existing solutions that handle this from the computing perspective, we consider the problem from the communication perspective. Particularly, we propose a coordinated satellite-terrestrial network to create efficient blockchains. We also derive a network scheduling strategy for the proposed architecture. Simulation results demonstrate that the proposed system can support blockchains for higher efficiency. Moreover, several open research issues and design challenges will be discussed.

preprint2020arXiv

CRNet: Cross-Reference Networks for Few-Shot Segmentation

Over the past few years, state-of-the-art image segmentation algorithms are based on deep convolutional neural networks. To render a deep network with the ability to understand a concept, humans need to collect a large amount of pixel-level annotated data to train the models, which is time-consuming and tedious. Recently, few-shot segmentation is proposed to solve this problem. Few-shot segmentation aims to learn a segmentation model that can be generalized to novel classes with only a few training images. In this paper, we propose a cross-reference network (CRNet) for few-shot segmentation. Unlike previous works which only predict the mask in the query image, our proposed model concurrently make predictions for both the support image and the query image. With a cross-reference mechanism, our network can better find the co-occurrent objects in the two images, thus helping the few-shot segmentation task. We also develop a mask refinement module to recurrently refine the prediction of the foreground regions. For the $k$-shot learning, we propose to finetune parts of networks to take advantage of multiple labeled support images. Experiments on the PASCAL VOC 2012 dataset show that our network achieves state-of-the-art performance.

preprint2020arXiv

Cross-Spectrum Dual-Subspace Pairing for RGB-infrared Cross-Modality Person Re-Identification

Due to its potential wide applications in video surveillance and other computer vision tasks like tracking, person re-identification (ReID) has become popular and been widely investigated. However, conventional person re-identification can only handle RGB color images, which will fail at dark conditions. Thus RGB-infrared ReID (also known as Infrared-Visible ReID or Visible-Thermal ReID) is proposed. Apart from appearance discrepancy in traditional ReID caused by illumination, pose variations and viewpoint changes, modality discrepancy produced by cameras of the different spectrum also exists, which makes RGB-infrared ReID more difficult. To address this problem, we focus on extracting the shared cross-spectrum features of different modalities. In this paper, a novel multi-spectrum image generation method is proposed and the generated samples are utilized to help the network to find discriminative information for re-identifying the same person across modalities. Another challenge of RGB-infrared ReID is that the intra-person (images from the same person) discrepancy is often larger than the inter-person (images from different persons) discrepancy, so a dual-subspace pairing strategy is proposed to alleviate this problem. Combining those two parts together, we also design a one-stream neural network combining the aforementioned methods to extract compact representations of person images, called Cross-spectrum Dual-subspace Pairing (CDP) model. Furthermore, during the training process, we also propose a Dynamic Hard Spectrum Mining method to automatically mine more hard samples from hard spectrum based on the current model state to further boost the performance. Extensive experimental results on two public datasets, SYSU-MM01 with RGB + near-infrared images and RegDB with RGB + far-infrared images, have demonstrated the efficiency and generality of our proposed method.

preprint2020arXiv

Dark, Beyond Deep: A Paradigm Shift to Cognitive AI with Humanlike Common Sense

Recent progress in deep learning is essentially based on a "big data for small tasks" paradigm, under which massive amounts of data are used to train a classifier for a single narrow task. In this paper, we call for a shift that flips this paradigm upside down. Specifically, we propose a "small data for big tasks" paradigm, wherein a single artificial intelligence (AI) system is challenged to develop "common sense", enabling it to solve a wide range of tasks with little training data. We illustrate the potential power of this new paradigm by reviewing models of common sense that synthesize recent breakthroughs in both machine and human vision. We identify functionality, physics, intent, causality, and utility (FPICU) as the five core domains of cognitive AI with humanlike common sense. When taken as a unified concept, FPICU is concerned with the questions of "why" and "how", beyond the dominant "what" and "where" framework for understanding vision. They are invisible in terms of pixels but nevertheless drive the creation, maintenance, and development of visual scenes. We therefore coin them the "dark matter" of vision. Just as our universe cannot be understood by merely studying observable matter, we argue that vision cannot be understood without studying FPICU. We demonstrate the power of this perspective to develop cognitive AI systems with humanlike common sense by showing how to observe and apply FPICU with little training data to solve a wide range of challenging tasks, including tool use, planning, utility inference, and social learning. In summary, we argue that the next generation of AI must embrace "dark" humanlike common sense for solving novel tasks.

preprint2020arXiv

Denoising individual bias for a fairer binary submatrix detection

Low rank representation of binary matrix is powerful in disentangling sparse individual-attribute associations, and has received wide applications. Existing binary matrix factorization (BMF) or co-clustering (CC) methods often assume i.i.d background noise. However, this assumption could be easily violated in real data, where heterogeneous row- or column-wise probability of binary entries results in disparate element-wise background distribution, and paralyzes the rationality of existing methods. We propose a binary data denoising framework, namely BIND, which optimizes the detection of true patterns by estimating the row- or column-wise mixture distribution of patterns and disparate background, and eliminating the binary attributes that are more likely from the background. BIND is supported by thoroughly derived mathematical property of the row- and column-wise mixture distributions. Our experiment on synthetic and real-world data demonstrated BIND effectively removes background noise and drastically increases the fairness and accuracy of state-of-the arts BMF and CC methods.

preprint2020arXiv

Digital personal health libraries: a systematic literature review

Objective: This paper gives context on recent literature regarding the development of digital personal health libraries (PHL) and provides insights into the potential application of consumer health informatics in diverse clinical specialties. Materials and Methods: A systematic literature review was conducted following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement. Here, 2,850 records were retrieved from PubMed and EMBASE in March 2020 using search terms: personal, health, and library. Information related to the health topic, target population, study purpose, library function, data source, data science method, evaluation measure, and status were extracted from each eligible study. In addition, knowledge discovery methods, including co-occurrence analysis and multiple correspondence analysis, were used to explore research trends of PHL. Results: After screening, this systematic review focused on a dozen articles related to PHL. These encompassed health topics such as infectious diseases, congestive heart failure, electronic prescribing. Data science methods included relational database, information retrieval technology, ontology construction technology. Evaluation measures were heterogeneous regarding PHL functions and settings. At the time of writing, only one of the PHLs described in these articles is available for the public while the others are either prototypes or in the pilot stage. Discussion: Although PHL researches have used different methods to address problems in diverse health domains, there is a lack of an effective PHL to meet the needs of older adults. Conclusion: The development of PHLs may create an unprecedented opportunity for promoting the health of older consumers by providing diverse health information.

preprint2020arXiv

Dissociable neural representations of adversarially perturbed images in convolutional neural networks and the human brain

Despite the remarkable similarities between convolutional neural networks (CNN) and the human brain, CNNs still fall behind humans in many visual tasks, indicating that there still exist considerable differences between the two systems. Here, we leverage adversarial noise (AN) and adversarial interference (AI) images to quantify the consistency between neural representations and perceptual outcomes in the two systems. Humans can successfully recognize AI images as corresponding categories but perceive AN images as meaningless noise. In contrast, CNNs can correctly recognize AN images but mistakenly classify AI images into wrong categories with surprisingly high confidence. We use functional magnetic resonance imaging to measure brain activity evoked by regular and adversarial images in the human brain, and compare it to the activity of artificial neurons in a prototypical CNN-AlexNet. In the human brain, we find that the representational similarity between regular and adversarial images largely echoes their perceptual similarity in all early visual areas. In AlexNet, however, the neural representations of adversarial images are inconsistent with network outputs in all intermediate processing layers, providing no neural foundations for perceptual similarity. Furthermore, we show that voxel-encoding models trained on regular images can successfully generalize to the neural responses to AI images but not AN images. These remarkable differences between the human brain and AlexNet in the representation-perception relation suggest that future CNNs should emulate both behavior and the internal neural presentations of the human brain.

preprint2020arXiv

Dynamic Dispatching for Large-Scale Heterogeneous Fleet via Multi-agent Deep Reinforcement Learning

Dynamic dispatching is one of the core problems for operation optimization in traditional industries such as mining, as it is about how to smartly allocate the right resources to the right place at the right time. Conventionally, the industry relies on heuristics or even human intuitions which are often short-sighted and sub-optimal solutions. Leveraging the power of AI and Internet of Things (IoT), data-driven automation is reshaping this area. However, facing its own challenges such as large-scale and heterogenous trucks running in a highly dynamic environment, it can barely adopt methods developed in other domains (e.g., ride-sharing). In this paper, we propose a novel Deep Reinforcement Learning approach to solve the dynamic dispatching problem in mining. We first develop an event-based mining simulator with parameters calibrated in real mines. Then we propose an experience-sharing Deep Q Network with a novel abstract state/action representation to learn memories from heterogeneous agents altogether and realizes learning in a centralized way. We demonstrate that the proposed methods significantly outperform the most widely adopted approaches in the industry by $5.56\%$ in terms of productivity. The proposed approach has great potential in a broader range of industries (e.g., manufacturing, logistics) which have a large-scale of heterogenous equipment working in a highly dynamic environment, as a general framework for dynamic resource allocation.

preprint2020arXiv

Extension of elementary $p$-groups and its application in classification of groups of prime exponent

Let $p$ be a prime number and $\mathbb{Z}_p=\mathbb{Z}/p\mathbb{Z}$. We study finite groups with abelian derived subgroup and exponent $p$ in terms of group extension data and their matrix presentations. We show a one-to-one correspondence between the following two sets: (i) the isoclasses of class 2 groups of exponent $p$ and order $p^{m+n}$ and with derived subgroup $\mathbb{Z}_p^n$, and (ii) the set $\text{Gr}(n,\text{AS}_m(\mathbb{Z}_p))/\text{GL}_m(\mathbb{Z}_p)$ of orbits of $\text{Gr}(n,\text{AS}_m(\mathbb{Z}_p))$ under the congruence action by $\text{GL}_m(\mathbb{Z}_p)$, where $\text{Gr}(n,\text{AS}_m(\mathbb{Z}_p))$ is the set of $n$-dimensional subspaces of anti-symmetric matrices of order $m$ over $\mathbb{Z}_p$. We give a description of the orbit spaces $\text{Gr}(2, \text{AS}_m(\mathbb{Z}_p))/\text{GL}_m(\mathbb{Z}_p)$ for all $m$ and $p$ by applying the theory of pencils of anti-symmetric matrices. Based on this, we show complete sets of representatives of orbits of $\text{Gr}(3,\text{AS}_4(\mathbb{Z}_3))/\text{GL}_4(\mathbb{Z}_3)$, $\text{Gr}(4, \text{AS}_4(\mathbb{Z}_3))/\text{GL}_4(\mathbb{Z}_3)$ and $\text{Gr}(3, \text{AS}_5(\mathbb{Z}_3))/\text{GL}_5(\mathbb{Z}_3)$. As a consequence, we obtain a classification of corresponding class 2 groups of exponent $p$. In particular, we recover the classification of groups with exponent 3 and order $\le 3^8$.

preprint2020arXiv

Fast And Efficient Boolean Matrix Factorization By Geometric Segmentation

Boolean matrix has been used to represent digital information in many fields, including bank transaction, crime records, natural language processing, protein-protein interaction, etc. Boolean matrix factorization (BMF) aims to find an approximation of a binary matrix as the Boolean product of two low rank Boolean matrices, which could generate vast amount of information for the patterns of relationships between the features and samples. Inspired by binary matrix permutation theories and geometric segmentation, we developed a fast and efficient BMF approach called MEBF (Median Expansion for Boolean Factorization). Overall, MEBF adopted a heuristic approach to locate binary patterns presented as submatrices that are dense in 1's. At each iteration, MEBF permutates the rows and columns such that the permutated matrix is approximately Upper Triangular-Like (UTL) with so-called Simultaneous Consecutive-ones Property (SC1P). The largest submatrix dense in 1 would lies on the upper triangular area of the permutated matrix, and its location was determined based on a geometric segmentation of a triangular. We compared MEBF with other state of the art approaches on data scenarios with different sparsity and noise levels. MEBF demonstrated superior performances in lower reconstruction error, and higher computational efficiency, as well as more accurate sparse patterns than popular methods such as ASSO, PANDA and MP. We demonstrated the application of MEBF on both binary and non-binary data sets, and revealed its further potential in knowledge retrieving and data denoising.

preprint2020arXiv

High-Throughput Production of Cheap Mineral-Based 2D Electrocatalysts for High-Current-Density Hydrogen Evolution

The high-throughput scalable production of cheap, efficient and durable electrocatalysts that work well at high current densities demanded by industry is a great challenge for the large-scale implementation of electrochemical technologies. Here we report the production of a 2D MoS2-based ink-type electrocatalyst by a scalable top-down exfoliation technique followed by a simple heat treatment. The catalyst shows a high current density of 1000 mA cm^-2 at an overpotential of 454 mV for the hydrogen evolution reaction (HER) without the need of iR correction, as well as good stability over 24 hours. Using the same method, we have, for the first time, produced a cheap MoS2 mineral-based catalyst and found that it had an excellent performance for high-current-density HER. Noteworthy, production rate of this MoS2-based catalyst is one to two orders of magnitude higher than those previously reported. In addition, the price of the MoS2 mineral is five orders of magnitude lower than commercial Pt catalysts, making the MoS2 mineral-based catalyst cheap, and the ink-type catalyst dispersions can be easily integrated with other technologies for large-scale catalyst electrode preparation. These advantages indicate the huge potentials of this method and mineral-based cheap and abundant natural resources as catalysts in the electrochemical technologies.

preprint2020arXiv

Iterative Distance-Aware Similarity Matrix Convolution with Mutual-Supervised Point Elimination for Efficient Point Cloud Registration

In this paper, we propose a novel learning-based pipeline for partially overlapping 3D point cloud registration. The proposed model includes an iterative distance-aware similarity matrix convolution module to incorporate information from both the feature and Euclidean space into the pairwise point matching process. These convolution layers learn to match points based on joint information of the entire geometric features and Euclidean offset for each point pair, overcoming the disadvantage of matching by simply taking the inner product of feature vectors. Furthermore, a two-stage learnable point elimination technique is presented to improve computational efficiency and reduce false positive correspondence pairs. A novel mutual-supervision loss is proposed to train the model without extra annotations of keypoints. The pipeline can be easily integrated with both traditional (e.g. FPFH) and learning-based features. Experiments on partially overlapping and noisy point cloud registration show that our method outperforms the current state-of-the-art, while being more computationally efficient. Code is publicly available at https://github.com/jiahaowork/idam.

preprint2020arXiv

Machine Number Sense: A Dataset of Visual Arithmetic Problems for Abstract and Relational Reasoning

As a comprehensive indicator of mathematical thinking and intelligence, the number sense (Dehaene 2011) bridges the induction of symbolic concepts and the competence of problem-solving. To endow such a crucial cognitive ability to machine intelligence, we propose a dataset, Machine Number Sense (MNS), consisting of visual arithmetic problems automatically generated using a grammar model--And-Or Graph (AOG). These visual arithmetic problems are in the form of geometric figures: each problem has a set of geometric shapes as its context and embedded number symbols. Solving such problems is not trivial; the machine not only has to recognize the number, but also to interpret the number with its contexts, shapes, and relations (e.g., symmetry) together with proper operations. We benchmark the MNS dataset using four predominant neural network models as baselines in this visual reasoning task. Comprehensive experiments show that current neural-network-based models still struggle to understand number concepts and relational operations. We show that a simple brute-force search algorithm could work out some of the problems without context information. Crucially, taking geometric context into account by an additional perception module would provide a sharp performance gain with fewer search steps. Altogether, we call for attention in fusing the classic search-based algorithms with modern neural networks to discover the essential number concepts in future research.

preprint2020arXiv

Magic trapping of a Rydberg ion with a diminished static polarizability

Highly excited Rydberg states are usually extremely polarizable and exceedingly sensitive to electric fields. Because of this Rydberg ions confined in electric fields have state-dependent trapping potentials. We engineer a Rydberg state that is insensitive to electric fields by coupling two Rydberg states with static polarizabilities of opposite sign, in this way we achieve state-independent magic trapping. We show that the magically-trapped ion can be coherently excited to the Rydberg state without the need for control of the ion's motion.

preprint2020arXiv

Mass Production of Two-Dimensional Materials by Intermediate-Assisted Grinding Exfoliation

The scalable and high-efficiency production of two-dimensional (2D) materials is a prerequisite to their commercial use. Currently, only graphene and graphene oxide can be produced on a ton scale, and the inability to produce other 2D materials on such a large scale hinders their technological applications. Here we report a grinding exfoliation method that uses micro-particles as force intermediates to resolve applied compressive forces into a multitude of small shear forces, inducing the highly-efficient exfoliation of layer materials. The method, referred to as intermediate-assisted grinding exfoliation (iMAGE), can be used for the large-scale production of many 2D materials. As an example, we have exfoliated bulk h-BN into 2D h-BN with large flake sizes, high quality and structural integrity, with a high exfoliation yield of 67%, a high production rate of 0.3 g h-1 and a low energy consumption of 3.01x10^6 J g-1. The production rate and energy consumption are one to two orders of magnitude better than previous results. Besides h-BN, this iMAGE technology has been used to exfoliate various layer materials such as graphite, black phosphorus, transition metal dichalcogenides, and metal oxides, proving its universality. Molybdenite concentrate, a natural low-cost and abundant mineral, was used as a demo for the large-scale exfoliation production of 2D MoS2 flakes. Our work indicates the huge potential of the iMAGE method to produce large amounts of various 2D materials, which paves the way for their commercial application.

preprint2020arXiv

Maximum Entropy Model Rollouts: Fast Model Based Policy Optimization without Compounding Errors

Model usage is the central challenge of model-based reinforcement learning. Although dynamics model based on deep neural networks provide good generalization for single step prediction, such ability is over exploited when it is used to predict long horizon trajectories due to compounding errors. In this work, we propose a Dyna-style model-based reinforcement learning algorithm, which we called Maximum Entropy Model Rollouts (MEMR). To eliminate the compounding errors, we only use our model to generate single-step rollouts. Furthermore, we propose to generate \emph{diverse} model rollouts by non-uniform sampling of the environment states such that the entropy of the model rollouts is maximized. We mathematically derived the maximum entropy sampling criteria for one data case under Gaussian prior. To accomplish this criteria, we propose to utilize a prioritized experience replay. Our preliminary experiments in challenging locomotion benchmarks show that our approach achieves the same sample efficiency of the best model-based algorithms, matches the asymptotic performance of the best model-free algorithms, and significantly reduces the computation requirements of other model-based methods.

preprint2020arXiv

Neural encoding and interpretation for high-level visual cortices based on fMRI using image caption features

On basis of functional magnetic resonance imaging (fMRI), researchers are devoted to designing visual encoding models to predict the neuron activity of human in response to presented image stimuli and analyze inner mechanism of human visual cortices. Deep network structure composed of hierarchical processing layers forms deep network models by learning features of data on specific task through big dataset. Deep network models have powerful and hierarchical representation of data, and have brought about breakthroughs for visual encoding, while revealing hierarchical structural similarity with the manner of information processing in human visual cortices. However, previous studies almost used image features of those deep network models pre-trained on classification task to construct visual encoding models. Except for deep network structure, the task or corresponding big dataset is also important for deep network models, but neglected by previous studies. Because image classification is a relatively fundamental task, it is difficult to guide deep network models to master high-level semantic representations of data, which causes into that encoding performance for high-level visual cortices is limited. In this study, we introduced one higher-level vision task: image caption (IC) task and proposed the visual encoding model based on IC features (ICFVEM) to encode voxels of high-level visual cortices. Experiment demonstrated that ICFVEM obtained better encoding performance than previous deep network models pre-trained on classification task. In addition, the interpretation of voxels was realized to explore the detailed characteristics of voxels based on the visualization of semantic words, and comparative analysis implied that high-level visual cortices behaved the correlative representation of image content.

preprint2020arXiv

On a generalisation of finite $T$-groups

Let $σ=\{σ_i |i\in I\}$ is some partition of all primes $\mathbb{P}$ and $G$ a finite group. A subgroup $H$ of $G$ is said to be $σ$-subnormal in $G$ if there exists a subgroup chain $H=H_0\leq H_1\leq \cdots \leq H_n=G$ such that either $H_{i-1}$ is normal in $H_i$ or $H_i/(H_{i-1})_{H_i}$ is a finite $σ_j$-group for some $j \in I$ for $i = 1, \ldots, n$. We call a finite group $G$ a $T_σ$-group if every $σ$-subnormal subgroup is normal in $G$. In this paper, we analyse the structure of the $T_σ$-groups and give some characterisations of the $T_σ$-groups.

preprint2020arXiv

On the fundamental group of open Richardson varieties

We compute the fundamental group of an open Richardson variety in the manifold of complete flags that corresponds to a partial flag manifold. Rietsch showed that these log Calabi-Yau varieties underlie a Landau-Ginzburg mirror for the Langlands dual partial flag manifold, and our computation verifies a prediction of Hori for this mirror. It is log Calabi-Yau as it isomorphic to the complement of the Knutson-Lam-Speyer anti-canonical divisor for the partial flag manifold. We also determine explicit defining equations for this divisor.

preprint2020arXiv

SlackQ : Approaching the Qubit Mapping Problem with A Slack-aware Swap Insertion Scheme

The rapid progress of physical implementation of quantum computers paved the way for the design of tools to help users write quantum programs for any given quantum device. The physical constraints inherent in current NISQ architectures prevent most quantum algorithms from being directly executed on quantum devices. To enable two-qubit gates in the algorithm, existing works focus on inserting SWAP gates to dynamically remap logical qubits to physical qubits. However, their schemes lack consideration of the execution time of generated quantum circuits. In this work, we propose a slack-aware SWAP insertion scheme for the qubit mapping problem in the NISQ era. Our experiments show performance improvement by up to 2.36X at maximum, by 1.62X on average, over 106 representative benchmarks from RevLib, IBM Qiskit , and ScaffCC.

preprint2020arXiv

Soliton Distillation in Fiber Lasers

Pure solitons are for the first time distilled from the resonant continuous wave (CW) background in a fiber laser by utilizing nonlinear Fourier transform (NFT). It is identified that the soliton and the resonant CW background have different eigenvalue distributions in the nonlinear frequency domain. Similar to water distillation, we propose the approach of soliton distillation, by making NFT on a steady pulse generated from a fiber laser, then filtering out the eigenvalues of the resonant CW background in the nonlinear frequency domain, and finally recovering the soliton by inverse NFT (INFT). Simulation results verify that the soliton can be distinguished from the resonant CW background in the nonlinear frequency domain and pure solitons can be obtained by INFT.

preprint2020arXiv

STNReID : Deep Convolutional Networks with Pairwise Spatial Transformer Networks for Partial Person Re-identification

Partial person re-identification (ReID) is a challenging task because only partial information of person images is available for matching target persons. Few studies, especially on deep learning, have focused on matching partial person images with holistic person images. This study presents a novel deep partial ReID framework based on pairwise spatial transformer networks (STNReID), which can be trained on existing holistic person datasets. STNReID includes a spatial transformer network (STN) module and a ReID module. The STN module samples an affined image (a semantically corresponding patch) from the holistic image to match the partial image. The ReID module extracts the features of the holistic, partial, and affined images. Competition (or confrontation) is observed between the STN module and the ReID module, and two-stage training is applied to acquire a strong STNReID for partial ReID. Experimental results show that our STNReID obtains 66.7% and 54.6% rank-1 accuracies on partial ReID and partial iLIDS datasets, respectively. These values are at par with those obtained with state-of-the-art methods.

preprint2020arXiv

Targeted Maximum Likelihood Estimation of Community-based Causal Effect of Community-Level Stochastic Interventions

Unlike the commonly used parametric regression models such as mixed models, that can easily violate the required statistical assumptions and result in invalid statistical inference, target maximum likelihood estimation allows more realistic data-generative models and provides double-robust, semi-parametric and efficient estimators. Target maximum likelihood estimators (TMLEs) for the causal effect of a community-level static exposure were previously proposed by Balzer et al. In this manuscript, we build on this work and present identifiability results and develop two semi-parametric efficient TMLEs for the estimation of the causal effect of the single time-point community-level stochastic intervention whose assignment mechanism can depend on measured and unmeasured environmental factors and its individual-level covariates. The first community-level TMLE is developed under a general hierarchical non-parametric structural equation model, which can incorporate pooled individual-level regressions for estimating the outcome mechanism. The second individual-level TMLE is developed under a restricted hierarchical model in which the additional assumption of no covariate interference within communities holds. The proposed TMLEs have several crucial advantages. First, both TMLEs can make use of individual level data in the hierarchical setting, and potentially reduce finite sample bias and improve estimator efficiency. Second, the stochastic intervention framework provides a natural way for defining and estimating casual effects where the exposure variables are continuous or discrete with multiple levels, or even cannot be directly intervened on. Also, the positivity assumption needed for our proposed causal parameters can be weaker than the version of positivity required for other casual parameters.

preprint2020arXiv

tmleCommunity: A R Package Implementing Target Maximum Likelihood Estimation for Community-level Data

Over the past years, many applications aim to assess the causal effect of treatments assigned at the community level, while data are still collected at the individual level among individuals of the community. In many cases, one wants to evaluate the effect of a stochastic intervention on the community, where all communities in the target population receive probabilistically assigned treatments based on a known specified mechanism (e.g., implementing a community-level intervention policy that target stochastic changes in the behavior of a target population of communities). The tmleCommunity package is recently developed to implement targeted minimum loss-based estimation (TMLE) of the effect of community-level intervention(s) at a single time point on an individual-based outcome of interest, including the average causal effect. Implementations of the inverse-probability-of-treatment-weighting (IPTW) and the G-computation formula (GCOMP) are also available. The package supports multivariate arbitrary (i.e., static, dynamic or stochastic) interventions with a binary or continuous outcome. Besides, it allows user-specified data-adaptive machine learning algorithms through SuperLearner, sl3 and h2oEnsemble packages. The usage of the tmleCommunity package, along with a few examples, will be described in this paper.

preprint2020arXiv

Unsupervised Learning of Depth, Optical Flow and Pose with Occlusion from 3D Geometry

In autonomous driving, monocular sequences contain lots of information. Monocular depth estimation, camera ego-motion estimation and optical flow estimation in consecutive frames are high-profile concerns recently. By analyzing tasks above, pixels in the middle frame are modeled into three parts: the rigid region, the non-rigid region, and the occluded region. In joint unsupervised training of depth and pose, we can segment the occluded region explicitly. The occlusion information is used in unsupervised learning of depth, pose and optical flow, as the image reconstructed by depth-pose and optical flow will be invalid in occluded regions. A less-than-mean mask is designed to further exclude the mismatched pixels interfered with by motion or illumination change in the training of depth and pose networks. This method is also used to exclude some trivial mismatched pixels in the training of the optical flow network. Maximum normalization is proposed for depth smoothness term to restrain depth degradation in textureless regions. In the occluded region, as depth and camera motion can provide more reliable motion estimation, they can be used to instruct unsupervised learning of optical flow. Our experiments in KITTI dataset demonstrate that the model based on three regions, full and explicit segmentation of the occlusion region, the rigid region, and the non-rigid region with corresponding unsupervised losses can improve performance on three tasks significantly. The source code is available at: https://github.com/guangmingw/DOPlearning.

preprint2020arXiv

Value of Information Analysis via Active Learning and Knowledge Sharing in Error-Controlled Adaptive Kriging

Large uncertainties in many phenomena have challenged decision making. Collecting additional information to better characterize reducible uncertainties is among decision alternatives. Value of information (VoI) analysis is a mathematical decision framework that quantifies expected potential benefits of new data and assists with optimal allocation of resources for information collection. However, analysis of VoI is computational very costly because of the underlying Bayesian inference especially for equality-type information. This paper proposes the first surrogate-based framework for VoI analysis. Instead of modeling the limit state functions describing events of interest for decision making, which is commonly pursued in surrogate model-based reliability methods, the proposed framework models system responses. This approach affords sharing equality-type information from observations among surrogate models to update likelihoods of multiple events of interest. Moreover, two knowledge sharing schemes called model and training points sharing are proposed to most effectively take advantage of the knowledge offered by costly model evaluations. Both schemes are integrated with an error rate-based adaptive training approach to efficiently generate accurate Kriging surrogate models. The proposed VoI analysis framework is applied for an optimal decision-making problem involving load testing of a truss bridge. While state-of-the-art methods based on importance sampling and adaptive Kriging Monte Carlo simulation are unable to solve this problem, the proposed method is shown to offer accurate and robust estimates of VoI with a limited number of model evaluations. Therefore, the proposed method facilitates the application of VoI for complex decision problems.

preprint2019arXiv

A weakly compressible SPH method for violent multi-phase flows with high density ratio

The weakly compressible SPH (WCSPH) method is known suffering from low computational efficiency, or unnatural voids and unrealistic phase separation when it is applied to simulate highly violent multi-phase flows with high density ratio, such as that between water and air. In this paper, to remedy these issues, we propose a multi-phase WCSPH method based on a low-dissipation Riemann solver and the transport-velocity formulation. The two-phase Riemann problem is first constructed to handle the pairwise interaction between fluid particles, then modified for the fluid-wall interaction to impose the solid wall boundary condition. Since the method uses the same artificial speed of sound for both heavy and light phases, the computational efficiency increases greatly. Furthermore, due to the transport-velocity formulation employed for the light phase and application of the two-phase Riemann problem, the unnatural voids and unrealistic phase separation are effectively eliminated. The method is validated with several 2- and 3D cases involving violent water-air flows. The results have been compared with existing experimental data, previous numerical and analytical solutions, where the proposed method demonstrates good robustness, improved or comparable accuracy, respectively, comparing to previous methods with same choice of sound speed or those with much less computational efficiency.

preprint2019arXiv

Accelerated Coronary MRI with sRAKI: A Database-Free Self-Consistent Neural Network k-space Reconstruction for Arbitrary Undersampling

This study aims to accelerate coronary MRI using a novel reconstruction algorithm, called self-consistent robust artificial-neural-networks for k-space interpolation (sRAKI). sRAKI performs iterative parallel imaging reconstruction by enforcing coil self-consistency using subject-specific neural networks. This approach extends the linear convolutions in SPIRiT to nonlinear interpolation using convolutional neural networks (CNNs). These CNNs are trained individually for each scan using the scan-specific autocalibrating signal (ACS) data. Reconstruction is performed by imposing the learned self-consistency and data-consistency enabling sRAKI to support random undersampling patterns. Fully-sampled targeted right coronary artery MRI was acquired in six healthy subjects for evaluation. The data were retrospectively undersampled, and reconstructed using SPIRiT, $\ell_1$-SPIRiT and sRAKI for acceleration rates of 2 to 5. Additionally, prospectively undersampled whole-heart coronary MRI was acquired to further evaluate performance. The results indicate that sRAKI reduces noise amplification and blurring artifacts compared with SPIRiT and $\ell_1$-SPIRiT, especially at high acceleration rates in targeted data. Quantitative analysis shows that sRAKI improves normalized mean-squared-error (~44% and ~21% over SPIRiT and $\ell_1$-SPIRiT at rate 5) and vessel sharpness (~10% and ~20% over SPIRiT and $\ell_1$-SPIRiT at rate 5). In addition, whole-heart data shows the sharpest coronary arteries when resolved using sRAKI, with 11% and 15% improvement in vessel sharpness over SPIRiT and $\ell_1$-SPIRiT, respectively. Thus, sRAKI is a database-free neural network-based reconstruction technique that may further accelerate coronary MRI with arbitrary undersampling patterns, while improving noise resilience over linear parallel imaging and image sharpness over $\ell_1$ regularization techniques.

preprint2019arXiv

Consistency of Binary Segmentation For Multiple Change-Points Estimation With Functional Data

For sequentially observed functional data exhibiting multiple change points in the mean function, we establish consistency results for the estimated number and locations of the change points based on the norm of the functional CUSUM process and standard binary segmentation. In addition to extending similar results in Venkatraman (1992) and Fryzlewicz (2014) for scalar data to the general Hilbert space setting, our main results are established without assuming the Gaussianity of the data, and under general linear process conditions on the model errors.

preprint2019arXiv

Dual-criteria time stepping for weakly compressible smoothed particle hydrodynamics

Implementing particle-interaction configuration and time integration are performance intensive essentials of particle-based methods. In this paper, a dual-criteria time-stepping method is proposed to improve the computational efficiency of the weakly-compressible smoothed particle hydrodynamic (WCSPH) method for modeling incompressible flows. The key idea is to introduce an advection time criterion, which is based on fluid velocity field, for recreating the particle-interaction configuration. Within this time criterion, several steps of pressure relaxation determined by the acoustic time criterion, based on the artificial speed of sound, can be carried out without updating the particle interaction configuration and much larger time-step sizes compared with the conventional counterpart. The method has shown optimized computational performance through CPU cost analysis. Good accuracy and performance is obtained for the presented benchmarks implying promising potential of the proposed method for incompressible flow and fluid-structure interaction simulations.

preprint2019arXiv

Fundamental Spin Interactions Underlying the Magnetic Anisotropy in the Kitaev Ferromagnet CrI$_3$

We lay the foundation for determining the microscopic spin interactions in two-dimensional (2D) ferromagnets by combining angle-dependent ferromagnetic resonance (FMR) experiments on high quality CrI$_3$ single crystals with theoretical modeling based on symmetries. We discover that the Kitaev interaction is the strongest in this material with $K \sim -5.2$ meV, 25 times larger than the Heisenberg exchange $J \sim -0.2$ meV, and responsible for opening the $\sim$5 meV gap at the Dirac points in the spin-wave dispersion. Furthermore, we find that the symmetric off-diagonal anisotropy $Γ\sim -67.5$ $μ$eV, though small, is crucial for opening a $\sim$0.3 meV gap in the magnon spectrum at the zone center and stabilizing ferromagnetism in the 2D limit. The high resolution of the FMR data further reveals a $μ$eV-scale quadrupolar contribution to the $S=3/2$ magnetism. Our identification of the underlying exchange anisotropies opens paths toward 2D ferromagnets with higher $T_\text{C}$ as well as magnetically frustrated quantum spin liquids based on Kitaev physics.

preprint2019arXiv

Neo: A Learned Query Optimizer

Query optimization is one of the most challenging problems in database systems. Despite the progress made over the past decades, query optimizers remain extremely complex components that require a great deal of hand-tuning for specific workloads and datasets. Motivated by this shortcoming and inspired by recent advances in applying machine learning to data management challenges, we introduce Neo (Neural Optimizer), a novel learning-based query optimizer that relies on deep neural networks to generate query executions plans. Neo bootstraps its query optimization model from existing optimizers and continues to learn from incoming queries, building upon its successes and learning from its failures. Furthermore, Neo naturally adapts to underlying data patterns and is robust to estimation errors. Experimental results demonstrate that Neo, even when bootstrapped from a simple optimizer like PostgreSQL, can learn a model that offers similar performance to state-of-the-art commercial optimizers, and in some cases even surpass them.

preprint2019arXiv

Plasmon-enhanced Stimulated Raman Scattering Microscopy with Single-molecule Detection Sensitivity

Stimulated Raman scattering (SRS) microscopy allows for high-speed label-free chemical imaging of biomedical systems. The imaging sensitivity of SRS microscopy is limited to ~10 mM for endogenous biomolecules. Electronic pre-resonant SRS allows detection of sub-micromolar chromophores. However, label-free SRS detection of single biomolecules having extremely small Raman cross-sections (~10-30 cm2 sr-1) remains unreachable. Here, we demonstrate plasmon-enhanced stimulated Raman scattering (PESRS) microscopy with single-molecule detection sensitivity. Incorporating pico-Joule laser excitation, background subtraction, and a denoising algorithm, we obtained robust single-pixel SRS spectra exhibiting the statistics of single-molecule events. Single-molecule detection was verified by using two isotopologues of adenine. We further demonstrated the capability of applying PESRS for biological applications and utilized PESRS to map adenine released from bacteria due to starvation stress. PESRS microscopy holds the promise for ultrasensitive detection of molecular events in chemical and biomedical systems.

preprint2019arXiv

Sending-or-Not-Sending with Independent Lasers: Secure Twin-Field Quantum Key Distribution Over 509 km

Twin field quantum key distribution promises high key rates at long distance to beat the rate distance limit. Here, applying the sending or not sending TF QKD protocol, we experimentally demonstrate a secure key distribution breaking the absolute key rate limit of repeaterless QKD over 509 km, 408 km ultra-low loss optical fibre and 350 km standard optical fibre. Two independent lasers are used as the source with remote frequency locking technique over 500 km fiber distance; Practical optical fibers are used as the optical path with appropriate noise filtering; And finite key effects are considered in the key rate analysis. The secure key rates obtained at different distances are more than 5 times higher than the conditional limit of repeaterless QKD, a bound value assuming the same detection loss in the comparison. The achieved secure key rate is also higher than that a traditional QKD protocol running with a perfect repeaterless QKD device and even if an infinite number of sent pulses. Our result shows that the protocol and technologies applied in this experiment enable TF QKD to achieve high secure key rate at long distribution distance, and hence practically useful for field implementation of intercity QKD.

preprint2019arXiv

Sub-microsecond entangling gate between trapped ions via Rydberg interaction

Generating quantum entanglement in large systems on time scales much shorter than the coherence time is key to powerful quantum simulation and computation. Trapped ions are among the most accurately controlled and best isolated quantum systems with low-error entanglement gates operated via the vibrational motion of a few-ion crystal within tens of microseconds. To exceed the level of complexity tractable by classical computers the main challenge is to realise fast entanglement operations in large ion crystals. The strong dipole-dipole interactions in polar molecule and Rydberg atom systems allow much faster entangling gates, yet stable state-independent confinement comparable with trapped ions needs to be demonstrated in these systems. Here, we combine the benefits of these approaches: we report a $700\,\mathrm{ns}$ two-ion entangling gate which utilises the strong dipolar interaction between trapped Rydberg ions and produce a Bell state with $78\%$ fidelity. The sources of gate error are identified and a total error below $0.2\%$ is predicted for experimentally-achievable parameters. Furthermore, we predict that residual coupling to motional modes contributes $\sim 10^{-4}$ gate error in a large ion crystal of 100 ions. This provides a new avenue to significantly speed up and scale up trapped ion quantum computers and simulators.

preprint2019arXiv

Tracking the dynamics of an ideal quantum measurement

The existence of ideal quantum measurements is one of the fundamental predictions of quantum mechanics. In theory the measurement projects onto the eigenbasis of the measurement observable while preserving all coherences of degenerate eigenstates. The question arises whether there are dynamical processes in nature that correspond to such ideal quantum measurements. Here we address this question and present experimental results monitoring the dynamics of a naturally occurring measurement process: the coupling of a trapped ion qutrit to the photon environment. By taking tomographic snapshots during the detection process, we show with an average fidelity of $94\%$ that the process develops in agreement with the model of an ideal quantum measurement.

preprint2019arXiv

Two-loop Octagons, Algebraic Letters and $\bar{Q}$ Equations

We compute the symbol of the first two-loop amplitudes in planar ${\cal N}=4$ SYM with algebraic letters, the eight-point NMHV amplitude (or the dual octagon Wilson loops). We show how to apply $\bar{Q}$ equations for computing the differential of two-loop $n$-point NMHV amplitudes and present the result for n=8 explicitly. The symbol alphabet for octagon consists of 180 independent rational letters and 18 algebraic ones involving Gram-determinant square roots. We comment on all-loop predictions for final entries and aspects of the result valid for all multiplicities.

preprint2016arXiv

Quantum capacitance anomalies of two-dimensional non-equilibrium states under microwave irradiation

We report our direct study of the compressibility on ultrahigh mobility two-dimensional electron system ($μ_{e} \sim 1 \times 10^{7}$ cm$^{2}$/Vs) in GaAs/AlGaAs quantum wells under microwave (MW) irradiation. The field penetration current results show that the quantum capacitance oscillates with microwave induced resistance oscillations (MIRO), however, the trend is opposite with respect to the compressibility for usual equilibrium states in previous theoretical explanations. The anomalous phenomena provide a platform for study on the non-equilibrium system under microwave, and point to the current domains and inhomogeneity induced by radiation. Moreover, the quantum capacitance indication for multi-photon process around $j = 1/2$ is detected under intensive microwave below 30 GHz.