Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
182works
0followers
50topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

182 published item(s)

preprint2026arXiv

$π$-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows

The rise of personal assistant agents, e.g., OpenClaw, highlights the growing potential of large language models to support users across everyday life and work. A core challenge in these settings is proactive assistance, since users often begin with underspecified requests and leave important needs, constraints, or preferences unstated. However, existing benchmarks rarely evaluate whether agents can identify and act on such hidden intents before they are explicitly stated, especially in sustained multi-turn interactions where user needs emerge gradually. To address this gap, we introduce $π$-Bench, a benchmark for proactive assistance comprising 100 multi-turn tasks across 5 domain-specific user personas. By incorporating hidden user intents, inter-task dependencies, and cross-session continuity, $π$-Bench evaluates agents' ability to anticipate and address user needs over extended interactions, jointly measuring proactivity and task completion in long-horizon trajectories that better reflect real-world use. Experiments show (1) proactive assistance remains challenging, (2) a clear distinction between task completion and proactivity, and (3) the value of prior interaction for proactive intent resolution in later tasks.

preprint2026arXiv

ACPO: Anchor-Constrained Perceptual Optimization for Diffusion Models with No-Reference Quality Guidance

Diffusion models have achieved remarkable success in image generation, yet their training is predominantly driven by full-reference objectives that enforce pixel-wise similarity to ground-truth images.Such supervision, while effective for fidelity, may insufficient in terms of subjective visual perception quality and text-image semantic consistency. In this work, we investigate the problem of incorporating no-reference perceptual quality into diffusion training. A key challenge is that directly optimizing perceptual signals, such as those provided by no-reference image quality assessment (NR-IQA) models, introduces a mismatch with the original diffusion objective, leading to training instability and distributional drift during fine-tuning. To address this issue, we propose an anchor-constrained optimization framework that enables stable perceptual adaptation. Specifically, we leverage a learned NR-IQA model as a perceptual guidance signal, while introducing an anchor-based regularization that enforces consistency with the base diffusion model in terms of noise prediction. This design effectively balances perceptual quality improvement and generative fidelity, allowing controlled adaptation toward perceptually favorable outputs without compromising the original generative behavior. Extensive experiments demonstrate that our method consistently enhances perceptual quality while preserving generation diversity and training stability, highlighting the effectiveness of anchor-constrained perceptual optimization for diffusion models.

preprint2026arXiv

Baiting AI: Deceptive Adversary Against AI-Protected Industrial Infrastructures

This paper explores a new cyber-attack vector targeting Industrial Control Systems (ICS), particularly focusing on water treatment facilities. Developing a new multi-agent Deep Reinforcement Learning (DRL) approach, adversaries craft stealthy, strategically timed, wear-out attacks designed to subtly degrade product quality and reduce the lifespan of field actuators. This sophisticated method leverages DRL methodology not only to execute precise and detrimental impacts on targeted infrastructure but also to evade detection by contemporary AI-driven defence systems. By developing and implementing tailored policies, the attackers ensure their hostile actions blend seamlessly with normal operational patterns, circumventing integrated security measures. Our research reveals the robustness of this attack strategy, shedding light on the potential for DRL models to be manipulated for adversarial purposes. Our research has been validated through testing and analysis in an industry-level setup. For reproducibility and further study, all related materials, including datasets and documentation, are publicly accessible.

preprint2026arXiv

Biosignal Fingerprinting: A Cross-Modal PPG-ECG Foundation Model

Cardiovascular disease remains the leading cause of global mortality, yet scalable cardiac monitoring is hindered by the gap between diagnostic-rich ECG and ubiquitous wearable PPG. Bridging this gap requires representations that are compact, transferable across modalities and devices, and deployable without task-specific retraining. Here we introduce biosignal fingerprints: compact latent representations of cardiovascular state derived from a cross-modal foundation model, the Multi-modal Masked Autoencoder (M2AE), trained on over 3.4 million paired ECG and PPG signals. M2AE integrates modality-specific encoders with a shared bottleneck and dual decoders, jointly optimized using reconstruction and cross-modal contrastive objectives, yielding generalizable fingerprints that retain intra- and inter-modality features. Like a biometric fingerprint, these representations uniquely encode an individual's cardiovascular state in a modality-agnostic, privacy-preserving form reusable across clinical tasks without exposing raw waveform data or requiring model retraining. Across 7 downstream tasks, spanning cross-modal reconstruction, cardiovascular disease classification, hypertension detection, mortality prediction, and demographic inference, biosignal fingerprints achieve competitive or superior performance compared to leading domain-specialist foundation models in frozen settings, including an AUROC of 0.974 for five-class CVD classification and 0.877 for hypertension detection, with a maximum improvement of 27.7% in AUROC across 5 classification tasks. Critically, strong performance is maintained with only a single modality, enabling deployment in resource-constrained, single-sensor environments typical of real-world wearable monitoring, with direct implications for continuous cardiovascular monitoring across clinical and consumer health settings.

preprint2026arXiv

Brain-Inspired Exploration of Functional Networks and Key Neurons in Large Language Models

In recent years, the rapid advancement of large language models (LLMs) in natural language processing has sparked significant interest among researchers to understand their mechanisms and functional characteristics. Although prior studies have attempted to explain LLM functionalities by identifying and interpreting specific neurons, these efforts mostly focus on individual neuron contributions, neglecting the fact that human brain functions are realized through intricate interaction networks. Inspired by research on functional brain networks (FBNs) in the field of neuroscience, we utilize similar methodologies estabilished in FBN analysis to explore the "functional networks" within LLMs in this study. Experimental results highlight that, much like the human brain, LLMs exhibit certain functional networks that recur frequently during their operation. Further investigation reveals that these functional networks are indispensable for LLM performance. Inhibiting key functional networks severely impairs the model's capabilities. Conversely, amplifying the activity of neurons within these networks can enhance either the model's overall performance or its performance on specific tasks. This suggests that these functional networks are strongly associated with either specific tasks or the overall performance of the LLM. Code is available at https://github.com/WhatAboutMyStar/LLM_ACTIVATION.

preprint2026arXiv

CARD: Non-Uniform Quantization of Visual Semantic Unit for Generative Recommendation

Generative recommendation frameworks typically represent items as discrete Semantic IDs (SIDs). While existing studies have sought to enhance SID construction by incorporating multimodal content, collaborative signals, or more advanced quantization techniques, learning high-quality SIDs still faces two key challenges: (1) The two-stage generative recommendation paradigm (SID construction and autoregressive generation) provides insufficient supervision for heterogeneous fusion, which hinders learning high-quality SIDs, and (2) non-uniform embeddings lead to codeword imbalance and generation bias. To address these challenges, we propose a novel generative recommendation framework, called CARD. CARD introduces a visual semantic unit that unifies textual, visual, and collaborative signals into a structured visual representation prior to encoding, enabling holistic semantic modeling and effectively alleviating the semantic gap, thereby reducing the reliance on supervision signals during SID learning. Furthermore, to deal with the highly non-uniform distribution of item semantic embeddings in recommendation scenarios, we develop a non-uniform quantization framework (NU-RQ-VAE), which incorporates a learnable and invertible non-uniform transformation into the quantization process to map skewed semantic distributions into a more balanced latent space, thereby significantly improving codebook utilization and quantization accuracy. Experiments on multiple datasets show that CARD consistently outperforms baseline methods under various settings; meanwhile, the proposed non-uniform transformation module is plug-and-play and remains robust across different quantization schemes. Code is available at https://github.com/HAI-UESTC/CARD.

preprint2026arXiv

Chronicles-OCR: A Cross-Temporal Perception Benchmark for the Evolutionary Trajectory of Chinese Characters

Vision Large Language Models (VLLMs) have achieved remarkable success in modern text-rich visual understanding. However, their perceptual robustness in the face of the continuous morphological evolution of historical writing systems remains largely unexplored. Existing ancient text datasets typically focus on isolated historical periods, failing to capture the systematic visual distribution shifts spanning thousands of years. To bridge this gap and empower Digital Humanities, we introduce Chronicles-OCR, the first comprehensive benchmark specifically designed to evaluate the cross-temporal visual perception capabilities of VLLMs across the complete evolutionary trajectory of Chinese characters, known as the Seven Chinese Scripts. Curated in collaboration with top-tier institutional domain experts, the dataset comprises 2,800 strictly balanced images encompassing highly diverse physical media, ranging from tortoise shells to paper-based calligraphy. To accommodate the drastic morphological and topological variations across different historical stages, we propose a novel Stage-Adaptive Annotation Paradigm. Based on this, Chronicles-OCR formulates four rigorous quantitative tasks: cross-period character spotting, fine-grained archaic character recognition via visual referring, ancient text parsing, and script classification. By isolating visual perception from semantic reasoning, Chronicles-OCR provides an authoritative platform to expose the limitations of current VLLMs, paving the way for robust, evolution-aware historical text perception. Chronicles-OCR is publicly available at https://github.com/VirtualLUOUCAS/Chronicles-OCR.

preprint2026arXiv

Cross-Modal Attention Network with Dual Graph Learning in Multimodal Recommendation

Multimedia recommendation systems leverage user-item interactions and multimodal information to capture user preferences, enabling more accurate and personalized recommendations. Despite notable advancements, existing approaches still face two critical limitations: first, shallow modality fusion often relies on simple concatenation, failing to exploit rich synergic intra- and inter-modal relationships; second, asymmetric feature treatment-where users are only characterized by interaction IDs while items benefit from rich multimodal content-hinders the learning of a shared semantic space. To address these issues, we propose a Cross-modal Recursive Attention Network with dual graph Embedding (CRANE). To tackle shallow fusion, we design a core Recursive Cross-Modal Attention (RCA) mechanism that iteratively refines modality features based on cross-correlations in a joint latent space, effectively capturing high-order intra- and inter-modal dependencies. For symmetric multimodal learning, we explicitly construct users' multimodal profiles by aggregating features of their interacted items. Furthermore, CRANE integrates a symmetric dual-graph framework-comprising a heterogeneous user-item interaction graph and a homogeneous item-item semantic graph-unified by a self-supervised contrastive learning objective to fuse behavioral and semantic signals. Despite these complex modeling capabilities, CRANE maintains high computational efficiency. Theoretical and empirical analyses confirm its scalability and high practical efficiency, achieving faster convergence on small datasets and superior performance ceilings on large-scale ones. Comprehensive experiments on four public real-world datasets validate an average 5% improvement in key metrics over state-of-the-art baselines.

preprint2026arXiv

DeepLévy: Learning Heavy-Tailed Uncertainty in Highly Volatile Time Series

Modeling uncertainty in heavy-tailed time series remains a critical challenge for deep probabilistic forecasting models, which often struggle to capture abrupt, extreme events. While Lévy stable distributions offer a natural framework for modeling such non-Gaussian behaviors, the intractability of their probability density functions severely limits conventional likelihood-based inference. To address this, we introduce DeepLévy, a neural framework that learns mixtures of Lévy stable distributions by minimizing the discrepancy between empirical and parametric characteristic functions. DeepLévy incorporates a mixture mechanism that adaptively learns context-dependent weights and parameters over multiple Lévy components, enabling flexible multi-horizon uncertainty modeling. Evaluations on both real and synthetic datasets demonstrate that DeepLévy outperforms state-of-the-art deep probabilistic forecasting approaches in tail risk metrics, especially under extreme volatility.

preprint2026arXiv

Enhancing Self-Supervised Talking Head Forgery Detection via a Training-Free Dual-System Framework

Supervised talking head forgery detection faces severe generalization challenges due to the continuous evolution of generators. By reducing reliance on generator-specific forgery patterns, self-supervised detectors offer stronger cross-generator robustness. However, existing research has mainly focused on building stronger detectors, while the discriminative capacity of trained detectors remains insufficiently exploited. In particular, for score-based self-supervised detectors, the limited discriminative ability on hard cases is often reflected in unreliable anomaly ordering, leaving room for further refinement. Motivated by this observation, we draw inspiration from the dual-system theory of human cognition and propose a Training-Free Dual-System (TFDS) framework to further exploit the latent discriminative capacity of existing score-based self-supervised detectors. TFDS treats anomaly-like scores as the basis of System-1, using lightweight threshold-based routing to partition samples into confident and uncertain subsets. System-2 then revisits only the uncertain subset, performing fine-grained evidence-guided reasoning to refine the relative ordering of ambiguous samples within the original score distribution. Extensive experiments demonstrate consistent improvements across datasets and perturbation settings, with the gains arising mainly from corrected ordering within the uncertain subset. These findings show that existing self-supervised talking head forgery detectors still contain underexploited discriminative cues that can be effectively unlocked through training-free dual-system reasoning.

preprint2026arXiv

Experimental study on an S-band near-field microwave magnetron power transmission system on hundred-watt level

A multi-magnetron microwave source, a metamaterial transmitting antenna, and a large power rectenna array are presented to build a near-field 2.45 GHz microwave power transmission system. The square 1 m2 rectenna array consists of sixteen rectennas with 2048 Schottky diodes for large power microwave rectifying. It receives microwave power and converts them into DC power. The design, structure, and measured performance of a unit rectenna as well as the entail rectenna array are presented in detail. The multi-magnetron microwave power source switches between half and full output power levels, i.e. the half-wave and full-wave modes. The transmission antenna is formed by a double-layer metallic hole array, which is applied to combine the output power of each magnetron. The rectenna array DC output power reaches 67.3 W on a 1.2 ohm DC load at a distance of 5.5 m from the transmission antenna. DC output power is affected by the distance, DC load, and the mode of microwave power source. It shows that conventional low power Schottky diodes can be applied to a microwave power transmission system with simple magnetrons to realise large power microwave rectifying.

preprint2026arXiv

Frequency-Aware Semantic Fusion with Gated Injection for AI-generated Image Detection

AI-generated images are becoming increasingly realistic and diverse, posing significant challenges for generalizable detection. While Vision Foundation Models (VFMs) provide rich semantic representations and frequency-based methods capture complementary artifact cues, existing approaches that combine these modalities still suffer from limited generalization, with notable performance degradation on unseen generative models. We attribute this limitation to two key factors: frequency shortcut bias toward easily distinguishable cues associated with specific generators and cross-domain representation conflict between high-level semantics and low-level frequency patterns. To address these issues, we propose a Frequency-aware Gated Injection Network (FGINet) to improve generalization. Specifically, we design a Band-Masked Frequency Encoder (BMFE) that applies cross-band masking in the frequency domain to reduce reliance on generator-specific patterns and encourage more diverse and generalizable representations. We further introduce a Layer-wise Gated Frequency Injection (LGFI) mechanism to progressively inject frequency cues into the VFM backbone with adaptive gating, aligning with its hierarchical abstraction and alleviating representation conflict. Moreover, we propose a Hyperspherical Compactness Learning (HCL) framework with a cosine margin objective to learn compact and well-separated representations. Extensive experiments demonstrate that FGINet achieves state-of-the-art performance and strong generalization across multiple challenging datasets.

preprint2026arXiv

From Ecological Connectivity to Outbreak Risk: A Heterogeneous Graph Network for Epidemiological Reasoning under Sparse Spatiotemporal Data

Estimating population-level prevalence and transmission dynamics of wildlife pathogens can be challenging, partly because surveillance data is sparse, detection-driven, and unevenly sequenced. Using highly pathogenic avian influenza A/H5 clade 2.3.4.4b as a case study, we develop zooNet, a graph-based epidemiological framework that integrates mechanistic transmission simulation, metadata-driven genetic distance imputation, and spatiotemporal graph learning to reconstruct outbreak dynamics from incomplete observations. Applied to wild bird surveillance data from the United States during 2022, zooNet recovered coherent spatiotemporal structure despite intermittent detections, revealing sustained regional circulation across multiple migratory flyways. The framework consistently identified counties with ongoing transmission weeks to months before confirmed detections, including persistent activity in northeastern regions prior to documented re-emergence. These signals were detectable even in areas with sparse sequencing and irregular reporting. These results show that explicitly representing ecological processes and inferred genomic connectivity within a unified graph structure allows persistence and spatial risk structure to be inferred from detection-driven wildlife surveillance data.

preprint2026arXiv

FutureX-Pro: Extending Future Prediction to High-Value Vertical Domains

Building upon FutureX, which established a live benchmark for general-purpose future prediction, this report introduces FutureX-Pro, including FutureX-Finance, FutureX-Retail, FutureX-PublicHealth, FutureX-NaturalDisaster, and FutureX-Search. These together form a specialized framework extending agentic future prediction to high-value vertical domains. While generalist agents demonstrate proficiency in open-domain search, their reliability in capital-intensive and safety-critical sectors remains under-explored. FutureX-Pro targets four economically and socially pivotal verticals: Finance, Retail, Public Health, and Natural Disaster. We benchmark agentic Large Language Models (LLMs) on entry-level yet foundational prediction tasks -- ranging from forecasting market indicators and supply chain demands to tracking epidemic trends and natural disasters. By adapting the contamination-free, live-evaluation pipeline of FutureX, we assess whether current State-of-the-Art (SOTA) agentic LLMs possess the domain grounding necessary for industrial deployment. Our findings reveal the performance gap between generalist reasoning and the precision required for high-value vertical applications.

preprint2026arXiv

Genie Centurion: Accelerating Scalable Real-World Robot Training with Human Rewind-and-Refine Guidance

While Vision-Language-Action (VLA) models show strong generalizability in various tasks, real-world deployment of robotic policy still requires large-scale, high-quality human expert demonstrations. However, data collection via human teleoperation requires continuous operator attention, which is costly, hard to scale. To address this, we propose Genie Centurion (GCENT), a scalable and general data collection paradigm based on human rewind-and-refine guidance, enabling robots' interactive learning in deployment. GCENT starts at an imperfect policy and improves over time. When the robot execution failures occur, GCENT allows robots to revert to a previous state with a rewind mechanism, after which a teleoperator provides corrective demonstrations to refine the policy. This framework supports a one-human-to-many-robots supervision scheme with a Task Sentinel module, which autonomously predicts task success and solicits human intervention when necessary. Empirical results show that GCENT achieves up to 40% higher task success rates than state-of-the-art data collection methods, and reaches comparable performance using less than half the data in long-horizon and precise tasks. We also quantify the data yield-to-effort ratio under multi-robot scenarios, demonstrating GCENT's potential for scalable and cost-efficient robot policy training in real-world environments.

preprint2026arXiv

High-Performance KV$_3$Sb$_5$/WSe$_2$ van der Waals Photodetectors

Kagome metals AV$_3$Sb$_5$ (A = K, Rb, Cs) have recently emerged as a promising platform for exploring correlated and topological quantum states, yet their potential for optoelectronic applications remains largely unexplored. Here, we report high-performance photodetectors based on van der Waals KV$_3$Sb$_5$/WSe$_2$ heterojunctions. A high-quality Schottky interface readily forms between KV$_3$Sb$_5$ and WSe$_2$, enabling efficient separation and transport of photoinduced carriers. Under 520 nm illumination, the device achieves an open-circuit voltage up to 0.6 V, a responsivity of 809 mA/W, and a fast response time of 18.3 us. This work demonstrates the promising optoelectronic applications of Kagome metals and highlights the potential of KV$_3$Sb$_5$-based van der Waals heterostructures for high-performance photodetection.

preprint2026arXiv

HiMix: Hierarchical Artifact-aware Mixup for Generalized Synthetic Image Detection

The rapid evolution of generative models has enabled the creation of highly realistic and diverse synthetic images, posing significant challenges to reliable and generalizable Synthetic Image Detection (SID). However, existing detectors are typically trained on limited and biased datasets, resulting in poor generalization to unseen generators. To address this issue, we propose HiMix, a unified framework that enhances generalization by expanding the training distribution and promoting artifact-aware representations. Specifically, the Mixup-driven Distributional Augmentation (MDA) module constructs continuous transitional samples between real and fake images, improving coverage of low-confidence regions and exposing the model to more challenging samples, while the pixel-wise mixup operation smoothly perturbs semantics to enhance sensitivity to low-level artifacts. Moreover, the Hierarchical Artifact-aware Representation (HAR) module aggregates artifact information from both global and local levels through cross-layer integration and coarse-to-fine feature fusion, enabling the extraction of discriminative forgery representations under diverse distributions. Extensive experiments across multiple benchmarks demonstrate that HiMix achieves state-of-the-art performance, establishing well-separated logits for improved generalization to unseen forgeries.

preprint2026arXiv

LPFQA: A Long-Tail Professional Forum-based Benchmark for LLM Evaluation

Large Language Models (LLMs) perform well on standard reasoning and question-answering benchmarks, yet such evaluations often fail to capture their ability to handle long-tail, expertise-intensive knowledge in real-world professional scenarios. We introduce LPFQA, a long-tail knowledge benchmark derived from authentic professional forum discussions, covering 7 academic and industrial domains with 430 curated tasks grounded in practical expertise. LPFQA evaluates specialized reasoning, domain-specific terminology understanding, and contextual interpretation, and adopts a hierarchical difficulty structure to ensure semantic clarity and uniquely identifiable answers. Experiments on over multiple mainstream LLMs reveal substantial performance gaps, particularly on tasks requiring deep domain reasoning, exposing limitations overlooked by existing benchmarks. Overall, LPFQA provides an authentic and discriminative evaluation framework that complements prior benchmarks and informs future LLM development.

preprint2026arXiv

MaskTab: Scalable Masked Tabular Pretraining with Scaling Laws and Distillation for Industrial Classification

Tabular data forms the backbone of high-stakes decision systems in finance, healthcare, and beyond. Yet industrial tabular datasets are inherently difficult: high-dimensional, riddled with missing entries, and rarely labeled at scale. While foundation models have revolutionized vision and language, tabular learning still leans on handcrafted features and lacks a general self-supervised framework. We present MaskTab, a unified pre-training framework designed specifically for industrial-scale tabular data. MaskTab encodes missing values via dedicated learnable tokens, enabling the model to distinguish structural absence from random dropout. It jointly optimizes a hybrid supervised pre-training scheme--utilizing a twin-path architecture to reconcile masked reconstruction with task-specific supervision--and an MoE-augmented loss that adaptively routes features through specialized subnetworks. On industrial-scale benchmarks, it achieves +5.04% AUC and +8.28% KS over prior art under rigorous scaling. Moreover, its representations distill effectively into lightweight models, yielding +2.55% AUC and +4.85% KS under strict latency and interpretability constraints, while improving robustness to distribution shifts. Our work demonstrates that tabular data admits a foundation-model treatment--when its structural idiosyncrasies are respected.

preprint2026arXiv

MASRA: MLLM-Assisted Semantic-Relational Consistent Alignment for Video Temporal Grounding

Video Temporal Grounding (VTG) faces a cross-modal semantic gap that often leads to background features being incorrectly aligned with the query, while directly matching the query to moments results in insufficient discriminability and consistency of temporal semantics. To address this issue, we propose MLLM-Assisted Semantic-Relational Consistent Alignment (MASRA), a training-time MLLM-based optimization framework for VTG. MASRA leverages an MLLM during training to produce two forms of textual priors, namely event-level descriptions with temporal spans and clip-level captions, and instantiates two MLLM-assisted alignments. Event Semantic Temporal Alignment (ESTA) aligns temporal context with event semantics to explicitly strengthen the correspondence between semantics and temporal events and improve span-level separability. Local Relational Consistency Alignment (LRCA) constructs a textual relation matrix derived from clip-level captions and aligns it with the temporal feature similarity matrix in the model, enhancing temporal consistency while capturing local structural information. MASRA includes two simple supporting modules, semantic-guided enhancement and second-order relational attention, to better utilize the learned semantic context and relational structure. Moreover, we introduce Decoupled Alignment Interaction (DAI) with a context-aware codebook to adaptively absorb query-irrelevant semantics and alleviate the cross-modal gap. The MLLM is only invoked during training and is not used at inference. Extensive experiments show that MASRA outperforms existing methods, and ablation studies validate its effectiveness.

preprint2026arXiv

MatPhys: Learning Material-Aware Physics Parameters for Deformable Object Simulation from Videos

Reconstructing simulation-ready deformable objects is important for vision, graphics, and robotics. Existing physics-driven methods can recover physical digital twins from videos, but they suffer from two fundamental limitations: they typically assume a homogeneous material across the whole object, and their scene-specific inverse optimization, combined with the inherent ambiguity of monocular observation, yields inconsistent parameters for the same material across different scenes or interactions. We propose MatPhys, a material-aware feed-forward framework that predicts spring-mass parameters from a single-view video, addressing these two issues with two coupled designs. To relax the homogeneous material assumption, we use DINO features to decompose the object into semantically meaningful parts and to query a part-level material prior, assigning each part its own physical behavior. To enforce cross-scene consistency, we introduce a learned material codebook of shared material embeddings as the bridge between appearance and physics, and further use the part-level prior as a reference distribution that constrains the decoder so that the same material yields consistent parameters across scenes and interactions. Together, these designs turn an under-constrained monocular problem into feed-forward inference grounded on shared, reusable material concepts. Experiments show that our method matches per-scene optimization baselines in reconstruction and future prediction, while achieving stronger generalization to unseen interactions and objects with more consistent physical parameters.

preprint2026arXiv

MedVL-SAM2: A unified 3D medical vision-language model for multimodal reasoning and prompt-driven segmentation

Recent progress in medical vision-language models (VLMs) has achieved strong performance on image-level text-centric tasks such as report generation and visual question answering (VQA). However, achieving fine-grained visual grounding and volumetric spatial reasoning in 3D medical VLMs remains challenging, particularly when aiming to unify these capabilities within a single, generalizable framework. To address this challenge, we proposed MedVL-SAM2, a unified 3D medical multimodal model that concurrently supports report generation, VQA, and multi-paradigm segmentation, including semantic, referring, and interactive segmentation. MedVL-SAM2 integrates image-level reasoning and pixel-level perception through a cohesive architecture tailored for 3D medical imaging, and incorporates a SAM2-based volumetric segmentation module to enable precise multi-granular spatial reasoning. The model is trained in a multi-stage pipeline: it is first pre-trained on a large-scale corpus of 3D CT image-text pairs to align volumetric visual features with radiology-language embeddings. It is then jointly optimized with both language-understanding and segmentation objectives using a comprehensive 3D CT segmentation dataset. This joint training enables flexible interaction via language, point, or box prompts, thereby unifying high-level visual reasoning with spatially precise localization. Our unified architecture delivers state-of-the-art performance across report generation, VQA, and multiple 3D segmentation tasks. Extensive analyses further show that the model provides reliable 3D visual grounding, controllable interactive segmentation, and robust cross-modal reasoning, demonstrating that high-level semantic reasoning and precise 3D localization can be jointly achieved within a unified 3D medical VLM.

preprint2026arXiv

Pseudospin Formulation of Quench Dynamics in the Semiclassical Holstein Model

We present a pseudospin formulation for the post-quench dynamics of charge-density-wave (CDW) order in the half-filled spinless Holstein model on a square lattice, assuming spatially homogeneous evolution. This Anderson pseudospin description captures the coherent nonequilibrium dynamics of the coupled electron-lattice system. Numerical simulations reveal three distinct dynamical regimes of the CDW order parameter following a quench-locked oscillations, Landau-damped dynamics, and overdamped relaxation-closely paralleling quench dynamics in BCS superconductors and other electronically driven symmetry-breaking phases. Crucially, however, the presence of dynamical lattice degrees of freedom leads to qualitatively different long-time behavior. In particular, while the oscillation amplitude is reduced in the damped regimes, CDW oscillations do not fully decay but instead persist indefinitely due to feedback from the lattice field. We further show that these persistent oscillations are characterized by a nonequilibrium electronic distribution, which provides an intuitive understanding of both their amplitude and the renormalization of the oscillation frequency relative to the bare Holstein phonon frequency. Our results highlight the essential role of lattice dynamics in nonequilibrium ordered phases and establish a clear distinction between electron-lattice-driven CDW dynamics and their purely electronic counterparts.

preprint2026arXiv

RAVE: Re-Allocating Visual Attention in Large Multimodal Models

Large multimodal models (LMMs) inherit the self-attention mechanism of pretrained language backbones, yet standard attention can exhibit suboptimal allocation, including cross-modal misallocation between textual and visual evidence and intra-visual imbalance among visual tokens. We propose RAVE (Re-Allocating Visual Attention), a lightweight pair-gating mechanism that adds a learned query--key bias to pre-softmax attention scores over visual keys, derived from pre-RoPE query and key features. RAVE requires no architectural modification to the backbone and can be trained end-to-end with the rest of the model. Across a suite of multimodal benchmarks, RAVE improves over standard attention by an average of 3 points, with the largest gains on perception-intensive tasks -- including multilingual OCR, chart understanding, document VQA, and scene text VQA -- where accurate visual grounding is critical.

preprint2026arXiv

Reduce the Artifacts Bias for More Generalizable AI-Generated Image Detection

As the misuse of AI-generated images grows, generalizable image detection techniques are urgently needed. Recent state-of-the-art (SOTA) methods adopt aligned training datasets to reduce content, size, and format biases, empowering models to capture robust forgery cues. A common strategy is to employ reconstruction techniques, e.g., VAE and DDIM, which show remarkable results in diffusion-based methods. However, such reconstruction-based approaches typically introduce limited and homogeneous artifacts, which cannot fully capture diverse generative patterns, such as GAN-based methods. To complement reconstruction-based fake images with aligned yet diverse artifact patterns, we propose a GAN-based upsampling approach that mimics GAN-generated fake patterns while preserving content, size, and format alignment. This naturally results in two aligned but distinct types of fake images. However, due to the domain shift between reconstruction-based and upsampling-based fake images, direct mixed training causes suboptimal results, where one domain disrupts feature learning of the other. Accordingly, we propose a Separate Expert Fusion (SEF) framework to extract complementary artifact information and reduce inter-domain interference. We first train domain-specific experts via LoRA adaptation on a frozen foundational model, then conduct decoupled fusion with a gating network to adaptively combine expert features while retaining their specialized knowledge. Rather than merely benefiting GAN-generated image detection, this design introduces diverse and complementary artifact patterns that enable SEF to learn a more robust decision boundary and improve generalization across broader generative methods. Extensive experiments demonstrate that our method yields strong results across 13 diverse benchmarks. Codes are released at: https://github.com/liyih/SEF_AIGC_detection.

preprint2026arXiv

Revisiting Reinforcement Learning with Verifiable Rewards from a Contrastive Perspective

RLVR has become a widely adopted paradigm for improving LLMs' reasoning capabilities, and GRPO is one of its most representative algorithms. In this paper, we first show that GRPO admits an equivalent discriminative reformulation as a weighted positive-negative score difference. Under this view, GRPO increases sequence-level scores of verified positive rollouts and decreases those of negative rollouts, where the scores are averages of clipped token-level importance sampling ratios. This reformulation reveals two structural limitations of GRPO: likelihood-misaligned scoring, where clipped ratio-based surrogate scores are optimized instead of generation likelihoods, and score-insensitive credit assignment, where rollout-level credit is assigned without accounting for relative score gaps between positive and negative rollouts in the same group. To address these limitations, we propose ConSPO, a framework for Contrastive Sequence-level Policy Optimization in RLVR. ConSPO replaces GRPO's clipped ratio-based scores with length-normalized sequence log-probabilities, aligning the optimized rollout scores with the likelihoods used in autoregressive generation. It then optimizes a group-wise InfoNCE-style objective that contrasts each positive rollout against negative distractors from the same group, enabling credit assignment to depend on their relative scores. This contrastive formulation amplifies updates for poorly separated positives while concentrating suppressive updates on high-scoring negatives. Moreover, ConSPO introduces a curriculum-scheduled margin, guiding optimization from coarse positive-negative ordering in early training toward stronger separation in later stages. Extensive evaluations across diverse backbone models, parameter scales, and training datasets show that ConSPO consistently outperforms several strong RLVR baselines on challenging mathematical reasoning benchmarks.

preprint2026arXiv

SR$^2$-LoRA: Self-Rectifying Inter-layer Relations in Low-Rank Adaptation for Class-Incremental Learning

Pre-trained models with parameter-efficient fine-tuning (PEFT) have demonstrated promising potential for class-incremental learning (CIL), yet catastrophic forgetting still persists when adapting models to new tasks. In this paper, we present a novel perspective on catastrophic forgetting through the analysis of inter-layer relation drift, i.e., the progressive disruption of relationships among layer-wise representations during the learning of new tasks. We theoretically show that the increase of such drift reduces the classification margins of previously learned tasks, thereby degrading overall model performance. To address this issue, we propose \underline{S}elf-\underline{R}ectifying inter-layer \underline{R}elation Low-Rank Adaptation~(SR$^2$-LoRA), a simple yet effective method that mitigates catastrophic forgetting by constraining inter-layer relation drift. Specifically, SR$^2$-LoRA constructs the relation matrices induced by the previous and current models on current-task samples, and aligns the corresponding singular values. We further theoretically show that this alignment exhibits greater robustness to estimation perturbations than direct entry-wise alignment. Extensive experiments on standard CIL benchmarks demonstrate that SR$^2$-LoRA effectively mitigates catastrophic forgetting, with its advantages becoming more pronounced as the number of tasks increases. Code is available in the \href{https://github.com/FqWan24/SR-2-LoRA}{repository}.

preprint2026arXiv

Surface-Form Neural Sparse Retrieval: Robust Fuzzy Matching for Industrial Music Search

Music search at the scale of Amazon Music presents a unique challenge: queries frequently deviate from indexed metadata due to misspellings, transpositions, and phonetic variations, yet the retrieval system must operate under strict millisecond-level latency constraints. Our existing learning-to-retrieve system, the High Confidence Index (HCI), learns query-entity associations from customer behavior, relying on continual ``exploration'' to choose candidates. Traditional n-gram matching enables this exploration but suffers from poor semantic robustness and high noise, limiting the system's ability to learn from long-tail queries. In this work, we present a \textbf{robust neural sparse retrieval system} designed to maximize exploration efficiency. We adapt a state-of-the-art \textbf{inference-free} sparse retrieval architecture to the music domain, combining it with an effective \textbf{domain-specific granular subword tokenization strategy}. Our approach utilizes short-length token constraints (max 3 chars) to enforce the learning of surface-form robustness over lexical memorization. By pre-computing the neural embeddings and term expansions during the offline indexing phase, online processing is reduced to minimal tokenization and IDF weighting, achieving effectively zero latency overhead for query encoding. Evaluations on a 6M-document production corpus show an aggregate \textbf{91.4\%} recall@10 (vs. \textbf{57.7\%} for trigrams) at comparable throughput. Simulation of the HCI feedback loop demonstrates improved exploration efficiency, with \textbf{+0.8\%} higher stabilized recall than production trigrams. Ablation studies indicate that our sparse training methodology drives the performance gains, while domain-specific pretraining provides a cost-effective alternative to large-scale general-purpose pretraining.

preprint2026arXiv

The First Controllable Bokeh Rendering Challenge at NTIRE 2026

This study presents the outcomes of the first Controllable Bokeh Rendering Challenge at NTIRE and highlights the most effective submitted methodologies. In total, 44 participants registered for the competition, of which 8 teams submitted valid solutions after the conclusion of the final test phase. All submissions were evaluated on unseen images, focusing on portraits and intricate subjects with complex and visually appealing bokeh phenomena. In addition to the first track focusing on established quantitative fidelity metrics, we conducted a qualitative user study with a panel of experts for a second track focusing on perceptual assessment. As this was the inaugural challenge on this topic, most of the participants focused on refining and extending the Bokehlicious baseline method.

preprint2026arXiv

UAV as Urban Construction Change Monitor: A New Benchmark and Change Captioning Model

Remote Sensing Image Change Captioning (RSICC) aims to generate spatially grounded natural language descriptions of scene evolution from bi-temporal imagery, moving beyond binary change masks toward semantic-level understanding. However, existing methods rely on implicit feature differencing without explicitly modeling structured change semantics, and struggle to reconcile the conflicting representation demands of change detection and caption generation. In addition, current benchmarks provide limited coverage of high-resolution urban construction scenarios. To address these challenges, we propose PTNet, a prototype-guided task-adaptive framework for joint change captioning and detection. PTNet explicitly models structured change semantics through a learnable prototype bank that guides cross-temporal interaction, disentangles task-specific representations via multi-head gating, and injects detection-derived spatial priors into caption generation, enabling coherent semantic correspondence while preserving fine-grained spatial sensitivity. Furthermore, we construct UCCD, a large-scale UAV-based benchmark comprising 9,000 high-resolution image pairs and 45,000 annotated sentences for urban construction monitoring. Extensive experiments on UCCD and WHU-CDC demonstrate that PTNet consistently outperforms existing methods. The dataset and source code are publicly available at https://github.com/G124556/ptnet.

preprint2026arXiv

Unleashing the Potential of Neighbors: Diffusion-based Latent Neighbor Generation for Session-based Recommendation

Session-based recommendation aims to predict the next item that anonymous users may be interested in, based on their current session interactions. Recent studies have demonstrated that retrieving neighbor sessions to augment the current session can effectively alleviate the data sparsity issue and improve recommendation performance. However, existing methods typically rely on explicitly observed session data, neglecting latent neighbors - not directly observed but potentially relevant within the interest space - thereby failing to fully exploit the potential of neighbor sessions in recommendation. To address the above limitation, we propose a novel model of diffusion-based latent neighbor generation for session-based recommendation, named DiffSBR. Specifically, DiffSBR leverages two diffusion modules, including retrieval-augmented diffusion and self-augmented diffusion, to generate high-quality latent neighbors. In the retrieval-augmented diffusion module, we leverage retrieved neighbors as guiding signals to constrain and reconstruct the distribution of latent neighbors. Meanwhile, we adopt a training strategy that enables the retriever to learn from the feedback provided by the generator. In the self-augmented diffusion module, we explicitly guide the generation of latent neighbors by injecting the current session's multi-modal signals through contrastive learning. After obtaining the generated latent neighbors, we utilize them to enhance session representations for improving session-based recommendation. Extensive experiments on four public datasets show that DiffSBR generates effective latent neighbors and improves recommendation performance against state-of-the-art baselines.

preprint2025arXiv

Evidence of anisotropic three-dimensional weak-localization in TiSe$_{2}$ nanoflakes

TiSe$_2$ is a typical transition-metal dichalcogenide known for its charge-density wave order. In this study, we report the observation of an unusual anisotropic negative magnetoresistance in exfoliated TiSe$_2$ nanoflakes at low temperatures. Unlike the negative magnetoresistance reported in most other transition-metal dichalcogenides, our results cannot be explained by either the conventional two-dimensional weak localization effect or the Kondo effect. A comprehensive analysis of the data suggests that the observed anisotropic negative magnetoresistance in TiSe$_2$ flakes is most likely caused by the three-dimensional weak localization effect. Our findings contribute to a deeper understanding of the phase-coherent transport processes in TiSe$_2$.

preprint2024arXiv

3S-TSE: Efficient Three-Stage Target Speaker Extraction for Real-Time and Low-Resource Applications

Target speaker extraction (TSE) aims to isolate a specific voice from multiple mixed speakers relying on a registerd sample. Since voiceprint features usually vary greatly, current end-to-end neural networks require large model parameters which are computational intensive and impractical for real-time applications, espetially on resource-constrained platforms. In this paper, we address the TSE task using microphone array and introduce a novel three-stage solution that systematically decouples the process: First, a neural network is trained to estimate the direction of the target speaker. Second, with the direction determined, the Generalized Sidelobe Canceller (GSC) is used to extract the target speech. Third, an Inplace Convolutional Recurrent Neural Network (ICRN) acts as a denoising post-processor, refining the GSC output to yield the final separated speech. Our approach delivers superior performance while drastically reducing computational load, setting a new standard for efficient real-time target speaker extraction.

preprint2024arXiv

Distilling Temporal Knowledge with Masked Feature Reconstruction for 3D Object Detection

Striking a balance between precision and efficiency presents a prominent challenge in the bird's-eye-view (BEV) 3D object detection. Although previous camera-based BEV methods achieved remarkable performance by incorporating long-term temporal information, most of them still face the problem of low efficiency. One potential solution is knowledge distillation. Existing distillation methods only focus on reconstructing spatial features, while overlooking temporal knowledge. To this end, we propose TempDistiller, a Temporal knowledge Distiller, to acquire long-term memory from a teacher detector when provided with a limited number of frames. Specifically, a reconstruction target is formulated by integrating long-term temporal knowledge through self-attention operation applied to feature teachers. Subsequently, novel features are generated for masked student features via a generator. Ultimately, we utilize this reconstruction target to reconstruct the student features. In addition, we also explore temporal relational knowledge when inputting full frames for the student model. We verify the effectiveness of the proposed method on the nuScenes benchmark. The experimental results show our method obtain an enhancement of +1.6 mAP and +1.1 NDS compared to the baseline, a speed improvement of approximately 6 FPS after compressing temporal knowledge, and the most accurate velocity estimation.

preprint2024arXiv

Functional Geometry Guided Protein Sequence and Backbone Structure Co-Design

Proteins are macromolecules responsible for essential functions in almost all living organisms. Designing reasonable proteins with desired functions is crucial. A protein's sequence and structure are strongly correlated and they together determine its function. In this paper, we propose NAEPro, a model to jointly design Protein sequence and structure based on automatically detected functional sites. NAEPro is powered by an interleaving network of attention and equivariant layers, which can capture global correlation in a whole sequence and local influence from nearest amino acids in three dimensional (3D) space. Such an architecture facilitates effective yet economic message passing at two levels. We evaluate our model and several strong baselines on two protein datasets, $β$-lactamase and myoglobin. Experimental results show that our model consistently achieves the highest amino acid recovery rate, TM-score, and the lowest RMSD among all competitors. These findings prove the capability of our model to design protein sequences and structures that closely resemble their natural counterparts. Furthermore, in-depth analysis further confirms our model's ability to generate highly effective proteins capable of binding to their target metallocofactors. We provide code, data and models in Github.

preprint2024arXiv

GUESS:GradUally Enriching SyntheSis for Text-Driven Human Motion Generation

In this paper, we propose a novel cascaded diffusion-based generative framework for text-driven human motion synthesis, which exploits a strategy named GradUally Enriching SyntheSis (GUESS as its abbreviation). The strategy sets up generation objectives by grouping body joints of detailed skeletons in close semantic proximity together and then replacing each of such joint group with a single body-part node. Such an operation recursively abstracts a human pose to coarser and coarser skeletons at multiple granularity levels. With gradually increasing the abstraction level, human motion becomes more and more concise and stable, significantly benefiting the cross-modal motion synthesis task. The whole text-driven human motion synthesis problem is then divided into multiple abstraction levels and solved with a multi-stage generation framework with a cascaded latent diffusion model: an initial generator first generates the coarsest human motion guess from a given text description; then, a series of successive generators gradually enrich the motion details based on the textual description and the previous synthesized results. Notably, we further integrate GUESS with the proposed dynamic multi-condition fusion mechanism to dynamically balance the cooperative effects of the given textual condition and synthesized coarse motion prompt in different generation stages. Extensive experiments on large-scale datasets verify that GUESS outperforms existing state-of-the-art methods by large margins in terms of accuracy, realisticness, and diversity. Code is available at https://github.com/Xuehao-Gao/GUESS.

preprint2024arXiv

Hierarchical speaker representation for target speaker extraction

Target speaker extraction aims to isolate a specific speaker's voice from a composite of multiple sound sources, guided by an enrollment utterance or called anchor. Current methods predominantly derive speaker embeddings from the anchor and integrate them into the separation network to separate the voice of the target speaker. However, the representation of the speaker embedding is too simplistic, often being merely a 1*1024 vector. This dense information makes it difficult for the separation network to harness effectively. To address this limitation, we introduce a pioneering methodology called Hierarchical Representation (HR) that seamlessly fuses anchor data across granular and overarching 5 layers of the separation network, enhancing the precision of target extraction. HR amplifies the efficacy of anchors to improve target speaker isolation. On the Libri-2talker dataset, HR substantially outperforms state-of-the-art time-frequency domain techniques. Further demonstrating HR's capabilities, we achieved first place in the prestigious ICASSP 2023 Deep Noise Suppression Challenge. The proposed HR methodology shows great promise for advancing target speaker extraction through enhanced anchor utilization.

preprint2024arXiv

OFDM-Based Digital Semantic Communication with Importance Awareness

Semantic communication (SemCom) has received considerable attention for its ability to reduce data transmission size while maintaining task performance. However, existing works mainly focus on analog SemCom with simple channel models, which may limit its practical application. To reduce this gap, we propose an orthogonal frequency division multiplexing (OFDM)-based SemCom system that is compatible with existing digital communication infrastructures. In the considered system, the extracted semantics is quantized by scalar quantizers, transformed into OFDM signal, and then transmitted over the frequency-selective channel. Moreover, we propose a semantic importance measurement method to build the relationship between target task and semantic features. Based on semantic importance, we formulate a sub-carrier and bit allocation problem to maximize communication performance. However, the optimization objective function cannot be accurately characterized using a mathematical expression due to the neural network-based semantic codec. Given the complex nature of the problem, we first propose a low-complexity sub-carrier allocation method that assigns sub-carriers with better channel conditions to more critical semantics. Then, we propose a deep reinforcement learning-based bit allocation algorithm with dynamic action space. Simulation results demonstrate that the proposed system achieves 9.7% and 28.7% performance gains compared to analog SemCom and conventional bit-based communication systems, respectively.

preprint2024arXiv

StreamVC: Real-Time Low-Latency Voice Conversion

We present StreamVC, a streaming voice conversion solution that preserves the content and prosody of any source speech while matching the voice timbre from any target speech. Unlike previous approaches, StreamVC produces the resulting waveform at low latency from the input signal even on a mobile platform, making it applicable to real-time communication scenarios like calls and video conferencing, and addressing use cases such as voice anonymization in these scenarios. Our design leverages the architecture and training strategy of the SoundStream neural audio codec for lightweight high-quality speech synthesis. We demonstrate the feasibility of learning soft speech units causally, as well as the effectiveness of supplying whitened fundamental frequency information to improve pitch stability without leaking the source timbre information.

preprint2023arXiv

High-resolution myelin-water fraction and quantitative relaxation mapping using 3D ViSTa-MR fingerprinting

Purpose: This study aims to develop a high-resolution whole-brain multi-parametric quantitative MRI approach for simultaneous mapping of myelin-water fraction (MWF), T1, T2, and proton-density (PD), all within a clinically feasible scan time. Methods: We developed 3D ViSTa-MRF, which combined Visualization of Short Transverse relaxation time component (ViSTa) technique with MR Fingerprinting (MRF), to achieve high-fidelity whole-brain MWF and T1/T2/PD mapping on a clinical 3T scanner. To achieve fast acquisition and memory-efficient reconstruction, the ViSTa-MRF sequence leverages an optimized 3D tiny-golden-angle-shuffling spiral-projection acquisition and joint spatial-temporal subspace reconstruction with optimized preconditioning algorithm. With the proposed ViSTa-MRF approach, high-fidelity direct MWF mapping was achieved without a need for multi-compartment fitting that could introduce bias and/or noise from additional assumptions or priors. Results: The in-vivo results demonstrate the effectiveness of the proposed acquisition and reconstruction framework to provide fast multi-parametric mapping with high SNR and good quality. The in-vivo results of 1mm- and 0.66mm-iso datasets indicate that the MWF values measured by the proposed method are consistent with standard ViSTa results that are 30x slower with lower SNR. Furthermore, we applied the proposed method to enable 5-minute whole-brain 1mm-iso assessment of MWF and T1/T2/PD mappings for infant brain development and for post-mortem brain samples. Conclusions: In this work, we have developed a 3D ViSTa-MRF technique that enables the acquisition of whole-brain MWF, quantitative T1, T2, and PD maps at 1mm and 0.66mm isotropic resolution in 5 and 15 minutes, respectively. This advancement allows for quantitative investigations of myelination changes in the brain.

preprint2023arXiv

Incorporating Nuclear Quantum Effects in Molecular Dynamics with a Constrained Minimized Energy Surface

The accurate incorporation of nuclear quantum effects in large-scale molecular dynamics (MD) simulations remains a significant challenge. Recently, we combined constrained nuclear-electronic orbital (CNEO) theory with classical MD and obtained a new approach (CNEO-MD) that can accurately and efficiently incorporate nuclear quantum effects into classical simulations. In this Letter, we provide the theoretical foundation for CNEO-MD by developing an alternative formulation of the equations of motion for MD. In this new formulation, the expectation values of quantum nuclear positions evolve classically on an effective energy surface that is obtained from a constrained energy minimization procedure when solving for the quantum nuclear wave function, thus enabling the incorporation of nuclear quantum effects in classical MD simulations. For comparison with other existing approaches, we examined a series of model systems and found that this new MD approach is significantly more accurate than the conventional way of performing classical MD, and it also generally outperforms centroid MD and ring-polymer MD in describing vibrations in these model systems.

preprint2023arXiv

MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices

We present MobileVLM, a competent multimodal vision language model (MMVLM) targeted to run on mobile devices. It is an amalgamation of a myriad of architectural designs and techniques that are mobile-oriented, which comprises a set of language models at the scale of 1.4B and 2.7B parameters, trained from scratch, a multimodal vision model that is pre-trained in the CLIP fashion, cross-modality interaction via an efficient projector. We evaluate MobileVLM on several typical VLM benchmarks. Our models demonstrate on par performance compared with a few much larger models. More importantly, we measure the inference speed on both a Qualcomm Snapdragon 888 CPU and an NVIDIA Jeston Orin GPU, and we obtain state-of-the-art performance of 21.5 tokens and 65.3 tokens per second, respectively. Our code will be made available at: https://github.com/Meituan-AutoML/MobileVLM.

preprint2023arXiv

Model-based cross-correlation search for gravitational waves from the low-mass X-ray binary Scorpius X-1 in LIGO O3 data

We present the results of a model-based search for continuous gravitational waves from the low-mass X-ray binary Scorpius X-1 using LIGO detector data from the third observing run of Advanced LIGO, Advanced Virgo and KAGRA. This is a semicoherent search which uses details of the signal model to coherently combine data separated by less than a specified coherence time, which can be adjusted to balance sensitivity with computing cost. The search covered a range of gravitational-wave frequencies from 25Hz to 1600Hz, as well as ranges in orbital speed, frequency and phase determined from observational constraints. No significant detection candidates were found, and upper limits were set as a function of frequency. The most stringent limits, between 100Hz and 200Hz, correspond to an amplitude h0 of about 1e-25 when marginalized isotropically over the unknown inclination angle of the neutron star's rotation axis, or less than 4e-26 assuming the optimal orientation. The sensitivity of this search is now probing amplitudes predicted by models of torque balance equilibrium. For the usual conservative model assuming accretion at the surface of the neutron star, our isotropically-marginalized upper limits are close to the predicted amplitude from about 70Hz to 100Hz; the limits assuming the neutron star spin is aligned with the most likely orbital angular momentum are below the conservative torque balance predictions from 40Hz to 200Hz. Assuming a broader range of accretion models, our direct limits on gravitational-wave amplitude delve into the relevant parameter space over a wide range of frequencies, to 500Hz or more.

preprint2023arXiv

Secure Semantic Communications: Fundamentals and Challenges

Semantic communication allows the receiver to know the intention instead of the bit information itself, which is an emerging technique to support real-time human-machine and machine-to-machine interactions for future wireless communications. In semantic communications, both transmitter and receiver share some common knowledge, which can be used to extract small-size information at the transmitter and recover the original information at the receiver. Due to different design purposes, security issues in semantic communications have two unique features compared to standard bit-wise communications. First, an attacker in semantic communications considers not only the amount of stolen data but also the meanings of stolen data. Second, an attacker in semantic communication systems can attack not only semantic information transmission as done in standard communication systems but also attacks machine learning (ML) models used for semantic information extraction since most of semantic information is generated using ML based methods. Due to these unique features, in this paper, we present an overview on the fundamentals and key challenges in the design of secure semantic communication. We first provide various methods to define and extract semantic information. Then, we focus on secure semantic communication techniques in two areas: information security and semantic ML model security. For each area, we identify the main problems and challenges. Then, we will provide a comprehensive treatment of these problems. In a nutshell,this article provides a holistic set of guidelines on how to design secure semantic communication systems over real-world wireless communication networks.

preprint2023arXiv

Self-similarity Driven Scale-invariant Learning for Weakly Supervised Person Search

Weakly supervised person search aims to jointly detect and match persons with only bounding box annotations. Existing approaches typically focus on improving the features by exploring relations of persons. However, scale variation problem is a more severe obstacle and under-studied that a person often owns images with different scales (resolutions). On the one hand, small-scale images contain less information of a person, thus affecting the accuracy of the generated pseudo labels. On the other hand, the similarity of cross-scale images is often smaller than that of images with the same scale for a person, which will increase the difficulty of matching. In this paper, we address this problem by proposing a novel one-step framework, named Self-similarity driven Scale-invariant Learning (SSL). Scale invariance can be explored based on the self-similarity prior that it shows the same statistical properties of an image at different scales. To this end, we introduce a Multi-scale Exemplar Branch to guide the network in concentrating on the foreground and learning scale-invariant features by hard exemplars mining. To enhance the discriminative power of the features in an unsupervised manner, we introduce a dynamic multi-label prediction which progressively seeks true labels for training. It is adaptable to different types of unlabeled data and serves as a compensation for clustering based strategy. Experiments on PRW and CUHK-SYSU databases demonstrate the effectiveness of our method.

preprint2023arXiv

Shape-programmable Adaptive Multi-material Microrobots for Biomedical Applications

Flagellated microorganisms can swim at low Reynolds numbers and adapt to changes in their environment. Specifically, the flagella can switch their shapes or modes through gene expression. In the past decade, efforts have been made to fabricate and investigate rigid types of microrobots without any adaptation to the environments. More recently, obtaining adaptive microrobots mimicking real microorganisms is getting more attention. However, even though some adaptive microrobots achieved by hydrogels have emerged, the swimming behaviors of the microrobots before and after the environment-induced deformations are not predicted in a systematic standardized way. In this work, experiments, finite element analysis, and dynamic modeling are presented together to realize a complete understanding of these adaptive microrobots. The above three parts are cross-verified proving the success of using such methods, facilitating the bio-applications with shape-programmable and even swimming performance-programmable microrobots. Moreover, an application of targeted object delivery using the proposed microrobot has been successfully demonstrated. Finally, cytotoxicity tests are performed to prove the potential for using the proposed microrobot for biomedical applications.

preprint2023arXiv

TI-CNN: Convolutional Neural Networks for Fake News Detection

With the development of social networks, fake news for various commercial and political purposes has been appearing in large numbers and gotten widespread in the online world. With deceptive words, people can get infected by the fake news very easily and will share them without any fact-checking. For instance, during the 2016 US president election, various kinds of fake news about the candidates widely spread through both official news media and the online social networks. These fake news is usually released to either smear the opponents or support the candidate on their side. The erroneous information in the fake news is usually written to motivate the voters' irrational emotion and enthusiasm. Such kinds of fake news sometimes can bring about devastating effects, and an important goal in improving the credibility of online social networks is to identify the fake news timely. In this paper, we propose to study the fake news detection problem. Automatic fake news identification is extremely hard, since pure model based fact-checking for news is still an open problem, and few existing models can be applied to solve the problem. With a thorough investigation of a fake news data, lots of useful explicit features are identified from both the text words and images used in the fake news. Besides the explicit features, there also exist some hidden patterns in the words and images used in fake news, which can be captured with a set of latent features extracted via the multiple convolutional layers in our model. A model named as TI-CNN (Text and Image information based Convolutinal Neural Network) is proposed in this paper. By projecting the explicit and latent features into a unified feature space, TI-CNN is trained with both the text and image information simultaneously. Extensive experiments carried on the real-world fake news datasets have demonstrate the effectiveness of TI-CNN.

preprint2023arXiv

Towards Net-Zero Carbon Emissions in Network AI for 6G and Beyond

A global effort has been initiated to reduce the worldwide greenhouse gas (GHG) emissions, primarily carbon emissions, by half by 2030 and reach net-zero by 2050. The development of 6G must also be compliant with this goal. Unfortunately, developing a sustainable and net-zero emission systems to meet the users' fast growing demands on mobile services, especially smart services and applications, may be much more challenging than expected. Particularly, despite the energy efficiency improvement in both hardware and software designs, the overall energy consumption and carbon emission of mobile networks are still increasing at a tremendous speed. The growing penetration of resource-demanding AI algorithms and solutions further exacerbate this challenge. In this article, we identify the major emission sources and introduce an evaluation framework for analyzing the lifecycle of network AI implementations. A novel joint dynamic energy trading and task allocation optimization framework, called DETA, has been introduced to reduce the overall carbon emissions. We consider a federated edge intelligence-based network AI system as a case study to verify the effectiveness of our proposed solution. Experimental results based on a hardware prototype suggest that our proposed solution can reduce carbon emissions of network AI systems by up to 74.9%. Finally, open problems and future directions are discussed.

preprint2023arXiv

VQNet 2.0: A New Generation Machine Learning Framework that Unifies Classical and Quantum

With the rapid development of classical and quantum machine learning, a large number of machine learning frameworks have been proposed. However, existing machine learning frameworks usually only focus on classical or quantum, rather than both. Therefore, based on VQNet 1.0, we further propose VQNet 2.0, a new generation of unified classical and quantum machine learning framework that supports hybrid optimization. The core library of the framework is implemented in C++, and the user level is implemented in Python, and it supports deployment on quantum and classical hardware. In this article, we analyze the development trend of the new generation machine learning framework and introduce the design principles of VQNet 2.0 in detail: unity, practicality, efficiency, and compatibility, as well as full particulars of implementation. We illustrate the functions of VQNet 2.0 through several basic applications, including classical convolutional neural networks, quantum autoencoders, hybrid classical-quantum networks, etc. After that, through extensive experiments, we demonstrate that the operation speed of VQNet 2.0 is higher than the comparison method. Finally, through extensive experiments, we demonstrate that VQNet 2.0 can deploy on different hardware platforms, the overall calculation speed is faster than the comparison method. It also can be mixed and optimized with quantum circuits composed of multiple quantum computing libraries.

preprint2022arXiv

A Bottom-Up End-User Intelligent Assistant Approach to Empower Gig Workers against AI Inequality

The growing inequality in gig work between workers and platforms has become a critical social issue as gig work plays an increasingly prominent role in the future of work. The AI inequality is caused by (1) the technology divide in who has access to AI technologies in gig work; and (2) the data divide in who owns the data in gig work leads to unfair working conditions, growing pay gap, neglect of workers' diverse preferences, and workers' lack of trust in the platforms. In this position paper, we argue that a bottom-up approach that empowers individual workers to access AI-enabled work planning support and share data among a group of workers through a network of end-user-programmable intelligent assistants is a practical way to bridge AI inequality in gig work under the current paradigm of privately owned platforms. This position paper articulates a set of research challenges, potential approaches, and community engagement opportunities, seeking to start a dialogue on this important research topic in the interdisciplinary CHIWORK community.

preprint2022arXiv

A Catalog of Molecular Clumps and Cores with Infall Signatures

The research of infall motion is a common means to study molecular cloud dynamics and the early process of star formation. Many works had been done in-depth research on infall. We searched the literature related to infall study of molecular cloud since 1994, summarized the infall sources identified by the authors. A total of 456 infall sources are catalogued. We classify them into high-mass and low-mass sources, in which the high-mass sources are divided into three evolutionary stages: prestellar, protostellar and HII region. We divide the sources into clumps and cores according to their sizes. The H$_2$ column density values range from 1.21$\times$ 10$^{21}$ to 9.75 $\times$ 10$^{24}$ cm$^{-2}$, with a median value of 4.17$\times$ 10$^{22}$ cm$^{-2}$. The H$_2$ column densities of high-mass and low-mass sources are significantly separated. The median value of infall velocity for high-mass clumps is 1.12 km s$^{-1}$, and the infall velocities of low-mass cores are virtually all less than 0.5 km s$^{-1}$. There is no obvious difference between different stages of evolution. The mass infall rates of low-mass cores are between 10$^{-7}$ and 10$^{-4}$ M$_{\odot} \text{yr}^{-1}$, and those of high-mass clumps are between 10$^{-4}$ and 10$^{-1}$ M$_{\odot} \text{yr}^{-1}$ with only one exception. We do not find that the mass infall rates vary with evolutionary stages.

preprint2022arXiv

A Coarse-to-fine Morphological Approach With Knowledge-based Rules and Self-adapting Correction for Lung Nodules Segmentation

The segmentation module which precisely outlines the nodules is a crucial step in a computer-aided diagnosis(CAD) system. The most challenging part of such a module is how to achieve high accuracy of the segmentation, especially for the juxtapleural, non-solid and small nodules. In this research, we present a coarse-to-fine methodology that greatly improves the thresholding method performance with a novel self-adapting correction algorithm and effectively removes noisy pixels with well-defined knowledge-based principles. Compared with recent strong morphological baselines, our algorithm, by combining dataset features, achieves state-of-the-art performance on both the public LIDC-IDRI dataset (DSC 0.699) and our private LC015 dataset (DSC 0.760) which closely approaches the SOTA deep learning-based models' performances. Furthermore, unlike most available morphological methods that can only segment the isolated and well-circumscribed nodules accurately, the precision of our method is totally independent of the nodule type or diameter, proving its applicability and generality.

preprint2022arXiv

Adaptable Semantic Compression and Resource Allocation for Task-Oriented Communications

Task-oriented communication is a new paradigm that aims at providing efficient connectivity for accomplishing intelligent tasks rather than the reception of every transmitted bit. In this paper, a deep learning-based task-oriented communication architecture is proposed where the user extracts, compresses and transmits semantics in an end-to-end (E2E) manner. Furthermore, an approach is proposed to compress the semantics according to their importance relevant to the task, namely, adaptable semantic compression (ASC). Assuming a delay-intolerant system, supporting multiple users indicates a problem that executing with the higher compression ratio requires fewer channel resources but leads to the distortion of semantics, while executing with the lower compression ratio requires more channel resources and thus may lead to a transmission failure due to delay constraint. To solve the problem, both compression ratio and resource allocation are optimized for the task-oriented communication system to maximize the success probability of tasks. Specifically, due to the nonconvexity of the problem, we propose a compression ratio and resource allocation (CRRA) algorithm by separating the problem into two subproblems and solving iteratively to obtain the convergent solution. Furthermore, considering the scenarios where users have various service levels, a compression ratio, resource allocation, and user selection (CRRAUS) algorithm is proposed to deal with the problem. In CRRAUS, users are adaptively selected to complete the corresponding intelligent tasks based on branch and bound method at the expense of higher algorithm complexity compared with CRRA. Simulation results show that the proposed CRRA and CRRAUS algorithms can obtain at least 15% and 10% success gains over baseline algorithms, respectively.

preprint2022arXiv

All-sky search for gravitational wave emission from scalar boson clouds around spinning black holes in LIGO O3 data

This paper describes the first all-sky search for long-duration, quasi-monochromatic gravitational-wave signals emitted by ultralight scalar boson clouds around spinning black holes using data from the third observing run of Advanced LIGO. We analyze the frequency range from 20~Hz to 610~Hz, over a small frequency derivative range around zero, and use multiple frequency resolutions to be robust towards possible signal frequency wanderings. Outliers from this search are followed up using two different methods, one more suitable for nearly monochromatic signals, and the other more robust towards frequency fluctuations. We do not find any evidence for such signals and set upper limits on the signal strain amplitude, the most stringent being $\approx10^{-25}$ at around 130~Hz. We interpret these upper limits as both an "exclusion region" in the boson mass/black hole mass plane and the maximum detectable distance for a given boson mass, based on an assumption of the age of the black hole/boson cloud system.

preprint2022arXiv

An Exploratory Study of Stock Price Movements from Earnings Calls

Financial market analysis has focused primarily on extracting signals from accounting, stock price, and other numerical hard data reported in P&L statements or earnings per share reports. Yet, it is well-known that the decision-makers routinely use soft text-based documents that interpret the hard data they narrate. Recent advances in computational methods for analyzing unstructured and soft text-based data at scale offer possibilities for understanding financial market behavior that could improve investments and market equity. A critical and ubiquitous form of soft data are earnings calls. Earnings calls are periodic (often quarterly) statements usually by CEOs who attempt to influence investors' expectations of a company's past and future performance. Here, we study the statistical relationship between earnings calls, company sales, stock performance, and analysts' recommendations. Our study covers a decade of observations with approximately 100,000 transcripts of earnings calls from 6,300 public companies from January 2010 to December 2019. In this study, we report three novel findings. First, the buy, sell and hold recommendations from professional analysts made prior to the earnings have low correlation with stock price movements after the earnings call. Second, using our graph neural network based method that processes the semantic features of earnings calls, we reliably and accurately predict stock price movements in five major areas of the economy. Third, the semantic features of transcripts are more predictive of stock price movements than sales and earnings per share, i.e., traditional hard data in most of the cases.

preprint2022arXiv

Attention-based Dual Supervised Decoder for RGBD Semantic Segmentation

Encoder-decoder models have been widely used in RGBD semantic segmentation, and most of them are designed via a two-stream network. In general, jointly reasoning the color and geometric information from RGBD is beneficial for semantic segmentation. However, most existing approaches fail to comprehensively utilize multimodal information in both the encoder and decoder. In this paper, we propose a novel attention-based dual supervised decoder for RGBD semantic segmentation. In the encoder, we design a simple yet effective attention-based multimodal fusion module to extract and fuse deeply multi-level paired complementary information. To learn more robust deep representations and rich multi-modal information, we introduce a dual-branch decoder to effectively leverage the correlations and complementary cues of different tasks. Extensive experiments on NYUDv2 and SUN-RGBD datasets demonstrate that our method achieves superior performance against the state-of-the-art methods.

preprint2022arXiv

Bandwidth and Power Allocation for Task-Oriented SemanticCommunication

Deep learning enabled semantic communication has been studied to improve communication efficiency while guaranteeing intelligent task performance. Different from conventional communications systems, the resource allocation in semantic communications no longer just pursues the bit transmission rate, but focuses on how to better compress and transmit semantic to complete subsequent intelligent tasks. This paper aims to appropriately allocate the bandwidth and power for artificial intelligence (AI) task-oriented semantic communication and proposes a joint compressiom ratio and resource allocation (CRRA) algorithm. We first analyze the relationship between the AI task's performance and the semantic information. Then, to optimize the AI task's perfomance under resource constraints, a bandwidth and power allocation problem is formulated. The problem is first separated into two subproblems due to the non-convexity. The first subproblem is a compression ratio optimization problem with a given resource allocation scheme, which is solved by a enumeration algorithm. The second subproblem is to find the optimal resource allocation scheme, which is transformed into a convex problem by successive convex approximation method, and solved by a convex optimization method. The optimal semantic compression ratio and resource allocation scheme are obtained by iteratively solving these two subproblems. Simulation results show that the proposed algorithm can efficiently improve the AI task's performance by up to 30\% comprared with baselines.

preprint2022arXiv

BcMON: Blockchain Middleware for Offline Networks

Blockchain is becoming a new generation of information infrastructures. However, the current blockchain solutions rely on a continuous connectivity network to query and modify the state of the blockchain. The emerging satellite technology seems to be a good catalyst to forward offline transactions to the blockchain. However, this approach suffers expensive costs, difficult interoperability, and limited computation problems. Therefore, we propose BcMON, the first blockchain middleware for offline networks. BcMON incorporates three innovative designs: 1) it reduces the costs of offline transactions accessing the blockchain through Short Message Service (SMS), 2) it validates the authenticity of offline cross-chain transactions by two-phase consensus, 3) it supports offline clients to perform complex queries and computations on the blockchains. The prototype of BcMON has been implemented to evaluate the performance of the proposed middleware, which can show its stability, efficiency, and scalability.

preprint2022arXiv

Capturing Evolution Genes for Time Series Data

The modeling of time series is becoming increasingly critical in a wide variety of applications. Overall, data evolves by following different patterns, which are generally caused by different user behaviors. Given a time series, we define the evolution gene to capture the latent user behaviors and to describe how the behaviors lead to the generation of time series. In particular, we propose a uniform framework that recognizes different evolution genes of segments by learning a classifier, and adopt an adversarial generator to implement the evolution gene by estimating the segments' distribution. Experimental results based on a synthetic dataset and five real-world datasets show that our approach can not only achieve a good prediction results (e.g., averagely +10.56% in terms of F1), but is also able to provide explanations of the results.

preprint2022arXiv

CHANG-ES XXIX: The Sub-kpc Nuclear Bubble of NGC 4438

AGN bubbles could play an important role in accelerating high-energy CRs and galactic feedback. Only in nearby galaxies could we have high enough angular resolution in multi-wavelengths to study the sub-kpc environment of the AGN, where the bubbles are produced and strongly interact with the surrounding ISM. In this paper, we present the latest Chandra observations of the Virgo cluster galaxy NGC 4438, which hosts multi-scale bubbles detected in various bands. The galaxy also has low current star formation activity, so these bubbles are evidently produced by the AGN rather than a starburst. We present spatially resolved spectral analysis of the Chandra data of the $\sim3^{\prime\prime}\times5^{\prime\prime}$ ($\sim200{\rm~pc}\times350\rm~pc$) nuclear bubble of NGC 4438. The power law tail in the X-ray spectra can be most naturally explained as synchrotron emission from high-energy CR leptons. The hot gas temperature increases, while the overall contribution of the non-thermal X-ray emission decreases with the vertical distance from the galactic plane. We calculate the synchrotron cooling timescale of the CR leptons responsible for the non-thermal hard X-ray emission to be only a few tens to a few hundreds of years. The thermal pressure of the hot gas is about three times the magnetic pressure, but the current data cannot rule out the possibility that they are still in pressure balance. The spatially resolved spectroscopy presented in this paper may have important constraints on how the AGN accelerates CRs and drives outflows. We also discover a transient X-ray source only $\sim5^{\prime\prime}$ from the nucleus of NGC 4438. The source was not detected in 2002 and 2008, but became quite X-ray bright in March 2020, with an average 0.5-7 keV luminosity of $\sim10^{39}\rm~ergs~s^{-1}$.

preprint2022arXiv

CHANG-ES. XXIV. First Detection of A Radio Nuclear Ring and Potential LLAGN in NGC 5792

We report the discoveries of a nuclear ring of diameter 10$\arcsec$ ($\sim$1.5 kpc) and a potential low luminosity active galactic nucleus (LLAGN) in the radio continuum emission map of the edge-on barred spiral galaxy NGC~5792. These discoveries are based on the Continuum Halos in Nearby Galaxies - an Expanded Very Large Array (VLA) Survey, as well as subsequent VLA observations of sub-arcsecond resolution. Using a mixture of H$α$ and 24 $μ$m calibration, we disentangle the thermal and non-thermal radio emission of the nuclear region, and derive a star formation rate (SFR) of $\sim 0.4~M_{\sun}$ yr$^{-1}$. We find that the nuclear ring is dominated by non-thermal synchrotron emission. The synchrotron-based SFR is about three times of the mixture-based SFR. This result indicates that the nuclear ring underwent more intense star-forming activity in the past, and now its star formation is in the low state. The sub-arcsecond VLA images resolve six individual knots on the nuclear ring. The equipartition magnetic field strength $B_{\rm eq}$ of the knots varies from 77 to 88 $μ$G. The radio ring surrounds a point-like faint radio core of $S_{\rm 6GHz}=(16\pm4)$ $μ$Jy with polarized lobes at the center of NGC~5792, which suggests an LLAGN with an Eddington ratio $\sim10^{-5}$. This radio nuclear ring is reminiscent of the Central Molecular Zone (CMZ) of the Galaxy. Both of them consist of a nuclear ring and LLAGN.

preprint2022arXiv

Completing Partial Point Clouds with Outliers by Collaborative Completion and Segmentation

Most existing point cloud completion methods are only applicable to partial point clouds without any noises and outliers, which does not always hold in practice. We propose in this paper an end-to-end network, named CS-Net, to complete the point clouds contaminated by noises or containing outliers. In our CS-Net, the completion and segmentation modules work collaboratively to promote each other, benefited from our specifically designed cascaded structure. With the help of segmentation, more clean point cloud is fed into the completion module. We design a novel completion decoder which harnesses the labels obtained by segmentation together with FPS to purify the point cloud and leverages KNN-grouping for better generation. The completion and segmentation modules work alternately share the useful information from each other to gradually improve the quality of prediction. To train our network, we build a dataset to simulate the real case where incomplete point clouds contain outliers. Our comprehensive experiments and comparisons against state-of-the-art completion methods demonstrate our superiority. We also compare with the scheme of segmentation followed by completion and their end-to-end fusion, which also proves our efficacy.

preprint2022arXiv

Conditional Generative Data-free Knowledge Distillation

Knowledge distillation has made remarkable achievements in model compression. However, most existing methods require the original training data, which is usually unavailable due to privacy and security issues. In this paper, we propose a conditional generative data-free knowledge distillation (CGDD) framework for training lightweight networks without any training data. This method realizes efficient knowledge distillation based on conditional image generation. Specifically, we treat the preset labels as ground truth to train a conditional generator in a semi-supervised manner. The trained generator can produce specified classes of training images. For training the student network, we force it to extract the knowledge hidden in teacher feature maps, which provide crucial cues for the learning process. Moreover, an adversarial training framework for promoting distillation performance is constructed by designing several loss functions. This framework helps the student model to explore larger data space. To demonstrate the effectiveness of the proposed method, we conduct extensive experiments on different datasets. Compared with other data-free works, our work obtains state-of-the-art results on CIFAR100, Caltech101, and different versions of ImageNet datasets. The codes will be released.

preprint2022arXiv

Constraints from LIGO O3 data on gravitational-wave emission due to r-modes in the glitching pulsar PSR J0537-6910

We present a search for continuous gravitational-wave emission due to r-modes in the pulsar PSR J0537-6910 using data from the LIGO-Virgo Collaboration observing run O3. PSR J0537-6910 is a young energetic X-ray pulsar and is the most frequent glitcher known. The inter-glitch braking index of the pulsar suggests that gravitational-wave emission due to r-mode oscillations may play an important role in the spin evolution of this pulsar. Theoretical models confirm this possibility and predict emission at a level that can be probed by ground-based detectors. In order to explore this scenario, we search for r-mode emission in the epochs between glitches by using a contemporaneous timing ephemeris obtained from NICER data. We do not detect any signals in the theoretically expected band of 86-97 Hz, and report upper limits on the amplitude of the gravitational waves. Our results improve on previous amplitude upper limits from r-modes in J0537-6910 by a factor of up to 3 and place stringent constraints on theoretical models for r-mode driven spin-down in PSR J0537-6910, especially for higher frequencies at which our results reach below the spin-down limit defined by energy conservation.

preprint2022arXiv

Continual Learning with Bayesian Model based on a Fixed Pre-trained Feature Extractor

Deep learning has shown its human-level performance in various applications. However, current deep learning models are characterised by catastrophic forgetting of old knowledge when learning new classes. This poses a challenge particularly in intelligent diagnosis systems where initially only training data of a limited number of diseases are available. In this case, updating the intelligent system with data of new diseases would inevitably downgrade its performance on previously learned diseases. Inspired by the process of learning new knowledge in human brains, we propose a Bayesian generative model for continual learning built on a fixed pre-trained feature extractor. In this model, knowledge of each old class can be compactly represented by a collection of statistical distributions, e.g. with Gaussian mixture models, and naturally kept from forgetting in continual learning over time. Unlike existing class-incremental learning methods, the proposed approach is not sensitive to the continual learning process and can be additionally well applied to the data-incremental learning scenario. Experiments on multiple medical and natural image classification tasks showed that the proposed approach outperforms state-of-the-art approaches which even keep some images of old classes during continual learning of new classes.

preprint2022arXiv

Continual Unsupervised Domain Adaptation for Semantic Segmentation using a Class-Specific Transfer

In recent years, there has been tremendous progress in the field of semantic segmentation. However, one remaining challenging problem is that segmentation models do not generalize to unseen domains. To overcome this problem, one either has to label lots of data covering the whole variety of domains, which is often infeasible in practice, or apply unsupervised domain adaptation (UDA), only requiring labeled source data. In this work, we focus on UDA and additionally address the case of adapting not only to a single domain, but to a sequence of target domains. This requires mechanisms preventing the model from forgetting its previously learned knowledge. To adapt a segmentation model to a target domain, we follow the idea of utilizing light-weight style transfer to convert the style of labeled source images into the style of the target domain, while retaining the source content. To mitigate the distributional shift between the source and the target domain, the model is fine-tuned on the transferred source images in a second step. Existing light-weight style transfer approaches relying on adaptive instance normalization (AdaIN) or Fourier transformation still lack performance and do not substantially improve upon common data augmentation, such as color jittering. The reason for this is that these methods do not focus on region- or class-specific differences, but mainly capture the most salient style. Therefore, we propose a simple and light-weight framework that incorporates two class-conditional AdaIN layers. To extract the class-specific target moments needed for the transfer layers, we use unfiltered pseudo-labels, which we show to be an effective approximation compared to real labels. We extensively validate our approach (CACE) on a synthetic sequence and further propose a challenging sequence consisting of real domains. CACE outperforms existing methods visually and quantitatively.

preprint2022arXiv

Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech

Modelling prosody variation is critical for synthesizing natural and expressive speech in end-to-end text-to-speech (TTS) systems. In this paper, a cross-utterance conditional VAE (CUC-VAE) is proposed to estimate a posterior probability distribution of the latent prosody features for each phoneme by conditioning on acoustic features, speaker information, and text features obtained from both past and future sentences. At inference time, instead of the standard Gaussian distribution used by VAE, CUC-VAE allows sampling from an utterance-specific prior distribution conditioned on cross-utterance information, which allows the prosody features generated by the TTS system to be related to the context and is more similar to how humans naturally produce prosody. The performance of CUC-VAE is evaluated via a qualitative listening test for naturalness, intelligibility and quantitative measurements, including word error rates and the standard deviation of prosody attributes. Experimental results on LJ-Speech and LibriTTS data show that the proposed CUC-VAE TTS system improves naturalness and prosody diversity with clear margins.

preprint2022arXiv

Deep Joint Source-Channel Coding Based on Semantics of Pixels

The semantic information of the image for intelligent tasks is hidden behind the pixels, and slight changes in the pixels will affect the performance of intelligent tasks. In order to preserve semantic information behind pixels for intelligent tasks during wireless image transmission, we propose a joint source-channel coding method based on semantics of pixels, which can improve the performance of intelligent tasks for images at the receiver by retaining semantic information. Specifically, we first utilize gradients of intelligent task's perception results with respect to pixels to represent the semantic importance of pixels. Then, we extract the semantic distortion, and train the deep joint source-channel coding network with the goal of minimizing semantic distortion rather than pixel's distortion. Experiment results demonstrate that the proposed method improves the performance of the intelligent classification task by 1.38% and 66% compared with the SOTA deep joint source-channel coding method and the traditional separately source-channel coding method at the same transmission ra te and signal-to-noise ratio.

preprint2022arXiv

Emergence of Self-dual Patterns in Active Colloids with Periodical Feedback to Local Density

The central task in the study of self-organization is to explore the general mechanism of emergences. However, this is inhibited by the missing of a full knowledge of the microscopic dynamics of emergence. Here, in this study, the microscopic dynamics of self-organization for patterns is investigated and quantified in a periodically propelled Quincke system. The periodical coupling between propulsion and repulsion at the particle level leads to local directed oscillating particle flows and promises a loop of positive feedback to density fluctuations. Nevertheless, the global evolution of the resulting cluster phase is dominated by a global dual transformation. As stable attractors of the dual transformation, self-dual patterns including stripe patterns and square lattices can be achieved by tuning the strength and the frequency of propelling. However, stripes are possible only at strong propelling where boundary particle flows can form. The findings in this study show that the dynamics of emergence on different length scales are controlled by different mechanisms. The competition and the interplay between different microscopic dynamic processes play the central role in determining the product of emergence. Moreover, the periodically oscillating self-dual patterns demonstrate a classical approach to time crystals.

preprint2022arXiv

First Identification of New X-Ray Spectra of Mo39+, Mo40+, W43+, W44+ and W45+ on EAST

New high-resolution x-ray spectra of Mo39+, Mo40+, W43+, W44+ and W45+ have been carefully confirmed for the first time by use of the x-ray imaging crystal spectrometer (XCS) in Experimental Advanced Superconducting Tokamak (EAST) under various combined auxiliary heating plasmas conditions. Wavelength of these new x-ray spectra is ranged from 3.895 Å to 3.986 Å. When core electron temperature (Te0) reaches 6.0 keV, Mo39+ and Mo40+ lines of 3.9727, 3.9294 and 3.9480 Å can be effectively detected on XCS for EAST; meanwhile, line-integrated brightness of these spectral lines of Mo39+ and Mo40+ is very considerable when electron temperature reaches 12.9 keV. Multi-components spectral lines for W43+, W44+ and W45+ have also been identified when Te0 reaches 6 keV. Parts of spectral lines, such as Zn-1, Cu-2, Cu-4a, Cu-4d and Cu-5 lines of tungsten, are first observed experimentally. When electron temperature reaches 12.9 keV, line-integrated intensity for part of these spectral lines of W43+, W44+ and W45+ are considerable. These experimental results and theoretical predictions from FAC and FLYCHK codes are in good general agreement. These new spectral lines, obtained on XCS for EAST, are vital for deeply uncovering the mechanisms of ion and electron thermal, high-Z impurity and momentum (anomalous) transport to achieve the advanced steady-state operation scenarios for ITER and CFETR.

preprint2022arXiv

First joint observation by the underground gravitational-wave detector, KAGRA, with GEO600

We report the results of the first joint observation of the KAGRA detector with GEO600. KAGRA is a cryogenic and underground gravitational-wave detector consisting of a laser interferometer with three-kilometer arms, and located in Kamioka, Gifu, Japan. GEO600 is a British--German laser interferometer with 600 m arms, and located near Hannover, Germany. GEO600 and KAGRA performed a joint observing run from April 7 to 20, 2020. We present the results of the joint analysis of the GEO--KAGRA data for transient gravitational-wave signals, including the coalescence of neutron-star binaries and generic unmodeled transients. We also perform dedicated searches for binary coalescence signals and generic transients associated with gamma-ray burst events observed during the joint run. No gravitational-wave events were identified. We evaluate the minimum detectable amplitude for various types of transient signals and the spacetime volume for which the network is sensitive to binary neutron-star coalescences. We also place lower limits on the distances to the gamma-ray bursts analysed based on the non-detection of an associated gravitational-wave signal for several signal models, including binary coalescences. These analyses demonstrate the feasibility and utility of KAGRA as a member of the global gravitational-wave detector network.

preprint2022arXiv

FORCE: A Framework of Rule-Based Conversational Recommender System

The conversational recommender systems (CRSs) have received extensive attention in recent years. However, most of the existing works focus on various deep learning models, which are largely limited by the requirement of large-scale human-annotated datasets. Such methods are not able to deal with the cold-start scenarios in industrial products. To alleviate the problem, we propose FORCE, a Framework Of Rule-based Conversational Recommender system that helps developers to quickly build CRS bots by simple configuration. We conduct experiments on two datasets in different languages and domains to verify its effectiveness and usability.

preprint2022arXiv

GWTC-2.1: Deep Extended Catalog of Compact Binary Coalescences Observed by LIGO and Virgo During the First Half of the Third Observing Run

The second Gravitational-Wave Transient Catalog reported on 39 compact binary coalescences observed by the Advanced LIGO and Advanced Virgo detectors between 1 April 2019 15:00 UTC and 1 October 2019 15:00 UTC. We present GWTC-2.1, which reports on a deeper list of candidate events observed over the same period. We analyze the final version of the strain data over this period with improved calibration and better subtraction of excess noise, which has been publicly released. We employ three matched-filter search pipelines for candidate identification, and estimate the astrophysical probability for each candidate event. While GWTC-2 used a false alarm rate threshold of 2 per year, we include in GWTC-2.1, 1201 candidates that pass a false alarm rate threshold of 2 per day. We calculate the source properties of a subset of 44 high-significance candidates that have an astrophysical probability greater than 0.5. Of these candidates, 36 have been reported in GWTC-2. If the 8 additional high-significance candidates presented here are astrophysical, the mass range of events that are unambiguously identified as binary black holes (both objects $\geq 3M_\odot$) is increased compared to GWTC-2, with total masses from $\sim 14 M_\odot$ for GW190924_021846 to $\sim 182 M_\odot$ for GW190426_190642. The primary components of two new candidate events (GW190403_051519 and GW190426_190642) fall in the mass gap predicted by pair instability supernova theory. We also expand the population of binaries with significantly asymmetric mass ratios reported in GWTC-2 by an additional two events (the mass ratio is less than $0.65$ and $0.44$ at $90\%$ probability for GW190403_051519 and GW190917_114630 respectively), and find that 2 of the 8 new events have effective inspiral spins $χ_\mathrm{eff} > 0$ (at $90\%$ credibility), while no binary is consistent with $χ_\mathrm{eff} < 0$ at the same significance.

preprint2022arXiv

Interactive Robotic Grasping with Attribute-Guided Disambiguation

Interactive robotic grasping using natural language is one of the most fundamental tasks in human-robot interaction. However, language can be a source of ambiguity, particularly when there are ambiguous visual or linguistic contents. This paper investigates the use of object attributes in disambiguation and develops an interactive grasping system capable of effectively resolving ambiguities via dialogues. Our approach first predicts target scores and attribute scores through vision-and-language grounding. To handle ambiguous objects and commands, we propose an attribute-guided formulation of the partially observable Markov decision process (Attr-POMDP) for disambiguation. The Attr-POMDP utilizes target and attribute scores as the observation model to calculate the expected return of an attribute-based (e.g., &#34;what is the color of the target, red or green?&#34;) or a pointing-based (e.g., &#34;do you mean this one?&#34;) question. Our disambiguation module runs in real time on a real robot, and the interactive grasping system achieves a 91.43\% selection accuracy in the real-robot experiments, outperforming several baselines by large margins.

preprint2022arXiv

Invertible Mask Network for Face Privacy-Preserving

Face privacy-preserving is one of the hotspots that arises dramatic interests of research. However, the existing face privacy-preserving methods aim at causing the missing of semantic information of face and cannot preserve the reusability of original facial information. To achieve the naturalness of the processed face and the recoverability of the original protected face, this paper proposes face privacy-preserving method based on Invertible &#34;Mask&#34; Network (IMN). In IMN, we introduce a Mask-net to generate &#34;Mask&#34; face firstly. Then, put the &#34;Mask&#34; face onto the protected face and generate the masked face, in which the masked face is indistinguishable from &#34;Mask&#34; face. Finally, &#34;Mask&#34; face can be put off from the masked face and obtain the recovered face to the authorized users, in which the recovered face is visually indistinguishable from the protected face. The experimental results show that the proposed method can not only effectively protect the privacy of the protected face, but also almost perfectly recover the protected face from the masked face.

preprint2022arXiv

Ion-Beam Radiation-Induced Eshelby Transformations: The Mean and Variance in Hydrostatic and Shear Residual Stresses

Ion beam plays a pivotal role in ion implantations and the fabrication of nanostructures. However, there lacks a quantitative model to describe the residual stresses associated with the ion-beam radiation. Radiation-induced residual stress/transformation strain have been mostly recognized in the hydrostatic sub strain space. Here, we use molecular dynamics (MD) simulations to show that the response of a material to irradiation is generally anisotropic that depends on the ion-beam direction, and should be described using tensorial quantities. We demonstrate that accelerator-based ion beam irradiation, combined with the intrinsic lattice anisotropy and externally induced anisotropy (such as anisotropic mechanical loadings), causes radiation-actuated shear transformation strains in addition to hydrostatic expansion. We map out these complex correlations for several materials. Radiation-induced defects are shown to be responsible for residual shear stresses in the manner of Eshelby inclusion transformation. We propose such tensorial response model should be considered for accurate nanoscale fabrication using ion-beam irradiation.

preprint2022arXiv

Learning Object Relations with Graph Neural Networks for Target-Driven Grasping in Dense Clutter

Robots in the real world frequently come across identical objects in dense clutter. When evaluating grasp poses in these scenarios, a target-driven grasping system requires knowledge of spatial relations between scene objects (e.g., proximity, adjacency, and occlusions). To efficiently complete this task, we propose a target-driven grasping system that simultaneously considers object relations and predicts 6-DoF grasp poses. A densely cluttered scene is first formulated as a grasp graph with nodes representing object geometries in the grasp coordinate frame and edges indicating spatial relations between the objects. We design a Grasp Graph Neural Network (G2N2) that evaluates the grasp graph and finds the most feasible 6-DoF grasp pose for a target object. Additionally, we develop a shape completion-assisted grasp pose sampling method that improves sample quality and consequently grasping efficiency. We compare our method against several baselines in both simulated and real settings. In real-world experiments with novel objects, our approach achieves a 77.78% grasping accuracy in densely cluttered scenarios, surpassing the best-performing baseline by more than 15%. Supplementary material is available at https://sites.google.com/umn.edu/graph-grasping.

preprint2022arXiv

Local collective dynamics at equilibrium BCC crystal-melt interfaces

We present a classical molecular-dynamics study of the collective dynamical properties of the coexisting liquid phase at equilibrium body-centered cubic (BCC) Fe crystal-melt interfaces. For the three interfacial orientations (100), (110), and (111), the collective dynamics are characterized through the calculation of the intermediate scattering functions, dynamical structure factors and density relaxation times in a sequential local region of interest. An anisotropic speed up of the collective dynamics in all three BCC crystal-melt interfacial orientations is observed. This trend differs significantly different from the previously observed slowing down of the local collective dynamics at the liquid-vapor interface [Acta Mater 2020;198:281]. Examining the interfacial density relaxation times, we revisit the validity of the recently developed time-dependent Ginzburg-Landau (TDGL) theory for the solidification crystal-melt interface kinetic coefficients, resulting in excellent agreement with both the magnitude and the kinetic anisotropy of the CMI kinetic coefficients measured from the non-equilibrium MD simulations

preprint2022arXiv

LPCSE: Neural Speech Enhancement through Linear Predictive Coding

The increasingly stringent requirement on quality-of-experience in 5G/B5G communication systems has led to the emerging neural speech enhancement techniques, which however have been developed in isolation from the existing expert-rule based models of speech pronunciation and distortion, such as the classic Linear Predictive Coding (LPC) speech model because it is difficult to integrate the models with auto-differentiable machine learning frameworks. In this paper, to improve the efficiency of neural speech enhancement, we introduce an LPC-based speech enhancement (LPCSE) architecture, which leverages the strong inductive biases in the LPC speech model in conjunction with the expressive power of neural networks. Differentiable end-to-end learning is achieved in LPCSE via two novel blocks: a block that utilizes the expert rules to reduce the computational overhead when integrating the LPC speech model into neural networks, and a block that ensures the stability of the model and avoids exploding gradients in end-to-end training by mapping the Linear prediction coefficients to the filter poles. The experimental results show that LPCSE successfully restores the formants of the speeches distorted by transmission loss, and outperforms two existing neural speech enhancement methods of comparable neural network sizes in terms of the Perceptual evaluation of speech quality (PESQ) and Short-Time Objective Intelligibility (STOI) on the LJ Speech corpus.

preprint2022arXiv

Magnetic properties of equiatomic CrMnFeCoNi

Magnetic, specific heat, and structural properties of the equiatomic Cantor alloy system are reported for temperatures between 5 kelvin and 300 kelvin, and up to fields of 70 kilo-oersted. Magnetization measurements performed on as-cast, annealed, and cold-worked samples reveal a strong processing history dependence and that high-temperature annealing after cold-working does not restore the alloy to a pristine state. Measurements on known precipitates show that the two transitions, detected at 43 kelvin and 85 kelvin, are intrinsic to the Cantor alloy and not the result of an impurity phase. Experimental and ab initio density functional theory (DFT) computational results suggest that these transitions are a weak ferrimagnetic transition and a spin-glass-like transition, respectively, and magnetic and specific heat measurements provide evidence of significant Stoner enhancement and electron-electron interactions within the material.

preprint2022arXiv

Medium Transmission Map Matters for Learning to Restore Real-World Underwater Images

Underwater visual perception is essentially important for underwater exploration, archeology, ecosystem and so on. The low illumination, light reflections, scattering, absorption and suspended particles inevitably lead to the critically degraded underwater image quality, which causes great challenges on recognizing the objects from the underwater images. The existing underwater enhancement methods that aim to promote the underwater visibility, heavily suffer from the poor image restoration performance and generalization ability. To reduce the difficulty of underwater image enhancement, we introduce the media transmission map as guidance to assist in image enhancement. We formulate the interaction between the underwater visual images and the transmission map to obtain better enhancement results. Even with simple and lightweight network configuration, the proposed method can achieve advanced results of 22.6 dB on the challenging Test-R90 with an impressive 30 times faster than the existing models. Comprehensive experimental results have demonstrated the superiority and potential on underwater perception. Paper&#39;s code is offered on: https://github.com/GroupG-yk/MTUR-Net.

preprint2022arXiv

MHSnet: Multi-head and Spatial Attention Network with False-Positive Reduction for Pulmonary Nodules Detection

The mortality of lung cancer has ranked high among cancers for many years. Early detection of lung cancer is critical for disease prevention, cure, and mortality rate reduction. However, existing detection methods on pulmonary nodules introduce an excessive number of false positive proposals in order to achieve high sensitivity, which is not practical in clinical situations. In this paper, we propose the multi-head detection and spatial squeeze-and-attention network, MHSnet, to detect pulmonary nodules, in order to aid doctors in the early diagnosis of lung cancers. Specifically, we first introduce multi-head detectors and skip connections to customize for the variety of nodules in sizes, shapes and types and capture multi-scale features. Then, we implement a spatial attention module to enable the network to focus on different regions differently inspired by how experienced clinicians screen CT images, which results in fewer false positive proposals. Lastly, we present a lightweight but effective false positive reduction module with the Linear Regression model to cut down the number of false positive proposals, without any constraints on the front network. Extensive experimental results compared with the state-of-the-art models have shown the superiority of the MHSnet in terms of the average FROC, sensitivity and especially false discovery rate (2.98% and 2.18% improvement in terms of average FROC and sensitivity, 5.62% and 28.33% decrease in terms of false discovery rate and average candidates per scan). The false positive reduction module significantly decreases the average number of candidates generated per scan by 68.11% and the false discovery rate by 13.48%, which is promising to reduce distracted proposals for the downstream tasks based on the detection results.

preprint2022arXiv

MobileCodec: Neural Inter-frame Video Compression on Mobile Devices

Realizing the potential of neural video codecs on mobile devices is a big technological challenge due to the computational complexity of deep networks and the power-constrained mobile hardware. We demonstrate practical feasibility by leveraging Qualcomm&#39;s technology and innovation, bridging the gap from neural network-based codec simulations running on wall-powered workstations, to real-time operation on a mobile device powered by Snapdragon technology. We show the first-ever inter-frame neural video decoder running on a commercial mobile phone, decoding high-definition videos in real-time while maintaining a low bitrate and high visual quality.

preprint2022arXiv

Modeling Population Human Mobility with Dynamic Mode Decomposition

Human mobility research concerns spatiotemporal individual and population movement. Accurate modeling and prediction of human mobility can provide opportunities to monitor, manage and optimize human movement for improved social-economic benefit. In this paper, we adopt the dynamic mode decomposition algorithm to model population human mobility using visitor flow data between different states in the United States from 2019 to 2021 [1]. We train multiple DMD models with different low rank structures, and evaluate their modeling accuracy and predictability on novel testing data.

preprint2022arXiv

Multi-Agent Feedback Enabled Neural Networks for Intelligent Communications

In the intelligent communication field, deep learning (DL) has attracted much attention due to its strong fitting ability and data-driven learning capability. Compared with the typical DL feedforward network structures, an enhancement structure with direct data feedback have been studied and proved to have better performance than the feedfoward networks. However, due to the above simple feedback methods lack sufficient analysis and learning ability on the feedback data, it is inadequate to deal with more complicated nonlinear systems and therefore the performance is limited for further improvement. In this paper, a novel multi-agent feedback enabled neural network (MAFENN) framework is proposed, which make the framework have stronger feedback learning capabilities and more intelligence on feature abstraction, denoising or generation, etc. Furthermore, the MAFENN framework is theoretically formulated into a three-player Feedback Stackelberg game, and the game is proved to converge to the Feedback Stackelberg equilibrium. The design of MAFENN framework and algorithm are dedicated to enhance the learning capability of the feedfoward DL networks or their variations with the simple data feedback. To verify the MAFENN framework&#39;s feasibility in wireless communications, a multi-agent MAFENN based equalizer (MAFENN-E) is developed for wireless fading channels with inter-symbol interference (ISI). Experimental results show that when the quadrature phase-shift keying (QPSK) modulation scheme is adopted, the SER performance of our proposed method outperforms that of the traditional equalizers by about 2 dB in linear channels. When in nonlinear channels, the SER performance of our proposed method outperforms that of either traditional or DL based equalizers more significantly, which shows the effectiveness and robustness of our proposal in the complex channel environment.

preprint2022arXiv

Multi-View Clustering for Open Knowledge Base Canonicalization

Open information extraction (OIE) methods extract plenty of OIE triples <noun phrase, relation phrase, noun phrase> from unstructured text, which compose large open knowledge bases (OKBs). Noun phrases and relation phrases in such OKBs are not canonicalized, which leads to scattered and redundant facts. It is found that two views of knowledge (i.e., a fact view based on the fact triple and a context view based on the fact triple&#39;s source context) provide complementary information that is vital to the task of OKB canonicalization, which clusters synonymous noun phrases and relation phrases into the same group and assigns them unique identifiers. However, these two views of knowledge have so far been leveraged in isolation by existing works. In this paper, we propose CMVC, a novel unsupervised framework that leverages these two views of knowledge jointly for canonicalizing OKBs without the need of manually annotated labels. To achieve this goal, we propose a multi-view CH K-Means clustering algorithm to mutually reinforce the clustering of view-specific embeddings learned from each view by considering their different clustering qualities. In order to further enhance the canonicalization performance, we propose a training data optimization strategy in terms of data quantity and data quality respectively in each particular view to refine the learned view-specific embeddings in an iterative manner. Additionally, we propose a Log-Jump algorithm to predict the optimal number of clusters in a data-driven way without requiring any labels. We demonstrate the superiority of our framework through extensive experiments on multiple real-world OKB data sets against state-of-the-art methods.

preprint2022arXiv

Multibeam Satellite Communications with Energy Efficiency Optimization

Energy efficiency (EE) is an important aspect of satellite communications. Different with the existing algorithms that typically use the first-order Taylor lower bound approximation to convert non-convex EE maximization (EEM) problems into convex ones, in this letter a two-step quadratic transformation method is presented. In the first step, the fractional form of the achievable rate over the total power consumption is converted into a non-fractional form based on quadratic transformation. In the second step, the fractional form of the signal power over the interference-and-noise power is further converted into a non-fractional form, still based on quadratic transformation. After the two-step quadratic transformation, the original EEM problem is converted into an equivalent convex one. Then an alternating optimization algorithm is presented to solve it by iteratively performing two stages until a stop condition is satisfied. Simulation results show that the presented algorithm can fast converge and its performance is better than that of the sequential convex approximation algorithm and the multibeam interference mitigation algorithm.

preprint2022arXiv

Narrowband searches for continuous and long-duration transient gravitational waves from known pulsars in the LIGO-Virgo third observing run

Isolated neutron stars that are asymmetric with respect to their spin axis are possible sources of detectable continuous gravitational waves. This paper presents a fully-coherent search for such signals from eighteen pulsars in data from LIGO and Virgo&#39;s third observing run (O3). For known pulsars, efficient and sensitive matched-filter searches can be carried out if one assumes the gravitational radiation is phase-locked to the electromagnetic emission. In the search presented here, we relax this assumption and allow the frequency and frequency time-derivative of the gravitational waves to vary in a small range around those inferred from electromagnetic observations. We find no evidence for continuous gravitational waves, and set upper limits on the strain amplitude for each target. These limits are more constraining for seven of the targets than the spin-down limit defined by ascribing all rotational energy loss to gravitational radiation. In an additional search we look in O3 data for long-duration (hours-months) transient gravitational waves in the aftermath of pulsar glitches for six targets with a total of nine glitches. We report two marginal outliers from this search, but find no clear evidence for such emission either. The resulting duration-dependent strain upper limits do not surpass indirect energy constraints for any of these targets.

preprint2022arXiv

Observing hyperfine interactions of NV centers in diamond in an advanced quantum teaching lab

The negatively charged nitrogen-vacancy (NV$^-$) center in diamond is a model quantum system for university teaching labs due to its room-temperature compatibility and cost-effective operation. Based on the low-cost experimental setup that we have developed and described for the coherent control of the electronic spin (Sewani et al.), we introduce and explain here a number of more advanced experiments that probe the electron-nuclear interaction between the \nv electronic and the \NN~and \CC~nuclear spins. Optically-detected magnetic resonance (ODMR), Rabi oscillations, Ramsey fringe experiments, and Hahn echo sequences are implemented to demonstrate how the nuclear spins interact with the electron spins. Most experiments only require 15 minutes of measurement time and can, therefore, be completed within one teaching lab.

preprint2022arXiv

OpenMedIA: Open-Source Medical Image Analysis Toolbox and Benchmark under Heterogeneous AI Computing Platforms

In this paper, we present OpenMedIA, an open-source toolbox library containing a rich set of deep learning methods for medical image analysis under heterogeneous Artificial Intelligence (AI) computing platforms. Various medical image analysis methods, including 2D/3D medical image classification, segmentation, localisation, and detection, have been included in the toolbox with PyTorch and/or MindSpore implementations under heterogeneous NVIDIA and Huawei Ascend computing systems. To our best knowledge, OpenMedIA is the first open-source algorithm library providing compared PyTorch and MindSpore implementations and results on several benchmark datasets. The source codes and models are available at https://git.openi.org.cn/OpenMedIA.

preprint2022arXiv

Positioning Using Visible Light Communications: A Perspective Arcs Approach

Visible light positioning (VLP) is an accurate indoor positioning technology that uses luminaires as transmitters. In particular, circular luminaires are a common source type for VLP, that are typically treated only as point sources for positioning, while ignoring their geometry characteristics. In this paper, the arc feature of the circular luminaire and the coordinate information obtained via visible light communication (VLC) are jointly used for VLC-enabled indoor positioning, and a novel perspective arcs approach is proposed. The proposed approach does not rely on any inertial measurement unit, and has no tilted angle limitations at the user. First, a VLC assisted perspective circle and arc algorithm (V-PCA) is proposed for a scenario in which a complete luminaire and an incomplete one can be captured by the user. Considering the cases in which parts of VLC links are blocked, an anti-occlusion VLC assisted perspective arcs algorithm (OA-V-PA) is proposed. Simulation results show that the proposed indoor positioning algorithm can achieve a 95th percentile positioning accuracy of around 10 cm. Moreover, an experimental prototype based on mobile phone is implemented, in which, a fused image processing method is proposed. Experimental results show that the average positioning accuracy is less than 5 cm.

preprint2022arXiv

Privacy-aware Early Detection of COVID-19 through Adversarial Training

Early detection of COVID-19 is an ongoing area of research that can help with triage, monitoring and general health assessment of potential patients and may reduce operational strain on hospitals that cope with the coronavirus pandemic. Different machine learning techniques have been used in the literature to detect coronavirus using routine clinical data (blood tests, and vital signs). Data breaches and information leakage when using these models can bring reputational damage and cause legal issues for hospitals. In spite of this, protecting healthcare models against leakage of potentially sensitive information is an understudied research area. In this work, we examine two machine learning approaches, intended to predict a patient&#39;s COVID-19 status using routinely collected and readily available clinical data. We employ adversarial training to explore robust deep learning architectures that protect attributes related to demographic information about the patients. The two models we examine in this work are intended to preserve sensitive information against adversarial attacks and information leakage. In a series of experiments using datasets from the Oxford University Hospitals, Bedfordshire Hospitals NHS Foundation Trust, University Hospitals Birmingham NHS Foundation Trust, and Portsmouth Hospitals University NHS Trust we train and test two neural networks that predict PCR test results using information from basic laboratory blood tests, and vital signs performed on a patients&#39; arrival to hospital. We assess the level of privacy each one of the models can provide and show the efficacy and robustness of our proposed architectures against a comparable baseline. One of our main contributions is that we specifically target the development of effective COVID-19 detection models with built-in mechanisms in order to selectively protect sensitive attributes against adversarial attacks.

preprint2022arXiv

Real-time Rail Recognition Based on 3D Point Clouds

Accurate rail location is a crucial part in the railway support driving system for safety monitoring. LiDAR can obtain point clouds that carry 3D information for the railway environment, especially in darkness and terrible weather conditions. In this paper, a real-time rail recognition method based on 3D point clouds is proposed to solve the challenges, such as disorderly, uneven density and large volume of the point clouds. A voxel down-sampling method is first presented for density balanced of railway point clouds, and pyramid partition is designed to divide the 3D scanning area into the voxels with different volumes. Then, a feature encoding module is developed to find the nearest neighbor points and to aggregate their local geometric features for the center point. Finally, a multi-scale neural network is proposed to generate the prediction results of each voxel and the rail location. The experiments are conducted under 9 sequences of 3D point cloud data for the railway. The results show that the method has good performance in detecting straight, curved and other complex topologies rails.

preprint2022arXiv

Rethinking the Misalignment Problem in Dense Object Detection

Object detection aims to localize and classify the objects in a given image, and these two tasks are sensitive to different object regions. Therefore, some locations predict high-quality bounding boxes but low classification scores, and some locations are quite the opposite. A misalignment exists between the two tasks, and their features are spatially entangled. In order to solve the misalignment problem, we propose a plug-in Spatial-disentangled and Task-aligned operator (SALT). By predicting two task-aware point sets that are located in each task&#39;s sensitive regions, SALT can reassign features from those regions and align them to the corresponding anchor point. Therefore, features for the two tasks are spatially aligned and disentangled. To minimize the difference between the two regression stages, we propose a Self-distillation regression (SDR) loss that can transfer knowledge from the refined regression results to the coarse regression results. On the basis of SALT and SDR loss, we propose SALT-Net, which explicitly exploits task-aligned point-set features for accurate detection results. Extensive experiments on the MS-COCO dataset show that our proposed methods can consistently boost different state-of-the-art dense detectors by $\sim$2 AP. Notably, SALT-Net with Res2Net-101-DCN backbone achieves 53.8 AP on the MS-COCO test-dev.

preprint2022arXiv

Rethinking the Value of Gazetteer in Chinese Named Entity Recognition

Gazetteer is widely used in Chinese named entity recognition (NER) to enhance span boundary detection and type classification. However, to further understand the generalizability and effectiveness of gazetteers, the NLP community still lacks a systematic analysis of the gazetteer-enhanced NER model. In this paper, we first re-examine the effectiveness several common practices of the gazetteer-enhanced NER models and carry out a series of detailed analysis to evaluate the relationship between the model performance and the gazetteer characteristics, which can guide us to build a more suitable gazetteer. The findings of this paper are as follows: (1) the gazetteer improves most of the situations that the traditional NER model datasets are difficult to learn. (2) the performance of model greatly benefits from the high-quality pre-trained lexeme embeddings. (3) a good gazetteer should cover more entities that can be matched in both the training set and testing set.

preprint2022arXiv

RGB Image Classification with Quantum Convolutional Ansaetze

With the rapid growth of qubit numbers and coherence times in quantum hardware technology, implementing shallow neural networks on the so-called Noisy Intermediate-Scale Quantum (NISQ) devices has attracted a lot of interest. Many quantum (convolutional) circuit ansaetze are proposed for grayscale images classification tasks with promising empirical results. However, when applying these ansaetze on RGB images, the intra-channel information that is useful for vision tasks is not extracted effectively. In this paper, we propose two types of quantum circuit ansaetze to simulate convolution operations on RGB images, which differ in the way how inter-channel and intra-channel information are extracted. To the best of our knowledge, this is the first work of a quantum convolutional circuit to deal with RGB images effectively, with a higher test accuracy compared to the purely classical CNNs. We also investigate the relationship between the size of quantum circuit ansatz and the learnability of the hybrid quantum-classical convolutional neural network. Through experiments based on CIFAR-10 and MNIST datasets, we demonstrate that a larger size of the quantum circuit ansatz improves predictive performance in multiclass classification tasks, providing useful insights for near term quantum algorithm developments.

preprint2022arXiv

Search for anisotropic gravitational-wave backgrounds using data from Advanced LIGO and Advanced Virgo&#39;s first three observing runs

We report results from searches for anisotropic stochastic gravitational-wave backgrounds using data from the first three observing runs of the Advanced LIGO and Advanced Virgo detectors. For the first time, we include Virgo data in our analysis and run our search with a new efficient pipeline called {\tt PyStoch} on data folded over one sidereal day. We use gravitational-wave radiometry (broadband and narrow band) to produce sky maps of stochastic gravitational-wave backgrounds and to search for gravitational waves from point sources. A spherical harmonic decomposition method is employed to look for gravitational-wave emission from spatially-extended sources. Neither technique found evidence of gravitational-wave signals. Hence we derive 95\% confidence-level upper limit sky maps on the gravitational-wave energy flux from broadband point sources, ranging from $F_{α, Θ} < {\rm (0.013 - 7.6)} \times 10^{-8} {\rm erg \, cm^{-2} \, s^{-1} \, Hz^{-1}},$ and on the (normalized) gravitational-wave energy density spectrum from extended sources, ranging from $Ω_{α, Θ} < {\rm (0.57 - 9.3)} \times 10^{-9} \, {\rm sr^{-1}}$, depending on direction ($Θ$) and spectral index ($α$). These limits improve upon previous limits by factors of $2.9 - 3.5$. We also set 95\% confidence level upper limits on the frequency-dependent strain amplitudes of quasimonochromatic gravitational waves coming from three interesting targets, Scorpius X-1, SN 1987A and the Galactic Center, with best upper limits range from $h_0 < {\rm (1.7-2.1)} \times 10^{-25},$ a factor of $\geq 2.0$ improvement compared to previous stochastic radiometer searches.

preprint2022arXiv

Search for continuous gravitational wave emission from the Milky Way center in O3 LIGO--Virgo data

We present a directed search for continuous gravitational wave (CW) signals emitted by spinning neutron stars located in the inner parsecs of the Galactic Center (GC). Compelling evidence for the presence of a numerous population of neutron stars has been reported in the literature, turning this region into a very interesting place to look for CWs. In this search, data from the full O3 LIGO--Virgo run in the detector frequency band $[10,2000]\rm~Hz$ have been used. No significant detection was found and 95$\%$ confidence level upper limits on the signal strain amplitude were computed, over the full search band, with the deepest limit of about $7.6\times 10^{-26}$ at $\simeq 142\rm~Hz$. These results are significantly more constraining than those reported in previous searches. We use these limits to put constraints on the fiducial neutron star ellipticity and r-mode amplitude. These limits can be also translated into constraints in the black hole mass -- boson mass plane for a hypothetical population of boson clouds around spinning black holes located in the GC.

preprint2022arXiv

Search for continuous gravitational waves from 20 accreting millisecond X-ray pulsars in O3 LIGO data

Results are presented of searches for continuous gravitational waves from 20 accreting millisecond X-ray pulsars with accurately measured spin frequencies and orbital parameters, using data from the third observing run of the Advanced LIGO and Advanced Virgo detectors. The search algorithm uses a hidden Markov model, where the transition probabilities allow the frequency to wander according to an unbiased random walk, while the $\mathcal{J}$-statistic maximum-likelihood matched filter tracks the binary orbital phase. Three narrow sub-bands are searched for each target, centered on harmonics of the measured spin frequency. The search yields 16 candidates, consistent with a false alarm probability of 30% per sub-band and target searched. These candidates, along with one candidate from an additional target-of-opportunity search done for SAX J1808.4$-$3658, which was in outburst during one month of the observing run, cannot be confidently associated with a known noise source. Additional follow-up does not provide convincing evidence that any are a true astrophysical signal. When all candidates are assumed non-astrophysical, upper limits are set on the maximum wave strain detectable at 95% confidence, $h_0^{95\%}$. The strictest constraint is $h_0^{95\%} = 4.7\times 10^{-26}$ from IGR J17062$-$6143. Constraints on the detectable wave strain from each target lead to constraints on neutron star ellipticity and $r$-mode amplitude, the strictest of which are $ε^{95\%} = 3.1\times 10^{-7}$ and $α^{95\%} = 1.8\times 10^{-5}$ respectively. This analysis is the most comprehensive and sensitive search of continuous gravitational waves from accreting millisecond X-ray pulsars to date.

preprint2022arXiv

Search of the Early O3 LIGO Data for Continuous Gravitational Waves from the Cassiopeia A and Vela Jr. Supernova Remnants

We present directed searches for continuous gravitational waves from the neutron stars in the Cassiopeia A (Cas A) and Vela Jr. supernova remnants. We carry out the searches in the LIGO data from the first six months of the third Advanced LIGO and Virgo observing run, using the Weave semi-coherent method, which sums matched-filter detection-statistic values over many time segments spanning the observation period. No gravitational wave signal is detected in the search band of 20--976 Hz for assumed source ages greater than 300 years for Cas A and greater than 700 years for Vela Jr. Estimates from simulated continuous wave signals indicate we achieve the most sensitive results to date across the explored parameter space volume, probing to strain magnitudes as low as ~$6.3\times10^{-26}$ for Cas A and ~$5.6\times10^{-26}$ for Vela Jr. at frequencies near 166 Hz at 95% efficiency.

preprint2022arXiv

Searches for Gravitational Waves from Known Pulsars at Two Harmonics in the Second and Third LIGO-Virgo Observing Runs

We present a targeted search for continuous gravitational waves (GWs) from 236 pulsars using data from the third observing run of LIGO and Virgo (O3) combined with data from the second observing run (O2). Searches were for emission from the $l=m=2$ mass quadrupole mode with a frequency at only twice the pulsar rotation frequency (single harmonic) and the $l=2, m=1,2$ modes with a frequency of both once and twice the rotation frequency (dual harmonic). No evidence of GWs was found so we present 95\% credible upper limits on the strain amplitudes $h_0$ for the single harmonic search along with limits on the pulsars&#39; mass quadrupole moments $Q_{22}$ and ellipticities $\varepsilon$. Of the pulsars studied, 23 have strain amplitudes that are lower than the limits calculated from their electromagnetically measured spin-down rates. These pulsars include the millisecond pulsars J0437\textminus4715 and J0711\textminus6830 which have spin-down ratios of 0.87 and 0.57 respectively. For nine pulsars, their spin-down limits have been surpassed for the first time. For the Crab and Vela pulsars our limits are factors of $\sim 100$ and $\sim 20$ more constraining than their spin-down limits, respectively. For the dual harmonic searches, new limits are placed on the strain amplitudes $C_{21}$ and $C_{22}$. For 23 pulsars we also present limits on the emission amplitude assuming dipole radiation as predicted by Brans-Dicke theory.

preprint2022arXiv

Semantic-assisted image compression

Conventional image compression methods typically aim at pixel-level consistency while ignoring the performance of downstream AI tasks.To solve this problem, this paper proposes a Semantic-Assisted Image Compression method (SAIC), which can maintain semantic-level consistency to enable high performance of downstream AI tasks.To this end, we train the compression network using semantic-level loss function. In particular, semantic-level loss is measured using gradient-based semantic weights mechanism (GSW). GSW directly consider downstream AI tasks&#39; perceptual results. Then, this paper proposes a semantic-level distortion evaluation metric to quantify the amount of semantic information retained during the compression process. Experimental results show that the proposed SAIC method can retain more semantic-level information and achieve better performance of downstream AI tasks compared to the traditional deep learning-based method and the advanced perceptual method at the same compression ratio.

preprint2022arXiv

Signatures of non-Loudon-Fleury Raman scattering in the Kitaev magnet $β$-Li$_2$IrO$_3$

We investigate the magnetic excitations of the hyperhoneycomb Kitaev magnet $β$-$\text{Li}_2\text{IrO}_3$ by means of inelastic Raman scattering. The spectra exhibits a coexistence of a broad scattering continuum and two sharp low-energy peaks at 2.5 meV and 3 meV, with a distinctive polarization dependence. While the continuum is suggestive of fractional quasi-particles emerging from a proximate quantum spin liquid phase, the sharp peaks provide the first experimental signature of the `non-Loudon-Fleury&#39; one-magnon scattering processes proposed recently [Phys. Rev. B 104, 144412 (2021)]. The corresponding microscopic mechanism is similar to the one leading to the symmetric off-diagonal exchange interaction $Γ$ (as it involves a combination of both direct and ligand-mediated exchange paths), but is otherwise completely unexpected within the traditional Loudon-Fleury theory of Raman scattering. The present experimental verification therefore calls for a drastic reevaluation of Raman scattering in similar systems with strong spin orbit coupling and multiple exchange paths.

preprint2022arXiv

Symmetrical Z-Complementary Code Sets (SZCCSs) for Optimal Training in Generalized Spatial Modulation

This paper introduces a novel class of code sets, called &#34;symmetrical Z-complementary code sets (SZCCSs)&#34; , whose aperiodic auto- and cross- correlation sums exhibit zero-correlation zones (ZCZs) at both the front-end and tail-end of the entire correlation window. Three constructions of (optimal) SZCCSs based on general Boolean functions are presented. As a second major contribution, we apply SZCCSs to design optimal training sequences for broadband generalized spatial modulation (GSM) systems over frequency-selective channels. Key words: Complementary code set, channel estimation, training sequence design, generalized spatial modulation, frequency-selective channels.

preprint2022arXiv

Task Offloading with Multi-Tier Computing Resources in Next Generation Wireless Networks

With the development of next-generation wireless networks, the Internet of Things (IoT) is evolving towards the intelligent IoT (iIoT), where intelligent applications usually have stringent delay and jitter requirements. In order to provide low-latency services to heterogeneous users in the emerging iIoT, multi-tier computing was proposed by effectively combining edge computing and fog computing. More specifically, multi-tier computing systems compensate for cloud computing through task offloading and dispersing computing tasks to multi-tier nodes along the continuum from the cloud to things. In this paper, we investigate key techniques and directions for wireless communications and resource allocation approaches to enable task offloading in multi-tier computing systems. A multi-tier computing model, with its main functionality and optimization methods, is presented in details. We hope that this paper will serve as a valuable reference and guide to the theoretical, algorithmic, and systematic opportunities of multi-tier computing towards next-generation wireless networks.

preprint2022arXiv

The Role of Magnetic Fields in Triggered Star Formation of RCW 120

We report on the near-infrared polarimetric observations of RCW 120 with the 1.4 m IRSF telescope. The starlight polarization of the background stars reveals for the first time the magnetic field of RCW 120. The global magnetic field of RCW 120 is along the direction of $20^\circ$, parallel to the Galactic plane. The field strength on the plane of the sky is $100\pm26\,μ$G. The magnetic field around the eastern shell shows evidence of compression by the HII region. The external pressure (turbulent pressure + magnetic pressure) and the gas density of the ambient cloud are minimum along the direction where RCW 120 breaks out, which explains the observed elongation of RCW 120. The dynamical age of RCW 120, depending on the magnetic field strength, is $\sim\,1.6\,\mathrm{Myr}$ for field strength of $100\,μ$G, older than the hydrodynamic estimates. In direction perpendicular to the magnetic field, the density contrast of the western shell is greatly reduced by the strong magnetic field. The strong magnetic field in general reduces the efficiency of triggered star formation, in comparison with the hydrodynamic estimates. Triggered star formation via the &#34;collect and collapse&#34; mechanism could occur in the direction along the magnetic field. Core formation efficiency (CFE) is found to be higher in the southern and eastern shells of RCW 120 than in the infrared dark cloud receiving little influence from the HII region, suggesting increase in the CFE related to triggering from ionization feedback.

preprint2022arXiv

Tuning the Magnetic Properties of the CrMnFeCoNi Cantor Alloy

Magnetic properties of more than twenty Cantor alloy samples of varying composition were investigated over a temperature range of 5 K to 300 K and in fields of up to 70 kOe using magnetometry and muon spin relaxation. Two transitions are identified: a spin-glass-like transition that appears between 55 K and 190 K depending on composition, and a ferrimagnetic transition that occurs at approximately 43 K in multiple samples with widely varying compositions. The magnetic signatures at 43 K are remarkably insensitive to chemical composition. A modified Curie-Weiss model was used to fit the susceptibility data and to extract the net effective magnetic moment for each sample. The resulting values for the net effective moment were either diminished with increasing Cr or Mn concentrations or enhanced with decreasing Fe, Co, or Ni concentrations. Beyond a sufficiently large effective moment, the magnetic ground state transitions from ferrimagnetism to ferromagnetism. The effective magnetic moments, together with the corresponding compositions, are used in a global linear regression analysis to extract element-specific effective magnetic moments, which are compared to the values obtained by ab-initio based density functional theory (DFT) calculations. These moments provide the information necessary to controllably tune the magnetic properties of Cantor alloy variants.

preprint2022arXiv

Ultrafast modulation of the molten metal surface tension under femtosecond laser irradiation

We predict ultrafast modulation of the pure molten metal surface stress fields under the irradiation of the single femtosecond laser pulse through the two-temperature model molecular-dynamics simulations. High-resolution and precision calculations are used to resolve the ultrafast laser-induced anisotropic relaxations of the pressure components on the time-scale comparable to the intrinsic liquid density relaxation time. The magnitudes of the dynamic surface tensions are found being modulated sharply within picoseconds after the irradiation, due to the development of the nanometer scale non-hydrostatic regime behind the exterior atomic layer of the liquid surfaces. The reported novel regulation mechanism of the liquid surface stress field and the dynamic surface tension hints at levitating the manipulation of liquid surfaces, such as ultrafast steering the surface directional transport and patterning.

preprint2022arXiv

Who is next: rising star prediction via diffusion of user interest in social networks

Finding items with potential to increase sales is of great importance in online market. In this paper, we propose to study this novel and practical problem: rising star prediction. We call these potential items Rising Star, which implies their ability to rise from low-turnover items to best-sellers in the future. Rising stars can be used to help with unfair recommendation in e-commerce platform, balance supply and demand to benefit the retailers and allocate marketing resources rationally. Although the study of rising star can bring great benefits, it also poses challenges to us. The sales trend of rising star fluctuates sharply in the short-term and exhibits more contingency caused by some external events (e.g., COVID-19 caused increasing purchase of the face mask) than other items, which cannot be solved by existing sales prediction methods. To address above challenges, in this paper, we observe that the presence of rising stars is closely correlated with the early diffusion of user interest in social networks, which is validated in the case of Taocode (an intermediary that diffuses user interest in Taobao). Thus, we propose a novel framework, RiseNet, to incorporate the user interest diffusion process with the item dynamic features to effectively predict rising stars. Specifically, we adopt a coupled mechanism to capture the dynamic interplay between items and user interest, and a special designed GNN based framework to quantify user interest. Our experimental results on large-scale real-world datasets provided by Taobao demonstrate the effectiveness of our proposed framework.

preprint2021arXiv

A hybrid-mixed finite element method for single-phase Darcy flow in fractured porous media

We present a hybrid-mixed finite element method for a novel hybrid-dimensional model of single-phase Darcy flow in a fractured porous media. In this model, the fracture is treated as an $(d-1)$-dimensional interface within the $d$-dimensional fractured porous domain, for $d=2, 3$. Two classes of fracture are distinguished based on the permeability magnitude ratio between the fracture and its surrounding medium: when the permeability in the fracture is (significantly) larger than in its surrounding medium, it is considered as a {\it conductive} fracture; when the permeability in the fracture is (significantly) smaller than in its surrounding medium, it is considered as a {\it blocking} fracture. The conductive fractures are treated using the classical hybrid-dimensional approach of the interface model where pressure is assumed to be continuous across the fracture interfaces, while the blocking fractures are treated using the recent Dirac-$δ$ function approach where normal component of Darcy velocity is assumed to be continuous across the interface. Due to the use of Dirac-$δ$ function approach for the blocking fractures, our numerical scheme allows for nonconforming meshes with respect to the blocking fractures. This is the major novelty of our model and numerical discretization. Moreover, our numerical scheme produces locally conservative velocity approximations and leads to a symmetric positive definite linear system involving pressure degrees of freedom on the mesh skeleton only. The performance of the proposed method is demonstrated by various benchmark test cases in both two- and three-dimensions. Numerical results indicate that the proposed scheme is highly competitive with existing methods in the literature.

preprint2021arXiv

A Unified Light Framework for Real-time Fault Detection of Freight Train Images

Real-time fault detection for freight trains plays a vital role in guaranteeing the security and optimal operation of railway transportation under stringent resource requirements. Despite the promising results for deep learning based approaches, the performance of these fault detectors on freight train images, are far from satisfactory in both accuracy and efficiency. This paper proposes a unified light framework to improve detection accuracy while supporting a real-time operation with a low resource requirement. We firstly design a novel lightweight backbone (RFDNet) to improve the accuracy and reduce computational cost. Then, we propose a multi region proposal network using multi-scale feature maps generated from RFDNet to improve the detection performance. Finally, we present multi level position-sensitive score maps and region of interest pooling to further improve accuracy with few redundant computations. Extensive experimental results on public benchmark datasets suggest that our RFDNet can significantly improve the performance of baseline network with higher accuracy and efficiency. Experiments on six fault datasets show that our method is capable of real-time detection at over 38 frames per second and achieves competitive accuracy and lower computation than the state-of-the-art detectors.

preprint2021arXiv

All-sky search for long-duration gravitational-wave bursts in the third Advanced LIGO and Advanced Virgo run

After the detection of gravitational waves from compact binary coalescences, the search for transient gravitational-wave signals with less well-defined waveforms for which matched filtering is not well-suited is one of the frontiers for gravitational-wave astronomy. Broadly classified into &#34;short&#34; $ \lesssim 1~$\,s and &#34;long&#34; $ \gtrsim 1~$\,s duration signals, these signals are expected from a variety of astrophysical processes, including non-axisymmetric deformations in magnetars or eccentric binary black hole coalescences. In this work, we present a search for long-duration gravitational-wave transients from Advanced LIGO and Advanced Virgo&#39;s third observing run from April 2019 to March 2020. For this search, we use minimal assumptions for the sky location, event time, waveform morphology, and duration of the source. The search covers the range of $2~\text{--}~ 500$~s in duration and a frequency band of $24 - 2048$ Hz. We find no significant triggers within this parameter space; we report sensitivity limits on the signal strength of gravitational waves characterized by the root-sum-square amplitude $h_{\mathrm{rss}}$ as a function of waveform morphology. These $h_{\mathrm{rss}}$ limits improve upon the results from the second observing run by an average factor of 1.8.

preprint2021arXiv

All-sky search for short gravitational-wave bursts in the third Advanced LIGO and Advanced Virgo run

This paper presents the results of a search for generic short-duration gravitational-wave transients in data from the third observing run of Advanced LIGO and Advanced Virgo. Transients with durations of milliseconds to a few seconds in the 24--4096 Hz frequency band are targeted by the search, with no assumptions made regarding the incoming signal direction, polarization or morphology. Gravitational waves from compact binary coalescences that have been identified by other targeted analyses are detected, but no statistically significant evidence for other gravitational wave bursts is found. Sensitivities to a variety of signals are presented. These include updated upper limits on the source rate-density as a function of the characteristic frequency of the signal, which are roughly an order of magnitude better than previous upper limits. This search is sensitive to sources radiating as little as $\sim$10$^{-10} M_{\odot} c^2$ in gravitational waves at $\sim$70 Hz from a distance of 10~kpc, with 50\% detection efficiency at a false alarm rate of one per century. The sensitivity of this search to two plausible astrophysical sources is estimated: neutron star f-modes, which may be excited by pulsar glitches, as well as selected core-collapse supernova models.

preprint2021arXiv

All-sky, all-frequency directional search for persistent gravitational-waves from Advanced LIGO&#39;s and Advanced Virgo&#39;s first three observing runs

We present the first results from an all-sky all-frequency (ASAF) search for an anisotropic stochastic gravitational-wave background using the data from the first three observing runs of the Advanced LIGO and Advanced Virgo detectors. Upper limit maps on broadband anisotropies of a persistent stochastic background were published for all observing runs of the LIGO-Virgo detectors. However, a broadband analysis is likely to miss narrowband signals as the signal-to-noise ratio of a narrowband signal can be significantly reduced when combined with detector output from other frequencies. Data folding and the computationally efficient analysis pipeline, {\tt PyStoch}, enable us to perform the radiometer map-making at every frequency bin. We perform the search at 3072 {\tt{HEALPix}} equal area pixels uniformly tiling the sky and in every frequency bin of width $1/32$~Hz in the range $20-1726$~Hz, except for bins that are likely to contain instrumental artefacts and hence are notched. We do not find any statistically significant evidence for the existence of narrowband gravitational-wave signals in the analyzed frequency bins. Therefore, we place $95\%$ confidence upper limits on the gravitational-wave strain for each pixel-frequency pair, the limits are in the range $(0.030 - 9.6) \times10^{-24}$. In addition, we outline a method to identify candidate pixel-frequency pairs that could be followed up by a more sensitive (and potentially computationally expensive) search, e.g., a matched-filtering-based analysis, to look for fainter nearly monochromatic coherent signals. The ASAF analysis is inherently independent of models describing any spectral or spatial distribution of power. We demonstrate that the ASAF results can be appropriately combined over frequencies and sky directions to successfully recover the broadband directional and isotropic results.

preprint2021arXiv

Cliophysics: A scientific analysis of recurrent historical events

Named after Clio, the Greek goddess of history, cliophysics is a daughter (and in a sense an extension) of econophysics. Like econophysics it relies on the methodology of experimental physics. Its purpose is to conduct a scientific analysis of historical events. Such events can be of sociological, political or economic nature. In this last case cliophysics would coincide with econophysics. The main difference between cliophysics and econophysics is that the description of historical events may be qualitative as well as quantitative. For the handling of qualitative accounts cliophysics has developed an approach based on the identification of patterns. To detect a pattern the main challenge is to break the &#34;noise barrier&#34;. The very existence of patterns is what makes cliophysics possible and ensures its success. Briefly stated, once a pattern is detected, it allows predictions to be made. As the capacity to make successful predictions is the hallmark of any science, it becomes easy to decide whether or not the claim made in the title of the paper is indeed fulfilled. A number of examples of clusters of similar events will be given which should convince readers that historical events can be simplified almost at will very much as in physics. One should not forget that physical effects are also subject to the environment. For instance, if tried at the equator, the experiment of the Foucault pendulum will fail. In the last part of the paper, we describe cliophysical investigations conducted over the past decades; they make us confident that cliophysics can be a valuable tool for decision makers.

preprint2021arXiv

How Powerful are Interest Diffusion on Purchasing Prediction: A Case Study of Taocode

A taocode is a kind of specially coded text-link on Taobao(the world&#39;s biggest online shopping website), through which users can share messages about products with each other. Analyzing taocodes can potentially facilitate understanding of the social relationships between users and, more excitingly, their online purchasing behaviors under the influence of taocode diffusion. This paper innovatively investigates the problem of online purchasing predictions from an information diffusion perspective, with taocode as a case study. Specifically, we conduct profound observational studies on a large-scale real-world dataset from Taobao, containing over 100M Taocode sharing records. Inspired by our observations, we propose InfNet, a dynamic GNN-based framework that models the information diffusion across Taocode. We then apply InfNet to item purchasing predictions. Extensive experiments on real-world datasets validate the effectiveness of InfNet compared with 8 state-of-the-art baselines.

preprint2021arXiv

Integrating Pre-trained Model into Rule-based Dialogue Management

Rule-based dialogue management is still the most popular solution for industrial task-oriented dialogue systems for their interpretablility. However, it is hard for developers to maintain the dialogue logic when the scenarios get more and more complex. On the other hand, data-driven dialogue systems, usually with end-to-end structures, are popular in academic research and easier to deal with complex conversations, but such methods require plenty of training data and the behaviors are less interpretable. In this paper, we propose a method to leverages the strength of both rule-based and data-driven dialogue managers (DM). We firstly introduce the DM of Carina Dialog System (CDS, an advanced industrial dialogue system built by Microsoft). Then we propose the &#34;model-trigger&#34; design to make the DM trainable thus scalable to scenario changes. Furthermore, we integrate pre-trained models and empower the DM with few-shot capability. The experimental results demonstrate the effectiveness and strong few-shot capability of our method.

preprint2021arXiv

Optimization of User Selection and Bandwidth Allocation for Federated Learning in VLC/RF Systems

Limited radio frequency (RF) resources restrict the number of users that can participate in federated learning (FL) thus affecting FL convergence speed and performance. In this paper, we first introduce visible light communication (VLC) as a supplement to RF in FL and build a hybrid VLC/RF communication system, in which each indoor user can use both VLC and RF to transmit its FL model parameters. Then, the problem of user selection and bandwidth allocation is studied for FL implemented over a hybrid VLC/RF system aiming to optimize the FL performance. The problem is first separated into two subproblems. The first subproblem is a user selection problem with a given bandwidth allocation, which is solved by a traversal algorithm. The second subproblem is a bandwidth allocation problem with a given user selection, which is solved by a numerical method. The final user selection and bandwidth allocation are obtained by iteratively solving these two subproblems. Simulation results show that the proposed FL algorithm that efficiently uses VLC and RF for FL model transmission can improve the prediction accuracy by up to 10% compared with a conventional FL system using only RF.

preprint2021arXiv

Performance of Superconducting Quantum Computing Chips under Different Architecture Design

Existing and near-term quantum computers can only perform two-qubit gates between physically connected qubits. Research has been done on compilers to rewrite quantum programs to match hardware constraints. However, the quantum processor architecture, in particular the qubit connectivity and topology, still lacks enough discussion, while it potentially has a huge impact on the performance of the quantum algorithms. We perform a quantitative and comprehensive study on the quantum processor performance under different qubit connectivity and topology. We select ten representative design models with different connectivities and topologies from quantum architecture design space and benchmark their performance by running a set of standard quantum algorithms. It is shown that a high-performance architecture almost always comes with a design with a large connectivity, while the topology shows a weak influence on the performance in our experiment. Different quantum algorithms show different dependence on quantum chip connectivity and topologies. This work provides quantum computing researchers with a systematic approach to evaluating their processor design.

preprint2021arXiv

Progressive Neural Image Compression with Nested Quantization and Latent Ordering

We present PLONQ, a progressive neural image compression scheme which pushes the boundary of variable bitrate compression by allowing quality scalable coding with a single bitstream. In contrast to existing learned variable bitrate solutions which produce separate bitstreams for each quality, it enables easier rate-control and requires less storage. Leveraging the latent scaling based variable bitrate solution, we introduce nested quantization, a method that defines multiple quantization levels with nested quantization grids, and progressively refines all latents from the coarsest to the finest quantization level. To achieve finer progressiveness in between any two quantization levels, latent elements are incrementally refined with an importance ordering defined in the rate-distortion sense. To the best of our knowledge, PLONQ is the first learning-based progressive image coding scheme and it outperforms SPIHT, a well-known wavelet-based progressive image codec.

preprint2021arXiv

Search for Gravitational Waves Associated with Gamma-Ray Bursts Detected by Fermi and Swift During the LIGO-Virgo Run O3b

We search for gravitational-wave signals associated with gamma-ray bursts detected by the Fermi and Swift satellites during the second half of the third observing run of Advanced LIGO and Advanced Virgo (1 November 2019 15:00 UTC-27 March 2020 17:00 UTC).We conduct two independent searches: a generic gravitational-wave transients search to analyze 86 gamma-ray bursts and an analysis to target binary mergers with at least one neutron star as short gamma-ray burst progenitors for 17 events. We find no significant evidence for gravitational-wave signals associated with any of these gamma-ray bursts. A weighted binomial test of the combined results finds no evidence for sub-threshold gravitational wave signals associated with this GRB ensemble either. We use several source types and signal morphologies during the searches, resulting in lower bounds on the estimated distance to each gamma-ray burst. Finally, we constrain the population of low luminosity short gamma-ray bursts using results from the first to the third observing runs of Advanced LIGO and Advanced Virgo. The resulting population is in accordance with the local binary neutron star merger rate.

preprint2021arXiv

Search for subsolar-mass binaries in the first half of Advanced LIGO and Virgo&#39;s third observing run

We report on a search for compact binary coalescences where at least one binary component has a mass between 0.2 $M_\odot$ and 1.0 $M_\odot$ in Advanced LIGO and Advanced Virgo data collected between 1 April 2019 1500 UTC and 1 October 2019 1500 UTC. We extend previous analyses in two main ways: we include data from the Virgo detector and we allow for more unequal mass systems, with mass ratio $q \geq 0.1$. We do not report any gravitational-wave candidates. The most significant trigger has a false alarm rate of 0.14 $\mathrm{yr}^{-1}$. This implies an upper limit on the merger rate of subsolar binaries in the range $[220-24200] \mathrm{Gpc}^{-3} \mathrm{yr}^{-1}$, depending on the chirp mass of the binary. We use this upper limit to derive astrophysical constraints on two phenomenological models that could produce subsolar-mass compact objects. One is an isotropic distribution of equal-mass primordial black holes. Using this model, we find that the fraction of dark matter in primordial black holes is $f_\mathrm{PBH} \equiv Ω_\mathrm{PBH} / Ω_\mathrm{DM} \lesssim 6\%$. The other is a dissipative dark matter model, in which fermionic dark matter can collapse and form black holes. The upper limit on the fraction of dark matter black holes depends on the minimum mass of the black holes that can be formed: the most constraining result is obtained at $M_\mathrm{min}=1 M_\odot$, where $f_\mathrm{DBH} \equiv Ω_\mathrm{PBH} / Ω_\mathrm{DM} \lesssim 0.003\%$. These are the tightest limits on spinning subsolar-mass binaries to date.

preprint2021arXiv

The hybrid dimensional representation of permeability tensor: a reinterpretation of the discrete fracture model and its extension on nonconforming meshes

The discrete fracture model (DFM) has been widely used in the simulation of fluid flow in fractured porous media. Traditional DFM uses the so-called hybrid-dimensional approach to treat fractures explicitly as low-dimensional entries (e.g. line entries in 2D media and face entries in 3D media) on the interfaces of matrix cells and then couple the matrix and fracture flow systems together based on the principle of superposition with the fracture thickness used as the dimensional homogeneity factor. Because of this methodology, DFM is considered to be limited on conforming meshes and thus may raise difficulties in generating high quality unstructured meshes due to the complexity of fracture&#39;s geometrical morphology. In this paper, we clarify that the DFM actually can be extended to non-conforming meshes without any essential changes. To show it clearly, we provide another perspective for DFM based on hybrid-dimensional representation of permeability tensor to describe fractures as one-dimensional line Dirac delta functions contained in permeability tensor. A finite element DFM scheme for single-phase flow on non-conforming meshes is then derived by applying Galerkin finite element method to it. Analytical analysis and numerical experiments show that our DFM automatically degenerates to the classical finite element DFM when the mesh is conforming with fractures. Moreover, the accuracy and efficiency of the model on non-conforming meshes are demonstrated by testing several benchmark problems. This model is also applicable to curved fracture with variable thickness.

preprint2021arXiv

The Hybrid-dimensional Darcy&#39;s Law: A Reinterpreted Discrete Fracture Model for Fracture and Barrier Networks on Non-conforming Meshes

In this paper, we extend the reinterpreted discrete fracture model for flow simulation of fractured porous media containing flow blocking barriers on non-conforming meshes. The methodology of the approach is to modify the traditional Darcy&#39;s law into the hybrid-dimensional Darcy&#39;s law where fractures and barriers are represented as Dirac-delta functions contained in the permeability tensor and resistance tensor, respectively. As a natural extension of the reinterpreted discrete fracture model for highly conductive fractures, this model is able to account for the influence of both highly conductive fractures and blocking barriers accurately on non-conforming meshes. The local discontinuous Galerkin (LDG) method is employed to accommodate the form of the hybrid-dimensional Darcy&#39;s law and the nature of the pressure/flux discontinuity. The performance of the model is demonstrated by several numerical tests.

preprint2021arXiv

Towards Fast, Accurate and Stable 3D Dense Face Alignment

Existing methods of 3D dense face alignment mainly concentrate on accuracy, thus limiting the scope of their practical applications. In this paper, we propose a novel regression framework named 3DDFA-V2 which makes a balance among speed, accuracy and stability. Firstly, on the basis of a lightweight backbone, we propose a meta-joint optimization strategy to dynamically regress a small set of 3DMM parameters, which greatly enhances speed and accuracy simultaneously. To further improve the stability on videos, we present a virtual synthesis method to transform one still image to a short-video which incorporates in-plane and out-of-plane face moving. On the premise of high accuracy and stability, 3DDFA-V2 runs at over 50fps on a single CPU core and outperforms other state-of-the-art heavy models simultaneously. Experiments on several challenging datasets validate the efficiency of our method. Pre-trained models and code are available at https://github.com/cleardusk/3DDFA_V2.

preprint2021arXiv

Towards Unbiased COVID-19 Lesion Localisation and Segmentation via Weakly Supervised Learning

Despite tremendous efforts, it is very challenging to generate a robust model to assist in the accurate quantification assessment of COVID-19 on chest CT images. Due to the nature of blurred boundaries, the supervised segmentation methods usually suffer from annotation biases. To support unbiased lesion localisation and to minimise the labeling costs, we propose a data-driven framework supervised by only image-level labels. The framework can explicitly separate potential lesions from original images, with the help of a generative adversarial network and a lesion-specific decoder. Experiments on two COVID-19 datasets demonstrate the effectiveness of the proposed framework and its superior performance to several existing methods.

preprint2021arXiv

Unraveling the Dynamic Importance of County-level Features in Trajectory of COVID-19

The objective of this study was to investigate the importance of multiple county-level features in the trajectory of COVID-19. We examined feature importance across 2,787 counties in the United States using a data-driven machine learning model. We trained random forest models using 23 features representing six key influencing factors affecting pandemic spread: social demographics of counties, population activities, mobility within the counties, movement across counties, disease attributes, and social network structure. Also, we categorized counties into multiple groups according to their population densities, and we divided the trajectory of COVID-19 into three stages: the outbreak stage, the social distancing stage, and the reopening stage. The study aims to answer two research questions: (1) The extent to which the importance of heterogeneous features evolves in different stages; (2) The extent to which the importance of heterogeneous features varies across counties with different characteristics. We fitted a set of random forest models to determine weekly feature importance. The results showed that: (1) Social demographic features, such as gross domestic product, population density, and minority status maintained high-importance features throughout stages of COVID-19 across the 2787 studied counties; (2) Within-county mobility features had the highest importance in county clusters with higher population densities; (3) The feature reflecting the social network structure (Facebook, social connectedness index), had higher importance in the models for counties with higher population densities. The results show that the data-driven machine learning models could provide important insights to inform policymakers regarding feature importance for counties with various population densities and in different stages of a pandemic life cycle.

preprint2021arXiv

What social media told about us in the time of COVID-19: a scoping review

With the onset of COVID-19 pandemic, social media has rapidly become a crucial communication tool for information generation, dissemination, and consumption. In this scoping review, we selected and examined peer-reviewed empirical studies relating to COVID-19 and social media during the first outbreak starting in November 2019 until May 2020. From an analysis of 81 studies, we identified five overarching public health themes concerning the role of online social platforms and COVID-19. These themes focused on: (i) surveying public attitudes, (ii) identifying infodemics, (iii) assessing mental health, (iv) detecting or predicting COVID-19 cases, (v) analyzing government responses to the pandemic, and (vi) evaluating quality of health information in prevention education videos. Furthermore, our review highlights the paucity of studies on the application of machine learning on social media data related to COVID-19 and a lack of studies documenting real-time surveillance developed with social media data on COVID-19. For COVID-19, social media can play a crucial role in disseminating health information as well as tackling infodemics and misinformation.

preprint2020arXiv

A Big Data Enabled Channel Model for 5G Wireless Communication Systems

The standardization process of the fifth generation (5G) wireless communications has recently been accelerated and the first commercial 5G services would be provided as early as in 2018. The increasing of enormous smartphones, new complex scenarios, large frequency bands, massive antenna elements, and dense small cells will generate big datasets and bring 5G communications to the era of big data. This paper investigates various applications of big data analytics, especially machine learning algorithms in wireless communications and channel modeling. We propose a big data and machine learning enabled wireless channel model framework. The proposed channel model is based on artificial neural networks (ANNs), including feed-forward neural network (FNN) and radial basis function neural network (RBF-NN). The input parameters are transmitter (Tx) and receiver (Rx) coordinates, Tx-Rx distance, and carrier frequency, while the output parameters are channel statistical properties, including the received power, root mean square (RMS) delay spread (DS), and RMS angle spreads (ASs). Datasets used to train and test the ANNs are collected from both real channel measurements and a geometry based stochastic model (GBSM). Simulation results show good performance and indicate that machine learning algorithms can be powerful analytical tools for future measurement-based wireless channel modeling.

preprint2020arXiv

A Coalition-Based Communication Framework for Task-Driven Flying Ad-Hoc Networks

In this paper, we develop a task-driven networking framework for Flying Ad-hoc Networks (FANETs), where a coalition-based model is outlined. Firstly, we present a brief survey to show the state-of-the-art studies on the intra-communication of unmanned aerial vehicle (UAV) swarms. The features and deficiencies of existing models are analyzed. To capture the task-driven requirement of the flying multi-agent system, a coalition-based framework is proposed. We discuss the composition, networking mode and the classification of data transmission. After that, the application scenario of UAV coalitions is given, where large-scale, distributed and highly dynamic characteristics greatly increase the difficulty of resource optimization for UAVs. To tackle the problem, we design an intelligence-based optimization architecture, which mainly includes the game model, machine learning and real-time decision. Under the guidance of game theories and machine learning, UAVs can make comprehensive decisions by combining the previous training results with their sensing, information interaction, and game strategies. Finally, a preliminary case and promising open issues of UAV coalitions are studied.

preprint2020arXiv

A Deep Learning Approach to Grasping the Invisible

We study an emerging problem named &#34;grasping the invisible&#34; in robotic manipulation, in which a robot is tasked to grasp an initially invisible target object via a sequence of pushing and grasping actions. In this problem, pushes are needed to search for the target and rearrange cluttered objects around it to enable effective grasps. We propose to solve the problem by formulating a deep learning approach in a critic-policy format. The target-oriented motion critic, which maps both visual observations and target information to the expected future rewards of pushing and grasping motion primitives, is learned via deep Q-learning. We divide the problem into two subtasks, and two policies are proposed to tackle each of them, by combining the critic predictions and relevant domain knowledge. A Bayesian-based policy accounting for past action experience performs pushing to search for the target; once the target is found, a classifier-based policy coordinates target-oriented pushing and grasping to grasp the target in clutter. The motion critic and the classifier are trained in a self-supervised manner through robot-environment interactions. Our system achieves a 93% and 87% task success rate on each of the two subtasks in simulation and an 85% task success rate in real robot experiments on the whole problem, which outperforms several baselines by large margins. Supplementary material is available at https://sites.google.com/umn.edu/grasping-invisible.

preprint2020arXiv

A Flexible Connector for Soft Modular Robots Based on Micropatterned Intersurface Jamming

Soft modular robots enable more flexibility and safer interaction with the changing environment than traditional robots. However, it has remained challenging to create deformable connectors that can be integrated into soft machines. In this work, we propose a flexible connector for soft modular robots based on micropatterned intersurface jamming. The connector is composed of micropatterned dry adhesives made by silicone rubber and a flexible main body with inflatable chambers for active engagement and disengagement. Through connection force tests, we evaluate the characteristics of the connector both in the linear direction and under rotational disruptions. The connector can stably support an average maximum load of 22 N (83 times the connector&#39;s body weight) linearly and 10.86 N under planar rotation. The proposed connector demonstrates the potential to create a robust connection between soft modular robots without raising the system&#39;s overall stiffness; thus guarantees high flexibility of the robotic system.

preprint2020arXiv

A Generalized Dimming Control Scheme for Visible Light Communications

A novel dimming control scheme, termed as generalized dimming control (GDC), is proposed for visible light communication (VLC) systems. The proposed GDC scheme achieves dimming control by simultaneously adjusting the intensity of transmitted symbols and the number of active elements in a space-time matrix. Both the indices of the active elements in each space-time matrix and the modulated constellation symbols are used to carry information. Since illumination is deemed as the prior task of VLC, an incremental algorithm for index mapping is proposed for achieving target optical power and uniform illumination. Next, GDC having the optimal activation pattern is investigated to further improve the bit-error rate (BER) performance. In particular, the BER performance of GDC is analyzed using the union bound technique. Based on the analytical BER bound, the optimal activation pattern of GDC scheme with the minimum BER criterion (GDC-MBER) is obtained by exhaustively searching all conditional pairwise error probabilities. However, since GDC-MBER requires high search complexity, two low-complexity GDC schemes having the maximum free distance criterion (GDCMFD) are proposed. The first GDC-MFD scheme, coined as GDC-MFD1, reduces the computational complexity by deriving a lower bound of the free distance based on Rayleigh-Ritz theorem. Based on the time-invariance characteristics of the VLC channel, GDC-MFD2 is proposed to further reduce the required computation efforts. Simulation and numerical results show that GDC-MBER, GDC-MFD1 and GDC-MFD2 have similar BER performance, and they can achieve 2 dB performance gains over conventional hybrid dimming control scheme and 7 dB performance gains over digital dimming control schemes.

preprint2020arXiv

A High Coverage Camera Assisted Received Signal Strength Ratio Algorithm for Indoor Visible Light Positioning

In this paper, a high coverage algorithm termed enhanced camera assisted received signal strength ratio (eCA-RSSR) positioning algorithm is proposed for visible light positioning (VLP) systems. The basic idea of eCA-RSSR is to utilize visual information captured by the camera to estimate the incidence angles of visible lights first. Based on the incidence angles, eCA-RSSR utilizes the received signal strength ratio (RSSR) calculated by the photodiode (PD) to estimate the ratios of the distances between the LEDs and the receiver. Based on an Euclidean plane geometry theorem, eCA-RSSR transforms the ratios of the distances into the absolute values. In this way, eCA-RSSR only requires 3 LEDs for both orientation-free 2D and 3D positioning, implying that eCA-RSSR can achieve high coverage. Based on the absolute values of the distances, the linear least square method is employed to estimate the position of the receiver. Therefore, for the receiver having a small distance between the PD and the camera, the accuracy of eCA-RSSR does not depend on the starting values of the non-linear least square method and the complexity of eCA-RSSR is low. Furthermore, since the distance between the PD and camera can significantly affect the performance of eCA-RSSR, we further propose a compensation algorithm for eCA-RSSR based on the single-view geometry. Simulation results show that eCA-RSSR can achieve centimeter-level accuracy over 80% indoor area for both the receivers having a small and a large distance between the PD and the camera.

preprint2020arXiv

A Non-Iterative Reconstruction Algorithm for the Acoustic Inverse Boundary Value Problem

We present a non-iterative algorithm to reconstruct the isotropic acoustic wave speed from the measurement of the Neumann-to-Dirichlet map. The algorithm is designed based on the boundary control method and involves only computations that are stable. We prove the convergence of the algorithm and present its numerical implementation. The effectiveness of the algorithm is validated on both constant speed and variable speed, with full and partial boundary measurement as well as different levels of noise.

preprint2020arXiv

An Eigenspace Divide-and-Conquer Approach for Large-Scale Optimization

Divide-and-conquer-based (DC-based) evolutionary algorithms (EAs) have achieved notable success in dealing with large-scale optimization problems (LSOPs). However, the appealing performance of this type of algorithms generally requires a high-precision decomposition of the optimization problem, which is still a challenging task for existing decomposition methods. This study attempts to address the above issue from a different perspective and proposes an eigenspace divide-and-conquer (EDC) approach. Different from existing DC-based algorithms that perform decomposition and optimization in the original decision space, EDC first establishes an eigenspace by conducting singular value decomposition on a set of high-quality solutions selected from recent generations. Then it transforms the optimization problem into the eigenspace, and thus significantly weakens the dependencies among the corresponding eigenvariables. Accordingly, these eigenvariables can be efficiently grouped by a simple random strategy and each of the resulting subproblems can be addressed more easily by a traditional EA. To verify the efficiency of EDC, comprehensive experimental studies were conducted on two sets of benchmark functions. Experimental results indicate that EDC is robust to its parameters and has good scalability to the problem dimension. The comparison with several state-of-the-art algorithms further confirms that EDC is pretty competitive and performs better on complicated LSOPs.

preprint2020arXiv

Boundary Content Graph Neural Network for Temporal Action Proposal Generation

Temporal action proposal generation plays an important role in video action understanding, which requires localizing high-quality action content precisely. However, generating temporal proposals with both precise boundaries and high-quality action content is extremely challenging. To address this issue, we propose a novel Boundary Content Graph Neural Network (BC-GNN) to model the insightful relations between the boundary and action content of temporal proposals by the graph neural networks. In BC-GNN, the boundaries and content of temporal proposals are taken as the nodes and edges of the graph neural network, respectively, where they are spontaneously linked. Then a novel graph computation operation is proposed to update features of edges and nodes. After that, one updated edge and two nodes it connects are used to predict boundary probabilities and content confidence score, which will be combined to generate a final high-quality proposal. Experiments are conducted on two mainstream datasets: ActivityNet-1.3 and THUMOS14. Without the bells and whistles, BC-GNN outperforms previous state-of-the-art methods in both temporal action proposal and temporal action detection tasks.

preprint2020arXiv

Collaborative Learning for Extremely Low Bit Asymmetric Hashing

Hashing techniques are in great demand for a wide range of real-world applications such as image retrieval and network compression. Nevertheless, existing approaches could hardly guarantee a satisfactory performance with the extremely low-bit (e.g., 4-bit) hash codes due to the severe information loss and the shrink of the discrete solution space. In this paper, we propose a novel \textit{Collaborative Learning} strategy that is tailored for generating high-quality low-bit hash codes. The core idea is to jointly distill bit-specific and informative representations for a group of pre-defined code lengths. The learning of short hash codes among the group can benefit from the manifold shared with other long codes, where multiple views from different hash codes provide the supplementary guidance and regularization, making the convergence faster and more stable. To achieve that, an asymmetric hashing framework with two variants of multi-head embedding structures is derived, termed as Multi-head Asymmetric Hashing (MAH), leading to great efficiency of training and querying. Extensive experiments on three benchmark datasets have been conducted to verify the superiority of the proposed MAH, and have shown that the 8-bit hash codes generated by MAH achieve $94.3\%$ of the MAP (Mean Average Precision (MAP)) score on the CIFAR-10 dataset, which significantly surpasses the performance of the 48-bit codes by the state-of-the-arts in image retrieval tasks.

preprint2020arXiv

Collisional-radiative modeling of the $5p-5s$ spectrum of W XIV - W XVI ions

The wavelength and rate of the $5p-5s$ transition of W XIV - W XVI ions have been calculated by the relativistic configuration interaction (RCI) method with the implementation of Flexible Atomic code (FAC). A reasonable collisional-radiative model (CRM) has been constructed to simulate the $5p - 5s$ transition spectrum of W XIV - W XVI ions which had been observed in electron beam ion trap (EBIT) device. The results are in reasonable agreement with the available experimental and theoretical data, and might be applied to identify the controversial spectra. The confusion on the assignment of the ionization stage are solved in the present work.

preprint2020arXiv

Convergence analysis of an inexact inertial Krasnoselskii-Mann algorithm with applications

The classical Krasnoselskii-Mann iteration is broadly used for approximating fixed points of nonexpansive operators. To accelerate the convergence of the Krasnoselskii-Mann iteration, the inertial methods were received much attention in recent years. In this paper, we propose an inexact inertial Krasnoselskii-Mann algorithm. In comparison with the original inertial Krasnoselskii-Mann algorithm, our algorithm allows error for updating the iterative sequence, which makes it more flexible and useful in practice. We establish weak convergence results for the proposed algorithm under different conditions on parameters and error terms. Furthermore, we provide a nonasymptotic convergence rate for the proposed algorithm. As applications, we propose and study inexact inertial proximal point algorithm and inexact inertial forward-backward splitting algorithm for solving monotone inclusion problems and the corresponding convex minimization problems.

preprint2020arXiv

Cross-modal supervised learning for better acoustic representations

Obtaining large-scale human-labeled datasets to train acoustic representation models is a very challenging task. On the contrary, we can easily collect data with machine-generated labels. In this work, we propose to exploit machine-generated labels to learn better acoustic representations, based on the synchronization between vision and audio. Firstly, we collect a large-scale video dataset with 15 million samples, which totally last 16,320 hours. Each video is 3 to 5 seconds in length and annotated automatically by publicly available visual and audio classification models. Secondly, we train various classical convolutional neural networks (CNNs) including VGGish, ResNet 50 and Mobilenet v2. We also make several improvements to VGGish and achieve better results. Finally, we transfer our models on three external standard benchmarks for audio classification task, and achieve significant performance boost over the state-of-the-art results. Models and codes are available at: https://github.com/Deeperjia/vgg-like-audio-models.

preprint2020arXiv

Deep Eyes: Binocular Depth-from-Focus on Focal Stack Pairs

Human visual system relies on both binocular stereo cues and monocular focusness cues to gain effective 3D perception. In computer vision, the two problems are traditionally solved in separate tracks. In this paper, we present a unified learning-based technique that simultaneously uses both types of cues for depth inference. Specifically, we use a pair of focal stacks as input to emulate human perception. We first construct a comprehensive focal stack training dataset synthesized by depth-guided light field rendering. We then construct three individual networks: a Focus-Net to extract depth from a single focal stack, a EDoF-Net to obtain the extended depth of field (EDoF) image from the focal stack, and a Stereo-Net to conduct stereo matching. We show how to integrate them into a unified BDfF-Net to obtain high-quality depth maps. Comprehensive experiments show that our approach outperforms the state-of-the-art in both accuracy and speed and effectively emulates human vision systems.

preprint2020arXiv

Denoising-based Turbo Message Passing for Compressed Video Background Subtraction

In this paper, we consider the compressed video background subtraction problem that separates the background and foreground of a video from its compressed measurements. The background of a video usually lies in a low dimensional space and the foreground is usually sparse. More importantly, each video frame is a natural image that has textural patterns. By exploiting these properties, we develop a message passing algorithm termed offline denoising-based turbo message passing (DTMP). We show that these structural properties can be efficiently handled by the existing denoising techniques under the turbo message passing framework. We further extend the DTMP algorithm to the online scenario where the video data is collected in an online manner. The extension is based on the similarity/continuity between adjacent video frames. We adopt the optical flow method to refine the estimation of the foreground. We also adopt the sliding window based background estimation to reduce complexity. By exploiting the Gaussianity of messages, we develop the state evolution to characterize the per-iteration performance of offline and online DTMP. Comparing to the existing algorithms, DTMP can work at much lower compression rates, and can subtract the background successfully with a lower mean squared error and better visual quality for both offline and online compressed video background subtraction.

preprint2020arXiv

Distributed Bayesian Matrix Decomposition for Big Data Mining and Clustering

Matrix decomposition is one of the fundamental tools to discover knowledge from big data generated by modern applications. However, it is still inefficient or infeasible to process very big data using such a method in a single machine. Moreover, big data are often distributedly collected and stored on different machines. Thus, such data generally bear strong heterogeneous noise. It is essential and useful to develop distributed matrix decomposition for big data analytics. Such a method should scale up well, model the heterogeneous noise, and address the communication issue in a distributed system. To this end, we propose a distributed Bayesian matrix decomposition model (DBMD) for big data mining and clustering. Specifically, we adopt three strategies to implement the distributed computing including 1) the accelerated gradient descent, 2) the alternating direction method of multipliers (ADMM), and 3) the statistical inference. We investigate the theoretical convergence behaviors of these algorithms. To address the heterogeneity of the noise, we propose an optimal plug-in weighted average that reduces the variance of the estimation. Synthetic experiments validate our theoretical results, and real-world experiments show that our algorithms scale up well to big data and achieves superior or competing performance compared to other distributed methods.

preprint2020arXiv

Early Indicators of COVID-19 Spread Risk Using Digital Trace Data of Population Activities

The spread of pandemics such as COVID-19 is strongly linked to human activities. The objective of this paper is to specify and examine early indicators of disease spread risk in cities during the initial stages of outbreak based on patterns of human activities obtained from digital trace data. In this study, the Venables distance (D_v), and the activity density (D_a) are used to quantify and evaluate human activities for 193 US counties, whose cumulative number of confirmed cases was greater than 100 as of March 31, 2020. Venables distance provides a measure of the agglomeration of the level of human activities based on the average distance of human activities across a city or a county (less distance could lead to a greater contact risk). Activity density provides a measure of level of overall activity level in a county or a city (more activity could lead to a greater risk). Accordingly, Pearson correlation analysis is used to examine the relationship between the two human activity indicators and the basic reproduction number in the following weeks. The results show statistically significant correlations between the indicators of human activities and the basic reproduction number in all counties, as well as a significant leader-follower relationship (time lag) between them. The results also show one to two weeks&#39; lag between the change in activity indicators and the decrease in the basic reproduction number. This result implies that the human activity indicators provide effective early indicators for the spread risk of the pandemic during the early stages of the outbreak. Hence, the results could be used by the authorities to proactively assess the risk of disease spread by monitoring the daily Venables distance and activity density in a proactive manner.

preprint2020arXiv

Effects of Population Co-location Reduction on Cross-county Transmission Risk of COVID-19 in the United States

The rapid spread of COVID-19 in the United States has imposed a major threat to public health, the real economy, and human well-being. With the absence of effective vaccines, the preventive actions of social distancing and travel reduction are recognized as essential non-pharmacologic approaches to control the spread of COVID-19. Prior studies demonstrated that human movement and mobility drove the spatiotemporal distribution of COVID-19 in China. Little is known, however, about the patterns and effects of co-location reduction on cross-county transmission risk of COVID-19. This study utilizes Facebook co-location data for all counties in the United States from March to early May 2020. The analysis examines the synchronicity and time lag between travel reduction and pandemic growth trajectory to evaluate the efficacy of social distancing in ceasing the population co-location probabilities, and subsequently the growth in weekly new cases. The results show that the mitigation effects of co-location reduction appear in the growth of weekly new cases with one week of delay. Furthermore, significant segregation is found among different county groups which are categorized based on numbers of cases. The results suggest that within-group co-location probabilities remain stable, and social distancing policies primarily resulted in reduced cross-group co-location probabilities (due to travel reduction from counties with large number of cases to counties with low numbers of cases). These findings could have important practical implications for local governments to inform their intervention measures for monitoring and reducing the spread of COVID-19, as well as for adoption in future pandemics. Public policy, economic forecasting, and epidemic modeling need to account for population co-location patterns in evaluating transmission risk of COVID-19 across counties.

preprint2020arXiv

Energy-Efficient Buffer-Aided Relaying Systems with Opportunistic Spectrum Access

In this paper, an energy-efficient cross-layer design framework is proposed for cooperative relaying networks, which takes into account the influence of spectrum utilization probability. Specifically, random arrival traffic is considered and an adaptive modulation and coding (AMC) scheme is adopted in the cooperative transmission system to improve the system performance. The average packet dropping rate of the relay-buffer is studied at first. With the packet dropping rate and stationary distribution of the system state, the closed-form expression of the delay is derived. Then the energy efficiency for relay-assisted transmission is investigated, which takes into account the queueing process of the relay and the source. In this context, an energy efficiency optimization problem is formulated to determine the optimum strategy of power and time allocation for the relay-assisted cooperative system. Finally, the energy efficient switching strategy between the relay assisted transmission and the direct transmission is obtained, where packet transmissions have different delay requirements. In addition, energy efficient transmission policy with AMC is obtained. Numerical results demonstrate the effectiveness of the proposed design improving the energy efficiency.

preprint2020arXiv

Feedback Recurrent AutoEncoder

In this work, we propose a new recurrent autoencoder architecture, termed Feedback Recurrent AutoEncoder (FRAE), for online compression of sequential data with temporal dependency. The recurrent structure of FRAE is designed to efficiently extract the redundancy along the time dimension and allows a compact discrete representation of the data to be learned. We demonstrate its effectiveness in speech spectrogram compression. Specifically, we show that the FRAE, paired with a powerful neural vocoder, can produce high-quality speech waveforms at a low, fixed bitrate. We further show that by adding a learned prior for the latent space and using an entropy coder, we can achieve an even lower variable bitrate.

preprint2020arXiv

Feedback Recurrent Autoencoder for Video Compression

Recent advances in deep generative modeling have enabled efficient modeling of high dimensional data distributions and opened up a new horizon for solving data compression problems. Specifically, autoencoder based learned image or video compression solutions are emerging as strong competitors to traditional approaches. In this work, We propose a new network architecture, based on common and well studied components, for learned video compression operating in low latency mode. Our method yields state of the art MS-SSIM/rate performance on the high-resolution UVG dataset, among both learned video compression approaches and classical video compression methods (H.265 and H.264) in the rate range of interest for streaming applications. Additionally, we provide an analysis of existing approaches through the lens of their underlying probabilistic graphical models. Finally, we point out issues with temporal consistency and color shift observed in empirical evaluation, and suggest directions forward to alleviate those.

preprint2020arXiv

Free-breathing and ungated cardiac cine using navigator-less spiral SToRM

We introduce a kernel low-rank algorithm to recover free-breathing and ungated dynamic MRI from spiral acquisitions without explicit k-space navigators. It is often challenging for low-rank methods to recover free-breathing and ungated images from undersampled measurements; extensive cardiac and respiratory motion often results in the Casorati matrix not being sufficiently low-rank. Therefore, we exploit the non-linear structure of the dynamic data, which gives the low-rank kernel matrix. Unlike prior work that rely on navigators to estimate the manifold structure, we propose a kernel low-rank matrix completion method to directly fill in the missing k-space data from variable density spiral acquisitions. We validate the proposed scheme using simulated data and in-vivo data. Our results show that the proposed scheme provides improved reconstructions compared to the classical methods such as low-rank and XD-GRASP. The comparison with breath-held cine data shows that the quantitative metrics agree, whereas the image quality is marginally lower.

preprint2020arXiv

GIKT: A Graph-based Interaction Model for Knowledge Tracing

With the rapid development in online education, knowledge tracing (KT) has become a fundamental problem which traces students&#39; knowledge status and predicts their performance on new questions. Questions are often numerous in online education systems, and are always associated with much fewer skills. However, the previous literature fails to involve question information together with high-order question-skill correlations, which is mostly limited by data sparsity and multi-skill problems. From the model perspective, previous models can hardly capture the long-term dependency of student exercise history, and cannot model the interactions between student-questions, and student-skills in a consistent way. In this paper, we propose a Graph-based Interaction model for Knowledge Tracing (GIKT) to tackle the above probems. More specifically, GIKT utilizes graph convolutional network (GCN) to substantially incorporate question-skill correlations via embedding propagation. Besides, considering that relevant questions are usually scattered throughout the exercise history, and that question and skill are just different instantiations of knowledge, GIKT generalizes the degree of students&#39; master of the question to the interactions between the student&#39;s current state, the student&#39;s history related exercises, the target question, and related skills. Experiments on three datasets demonstrate that GIKT achieves the new state-of-the-art performance, with at least 1% absolute AUC improvement.

preprint2020arXiv

Green Offloading in Fog-Assisted IoT Systems: An Online Perspective Integrating Learning and Control

In fog-assisted IoT systems, it is a common practice to offload tasks from IoT devices to their nearby fog nodes to reduce task processing latencies and energy consumptions. However, the design of online energy-efficient scheme is still an open problem because of various uncertainties in system dynamics such as processing capacities and transmission rates. Moreover, the decision-making process is constrained by resource limits on fog nodes and IoT devices, making the design even more complicated. In this paper, we formulate such a task offloading problem with unknown system dynamics as a combinatorial multi-armed bandit (CMAB) problem with long-term constraints on time-averaged energy consumptions. Through an effective integration of online learning and online control, we propose a \textit{Learning-Aided Green Offloading} (LAGO) scheme. In LAGO, we employ bandit learning methods to handle the exploitation-exploration tradeoff and utilize virtual queue techniques to deal with the long-term constraints. Our theoretical analysis shows that LAGO can reduce the average task latency with a tunable sublinear regret bound over a finite time horizon and satisfy the long-term time-averaged energy constraints. We conduct extensive simulations to verify such theoretical results.

preprint2020arXiv

Guided Variational Autoencoder for Disentanglement Learning

We propose an algorithm, guided variational autoencoder (Guided-VAE), that is able to learn a controllable generative model by performing latent representation disentanglement learning. The learning objective is achieved by providing signals to the latent encoding/embedding in VAE without changing its main backbone architecture, hence retaining the desirable properties of the VAE. We design an unsupervised strategy and a supervised strategy in Guided-VAE and observe enhanced modeling and controlling capability over the vanilla VAE. In the unsupervised strategy, we guide the VAE learning by introducing a lightweight decoder that learns latent geometric transformation and principal components; in the supervised strategy, we use an adversarial excitation and inhibition mechanism to encourage the disentanglement of the latent variables. Guided-VAE enjoys its transparency and simplicity for the general representation learning task, as well as disentanglement learning. On a number of experiments for representation learning, improved synthesis/sampling, better disentanglement for classification, and reduced classification errors in meta-learning have been observed.

preprint2020arXiv

Hierarchical Bi-Directional Feature Perception Network for Person Re-Identification

Previous Person Re-Identification (Re-ID) models aim to focus on the most discriminative region of an image, while its performance may be compromised when that region is missing caused by camera viewpoint changes or occlusion. To solve this issue, we propose a novel model named Hierarchical Bi-directional Feature Perception Network (HBFP-Net) to correlate multi-level information and reinforce each other. First, the correlation maps of cross-level feature-pairs are modeled via low-rank bilinear pooling. Then, based on the correlation maps, Bi-directional Feature Perception (BFP) module is employed to enrich the attention regions of high-level feature, and to learn abstract and specific information in low-level feature. And then, we propose a novel end-to-end hierarchical network which integrates multi-level augmented features and inputs the augmented low- and middle-level features to following layers to retrain a new powerful network. What&#39;s more, we propose a novel trainable generalized pooling, which can dynamically select any value of all locations in feature maps to be activated. Extensive experiments implemented on the mainstream evaluation datasets including Market-1501, CUHK03 and DukeMTMC-ReID show that our method outperforms the recent SOTA Re-ID models.

preprint2020arXiv

High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification

Occluded person re-identification (ReID) aims to match occluded person images to holistic ones across dis-joint cameras. In this paper, we propose a novel framework by learning high-order relation and topology information for discriminative features and robust alignment. At first, we use a CNN backbone and a key-points estimation model to extract semantic local features. Even so, occluded images still suffer from occlusion and outliers. Then, we view the local features of an image as nodes of a graph and propose an adaptive direction graph convolutional (ADGC)layer to pass relation information between nodes. The proposed ADGC layer can automatically suppress the message-passing of meaningless features by dynamically learning di-rection and degree of linkage. When aligning two groups of local features from two images, we view it as a graph matching problem and propose a cross-graph embedded-alignment (CGEA) layer to jointly learn and embed topology information to local features, and straightly predict similarity score. The proposed CGEA layer not only take full use of alignment learned by graph matching but also re-place sensitive one-to-one matching with a robust soft one. Finally, extensive experiments on occluded, partial, and holistic ReID tasks show the effectiveness of our proposed method. Specifically, our framework significantly outperforms state-of-the-art by6.5%mAP scores on Occluded-Duke dataset.

preprint2020arXiv

In Search for Infall Motion in molecular clumps II: HCO+ (1-0) and HCN (1-0) Observations toward a Sub-sample of Infall Candidates

Gravitational accretion accumulates the original mass, and this process is crucial for us to understand the initial phases of star formation. Using the specific infall profiles in optically thick and thin lines, we searched the clumps with infall motion from the Milky Way Imaging Scroll Painting (MWISP) CO data in previous work. In this study, we selected 133 sources of them as a sub-sample for further research and identification. The excitation temperatures of these sources are between 7.0 and 38.5 K, while the H_2 column densities are between 10^21 and 10^23 cm^-2. We have observed optically thick lines HCO+ (1-0) and HCN (1-0) using the DLH 13.7-m telescope, and found 56 sources of them with blue profile and no red profile in these two lines, which are likely to have infall motions, with the detection rate of 42\%. It suggests that using CO data to restrict sample can effectively improve the infall detection rate. Among these confirmed infall sources, there are 43 associated with Class 0/I young stellar objects (YSOs), and 13 are not. These 13 sources are probably associated with the sources in earlier evolutionary stage. By comparison, the confirmed sources which are associated with Class 0/I YSOs have higher excitation temperatures and column densities, while the other sources are colder and have lower column densities. Most infall velocities of the sources we confirmed are between 10^-1 to 10^0 km s^-1, which is consistent with previous studies.

preprint2020arXiv

Joint Switch-Controller Association and Control Devolution for SDN Systems: An Integration of Online Control and Online Learning

In software-defined networking (SDN) systems, it is a common practice to adopt a multi-controller design and control devolution techniques to improve the performance of the control plane. However, in such systems, the decision-making for joint switch-controller association and control devolution often involves various uncertainties, e.g., the temporal variations of controller accessibility, and computation and communication costs of switches. In practice, statistics of such uncertainties are unattainable and need to be learned in an online fashion, calling for an integrated design of learning and control. In this paper, we formulate a stochastic network optimization problem that aims to minimize time-average system costs and ensure queue stability. By transforming the problem into a combinatorial multi-armed bandit problem with long-term stability constraints, we adopt bandit learning methods and optimal control techniques to handle the exploration-exploitation tradeoff and long-term stability constraints, respectively. Through an integrated design of online learning and online control, we propose an effective Learning-Aided Switch-Controller Association and Control Devolution (LASAC) scheme. Our theoretical analysis and simulation results show that LASAC achieves a tunable tradeoff between queue stability and system cost reduction with a sublinear time-averaged regret bound over a finite time horizon.

preprint2020arXiv

Kinetics of Crystallization and Orientational Ordering in Dipolar Particle Systems

The kinetic mechanisms underlying bottom-up assembly of colloidal particles have been widely investigated in efforts to control crystallization pathways and to direct growth into targeted superstructures for applications including photonic crystals. Current work builds on recent progress in the development of kinetic theories for crystal growth of body-centered-cubic crystals in systems with short-range inter-particle interactions, accounting for a greater diversity of crystal structures and the role of the longer-ranged interactions and orientational degrees of freedom arising in polar systems. We address the importance of orientational ordering processes in influencing crystal growth in such polar systems, thus advancing the theory beyond the treatment of the translational ordering processes considered in previous investigations. The work employs comprehensive molecular-dynamics simulations that resolve key crystallization processes, and are used in the development of a quantitative theoretical framework based on ideas from time-dependent Ginzburg-Landau theory. The significant impact of orientational ordering on the crystallization kinetics could be potentially leveraged to achieve crystallization kinetics steering through external electric or magnetic fields. Our combined theory/simulation approach provides opportunities for future investigations of more complex crystallization kinetics.

preprint2020arXiv

Learning Adaptive Embedding Considering Incremental Class

Class-Incremental Learning (CIL) aims to train a reliable model with the streaming data, which emerges unknown classes sequentially. Different from traditional closed set learning, CIL has two main challenges: 1) Novel class detection. The initial training data only contains incomplete classes, and streaming test data will accept unknown classes. Therefore, the model needs to not only accurately classify known classes, but also effectively detect unknown classes; 2) Model expansion. After the novel classes are detected, the model needs to be updated without re-training using entire previous data. However, traditional CIL methods have not fully considered these two challenges, first, they are always restricted to single novel class detection each phase and embedding confusion caused by unknown classes. Besides, they also ignore the catastrophic forgetting of known categories in model update. To this end, we propose a Class-Incremental Learning without Forgetting (CILF) framework, which aims to learn adaptive embedding for processing novel class detection and model update in a unified framework. In detail, CILF designs to regularize classification with decoupled prototype based loss, which can improve the intra-class and inter-class structure significantly, and acquire a compact embedding representation for novel class detection in result. Then, CILF employs a learnable curriculum clustering operator to estimate the number of semantic clusters via fine-tuning the learned network, in which curriculum operator can adaptively learn the embedding in self-taught form. Therefore, CILF can detect multiple novel classes and mitigate the embedding confusion problem. Last, with the labeled streaming test data, CILF can update the network with robust regularization to mitigate the catastrophic forgetting. Consequently, CILF is able to iteratively perform novel class detection and model update.

preprint2020arXiv

Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking

A major challenge in matching images and text is that they have intrinsically different data distributions and feature representations. Most existing approaches are based either on embedding or classification, the first one mapping image and text instances into a common embedding space for distance measuring, and the second one regarding image-text matching as a binary classification problem. Neither of these approaches can, however, balance the matching accuracy and model complexity well. We propose a novel framework that achieves remarkable matching performance with acceptable model complexity. Specifically, in the training stage, we propose a novel Multi-modal Tensor Fusion Network (MTFN) to explicitly learn an accurate image-text similarity function with rank-based tensor fusion rather than seeking a common embedding space for each image-text instance. Then, during testing, we deploy a generic Cross-modal Re-ranking (RR) scheme for refinement without requiring additional training procedure. Extensive experiments on two datasets demonstrate that our MTFN-RR consistently achieves the state-of-the-art matching performance with much less time complexity. The implementation code is available at https://github.com/Wangt-CN/MTFN-RR-PyTorch-Code.

preprint2020arXiv

MIPS: Instance Placement for Stream Processing Systems based on Monte Carlo Tree Search

Stream processing engines enable modern systems to conduct large-scale analytics over unbounded data streams in real time. They often view an application as a direct acyclic graph with streams flowing through pipelined instances of various processing units. One key challenge that emerges is instance placement, i.e., to decide the placement of instances across servers with minimum traffic across servers and maximum resource utilization. The challenge roots in not only its intrinsic complexity but also the impact between successive application deployments. Most updated engines such as Apache Heron exploits a more modularized scheduler design that decomposes the task into two stages: One decides the instance-to-container mapping while the other focuses on the container-to-server mapping that is delegated to standalone resource managers. The unaligned objectives and scheduler designs in the two stages may lead to long response times or low utilization. However, so far little work has appeared to address the challenge. Inspired by the recent success of applications of Monte Carlo Tree Search (MCTS) methods in various fields, we develop a novel model to characterize such systems, formulate the problem, and cast each stage of mapping into a sequential decision process. By adopting MCTS methods, we propose MIPS, an MCTS-based Instance Placement Scheme to decide the two-staged mapping in a timely yet efficient manner. In addition, we discuss practical issues and refine MIPS to further improve its performance. Results from extensive simulations show, given mild-value of samples, MIPS outperforms existing schemes with a significant traffic reduction and utilization improvement. To our best knowledge, this paper is the first to study the two-staged mapping problem and to apply MCTS to solving the challenge.

preprint2020arXiv

Multi-level Training and Bayesian Optimization for Economical Hyperparameter Optimization

Hyperparameters play a critical role in the performances of many machine learning methods. Determining their best settings or Hyperparameter Optimization (HPO) faces difficulties presented by the large number of hyperparameters as well as the excessive training time. In this paper, we develop an effective approach to reducing the total amount of required training time for HPO. In the initialization, the nested Latin hypercube design is used to select hyperparameter configurations for two types of training, which are, respectively, heavy training and light training. We propose a truncated additive Gaussian process model to calibrate approximate performance measurements generated by light training, using accurate performance measurements generated by heavy training. Based on the model, a sequential model-based algorithm is developed to generate the performance profile of the configuration space as well as find optimal ones. Our proposed approach demonstrates competitive performance when applied to optimize synthetic examples, support vector machines, fully connected networks and convolutional neural networks.

preprint2020arXiv

New Complementary Sets with Low PAPR Property under Spectral Null Constraints

Complementary set sequences (CSSs) are useful for dealing with the high peak-to-average power ratio (PAPR) problem in orthogonal frequency division multiplexing (OFDM) systems. In practical OFDM transmission, however, certain sub-carriers maybe reserved and/or prohibited to transmit signals, leading to the so-called \emph{spectral null constraint} (SNC) design problem. For example, the DC sub-carrier is reserved to avoid the offsets in D/A and A/D converter in the LTE systems. While most of the current research focus on the design of low PAPR CSSs to improve the code-rate, few works address the aforementioned SNC in their designs. This motivates us to investigate CSSs with SNC as well as low PAPR property. In this paper, we present systematic constructions of CSSs under SNCs and low PAPR. First, we show that mutually orthogonal complementary sets (MOCSs) can be used as \emph{seed sequences} to generate new CSSs with SNC and low PAPR, and then provide an iterative technique for the construction of MOCSs which can be further used to generate complementary sets (CSs) with low PAPRs and spectral nulls at \emph{varying} positions in the designed sequences. Next, inspired by a recent idea of Chen, we propose a novel construction of these \emph{seed} MOCSs with non-power-of-two lengths from generalized Boolean functions.

preprint2020arXiv

Online User-AP Association with Predictive Scheduling in Wireless Caching Networks

For wireless caching networks, the scheme design for content delivery is non-trivial in the face of the following tradeoff. On one hand, to optimize overall throughput, users can associate their nearby APs with great channel capacities; however, this may lead to unstable queue backlogs on APs and prolong request delays. On the other hand, to ensure queue stability, some users may have to associate APs with inferior channel states, which would incur throughput loss. Moreover, for such systems, how to conduct predictive scheduling to reduce delays and the fundamental limits of its benefits remain unexplored. In this paper, we formulate the problem of online user-AP association and resource allocation for content delivery with predictive scheduling under a fixed content placement as a stochastic network optimization problem. By exploiting its unique structure, we transform the problem into a series of modular maximization sub-problems with matroid constraints. Then we devise PUARA, a Predictive User-AP Association and Resource Allocation scheme which achieves a provably near-optimal throughput with queue stability. Our theoretical analysis and simulation results show that PUARA can not only perform a tunable control between throughput maximization and queue stability but also incur a notable delay reduction with predicted information.

preprint2020arXiv

Online VNF Chaining and Predictive Scheduling: Optimality and Trade-offs

For NFV systems, the key design space includes the function chaining for network requests and resource scheduling for servers. The problem is challenging since NFV systems usually require multiple (often conflicting) design objectives and the computational efficiency of real-time decision making with limited information. Furthermore, the benefits of predictive scheduling to NFV systems still remain unexplored. In this paper, we propose POSCARS, an efficient predictive and online service chaining and resource scheduling scheme that achieves tunable trade-offs among various system metrics with queue stability guarantee. Through a careful choice of granularity in system modeling, we acquire a better understanding of the trade-offs in our design space. By a non-trivial transformation, we decouple the complex optimization problem into a series of online sub-problems to achieve the optimality with only limited information. By employing randomized load balancing techniques, we propose three variants of POSCARS to reduce the overheads of decision making. Theoretical analysis and simulations show that POSCARS and its variants require only mild-value of future information to achieve near-optimal system cost with an ultra-low request response time.

preprint2020arXiv

POTUS: Predictive Online Tuple Scheduling for Data Stream Processing Systems

Most online service providers deploy their own data stream processing systems in the cloud to conduct large-scale and real-time data analytics. However, such systems, e.g., Apache Heron, often adopt naive scheduling schemes to distribute data streams (in the units of tuples) among processing instances, which may result in workload imbalance and system disruption. Hence, there still exists a mismatch between the temporal variations of data streams and such inflexible scheduling scheme designs. Besides, the fundamental benefits of predictive scheduling to data stream processing systems also remain unexplored. In this paper, we focus on the problem of tuple scheduling with predictive service in Apache Heron. With a careful choice in the granularity of system modeling and decision making, we formulate the problem as a stochastic network optimization problem and propose POTUS, an online predictive scheduling scheme that aims to minimize the response time of data stream processing by steering data streams in a distributed fashion. Theoretical analysis and simulation results show that POTUS achieves an ultra-low response time with queue stability guarantee. Moreover, POTUS only requires mild-value of future information to effectively reduce the response time, even with mis-prediction.

preprint2020arXiv

Privacy-preserving Medical Treatment System through Nondeterministic Finite Automata

In this paper, we propose a privacy-preserving medical treatment system using nondeterministic finite automata (NFA), hereafter referred to as P-Med, designed for the remote medical environment. P-Med makes use of the nondeterministic transition characteristic of NFA to flexibly represent the medical model, which includes illness states, treatment methods and state transitions caused by exerting different treatment methods. A medical model is encrypted and outsourced to the cloud to deliver telemedicine services. Using P-Med, patient-centric diagnosis and treatment can be made on-the-fly while protecting the confidentiality of a patient&#39;s illness states and treatment recommendation results. Moreover, a new privacy-preserving NFA evaluation method is given in P-Med to get a confidential match result for the evaluation of an encrypted NFA and an encrypted data set, which avoids the cumbersome inner state transition determination. We demonstrate that P-Med realizes treatment procedure recommendation without privacy leakage to unauthorized parties. We conduct extensive experiments and analyses to evaluate efficiency.

preprint2020arXiv

Probability Weighted Compact Feature for Domain Adaptive Retrieval

Domain adaptive image retrieval includes single-domain retrieval and cross-domain retrieval. Most of the existing image retrieval methods only focus on single-domain retrieval, which assumes that the distributions of retrieval databases and queries are similar. However, in practical application, the discrepancies between retrieval databases often taken in ideal illumination/pose/background/camera conditions and queries usually obtained in uncontrolled conditions are very large. In this paper, considering the practical application, we focus on challenging cross-domain retrieval. To address the problem, we propose an effective method named Probability Weighted Compact Feature Learning (PWCF), which provides inter-domain correlation guidance to promote cross-domain retrieval accuracy and learns a series of compact binary codes to improve the retrieval speed. First, we derive our loss function through the Maximum A Posteriori Estimation (MAP): Bayesian Perspective (BP) induced focal-triplet loss, BP induced quantization loss and BP induced classification loss. Second, we propose a common manifold structure between domains to explore the potential correlation across domains. Considering the original feature representation is biased due to the inter-domain discrepancy, the manifold structure is difficult to be constructed. Therefore, we propose a new feature named Histogram Feature of Neighbors (HFON) from the sample statistics perspective. Extensive experiments on various benchmark databases validate that our method outperforms many state-of-the-art image retrieval methods for domain adaptive image retrieval. The source code is available at https://github.com/fuxianghuang1/PWCF

preprint2020arXiv

Quantum-classical crossover in the spin-1/2 Heisenberg-Kitaev kagome magnet

The spin-1/2 Heisenberg kagome antiferromagnet is one of the paradigmatic playgrounds for frustrated quantum magnetism, with an extensive number of competing resonating valence bond (RVB) states emerging at low energies, including gapped and gapless spin liquids and valence bond crystals. Here we revisit the crossover from this quantum RVB phase to a semiclassical regime brought about by anisotropic Kitaev interactions, and focus on the precise mechanisms underpinning this crossover. To this end, we introduce a simple parametrization of the classical ground states (GSs) in terms of emergent Ising-like variables, and use this parametrizaton: i) to construct an effective low-energy description of the order-by-disorder mechanism operating in a large part of the phase diagram, and ii) to contrast, side by side, exact diagonalization data obtained from the full basis with that obtained from the restricted (orthonormalized) basis of classical GSs. The results reveal that fluctuation corrections from states outside the restricted basis are strongly quenched inside the semiclassical regime (due to the large anisotropy spin gaps), and that the RVB phase survives up to a relatively large value of Kitaev anisotropy $K$. We further find that the pure Kitaev model admits a subextensive number of one-dimensional symmetries, which explains naturally the absence of classical and quantum order by disorder reported previously.

preprint2020arXiv

Quasi-Orthogonal Z-Complementary Pairs and Their Applications in Fully Polarimetric Radar Systems

One objective of this paper is to propose a novel class of sequence pairs, called &#34;Quasi-orthogonal Z-complementary pairs (QOZCPs)&#34;, each depicting Z-complementary property for their aperiodic auto-correlation sums and also have a zero correlation zone when their aperiodic cross-correlation is considered. Construction of QOZCPs based on Successively Distributed Algorithms under Majorization Minimization (SDAMM) is presented. Another objective of this paper is to apply the proposed QOZCPs in fully polarimetric radar systems and analyse the corresponding ambiguity functions. It turns out that QOZCP waveforms are much more Doppler resilient than the known Golay complementary waveforms.

preprint2020arXiv

Reconstruction of the collision kernel in the nonlinear Boltzmann equation

We consider an inverse problem for the Boltzmann equation with nonlinear collision operator in dimensions $n\geq 2$. We show that the kinetic collision kernel can be uniquely determined from the incoming-to-outgoing mappings on the boundary of the domain provided that the kernel satisfies a monotonicity condition. Furthermore, a reconstruction formula is also derived. The key methodology is based on the higher-order linearization scheme to reduce a nonlinear equation into simpler linear equations by introducing multiple small parameters into the original equation.

preprint2020arXiv

Resolving the Nuclear Radio Emission from M32 with Very Large Array

The Local Group dwarf elliptical galaxy M32 hosts one of the nearest and most under-luminous super-massive black holes (SMBHs) ever known, offering a rare opportunity to study the physics of accreting SMBHs at the most quiescent state. Recent Very Large Array (VLA) observations have detected a radio source at the nucleus of M32, which is suggested to be the radio counterpart of the SMBH. To further investigate the radio properties of this nuclear source, we have conducted follow-up, high-resolution VLA observations in four epochs between 2015--2017, each with dual frequencies. At 6 GHz, the nuclear source is resolved under an angular resolution of $\sim$0\farcs4, exhibiting a coreless, slightly lopsided morphology with a detectable extent of $\sim$2.5 \arcsec ($\sim$10 parsec). No significant variability can be found among the four epochs. At 15 GHz, no significant emission can be detected within the same region, pointing to a steep intrinsic radio spectrum (with a 3\,$σ$ upper limit of -1.46 for the spectral index). We discuss possible scenarios for the nature of this nuclear source and conclude that a stellar origin, in particular planetary nebulae, X-ray binaries, supernova remnants or diffuse ionized gas powered by massive stars, can be ruled out.Instead, the observed radio properties can be explained by synchrotron radiation from a hypothetical wind driven by the weakly accreting SMBH.

preprint2020arXiv

S2OSC: A Holistic Semi-Supervised Approach for Open Set Classification

Open set classification (OSC) tackles the problem of determining whether the data are in-class or out-of-class during inference, when only provided with a set of in-class examples at training time. Traditional OSC methods usually train discriminative or generative models with in-class data, then utilize the pre-trained models to classify test data directly. However, these methods always suffer from embedding confusion problem, i.e., partial out-of-class instances are mixed with in-class ones of similar semantics, making it difficult to classify. To solve this problem, we unify semi-supervised learning to develop a novel OSC algorithm, S2OSC, that incorporates out-of-class instances filtering and model re-training in a transductive manner. In detail, given a pool of newly coming test data, S2OSC firstly filters distinct out-of-class instances using the pre-trained model, and annotates super-class for them. Then, S2OSC trains a holistic classification model by combing in-class and out-of-class labeled data and remaining unlabeled test data in semi-supervised paradigm, which also integrates pre-trained model for knowledge distillation to further separate mixed instances. Despite its simplicity, the experimental results show that S2OSC achieves state-of-the-art performance across a variety of OSC tasks, including 85.4% of F1 on CIFAR-10 with only 300 pseudo-labels. We also demonstrate how S2OSC can be expanded to incremental OSC setting effectively with streaming data.

preprint2020arXiv

Service Chain Composition with Failures in NFV Systems: A Game-Theoretic Perspective

For state-of-the-art network function virtualization (NFV) systems, it remains a key challenge to conduct effective service chain composition for different network services (NSs) with ultra-low request latencies and minimum network congestion. To this end, existing solutions often require full knowledge of the network state, while ignoring the privacy issues and overlooking the non-cooperative behaviors of users. What is more, they may fall short in the face of unexpected failures such as user unavailability and virtual machine breakdown. In this paper, we formulate the problem of service chain composition in NFV systems with failures as a non-cooperative game. By showing that such a game is a weighted potential game and exploiting the unique problem structure, we propose two effective distributed schemes that guide the service chain compositions of different NSs towards the Nash equilibrium (NE) state with both near-optimal latencies and minimum congestion. Besides, we develop two novel learning-aided schemes as comparisons, which are based on deep reinforcement learning (DRL) and Monte Carlo tree search (MCTS) techniques, respectively. Our theoretical analysis and simulation results demonstrate the effectiveness of our proposed schemes, as well as the adaptivity when faced with failures.

preprint2020arXiv

SQLFlow: A Bridge between SQL and Machine Learning

Industrial AI systems are mostly end-to-end machine learning (ML) workflows. A typical recommendation or business intelligence system includes many online micro-services and offline jobs. We describe SQLFlow for developing such workflows efficiently in SQL. SQL enables developers to write short programs focusing on the purpose (what) and ignoring the procedure (how). Previous database systems extended their SQL dialect to support ML. SQLFlow (https://sqlflow.org/sqlflow ) takes another strategy to work as a bridge over various database systems, including MySQL, Apache Hive, and Alibaba MaxCompute, and ML engines like TensorFlow, XGBoost, and scikit-learn. We extended SQL syntax carefully to make the extension working with various SQL dialects. We implement the extension by inventing a collaborative parsing algorithm. SQLFlow is efficient and expressive to a wide variety of ML techniques -- supervised and unsupervised learning; deep networks and tree models; visual model explanation in addition to training and prediction; data processing and feature extraction in addition to ML. SQLFlow compiles a SQL program into a Kubernetes-native workflow for fault-tolerable execution and on-cloud deployment. Current industrial users include Ant Financial, DiDi, and Alibaba Group.

preprint2020arXiv

Travel time tomography in stationary spacetimes

In this paper, we consider the boundary rigidity problem on a cylindrical domain in $\mathbb R^{1+n}$, $n\geq 2$, equipped with a stationary (time-invariant) Lorentzian metric. We show that the time separation function between pairs of points on the boundary of the cylindrical domain determines the stationary spacetime, up to some time-invariant diffeomorphism, assuming that the metric satisfies some a-priori conditions.

preprint2020arXiv

Ultrasound Modulated Bioluminescence Tomography With A Single Optical Measurement

Ultrasound modulated bioluminescence tomography (UMBLT) is an imaging method which can be formulated as a hybrid inverse source problem. In the regime where light propagation is modeled by a radiative transfer equation, previous approaches to this problem require large numbers of optical measurements [10]. Here we propose an alternative solution for this inverse problem which requires only a single optical measurement in order to reconstruct the isotropic source. Specifically, we derive two inversion formulae based on Neumann series and Fredholm theory respectively, and prove their convergence under sufficient conditions. The resulting numerical algorithms are implemented and experimented to reconstruct both continuous and discontinuous sources in the presence of noise.

preprint2020arXiv

Understanding Electricity-Theft Behavior via Multi-Source Data

Electricity theft, the behavior that involves users conducting illegal operations on electrical meters to avoid individual electricity bills, is a common phenomenon in the developing countries. Considering its harmfulness to both power grids and the public, several mechanized methods have been developed to automatically recognize electricity-theft behaviors. However, these methods, which mainly assess users&#39; electricity usage records, can be insufficient due to the diversity of theft tactics and the irregularity of user behaviors. In this paper, we propose to recognize electricity-theft behavior via multi-source data. In addition to users&#39; electricity usage records, we analyze user behaviors by means of regional factors (non-technical loss) and climatic factors (temperature) in the corresponding transformer area. By conducting analytical experiments, we unearth several interesting patterns: for instance, electricity thieves are likely to consume much more electrical power than normal users, especially under extremely high or low temperatures. Motivated by these empirical observations, we further design a novel hierarchical framework for identifying electricity thieves. Experimental results based on a real-world dataset demonstrate that our proposed model can achieve the best performance in electricity-theft detection (e.g., at least +3.0% in terms of F0.5) compared with several baselines. Last but not least, our work has been applied by the State Grid of China and used to successfully catch electricity thieves in Hangzhou with a precision of 15% (an improvement form 0% attained by several other models the company employed) during monthly on-site investigation.

preprint2020arXiv

Vortical Reflection and Spiraling Fermi Arcs with Weyl Metamaterials

Scatterings and transport in Weyl semimetals have caught growing attention in condensed matter physics, with observables including chiral zero modes and the associated magnetoresistance and chiral magnetic effects. Measurement of electrical conductance is usually performed in these studies, which, however, cannot resolve the momentum of electrons, preventing direct observation of the phase singularities in scattering matrix associated with Weyl point. Here we experimentally demonstrate a helical phase distribution in the angle (momentum) resolved scattering matrix of electromagnetic waves in a photonic Weyl metamaterial. It further leads to spiraling Fermi arcs in an air gap sandwiched between a Weyl metamaterial and a metal plate. Benefiting from the alignment-free feature of angular vortical reflection, our findings establish a new platform in manipulating optical angular momenta with photonic Weyl systems.

preprint2019arXiv

Generalized Constructions of Complementary Sets of Sequences of Lengths Non-Power-of-Two

The construction of complementary sets (CSs) of sequences with different set size and sequence length become important due to its practical application for OFDM systems. Most of the constructions of CSs, based on generalized Boolean functions (GBFs), are of length $2^α$ ($α$ is a natural number). Recently some works have been reported on construction of CSs having lengths non-power of two, i.e., in the form of $2^{m-1}+2^v$ ($m$ is natural number, $0\leq v <m $), $N+1$ and $N+2$, where $N$ is a length for which $q$-ary complementary pairs exist. In this paper, we propose a construction of CSs of lengths $M+N$ for set size $4n$, using concatenation of CSs of lengths $M$ and $N$, and set size $4n$, where $M$ and $N$ are lengths for which $q$-ary complementary pairs exists. Also, we construct CSs of length $M+P$ for set size $8n$ by concatenating CSs of lengths $M$ and $P$, and set size $8n$, where $M$ and $P$ are lengths for which $q$-ary complementary pairs and complementary sets of size $4$ exists, respectively. The proposed constructions cover all the previous constructions as special cases in terms of lengths and lead to more CSs of new sequence lengths which have not been reported before.

preprint2018arXiv

Electron effective mass and electronic structure in nonstoichiometric a-IGZO films

The transport properties and optical transmittance and absorption spectra for the nostoichiometric amorphous Indium Gallium Zinc Oxide (a-IGZO) films with Gallium and Zinc deficiencies are investigated. The resistivity and carrier concentration variation with temperature both reveal that the films possess degenerate semiconductor (or metal) characteristics. The thermopower is negative and decreases linearly with decreasing temperature, indicating the electron diffusion thermopower governs the thermal transport process in each film. Using free-electron-like model, we extracted the electron effective mass, which is about three times as large as that of the stoichiometric one and increases with increasing carrier (electron) concentration. Neglecting the variation in the energy with the wavevector near the valence band maximum and using the free-electron-like model, we also obtained the electron effective mass via the optical absorption spectra measurement. The magnitude of the effective mass obtained via optical spectra measurement is comparable to that obtained via thermopower measurement for each film. Our results strongly suggest that the nostoichiometric a-IGZO films possess free-electron-like pseudo-energy-bandstructure.