Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
89works
0followers
51topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

89 published item(s)

preprint2026arXiv

Dr. Zero: Self-Evolving Search Agents without Training Data

As high-quality data becomes increasingly difficult to obtain, data-free self-evolution has emerged as a promising paradigm. This approach allows large language models (LLMs) to autonomously generate and solve complex problems, thereby improving their reasoning capabilities. However, multi-turn search agents struggle in data-free self-evolution due to the limited question diversity and the substantial compute required for multi-step reasoning and tool using. In this work, we introduce Dr. Zero, a framework enabling search agents to effectively self-evolve without any training data. In particular, we design a self-evolution feedback loop where a proposer generates diverse questions to train a solver initialized from the same base model. As the solver evolves, it incentivizes the proposer to produce increasingly difficult yet solvable tasks, thus establishing an automated curriculum to refine both agents. To enhance training efficiency, we also introduce hop-grouped relative policy optimization (HRPO). This method clusters structurally similar questions to construct group-level baselines, effectively minimizing the sampling overhead in evaluating each query's individual difficulty and solvability. Consequently, HRPO significantly reduces the compute requirements for solver training without compromising performance or stability. Extensive experiment results demonstrate that the data-free Dr. Zero matches or surpasses fully supervised search agents, proving that complex reasoning and search capabilities can emerge solely through self-evolution.

preprint2026arXiv

From Failure to Mastery: Generating Hard Samples for Tool-use Agents

The advancement of LLM agents with tool-use capabilities requires diverse and complex training corpora. Existing data generation methods, which predominantly follow a paradigm of random sampling and shallow generation, often yield simple and homogeneous trajectories that fail to capture complex, implicit logical dependencies. To bridge this gap, we introduce HardGen, an automatic agentic pipeline designed to generate hard tool-use training samples with verifiable reasoning. Firstly, HardGen establishes a dynamic API Graph built upon agent failure cases, from which it samples to synthesize hard traces. Secondly, these traces serve as conditional priors to guide the instantiation of modular, abstract advanced tools, which are subsequently leveraged to formulate hard queries. Finally, the advanced tools and hard queries enable the generation of verifiable complex Chain-of-Thought (CoT), with a closed-loop evaluation feedback steering the continuous refinement of the process. Extensive evaluations demonstrate that a 4B parameter model trained with our curated dataset achieves superior performance compared to several leading open-source and closed-source competitors (e.g., GPT-5.2, Gemini-3-Pro and Claude-Opus-4.5). Our code, models, and dataset will be open-sourced to facilitate future research.

preprint2026arXiv

MambaPanoptic: A Vision Mamba-based Structured State Space Framework for Panoptic Segmentation

Panoptic segmentation requires the simultaneous recognition of countable thing instances and amorphous stuff regions, placing joint demands on long-range context modelling, multi-scale feature representation, and efficient dense prediction. Existing convolutional and transformer-based methods struggle to satisfy all three requirements concurrently: convolutional architectures are limited in their capacity to model long-range dependencies, while transformer-based methods incur quadratic computational cost that is prohibitive at high resolutions. In this paper, we propose MambaPanoptic, a fully Mamba-based panoptic segmentation framework that addresses these limitations through two principal contributions. First, we introduce MambaFPN, a top-down feature pyramid that leverages Mamba blocks to generate globally coherent, multi-scale feature representations with linear computational complexity. Second, we adopt a PanopticFCN-style kernel generator that produces unified thing and stuff kernels for proposal-free panoptic prediction, enhanced by a QuadMamba-based feature refinement module applied at multiple network stages. Experiments on the Cityscapes and COCO panoptic segmentation benchmarks demonstrate that MambaPanoptic consistently outperforms PanopticDeepLab and PanopticFCN under comparable model sizes, and matches or surpasses Mask2Former on Cityscapes in PQ and AP while requiring fewer parameters.

preprint2026arXiv

MASH: A Multiplatform and Multimodal Annotated Dataset for Societal Impact of Hurricane

Natural disasters cause multidimensional threats to human societies, with hurricanes exemplifying one of the most disruptive events that not only caused severe physical damage but also sparked widespread discussion on social media platforms. Existing datasets for studying societal impacts of hurricanes often focus on outdated hurricanes and are limited to a single social media platform, failing to capture the broader societal impact in today's diverse social media environment. Moreover, existing datasets annotate visual and textual content of the post separately, failing to account for the multimodal nature of social media posts. To address these gaps, we present a multiplatform and Multimodal Annotated Dataset for Societal Impact of Hurricane (MASH) that includes 59,607 relevant social media data posts from Reddit, TikTok, and YouTube. In addition, all relevant social media data posts are annotated in a multimodal approach that considers both textual and visual content on three dimensions: Humanitarian Classes, Bias Classes, and Information Integrity Classes. To our best knowledge, MASH is the first large-scale, multi-platform, multimodal, and multi-dimensionally annotated dataset centered on hurricane disasters. In addition, we introduce an online platform that supports interactive data exploration, provides preliminary analytical results, and allows users to share their insights regarding the societal impacts of hurricanes. We envision that MASH can contribute to the study of hurricanes' impact on society, such as disaster response, disaster severity classification, public sentiment analysis, disaster policy making, and bias identification. The dataset is publicly available at https://huggingface.co/datasets/YRC10/MASH under the Creative Commons Attribution 4.0 (CC BY 4.0) license.

preprint2026arXiv

MeSS: City Mesh-Guided Outdoor Scene Generation with Cross-View Consistent Diffusion

Mesh models have become increasingly accessible for numerous cities; however, the lack of realistic textures restricts their application in virtual urban navigation and autonomous driving. To address this, this paper proposes MeSS (Meshbased Scene Synthesis) for generating high-quality, styleconsistent outdoor scenes with city mesh models serving as the geometric prior. While image and video diffusion models can leverage spatial layouts (such as depth maps or HD maps) as control conditions to generate street-level perspective views, they are not directly applicable to 3D scene generation. Video diffusion models excel at synthesizing consistent view sequences that depict scenes but often struggle to adhere to predefined camera paths or align accurately with rendered control videos. In contrast, image diffusion models, though unable to guarantee cross-view visual consistency, can produce more geometry-aligned results when combined with ControlNet. Building on this insight, our approach enhances image diffusion models by improving cross-view consistency. The pipeline comprises three key stages: first, we generate geometrically consistent sparse views using Cascaded Outpainting ControlNets; second, we propagate denser intermediate views via a component dubbed AGInpaint; and third, we globally eliminate visual inconsistencies (e.g., varying exposure) using the GCAlign module. Concurrently with generation, a 3D Gaussian Splatting (3DGS) scene is reconstructed by initializing Gaussian balls on the mesh surface. Our method outperforms existing approaches in both geometric alignment and generation quality. Once synthesized, the scene can be rendered in diverse styles through relighting and style transfer techniques. project page: https://albertchen98.github.io/mess/

preprint2026arXiv

RELO: Reinforcement Learning to Localize for Visual Object Tracking

Conventional visual object trackers localize targets using handcrafted spatial priors, often in the form of heatmaps. Such priors provide only surrogate supervision and are poorly aligned with tracking optimization and evaluation metrics, such as intersection over union (IoU) and area under the success curve (AUC). Here, we introduce RELO, a REinforcement-learning-to-LOcalize method for visual object tracking that formulates target localization as a Markov decision process. Specifically, RELO replaces handcrafted spatial priors with a localization policy learned over spatial positions via reinforcement learning, with rewards combining frame-level IoU and sequence-level AUC. We additionally introduce layer-aligned temporal token propagation to improve semantic consistency across frames, with negligible computational overhead. Across multiple benchmarks, RELO achieves superior results, attaining 57.5% AUC on LaSOText without template updates. This confirms that reward-driven localization provides an effective alternative to prior-driven localization for visual object tracking.

preprint2026arXiv

V2P: Visual Attention Calibration for GUI Grounding via Background Suppression and Center Peaking

Precise localization of GUI elements is crucial for the development of GUI agents. Traditional methods rely on bounding box or center-point regression, neglecting spatial interaction uncertainty and visual-semantic hierarchies. Recent methods incorporate attention mechanisms but still face two key issues: (1) ignoring processing background regions causes attention drift from the desired area, and (2) uniform modeling the target UI element fails to distinguish between its center and edges, leading to click imprecision. Inspired by how humans visually process and interact with GUI elements, we propose the Valley-to-Peak (V2P) method to address these issues. To mitigate background distractions, V2P introduces a suppression attention mechanism that minimizes the model's focus on irrelevant regions to highlight the intended region. For the issue of center-edge distinction, V2P applies a Fitts' Law-inspired approach by modeling GUI interactions as 2D Gaussian heatmaps where the weight gradually decreases from the center towards the edges. The weight distribution follows a Gaussian function, with the variance determined by the target's size. Consequently, V2P effectively isolates the target area and teaches the model to concentrate on the most essential point of the UI element. The model trained by V2P achieves the performance with 92.4\% and 52.5\% on two benchmarks ScreenSpot-v2 and ScreenSpot-Pro (see Fig.~\ref{fig:main_results_charts}). Ablations further confirm each component's contribution, underscoring V2P's generalizability in precise GUI grounding tasks and its potential for real-world deployment in future GUI agents.

preprint2025arXiv

Movable Antenna Enhanced Multi-Region Beam Coverage: A Multi-Notch-Filter-Inspired Design

Movable antenna (MA) has emerged as a promising technology to enhance wireless communication performance by exploiting the new degree of freedom (DoF) via antenna position optimization. In this letter, we investigate the MA-enhanced wide beam coverage over multiple subregions in the spatial domain. Specifically, we aim to maximize the minimum beam gain over the desired subregions by jointly optimizing the transmit beamforming and antenna position vector (APV). Although this problem is non-convex, we propose an efficient algorithm to solve it by leveraging the similarity between the considered multi-region coverage and classical multi-notch filter (MNF) design. In particular, we construct a spatial MNF-based transmit beamforming vector by assuming a continuous amplitude and phase-shift profile within the antenna movement region. Based on this continuous profile, we propose a sequential update algorithm to select an optimal subset of MA positions for multi-region coverage, jointly with a Gibbs sampling (GS) procedure to avoid undesired local optimum. Numerical results show that our proposed algorithm can significantly outperform conventional fixed position antennas (FPAs) and achieve a comparable performance to the alternating optimization (AO) algorithm with dramatically lower complexity.

preprint2025arXiv

On the Effectiveness of Training Data Optimization for LLM-based Code Generation: An Empirical Study

Large language models (LLMs) have achieved remarkable progress in code generation, largely driven by the availability of high-quality code datasets for effective training. To further improve data quality, numerous training data optimization techniques have been proposed; however, their overall effectiveness has not been systematically evaluated. To bridge this gap, we conduct the first large-scale empirical study, examining five widely-used training data optimization techniques and their pairwise combinations for LLM-based code generation across three benchmarks and four LLMs. Our results show that data synthesis is the most effective technique for improving functional correctness and reducing code smells, although it performs relatively worse on code maintainability compared to data refactoring, cleaning, and selection. Regarding combinations, we find that most combinations do not further improve functional correctness but can effectively enhance code quality (code smells and maintainability). Among all combinations, data synthesis combined with data refactoring achieves the strongest overall performance. Furthermore, our fine-grained analysis reinforces these findings and provides deeper insights into how individual techniques and their combinations influence code generation effectiveness. Overall, this work represents a first step toward a systematic understanding of training data optimization and combination strategies, offering practical guidance for future research and deployment in LLM-based code generation.

preprint2025arXiv

Particle-scale origin of quadrupolar non-affine displacement fields in granular solids

In this work, we identify the local structural defects that control the non-affine displacement fields in jammed disk packings subjected to athermal, quasistatic (AQS) simple shear. While complex non-affine displacement fields typically occur during simple shear, isolated effective quadrupoles are also observed and their probability increases with increasing pressure. We show that the emergence of an isolated effective quadrupole requires the breaking of an interparticle contact that is aligned with low-frequency, spatially extended vibrational modes. Since the Eshelby inhomogeneity problem gives rise to quadrupolar displacement fields in continuum materials, we reformulate and implement Eshelby's equivalent inclusion method (EIM) for jammed disk packings. Using EIM, we show that we can reconstruct the non-affine displacement fields for jammed disk packings in response to applied shear as a sum of discrete Eshelby-like defects that are caused by mismatches in the local stiffnesses of triangles formed from Delaunay triangulation of the disk centers.

preprint2024arXiv

A prediction-correction based iterative convolution-thresholding method for topology optimization of heat transfer problems

In this paper, we propose an iterative convolution-thresholding method (ICTM) based on prediction-correction for solving the topology optimization problem in steady-state heat transfer equations. The problem is formulated as a constrained minimization problem of the complementary energy, incorporating a perimeter/surface-area regularization term, while satisfying a steady-state heat transfer equation. The decision variables of the optimization problem represent the domains of different materials and are represented by indicator functions. The perimeter/surface-area term of the domain is approximated using Gaussian kernel convolution with indicator functions. In each iteration, the indicator function is updated using a prediction-correction approach. The prediction step is based on the variation of the objective functional by imposing the constraints, while the correction step ensures the monotonically decreasing behavior of the objective functional. Numerical results demonstrate the efficiency and robustness of our proposed method, particularly when compared to classical approaches based on the ICTM.

preprint2024arXiv

Reconfigurable Three-Dimensional Thermal Dome

Thermal metamaterial represents a groundbreaking approach to control heat conduction, and, as a crucial component, thermal invisibility is of utmost importance for heat management. Despite the flourishing development of thermal invisibility schemes, they still face two limitations in practical applications. First, objects are typically completely enclosed in traditional cloaks, making them difficult to use and unsuitable for objects with heat sources. Second, although some theoretical proposals have been put forth to change the thermal conductivity of materials to achieve dynamic invisibility, their designs are complex and rigid, making them unsuitable for large-scale use in real three-dimensional spaces. Here, we propose a concept of a thermal dome to achieve three-dimensional invisibility. Our scheme includes an open functional area, greatly enhancing its usability and applicability. It features a reconfigurable structure, constructed with simple isotropic natural materials, making it suitable for dynamic requirements. The performance of our reconfigurable thermal dome has been confirmed through simulations and experiments, consistent with the theory. The introduction of this concept can greatly advance the development of thermal invisibility technology from theory to engineering and provide inspiration for other physical domains, such as direct current electric fields and magnetic fields.

preprint2023arXiv

RL-MPCA: A Reinforcement Learning Based Multi-Phase Computation Allocation Approach for Recommender Systems

Recommender systems aim to recommend the most suitable items to users from a large number of candidates. Their computation cost grows as the number of user requests and the complexity of services (or models) increases. Under the limitation of computation resources (CRs), how to make a trade-off between computation cost and business revenue becomes an essential question. The existing studies focus on dynamically allocating CRs in queue truncation scenarios (i.e., allocating the size of candidates), and formulate the CR allocation problem as an optimization problem with constraints. Some of them focus on single-phase CR allocation, and others focus on multi-phase CR allocation but introduce some assumptions about queue truncation scenarios. However, these assumptions do not hold in other scenarios, such as retrieval channel selection and prediction model selection. Moreover, existing studies ignore the state transition process of requests between different phases, limiting the effectiveness of their approaches. This paper proposes a Reinforcement Learning (RL) based Multi-Phase Computation Allocation approach (RL-MPCA), which aims to maximize the total business revenue under the limitation of CRs. RL-MPCA formulates the CR allocation problem as a Weakly Coupled MDP problem and solves it with an RL-based approach. Specifically, RL-MPCA designs a novel deep Q-network to adapt to various CR allocation scenarios, and calibrates the Q-value by introducing multiple adaptive Lagrange multipliers (adaptive-$λ$) to avoid violating the global CR constraints. Finally, experiments on the offline simulation environment and online real-world recommender system validate the effectiveness of our approach.

preprint2023arXiv

The Security Analysis of Continuous-Variable Quantum Key Distribution under Limited Eavesdropping with Practical Fiber

Research on optimal eavesdropping models under practical conditions will help to evaluate realistic risk when employing quantum key distribution (QKD) system for secure information transmission. Intuitively, fiber loss will lead to the optical energy leaking to the environment, rather than harvested by the eavesdropper, which also limits the eavesdropping ability while improving the QKD system performance in practical use. However, defining the optimal eavesdropping model in the presence of lossy fiber is difficult because the channel is beyond the control of legitimate partners and the leaked signal is undetectable. Here we investigate how the fiber loss influences the eavesdropping ability based on a teleportation-based collective attack model which requires two distant stations and a shared entanglement source. We find that if the distributed entanglement is limited due to the practical loss, the optimal attack occurs when the two teleportation stations are merged to one and placed close to the transmitter site, which performs similar to the entangling-cloning attack but with a reduced wiretapping ratio. Assuming Eve uses the best available hollow-core fiber, the secret key rate in the practical environment can be 20%~40% higher than that under ideal eavesdropping. While if the entanglement distillation technology is mature enough to provide high quality of distributed entanglement, the two teleportation stations should be distantly separated for better eavesdropping performance, where the eavesdropping can even approach the optimal collective attack. Under the current level of entanglement purification technology, the unavoidable fiber loss can still greatly limit the eavesdropping ability as well as enhance the secret key rate and transmission distance of the realistic system, which promotes the development of QKD systems in practical application scenarios.

preprint2022arXiv

A two-stage method for reconstruction of parameters in diffusion equations

Parameter reconstruction for diffusion equations has a wide range of applications. In this paper, we proposed a two-stage scheme to efficiently solve conductivity reconstruction problems for steady-state diffusion equations with solution data measured inside the domain. The first stage is based on total variation regularization of the log diffusivity and the split Bregman iteration method. In the second stage, we apply the K-means clustering for the reconstruction of ``blocky'' conductivity functions. The convergence of the scheme is theoretically proved and extensive numerical examples are shown to demonstrate the performance of the scheme.

preprint2022arXiv

An efficient unconditionally stable method for Dirichlet partitions in arbitrary domains

A Dirichlet $k$-partition of a domain is a collection of $k$ pairwise disjoint open subsets such that the sum of their first Laplace--Dirichlet eigenvalues is minimal. In this paper, we propose a new relaxation of the problem by introducing auxiliary indicator functions of domains and develop a simple and efficient diffusion generated method to compute Dirichlet $k$-partitions for arbitrary domains. The method only alternates three steps: 1. convolution, 2. thresholding, and 3. projection. The method is simple, easy to implement, insensitive to initial guesses and can be effectively applied to arbitrary domains without any special discretization. At each iteration, the computational complexity is linear in the discretization of the computational domain. Moreover, we theoretically prove the energy decaying property of the method. Experiments are performed to show the accuracy of approximation, efficiency and unconditional stability of the algorithm. We apply the proposed algorithms on both 2- and 3-dimensional flat tori, triangle, square, pentagon, hexagon, disk, three-fold star, five-fold star, cube, ball, and tetrahedron domains to compute Dirichlet $k$-partitions for different $k$ to show the effectiveness of the proposed method. Compared to previous work with reported computational time, the proposed method achieves hundreds of times acceleration.

preprint2022arXiv

An Exploration of npm Package Co-Usage Examples from Stack Overflow: A Case Study

Third-party package usage has become a common practice in contemporary software development. Developers often face different challenges, including choosing the right libraries, installing errors, discrepancies, setting up the environment, and building failures during software development. The risks of maintaining a third-party package are well known, but it is unclear how information from Stack Overflow (SO) can be useful. This paper performed an empirical study to explore npm co-usage in SO. From over 30,000 SO posts, we extracted 2,100 SO posts related to npm and matched them to 217,934 npm library packages. We find that, popular and highly used libraries are not discussed as often in SO. However, we can see that the accepted answers may prove useful, as we believe that the usage examples and executable commands could be reused for tool support.

preprint2022arXiv

Balanced Multimodal Learning via On-the-fly Gradient Modulation

Multimodal learning helps to comprehensively understand the world, by integrating different senses. Accordingly, multiple input modalities are expected to boost model performance, but we actually find that they are not fully exploited even when the multimodal model outperforms its uni-modal counterpart. Specifically, in this paper we point out that existing multimodal discriminative models, in which uniform objective is designed for all modalities, could remain under-optimized uni-modal representations, caused by another dominated modality in some scenarios, e.g., sound in blowing wind event, vision in drawing picture event, etc. To alleviate this optimization imbalance, we propose on-the-fly gradient modulation to adaptively control the optimization of each modality, via monitoring the discrepancy of their contribution towards the learning objective. Further, an extra Gaussian noise that changes dynamically is introduced to avoid possible generalization drop caused by gradient modulation. As a result, we achieve considerable improvement over common fusion methods on different multimodal tasks, and this simple strategy can also boost existing multimodal methods, which illustrates its efficacy and versatility. The source code is available at \url{https://github.com/GeWu-Lab/OGM-GE_CVPR2022}.

preprint2022arXiv

C-P Map: A Novel Evaluation Toolkit for Speaker Verification

Evaluation trials are used to probe performance of automatic speaker verification (ASV) systems. In spite of the clear importance and impact, evaluation trials have not been seriously treated in research and engineering practice. This paper firstly presents a theoretical analysis on evaluation trials and highlights potential bias with the most popular cross-pairing approach used in trials design. To interpret and settle this problem, we define the concept of trial config and C-P map derived from it. The C-P map measures the performance of an ASV system on various trial configs in a 2-dimensional map. On the map, each location represents a particular trial config and its corresponding color represents the system performance. Experiments conducted on representative ASV systems show that the proposed C-P map offers a powerful evaluation toolkit for ASV performance analysis and comparison. The source code for C-P map has been release at https://gitlab.com/csltstu/sunine.

preprint2022arXiv

Check and Link: Pairwise Lesion Correspondence Guides Mammogram Mass Detection

Detecting mass in mammogram is significant due to the high occurrence and mortality of breast cancer. In mammogram mass detection, modeling pairwise lesion correspondence explicitly is particularly important. However, most of the existing methods build relatively coarse correspondence and have not utilized correspondence supervision. In this paper, we propose a new transformer-based framework CL-Net to learn lesion detection and pairwise correspondence in an end-to-end manner. In CL-Net, View-Interactive Lesion Detector is proposed to achieve dynamic interaction across candidates of cross views, while Lesion Linker employs the correspondence supervision to guide the interaction process more accurately. The combination of these two designs accomplishes precise understanding of pairwise lesion correspondence for mammograms. Experiments show that CL-Net yields state-of-the-art performance on the public DDSM dataset and our in-house dataset. Moreover, it outperforms previous methods by a large margin in low FPI regime.

preprint2022arXiv

Cross DQN: Cross Deep Q Network for Ads Allocation in Feed

E-commerce platforms usually display a mixed list of ads and organic items in feed. One key problem is to allocate the limited slots in the feed to maximize the overall revenue as well as improve user experience, which requires a good model for user preference. Instead of modeling the influence of individual items on user behaviors, the arrangement signal models the influence of the arrangement of items and may lead to a better allocation strategy. However, most of previous strategies fail to model such a signal and therefore result in suboptimal performance. In addition, the percentage of ads exposed (PAE) is an important indicator in ads allocation. Excessive PAE hurts user experience while too low PAE reduces platform revenue. Therefore, how to constrain the PAE within a certain range while keeping personalized recommendation under the PAE constraint is a challenge. In this paper, we propose Cross Deep Q Network (Cross DQN) to extract the crucial arrangement signal by crossing the embeddings of different items and modeling the crossed sequence by multi-channel attention. Besides, we propose an auxiliary loss for batch-level constraint on PAE to tackle the above-mentioned challenge. Our model results in higher revenue and better user experience than state-of-the-art baselines in offline experiments. Moreover, our model demonstrates a significant improvement in the online A/B test and has been fully deployed on Meituan feed to serve more than 300 millions of customers.

preprint2022arXiv

Deep Page-Level Interest Network in Reinforcement Learning for Ads Allocation

A mixed list of ads and organic items is usually displayed in feed and how to allocate the limited slots to maximize the overall revenue is a key problem. Meanwhile, modeling user preference with historical behavior is essential in recommendation and advertising (e.g., CTR prediction and ads allocation). Most previous works for user behavior modeling only model user's historical point-level positive feedback (i.e., click), which neglect the page-level information of feedback and other types of feedback. To this end, we propose Deep Page-level Interest Network (DPIN) to model the page-level user preference and exploit multiple types of feedback. Specifically, we introduce four different types of page-level feedback as input, and capture user preference for item arrangement under different receptive fields through the multi-channel interaction module. Through extensive offline and online experiments on Meituan food delivery platform, we demonstrate that DPIN can effectively model the page-level user preference and increase the revenue for the platform.

preprint2022arXiv

Efficient Localness Transformer for Smart Sensor-Based Energy Disaggregation

Modern smart sensor-based energy management systems leverage non-intrusive load monitoring (NILM) to predict and optimize appliance load distribution in real-time. NILM, or energy disaggregation, refers to the decomposition of electricity usage conditioned on the aggregated power signals (i.e., smart sensor on the main channel). Based on real-time appliance power prediction using sensory technology, energy disaggregation has great potential to increase electricity efficiency and reduce energy expenditure. With the introduction of transformer models, NILM has achieved significant improvements in predicting device power readings. Nevertheless, transformers are less efficient due to O(l^2) complexity w.r.t. sequence length l. Moreover, transformers can fail to capture local signal patterns in sequence-to-point settings due to the lack of inductive bias in local context. In this work, we propose an efficient localness transformer for non-intrusive load monitoring (ELTransformer). Specifically, we leverage normalization functions and switch the order of matrix multiplication to approximate self-attention and reduce computational complexity. Additionally, we introduce localness modeling with sparse local attention heads and relative position encodings to enhance the model capacity in extracting short-term local patterns. To the best of our knowledge, ELTransformer is the first NILM model that addresses computational complexity and localness modeling in NILM. With extensive experiments and quantitative analyses, we demonstrate the efficiency and effectiveness of the the proposed ELTransformer with considerable improvements compared to state-of-the-art baselines.

preprint2022arXiv

Enhanced exemplar autoencoder with cycle consistency loss in any-to-one voice conversion

Recent research showed that an autoencoder trained with speech of a single speaker, called exemplar autoencoder (eAE), can be used for any-to-one voice conversion (VC). Compared to large-scale many-to-many models such as AutoVC, the eAE model is easy and fast in training, and may recover more details of the target speaker. To ensure VC quality, the latent code should represent and only represent content information. However, this is not easy to attain for eAE as it is unaware of any speaker variation in model training. To tackle the problem, we propose a simple yet effective approach based on a cycle consistency loss. Specifically, we train eAEs of multiple speakers with a shared encoder, and meanwhile encourage the speech reconstructed from any speaker-specific decoder to get a consistent latent code as the original speech when cycled back and encoded again. Experiments conducted on the AISHELL-3 corpus showed that this new approach improved the baseline eAE consistently. The source code and examples are available at the project page: http://project.cslt.org/.

preprint2022arXiv

Evolving Programmable Computational Metamaterials

Granular metamaterials are a promising choice for the realization of mechanical computing devices. As preliminary evidence of this, we demonstrate here how to embed Boolean logic gates (AND and XOR) into a granular metamaterial by evolving where particular grains are placed in the material. Our results confirm the existence of gradients of increasing "AND-ness" and "XOR-ness" within the space of possible materials that can be followed by evolutionary search. We measure the computational functionality of a material by probing how it transforms bits encoded as vibrations with zero or non-zero amplitude. We compared the evolution of materials built from mass-contrasting particles and materials built from stiffness-contrasting particles, and found that the latter were more evolvable. We believe this work may pave the way toward evolutionary design of increasingly sophisticated, programmable, and computationally dense metamaterials with certain advantages over more traditional computational substrates.

preprint2022arXiv

Generation of S-shaped photonic hooks from microcylinders with engineered surface patches

Photonic hooks (PHs) are non-evanescent light beams with a highly concentrated curved optical fields. Since their discovery, PHs always have one single inflection point and thus have a hook-like structure. In this work, a new type of PHs with two inflection points and S-shaped structures (S-PHs) were reported for the first time. We theoretically studied the effects of various physical parameters on the generation of S-PHs. Furthermore, we showed that decorating particles with multiple patches can significantly enhance the curvature and length of the S-PHs. The S-PHs may have potential applications in super-resolution imaging, sub-wavelength micromachining, particle and cell manipulation, etc.

preprint2022arXiv

Giving Back: Contributions Congruent to Library Dependency Changes in a Software Ecosystem

Popular adoption of third-party libraries for contemporary software development has led to the creation of large inter-dependency networks, where sustainability issues of a single library can have widespread network effects. Maintainers of these libraries are often overworked, relying on the contributions of volunteers to sustain these libraries. In this work, we measure contributions that are aligned with dependency changes, to understand where they come from (i.e., non-maintainer, client maintainer, library maintainer, and library and client maintainer), analyze whether they contribute to library dormancy (i.e., a lack of activity), and investigate the similarities between these contributions and developers' typical contributions. Hence, we leverage socio-technical techniques to measure the dependency-contribution congruence (DC congruence), i.e., the degree to which contributions align with dependencies. We conduct a large-scale empirical study to measure the DC congruence for the NPM ecosystem using 1.7 million issues, 970 thousand pull requests (PR), and over 5.3 million commits belonging to 107,242 NPM packages. At the ecosystem level, we pinpoint in time peaks of congruence with dependency changes (i.e., 16% DC congruence score). Surprisingly, these contributions came from the ecosystem itself (i.e., non-maintainers of either client and library). At the project level, we find that DC congruence shares a statistically significant relationship with the likelihood of a package becoming dormant. Finally, by comparing source code of contributions, we find that congruent contributions are statistically different to typical contributions. Our work has implications to encourage and sustain contributions, especially to support library maintainers that require dependency changes.

preprint2022arXiv

Gradient Importance Learning for Incomplete Observations

Though recent works have developed methods that can generate estimates (or imputations) of the missing entries in a dataset to facilitate downstream analysis, most depend on assumptions that may not align with real-world applications and could suffer from poor performance in subsequent tasks such as classification. This is particularly true if the data have large missingness rates or a small sample size. More importantly, the imputation error could be propagated into the prediction step that follows, which may constrain the capabilities of the prediction model. In this work, we introduce the gradient importance learning (GIL) method to train multilayer perceptrons (MLPs) and long short-term memories (LSTMs) to directly perform inference from inputs containing missing values without imputation. Specifically, we employ reinforcement learning (RL) to adjust the gradients used to train these models via back-propagation. This allows the model to exploit the underlying information behind missingness patterns. We test the approach on real-world time-series (i.e., MIMIC-III), tabular data obtained from an eye clinic, and a standard dataset (i.e., MNIST), where our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.

preprint2022arXiv

Hybrid Transfer in Deep Reinforcement Learning for Ads Allocation

Ads allocation, which involves allocating ads and organic items to limited slots in feed with the purpose of maximizing platform revenue, has become a research hotspot. Notice that, e-commerce platforms usually have multiple entrances for different categories and some entrances have few visits. Data from these entrances has low coverage, which makes it difficult for the agent to learn. To address this challenge, we propose Similarity-based Hybrid Transfer for Ads Allocation (SHTAA), which effectively transfers samples as well as knowledge from data-rich entrance to data-poor entrance. Specifically, we define an uncertainty-aware similarity for MDP to estimate the similarity of MDP for different entrances. Based on this similarity, we design a hybrid transfer method, including instance transfer and strategy transfer, to efficiently transfer samples and knowledge from one entrance to another. Both offline and online experiments on Meituan food delivery platform demonstrate that the proposed method could achieve better performance for data-poor entrance and increase the revenue for the platform.

preprint2022arXiv

Learning List-wise Representation in Reinforcement Learning for Ads Allocation with Multiple Auxiliary Tasks

With the recent prevalence of reinforcement learning (RL), there have been tremendous interests in utilizing RL for ads allocation in recommendation platforms (e.g., e-commerce and news feed sites). To achieve better allocation, the input of recent RL-based ads allocation methods is upgraded from point-wise single item to list-wise item arrangement. However, this also results in a high-dimensional space of state-action pairs, making it difficult to learn list-wise representations with good generalization ability. This further hinders the exploration of RL agents and causes poor sample efficiency. To address this problem, we propose a novel RL-based approach for ads allocation which learns better list-wise representations by leveraging task-specific signals on Meituan food delivery platform. Specifically, we propose three different auxiliary tasks based on reconstruction, prediction, and contrastive learning respectively according to prior domain knowledge on ads allocation. We conduct extensive experiments on Meituan food delivery platform to evaluate the effectiveness of the proposed auxiliary tasks. Both offline and online experimental results show that the proposed method can learn better list-wise representations and achieve higher revenue for the platform compared to the state-of-the-art baselines.

preprint2022arXiv

Neural Topic Modeling with Deep Mutual Information Estimation

The emerging neural topic models make topic modeling more easily adaptable and extendable in unsupervised text mining. However, the existing neural topic models is difficult to retain representative information of the documents within the learnt topic representation. In this paper, we propose a neural topic model which incorporates deep mutual information estimation, i.e., Neural Topic Modeling with Deep Mutual Information Estimation(NTM-DMIE). NTM-DMIE is a neural network method for topic learning which maximizes the mutual information between the input documents and their latent topic representation. To learn robust topic representation, we incorporate the discriminator to discriminate negative examples and positive examples via adversarial learning. Moreover, we use both global and local mutual information to preserve the rich information of the input documents in the topic representation. We evaluate NTM-DMIE on several metrics, including accuracy of text clustering, with topic representation, topic uniqueness and topic coherence. Compared to the existing methods, the experimental results show that NTM-DMIE can outperform in all the metrics on the four datasets.

preprint2022arXiv

Newcomer OSS-Candidates: Characterizing Contributions of Novice Developers to GitHub

The ability of an Open Source Software (OSS) project to attract, onboard, and retain any newcomer is vital to its livelihood. Although, evidence suggests an upsurge in novice developers joining social coding platforms (such as GitHub), the extent to which their activities result in a OSS contribution is unknown. Henceforth, we execute the protocols of a registered report to study activities of a "Newcomer OSS-Candidate", who is a novice developer that is new to that social coding platform, and has the intention to later onboard an OSS project. Using GitHub as a case platform, we analyze 171 identified Newcomer OSS-Candidates to characterize their contribution activities. Results show that Newcomer OSS-Candidates are likely to target software based repositories (i.e., 66%), and their first contributions are mainly associated with development (commits) and maintenance (PRs). Newcomer OSS-Candidates are less likely to practice social coding, but eventually end up onboarding (i.e., 30% quantitative, 70% follow-up survey) an OSS project. Furthermore, they cite finding a way to start as the most challenging barrier to contribute. Our work reveals insights on how newcomers to social coding platforms are potential sources of OSS contributions.

preprint2022arXiv

Novel Valence Transition in Elemental Metal Europium around 80 GPa

Valence transition could induce structural, insulator-metal, nonmagnetic-magnetic and superconducting transitions in rare-earth metals and compounds, while the underlying physics remains unclear due to the complex interaction of localized 4f electrons as well as their coupling with itinerant electrons. The valence transition in the elemental metal europium (Eu) still has remained as a matter of debate. Using resonant x-ray emission scattering and x-ray diffraction, we pressurize the states of 4f electrons in Eu and study its valence and structure transitions up to 160 GPa. We provide compelling evidence for a valence transition around 80 GPa, which coincides with a structural transition from a monoclinic (C2/c) to an orthorhombic phase (Pnma). We show that the valence transition occurs when the pressure-dependent energy gap between 4f and 5d electrons approaches the Coulomb interaction. Our discovery is critical for understanding the electrodynamics of Eu, including magnetism and high-pressure superconductivity.

preprint2022arXiv

Overcoming Van der Waals Forces in reconfigurable nanostructures

Reconfigurable metamaterials require constituent nanostructures to demonstrate switching of shapes with external stimuli. For generality, such nanostructures would touch and stick to other surfaces in one of its configurations. Yet, a longstanding challenge is in overcoming this stiction caused by Van der Waals forces, which impedes shape recovery. Here, we introduce a stiff yet self-recovering material system based on acrylic acid, and tested it in high-aspect ratio structures, where recovery is weak. This designer material has a storage modulus of ~5.2 GPa at room temperature and ~90 MPa in the rubbery state at 150 Celsius, an order of magnitude higher than previous reports. A high-resolution resin for two-photon lithography was developed based on this polymer system, enabling 3D printing of nanopillars with diameters of ~400 nm and aspect ratio as high as ~10. Experimentally, we observed self-recovery as collapsed and touching structures overcome stiction to stand back up. We developed a theoretical model to explain the recoverability of these sub-micron structures. Reconfigurable structural colour prints and holograms were demonstrated, indicating potential applications of the material system as a shape memory polymer suitable for sub-micron reconfigurable metamaterials.

preprint2022arXiv

Pay Attention to Hard Trials

Performance of speaker recognition systems is evaluated on test trials. Although as crucial as rulers for tailors, trials have not been carefully treated so far, and most existing benchmarks compose trials by naive cross-pairing. In this paper, we argue that the cross-pairing approach produces overwhelming easy trials, which in turn leads to potential bias in system and technique comparison. To solve the problem, we advocate more attention to hard trials. We present an SVM-based approach to identifying hard trials and use it to construct new evaluation sets for VoxCeleb1 and SITW. With the new sets, we can re-evaluate the contribution of some recent technologies. The code and the identified hard trials will be published online at http://project.cslt.org.

preprint2022arXiv

PointScatter: Point Set Representation for Tubular Structure Extraction

This paper explores the point set representation for tubular structure extraction tasks. Compared with the traditional mask representation, the point set representation enjoys its flexibility and representation ability, which would not be restricted by the fixed grid as the mask. Inspired by this, we propose PointScatter, an alternative to the segmentation models for the tubular structure extraction task. PointScatter splits the image into scatter regions and parallelly predicts points for each scatter region. We further propose the greedy-based region-wise bipartite matching algorithm to train the network end-to-end and efficiently. We benchmark the PointScatter on four public tubular datasets, and the extensive experiments on tubular structure segmentation and centerline extraction task demonstrate the effectiveness of our approach. Code is available at https://github.com/zhangzhao2022/pointscatter.

preprint2022arXiv

Probabilistic methods for approximate archetypal analysis

Archetypal analysis is an unsupervised learning method for exploratory data analysis. One major challenge that limits the applicability of archetypal analysis in practice is the inherent computational complexity of the existing algorithms. In this paper, we provide a novel approximation approach to partially address this issue. Utilizing probabilistic ideas from high-dimensional geometry, we introduce two preprocessing techniques to reduce the dimension and representation cardinality of the data, respectively. We prove that provided the data is approximately embedded in a low-dimensional linear subspace and the convex hull of the corresponding representations is well approximated by a polytope with a few vertices, our method can effectively reduce the scaling of archetypal analysis. Moreover, the solution of the reduced problem is near-optimal in terms of prediction errors. Our approach can be combined with other acceleration techniques to further mitigate the intrinsic complexity of archetypal analysis. We demonstrate the usefulness of our results by applying our method to summarize several moderately large-scale datasets.

preprint2022arXiv

Reliable Visualization for Deep Speaker Recognition

In spite of the impressive success of convolutional neural networks (CNNs) in speaker recognition, our understanding to CNNs' internal functions is still limited. A major obstacle is that some popular visualization tools are difficult to apply, for example those producing saliency maps. The reason is that speaker information does not show clear spatial patterns in the temporal-frequency space, which makes it hard to interpret the visualization results, and hence hard to confirm the reliability of a visualization tool. In this paper, we conduct an extensive analysis on three popular visualization methods based on CAM: Grad-CAM, Score-CAM and Layer-CAM, to investigate their reliability for speaker recognition tasks. Experiments conducted on a state-of-the-art ResNet34SE model show that the Layer-CAM algorithm can produce reliable visualization, and thus can be used as a promising tool to explain CNN-based speaker models. The source code and examples are available in our project page: http://project.cslt.org/.

preprint2022arXiv

Rényi State Entropy for Exploration Acceleration in Reinforcement Learning

One of the most critical challenges in deep reinforcement learning is to maintain the long-term exploration capability of the agent. To tackle this problem, it has been recently proposed to provide intrinsic rewards for the agent to encourage exploration. However, most existing intrinsic reward-based methods proposed in the literature fail to provide sustainable exploration incentives, a problem known as vanishing rewards. In addition, these conventional methods incur complex models and additional memory in their learning procedures, resulting in high computational complexity and low robustness. In this work, a novel intrinsic reward module based on the Rényi entropy is proposed to provide high-quality intrinsic rewards. It is shown that the proposed method actually generalizes the existing state entropy maximization methods. In particular, a $k$-nearest neighbor estimator is introduced for entropy estimation while a $k$-value search method is designed to guarantee the estimation accuracy. Extensive simulation results demonstrate that the proposed Rényi entropy-based method can achieve higher performance as compared to existing schemes.

preprint2022arXiv

Some Examples of Privacy-preserving Publication and Sharing of COVID-19 Pandemic Data

A considerable amount of various types of data have been collected during the COVID-19 pandemic, the analysis and interpretation of which have been indispensable for curbing the spread of the disease. As the pandemic moves to an endemic state, the data collected during the pandemic will continue to be rich sources for further studying and understanding the impacts of the pandemic on various aspects of our society. On the other hand, naïve release and sharing of the information can be associated with serious privacy concerns. In this study, we use three common but distinct data types collected during the pandemic (case surveillance tabular data, case location data, and contact tracing networks) to illustrate the publication and sharing of granular information and individual-level pandemic data in a privacy-preserving manner. We leverage and build upon the concept of differential privacy to generate and release privacy-preserving data for each data type. We investigate the inferential utility of privacy-preserving information through simulation studies at different levels of privacy guarantees and demonstrate the approaches in real-life data. All the approaches employed in the study are straightforward to apply. Our study generates statistical evidence on the practical feasibility of sharing pandemic data with privacy guarantees and on how to balance the statistical utility of released information during this process.

preprint2022arXiv

Topmetal-M: a novel pixel sensor for compact tracking applications

The Topmetal-M is a large area pixel sensor (18 mm * 23 mm) prototype fabricated in a new 130 nm high-resistivity CMOS process in 2019. It contains 400 rows * 512 columns square pixels with the pitch of 40 μm. In Topmetal-M, a novel charge collection method combing the Monolithic Active Pixel Sensor (MAPS) and the Topmetal sensor has been proposed for the first time. Both the ionized charge deposited by the particle in the sensor and along the track over the sensor can be collected. The in-pixel circuit mainly consists of a low-noise charge sensitive amplifier to establish the signal for the energy reconstruction, and a discriminator with a Time-to-Amplitude Converter (TAC) for the Time of Arrival (TOA) measurement. With this mechanism, the trajectory, particle hit position, energy and arrival time of the particle can be measured. The analog signal from each pixel is accessible through time-shared multiplexing over the entire pixel array. This paper will discuss the design and preliminary test results of the Topmetal-M sensor.

preprint2022arXiv

Towards Grand Unification of Object Tracking

We present a unified method, termed Unicorn, that can simultaneously solve four tracking problems (SOT, MOT, VOS, MOTS) with a single network using the same model parameters. Due to the fragmented definitions of the object tracking problem itself, most existing trackers are developed to address a single or part of tasks and overspecialize on the characteristics of specific tasks. By contrast, Unicorn provides a unified solution, adopting the same input, backbone, embedding, and head across all tracking tasks. For the first time, we accomplish the great unification of the tracking network architecture and learning paradigm. Unicorn performs on-par or better than its task-specific counterparts in 8 tracking datasets, including LaSOT, TrackingNet, MOT17, BDD100K, DAVIS16-17, MOTS20, and BDD100K MOTS. We believe that Unicorn will serve as a solid step towards the general vision model. Code is available at https://github.com/MasterBin-IIAU/Unicorn.

preprint2022arXiv

Ultra-stable shear jammed granular material

Dry granular materials such as sand, gravel, pills, or agricultural grains, can become rigid when compressed or sheared. At low density, one can distort the shape of a container of granular material without encountering any resistance. Under isotropic compression, the material will reach a certain {\it jamming} density and then resist further compression. {\em Shear jamming} occurs when resistance to shear emerges in a system at a density lower than the jamming density, and the elastic properties of such states have important implications for industrial and geophysical processes. We report on experimental observations of changes in the mechanical properties of a shear-jammed granular material subjected to small-amplitude, quasi-static cyclic shear. We study a layer of plastic discs confined to a shear cell, using photoelasticimetry to measure all inter-particle vector forces. For sufficiently small cyclic shear amplitudes and large enough initial shear, the material evolves to an unexpected "ultra-stable" state in which all the particle positions and inter-particle contact forces remain unchanged after each complete shear cycle for thousands of cycles. The stress response of these states to small imposed shear is nearly elastic, in contrast to the original shear jammed state.

preprint2022arXiv

Universality for random matrices with equi-spaced external source: a case study of a biorthogonal ensemble

We prove the edge and bulk universality of random Hermitian matrices with equi-spaced external source. One feature of our method is that we use neither a Christoffel-Darboux type formula, nor a double-contour formula, which are standard methods to prove universality results for exactly solvable models. This matrix model is an example of a biorthogonal ensemble, which is a special kind of determinantal point process whose kernel generally does not have a Christoffel-Darboux type formula or double-contour representation. Our methods may showcase how to handle universality problems for biorthogonal ensembles in general.

preprint2022arXiv

Variational methods and deep Ritz method for active elastic solids

Variational methods have been widely used in soft matter physics for both static and dynamic problems. These methods are mostly based on two variational principles: the variational principle of minimum free energy (MFEVP) and Onsager's variational principle (OVP). Our interests lie in the applications of these variational methods to active matter physics. In our former work [Soft Matter, 2021, 17, 3634], we have explored the applications of OVP-based variational methods for the modeling of active matter dynamics. In the present work, we explore variational (or energy) methods that are based on MFEVP for static problems in active elastic solids. We show that MFEVP can be used not only to derive equilibrium equations, but also to develop approximate solution methods, such as Ritz method, for active solid statics. Moreover, the power of Ritz-type method can be further enhanced using deep learning methods if we use deep neural networks to construct the trial solutions of the variational problems. We then apply these variational methods and the deep Ritz method to study the spontaneous bending and contraction of a thin active circular plate that is induced by internal asymmetric active contraction. The circular plate is found to be bent towards its contracting side. The study of such a simple toy system gives implications for understanding the morphogenesis of solid-like confluent cell monolayers. In addition, we introduce a so-called activogravity length to characterize the importance of gravitational forces relative to internal active contraction in driving the bending of the active plate. When the lateral plate dimension is larger than the activogravity length (about 100 micron), gravitational forces become important. Such gravitaxis behaviors at multicellular scales may play significant roles in the morphogenesis and in the up-down symmetry broken during tissue development.

preprint2022arXiv

Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline

With the popularity of multi-modal sensors, visible-thermal (RGB-T) object tracking is to achieve robust performance and wider application scenarios with the guidance of objects' temperature information. However, the lack of paired training samples is the main bottleneck for unlocking the power of RGB-T tracking. Since it is laborious to collect high-quality RGB-T sequences, recent benchmarks only provide test sequences. In this paper, we construct a large-scale benchmark with high diversity for visible-thermal UAV tracking (VTUAV), including 500 sequences with 1.7 million high-resolution (1920 $\times$ 1080 pixels) frame pairs. In addition, comprehensive applications (short-term tracking, long-term tracking and segmentation mask prediction) with diverse categories and scenes are considered for exhaustive evaluation. Moreover, we provide a coarse-to-fine attribute annotation, where frame-level attributes are provided to exploit the potential of challenge-specific trackers. In addition, we design a new RGB-T baseline, named Hierarchical Multi-modal Fusion Tracker (HMFT), which fuses RGB-T data in various levels. Numerous experiments on several datasets are conducted to reveal the effectiveness of HMFT and the complement of different fusion types. The project is available at here.

preprint2022arXiv

Vision-based Anti-UAV Detection and Tracking

Unmanned aerial vehicles (UAV) have been widely used in various fields, and their invasion of security and privacy has aroused social concern. Several detection and tracking systems for UAVs have been introduced in recent years, but most of them are based on radio frequency, radar, and other media. We assume that the field of computer vision is mature enough to detect and track invading UAVs. Thus we propose a visible light mode dataset called Dalian University of Technology Anti-UAV dataset, DUT Anti-UAV for short. It contains a detection dataset with a total of 10,000 images and a tracking dataset with 20 videos that include short-term and long-term sequences. All frames and images are manually annotated precisely. We use this dataset to train several existing detection algorithms and evaluate the algorithms' performance. Several tracking methods are also tested on our tracking dataset. Furthermore, we propose a clear and simple tracking algorithm combined with detection that inherits the detector's high precision. Extensive experiments show that the tracking performance is improved considerably after fusing detection, thus providing a new attempt at UAV tracking using our dataset.The datasets and results are publicly available at: https://github.com/wangdongdut/DUT-Anti-UAV

preprint2021arXiv

A Dataset And Benchmark Of Underwater Object Detection For Robot Picking

Underwater object detection for robot picking has attracted a lot of interest. However, it is still an unsolved problem due to several challenges. We take steps towards making it more realistic by addressing the following challenges. Firstly, the currently available datasets basically lack the test set annotations, causing researchers must compare their method with other SOTAs on a self-divided test set (from the training set). Training other methods lead to an increase in workload and different researchers divide different datasets, resulting there is no unified benchmark to compare the performance of different algorithms. Secondly, these datasets also have other shortcomings, e.g., too many similar images or incomplete labels. Towards these challenges we introduce a dataset, Detecting Underwater Objects (DUO), and a corresponding benchmark, based on the collection and re-annotation of all relevant datasets. DUO contains a collection of diverse underwater images with more rational annotations. The corresponding benchmark provides indicators of both efficiency and accuracy of SOTAs (under the MMDtection framework) for academic research and industrial applications, where JETSON AGX XAVIER is used to assess detector speed to simulate the robot-embedded environment.

preprint2021arXiv

A vector Riemann-Hilbert approach to the Muttalib-Borodin ensembles

In this paper, we consider Muttalib-Borodin ensemble of Laguerre type, a determinantal point process over $[0,\infty)$ which depends on the varying weights $x^αe^{-nV(x)}$, $α>-1$, and a parameter $θ$. For $θ$ being a positive integer, we derive asymptotics of the associated biorthogonal polynomials near the origin for a large class of potential functions $V$ as $n\to \infty$. This further allows us to establish the hard edge scaling limit of the correlation kernel, which is previously only known in the special cases and conjectured to be universal. Our proof is based on the Deift/Zhou nonlinear steepest descent analysis of two $1 \times 2$ vector-valued Riemann-Hilbert problems that characterize the biorthogonal polynomials and the explicit constructions of $(θ+1)\times(θ+1)$ local parametrices near the origin in terms of the Meijer G-functions.

preprint2021arXiv

Holographic insulator/superconductor phase transitions with excited states

We construct a family of solutions of the holographic insulator/superconductor phase transitions with the excited states in the AdS soliton background by using both the numerical and analytical methods. The interesting point is that the improved Sturm-Liouville method can not only analytically investigate the properties of the phase transition with the excited states, but also the distributions of the condensed fields in the vicinity of the critical point. We observe that, regardless of the type of the holographic model, the excited state has a higher critical chemical potential than the corresponding ground state, and the difference of the dimensionless critical chemical potential between the consecutive states is around 2.4, which is different from the finding of the metal/superconductor phase transition in the AdS black hole background. Furthermore, near the critical point, we find that the phase transition of the systems is of the second order and a linear relationship exists between the charge density and chemical potential for all the excited states in both s-wave and p-wave insulator/superconductor models.

preprint2021arXiv

Holographic superconductors in 4D Einstein-Gauss-Bonnet gravity

We investigate the neutral AdS black-hole solution in the consistent $D\rightarrow4$ Einstein-Gauss-Bonnet gravity proposed in [K. Aoki, M.A. Gorji, and S. Mukohyama, Phys. Lett. B {\bf 810}, 135843 (2020)] and construct the gravity duals of ($2+1$)-dimensional superconductors with Gauss-Bonnet corrections in the probe limit. We find that the curvature correction has a more subtle effect on the scalar condensates in the s-wave superconductor in ($2+1$)-dimensions, which is different from the finding in the higher-dimensional superconductors that the higher curvature correction makes the scalar hair more difficult to be developed in the full parameter space. However, in the p-wave case, we observe that the higher curvature correction always makes it harder for the vector condensates to form in various dimensions. Moreover, we note that the higher curvature correction results in the larger deviation from the expected relation in the gap frequency $ω_g/T_c\approx 8$ in both ($2+1$)-dimensional s-wave and p-wave models.

preprint2021arXiv

Necessary and sufficient criterion of steering for two-qubit T states

Einstein-Podolsky-Rosen (EPR) steering is the ability that an observer persuades a distant observer to share entanglement by making local measurements. Determining a quantum state is steerable or unsteerable remains an open problem. Here, we derive a new steering inequality with infinite measurements corresponding to an arbitrary two-qubit T state, from consideration of EPR steering inequalities with N projective measurement settings for each side. In fact, the steering inequality is also a sufficient criterion for guaranteering that the T state is unsteerable. Hence, the steering inequality can be viewed as a necessary and sufficient criterion to distinguish whether the T state is steerable or unsteerable. In order to reveal the fact that the set composed of steerable states is the strict subset of the set made up of entangled states, we prove theoretically that all separable T states can not violate the steering inequality. Moreover, we put forward a method to estimate the maximum violation from concurrence for arbitrary two-qubit T states, which indicates that the T state is steerable if its concurrence exceeds 1/4.

preprint2021arXiv

Reconfigurable-intelligent-surface-assisted Downlink Transmission Design via Bayesian Optimization

This paper investigates the transmission design in the reconfigurable-intelligent-surface (RIS)-assisted downlink system. The channel state information (CSI) is usually difficult to be estimated at the base station (BS) when the RIS is not equipped with radio frequency chains. In this paper, we propose a downlink transmission framework with unknown CSI via Bayesian optimization. Since the CSI is not available at the BS, we treat the unknown objective function as the black-box function and take the beamformer, the phase shift, and the receiving filter as the input. Then the objective function is decomposed as the sum of low-dimension subfunctions to reduce the complexity. By re-expressing the power constraint of the BS in spherical coordinates, the original constraint problem is converted into an equivalent unconstrained problem. The users estimate the sum MSE of the training symbols as the objective value and feed it back to the BS. We assume a Gaussian prior of the feedback samples and the next query point is updated by minimizing the constructed acquisition function. Furthermore, this framework can also be applied to the power transfer system and fairness problems. Simulation results validate the effectiveness of the proposed transmission scheme in the downlink data transmission and power transfer.

preprint2021arXiv

Robust stimulated Raman shortcut-to-adiabatic passage by invariant-based optimal control

The stimulated Raman adiabatic passage (STIRAP) shows an efficient technique that accurately transfers population between two discrete quantum states with the same parity, in three-level quantum systems based on adiabatic evolution. This technique has widely theoretical and experimental applications in many fields of physics, chemistry, and beyond. Here, we present a generally robust approach to speed up STIRAP with invariant-based shortcut to adiabaticity. By controlling the dynamical process, we inversely design a family of Hamiltonians that can realize fast and accurate population transfer from the first to the third level, while the systematic error is largely suppressed in general. Furthermore, a detailed trade-off relation between the population of the intermediate state and the amplitudes of Rabi frequencies in the transfer process is illustrated. These results provide an optimal route toward manipulating the evolution of three-level quantum systems in future quantum information processing.

preprint2021arXiv

Self-Amplification of Coherent Energy Modulation in Seeded Free-Electron Lasers

The spectroscopic techniques for time-resolved fine analysis of matter require coherent X-ray radiation with femtosecond duration and high average brightness. Seeded free-electron lasers (FELs), which use the frequency up-conversion of an external seed laser to improve temporal coherence, are ideal for providing fully coherent soft X-ray pulses. However, it is difficult to operate seeded FELs at a high repetition rate due to the limitations of present state-of-the-art laser systems. Here, we report the novel self-modulation method for enhancing laser-induced energy modulation, thereby significantly reducing the requirement of an external laser system. Driven by this scheme, we experimentally realize high harmonic generation in a seeded FEL using an unprecedentedly small energy modulation. An electron beam with a laser-induced energy modulation as small as 1.8 times the slice energy spread is used for lasing at the 7th harmonic of a 266-nm seed laser in a single-stage high-gain harmonic generation (HGHG) setup and the 30th harmonic of the seed laser in a two-stage HGHG setup. The results mark a major step towards a high-repetition-rate, fully coherent X-ray FEL.

preprint2020arXiv

A deep learning approach to multi-track location and orientation in gaseous drift chambers

Accurate measuring the location and orientation of individual particles in a beam monitoring system is of particular interest to researchers in multiple disciplines. Among feasible methods, gaseous drift chambers with hybrid pixel sensors have the great potential to realize long-term stable measurement with considerable precision. In this paper, we introduce deep learning to analyze patterns in the beam projection image to facilitate three-dimensional reconstruction of particle tracks. We propose an end-to-end neural network based on segmentation and fitting for feature extraction and regression. Two segmentation branches, named binary segmentation and semantic segmentation, perform initial track determination and pixel-track association. Then pixels are assigned to multiple tracks, and a weighted least squares fitting is implemented with full back-propagation. Besides, we introduce a center-angle measure to judge the precision of location and orientation by combining two separate factors. The initial position resolution achieves 8.8 $μm$ for the single track and 11.4 $μm$ (15.2 $μm$) for the 1-3 tracks (1-5 tracks), and the angle resolution achieves 0.15$^{\circ}$ and 0.21$^{\circ}$ (0.29$^{\circ}$) respectively. These results show a significant improvement in accuracy and multi-track compatibility compared to traditional methods.

preprint2020arXiv

An efficient iterative method for reconstructing surface from point clouds

Surface reconstruction from point clouds is a fundamental step in many applications in computer vision. In this paper, we develop an efficient iterative method on a variational model for the surface reconstruction from point clouds. The surface is implicitly represented by indicator functions and the energy functional is then approximated based on such representations using heat kernel convolutions. We then develop a novel iterative method to minimize the approximate energy and prove the energy decaying property during each iteration. We then use asymptotic expansion to give a connection between the proposed algorithm and active contour models. Extensive numerical experiments are performed in both 2- and 3- dimensional Euclidean spaces to show that the proposed method is simple, efficient, and accurate.

preprint2020arXiv

ASR-Free Pronunciation Assessment

Most of the pronunciation assessment methods are based on local features derived from automatic speech recognition (ASR), e.g., the Goodness of Pronunciation (GOP) score. In this paper, we investigate an ASR-free scoring approach that is derived from the marginal distribution of raw speech signals. The hypothesis is that even if we have no knowledge of the language (so cannot recognize the phones/words), we can still tell how good a pronunciation is, by comparatively listening to some speech data from the target language. Our analysis shows that this new scoring approach provides an interesting correction for the phone-competition problem of GOP. Experimental results on the ERJ dataset demonstrated that combining the ASR-free score and GOP can achieve better performance than the GOP baseline.

preprint2020arXiv

Backreacting holographic superconductors from the coupling of a scalar field to the Einstein tensor

We investigate the properties of the backreacting holographic superconductors from the coupling of a scalar field to the Einstein tensor in the background of a d-dimensional AdS black hole. Imposing the Dirichlet boundary condition of the trial function without the Neumann boundary conditions, we improve the analytical Sturm-Liouville method with an iterative procedure to explore the pure effect of the Einstein tensor on the holographic superconductors and find that the Einstein tensor hinders the condensate of the scalar field but does not affect the critical phenomena. Our analytical findings are in very good agreement with the numerical results from the "marginally stable modes" method, which implies that the Sturm-Liouville method is still powerful to study the holographic superconductors from the coupling of a scalar field to the Einstein tensor even if we consider the backreactions.

preprint2020arXiv

Consistency of archetypal analysis

Archetypal analysis is an unsupervised learning method that uses a convex polytope to summarize multivariate data. For fixed $k$, the method finds a convex polytope with $k$ vertices, called archetype points, such that the polytope is contained in the convex hull of the data and the mean squared distance between the data and the polytope is minimal. In this paper, we prove a consistency result that shows if the data is independently sampled from a probability measure with bounded support, then the archetype points converge to a solution of the continuum version of the problem, of which we identify and establish several properties. We also obtain the convergence rate of the optimal objective values under appropriate assumptions on the distribution. If the data is independently sampled from a distribution with unbounded support, we also prove a consistency result for a modified method that penalizes the dispersion of the archetype points. Our analysis is supported by detailed computational experiments of the archetype points for data sampled from the uniform distribution in a disk, the normal distribution, an annular distribution, and a Gaussian mixture model.

preprint2020arXiv

Cooling-Shrinking Attack: Blinding the Tracker with Imperceptible Noises

Adversarial attack of CNN aims at deceiving models to misbehave by adding imperceptible perturbations to images. This feature facilitates to understand neural networks deeply and to improve the robustness of deep learning models. Although several works have focused on attacking image classifiers and object detectors, an effective and efficient method for attacking single object trackers of any target in a model-free way remains lacking. In this paper, a cooling-shrinking attack method is proposed to deceive state-of-the-art SiameseRPN-based trackers. An effective and efficient perturbation generator is trained with a carefully designed adversarial loss, which can simultaneously cool hot regions where the target exists on the heatmaps and force the predicted bounding box to shrink, making the tracked target invisible to trackers. Numerous experiments on OTB100, VOT2018, and LaSOT datasets show that our method can effectively fool the state-of-the-art SiameseRPN++ tracker by adding small perturbations to the template or the search regions. Besides, our method has good transferability and is able to deceive other top-performance trackers such as DaSiamRPN, DaSiamRPN-UpdateNet, and DiMP. The source codes are available at https://github.com/MasterBin-IIAU/CSA.

preprint2020arXiv

CovidSens: A Vision on Reliable Social Sensing for COVID-19

With the spiraling pandemic of the Coronavirus Disease 2019 (COVID-19), it has becoming inherently important to disseminate accurate and timely information about the disease. Due to the ubiquity of Internet connectivity and smart devices, social sensing is emerging as a dynamic AI-driven sensing paradigm to extract real-time observations from online users. In this paper, we propose CovidSens, a vision of social sensing based risk alert systems to spontaneously obtain and analyze social data to infer COVID-19 propagation. CovidSens can actively help to keep the general public informed about the COVID-19 spread and identify risk-prone areas. The CovidSens concept is motivated by three observations: 1) people actively share their experience of COVID-19 via online social media, 2) official warning channels and news agencies are relatively slower than people reporting on social media, and 3) online users are frequently equipped with powerful mobile devices that can perform data processing and analytics. We envision unprecedented opportunities to leverage posts generated by ordinary people to build real-time sensing and analytic system for gathering and circulating COVID-19 propagation data. Specifically, the vision of CovidSens attempts to answer the questions: How to distill reliable information on COVID-19 with prevailing rumors and misinformation? How to inform the general public about the state of the spread timely and effectively? How to leverage the computational power on edge devices to construct fully integrated edge-based social sensing platforms? In this vision paper, we discuss the roles of CovidSens and identify potential challenges in developing reliable social sensing based risk alert systems. We envision that approaches originating from multiple disciplines can be effective in addressing the challenges. Finally, we outline a few research directions for future work in CovidSens.

preprint2020arXiv

Curriculum Audiovisual Learning

Associating sound and its producer in complex audiovisual scene is a challenging task, especially when we are lack of annotated training data. In this paper, we present a flexible audiovisual model that introduces a soft-clustering module as the audio and visual content detector, and regards the pervasive property of audiovisual concurrency as the latent supervision for inferring the correlation among detected contents. To ease the difficulty of audiovisual learning, we propose a novel curriculum learning strategy that trains the model from simple to complex scene. We show that such ordered learning procedure rewards the model the merits of easy training and fast convergence. Meanwhile, our audiovisual model can also provide effective unimodal representation and cross-modal alignment performance. We further deploy the well-trained model into practical audiovisual sound localization and separation task. We show that our localization model significantly outperforms existing methods, based on which we show comparable performance in sound separation without referring external visual supervision. Our video demo can be found at https://youtu.be/kuClfGG0cFU.

preprint2020arXiv

DASC: Towards A Road Damage-Aware Social-Media-Driven Car Sensing Framework for Disaster Response Applications

While vehicular sensor networks (VSNs) have earned the stature of a mobile sensing paradigm utilizing sensors built into cars, they have limited sensing scopes since car drivers only opportunistically discover new events. Conversely, social sensing is emerging as a new sensing paradigm where measurements about the physical world are collected from humans. In contrast to VSNs, social sensing is more pervasive, but one of its key limitations lies in its inconsistent reliability stemming from the data contributed by unreliable human sensors. In this paper, we present DASC, a road Damage-Aware Social-media-driven Car sensing framework that exploits the collective power of social sensing and VSNs for reliable disaster response applications. However, integrating VSNs with social sensing introduces a new set of challenges: i) How to leverage noisy and unreliable social signals to route the vehicles to accurate regions of interest? ii) How to tackle the inconsistent availability (e.g., churns) caused by car drivers being rational actors? iii) How to efficiently guide the cars to the event locations with little prior knowledge of the road damage caused by the disaster, while also handling the dynamics of the physical world and social media? The DASC framework addresses the above challenges by establishing a novel hybrid social-car sensing system that employs techniques from game theory, feedback control, and Markov Decision Process (MDP). In particular, DASC distills signals emitted from social media and discovers the road damages to effectively drive cars to target areas for verifying emergency events. We implement and evaluate DASC in a reputed vehicle simulator that can emulate real-world disaster response scenarios. The results of a real-world application demonstrate the superiority of DASC over current VSNs-based solutions in detection accuracy and efficiency.

preprint2020arXiv

Development of readout electronics a novel beam monitoring system for ion research facility accelerator

This article presents the readout electronics of a novel beam monitoring system for ion research facility accelerator. The readout electronics are divided into Front-end Card (FEC) and Readout Control Unit (RCU). FEC uses Topmetal II minus to processes the energy of the hitting particles and convert it into a voltage signal. The main function of RCU is to digitize the analog output signal of FEC and format the raw data. On the other hand, the RCU also processes the control commands from the host and distributes the commands according to the mapping. The readout electronic has been characterized and calibrated in the laboratory, and have been installed with the detector. Implementation and testing of readout electronics have been discussed.

preprint2020arXiv

Domain-Invariant Speaker Vector Projection by Model-Agnostic Meta-Learning

Domain generalization remains a critical problem for speaker recognition, even with the state-of-the-art architectures based on deep neural nets. For example, a model trained on reading speech may largely fail when applied to scenarios of singing or movie. In this paper, we propose a domain-invariant projection to improve the generalizability of speaker vectors. This projection is a simple neural net and is trained following the Model-Agnostic Meta-Learning (MAML) principle, for which the objective is to classify speakers in one domain if it had been updated with speech data in another domain. We tested the proposed method on CNCeleb, a new dataset consisting of single-speaker multi-condition (SSMC) data. The results demonstrated that the MAML-based domain-invariant projection can produce more generalizable speaker vectors, and effectively improve the performance in unseen domains.

preprint2020arXiv

Experimental demonstration of complementarity relations between quantum steering criteria

The ability that one system immediately affects another one by using local measurements is regarded as quantum steering, which can be detected by various steering criteria. Recently, Mondal et al. [Phys. Rev. A 98, 052330 (2018)] derived the complementarity relations of coherence steering criteria, and revealed that the quantum steering of system can be observed through the average coherence of subsystem. Here, we experimentally verify the complementarity relations between quantum steering criteria by employing two-photon Bell-like states and three Pauli operators. The results demonstrate that if prepared quantum states can violate two setting coherence steering criteria and turn out to be steerable states, then it cannot violate the complementary settings criteria. Three measurement settings inequality, which establish a complementarity relation between these two coherence steering criteria, always holds in experiment. Besides, we experimentally certify that the strengths of coherence steering criteria dependent on the choice of coherence measure. In comparison with two setting coherence steering criteria based on l1 norm of coherence and relative entropy of coherence, our experimental results show that the steering criterion based on skew information of coherence is more stronger in detecting the steerability of quantum states. Thus, our experimental demonstrations can deepen the understanding of the relation between the quantum steering and quantum coherence.

preprint2020arXiv

FocalMix: Semi-Supervised Learning for 3D Medical Image Detection

Applying artificial intelligence techniques in medical imaging is one of the most promising areas in medicine. However, most of the recent success in this area highly relies on large amounts of carefully annotated data, whereas annotating medical images is a costly process. In this paper, we propose a novel method, called FocalMix, which, to the best of our knowledge, is the first to leverage recent advances in semi-supervised learning (SSL) for 3D medical image detection. We conducted extensive experiments on two widely used datasets for lung nodule detection, LUNA16 and NLST. Results show that our proposed SSL methods can achieve a substantial improvement of up to 17.3% over state-of-the-art supervised learning approaches with 400 unlabeled CT scans.

preprint2020arXiv

Graph Representation Learning for Merchant Incentive Optimization in Mobile Payment Marketing

Mobile payment such as Alipay has been widely used in our daily lives. To further promote the mobile payment activities, it is important to run marketing campaigns under a limited budget by providing incentives such as coupons, commissions to merchants. As a result, incentive optimization is the key to maximizing the commercial objective of the marketing campaign. With the analyses of online experiments, we found that the transaction network can subtly describe the similarity of merchants' responses to different incentives, which is of great use in the incentive optimization problem. In this paper, we present a graph representation learning method atop of transaction networks for merchant incentive optimization in mobile payment marketing. With limited samples collected from online experiments, our end-to-end method first learns merchant representations based on an attributed transaction networks, then effectively models the correlations between the commercial objectives each merchant may achieve and the incentives under varying treatments. Thus we are able to model the sensitivity to incentive for each merchant, and spend the most budgets on those merchants that show strong sensitivities in the marketing campaign. Extensive offline and online experimental results at Alipay demonstrate the effectiveness of our proposed approach.

preprint2020arXiv

High-Performance Long-Term Tracking with Meta-Updater

Long-term visual tracking has drawn increasing attention because it is much closer to practical applications than short-term tracking. Most top-ranked long-term trackers adopt the offline-trained Siamese architectures, thus, they cannot benefit from great progress of short-term trackers with online update. However, it is quite risky to straightforwardly introduce online-update-based trackers to solve the long-term problem, due to long-term uncertain and noisy observations. In this work, we propose a novel offline-trained Meta-Updater to address an important but unsolved problem: Is the tracker ready for updating in the current frame? The proposed meta-updater can effectively integrate geometric, discriminative, and appearance cues in a sequential manner, and then mine the sequential information with a designed cascaded LSTM module. Our meta-updater learns a binary output to guide the tracker's update and can be easily embedded into different trackers. This work also introduces a long-term tracking framework consisting of an online local tracker, an online verifier, a SiamRPN-based re-detector, and our meta-updater. Numerous experimental results on the VOT2018LT, VOT2019LT, OxUvALT, TLP, and LaSOT benchmarks show that our tracker performs remarkably better than other competing algorithms. Our project is available on the website: https://github.com/Daikenan/LTMU.

preprint2020arXiv

Improve bone age assessment by learning from anatomical local regions

Skeletal bone age assessment (BAA), as an essential imaging examination, aims at evaluating the biological and structural maturation of human bones. In the clinical practice, Tanner and Whitehouse (TW2) method is a widely-used method for radiologists to perform BAA. The TW2 method splits the hands into Region Of Interests (ROI) and analyzes each of the anatomical ROI separately to estimate the bone age. Because of considering the analysis of local information, the TW2 method shows accurate results in practice. Following the spirit of TW2, we propose a novel model called Anatomical Local-Aware Network (ALA-Net) for automatic bone age assessment. In ALA-Net, anatomical local extraction module is introduced to learn the hand structure and extract local information. Moreover, we design an anatomical patch training strategy to provide extra regularization during the training process. Our model can detect the anatomical ROIs and estimate bone age jointly in an end-to-end manner. The experimental results show that our ALA-Net achieves a new state-of-the-art single model performance of 3.91 mean absolute error (MAE) on the public available RSNA dataset. Since the design of our model is well consistent with the well recognized TW2 method, it is interpretable and reliable for clinical usage.

preprint2020arXiv

Improved tripartite uncertainty relation with quantum memory

Uncertainty principle is a striking and fundamental feature in quantum mechanics distinguishing from classical mechanics. It offers an important lower bound to predict outcomes of two arbitrary incompatible observables measured on a particle. In quantum information theory, this uncertainty principle is popularly formulized in terms of entropy. Here, we present an improvement of tripartite quantum-memory-assisted entropic uncertainty relation. The uncertainty's lower bound is derived by considering mutual information and Holevo quantity. It shows that the bound derived by this method will be tighter than the lower bound in [Phys. Rev. Lett. 103, 020402 (2009)]. Furthermore, regarding a pair of mutual unbiased bases as the incompatibility, our bound will become extremely tight for the three-qubit $\emph{X}$-state system, completely coinciding with the entropy-based uncertainty, and can restore Renes ${\emph{et al.}}$'s bound with respect to arbitrary tripartite pure states. In addition, by applying our lower bound, one can attain the tighter bound of quantum secret key rate, which is of basic importance to enhance the security of quantum key distribution protocols.

preprint2020arXiv

Jointly Modeling Motion and Appearance Cues for Robust RGB-T Tracking

In this study, we propose a novel RGB-T tracking framework by jointly modeling both appearance and motion cues. First, to obtain a robust appearance model, we develop a novel late fusion method to infer the fusion weight maps of both RGB and thermal (T) modalities. The fusion weights are determined by using offline-trained global and local multimodal fusion networks, and then adopted to linearly combine the response maps of RGB and T modalities. Second, when the appearance cue is unreliable, we comprehensively take motion cues, i.e., target and camera motions, into account to make the tracker robust. We further propose a tracker switcher to switch the appearance and motion trackers flexibly. Numerous results on three recent RGB-T tracking datasets show that the proposed tracker performs significantly better than other state-of-the-art algorithms.

preprint2020arXiv

Metal-free magnetism in chemically doped covalent organic frameworks

Organic and molecule-based magnets are not easily attainable, because to introduce stable paramagnetic centers to pure organic systems is challenging. Crystalline covalent organic frameworks (COFs) with high designability and chemical diversity constitute ideal platforms to access intriguing magnetic phenomena of organic materials. In this work, we proposed a general approach to attain unpaired electron spin and metal-free magnetism in narrow-band COFs by chemical doping. By using density functional theory calculations, we found that dopants with energy-matched frontier orbitals to COFs not only inject charges to them but also further localize the charges through orbital hybridization and formation of supramolecular charge-transfer complex. The localized states enable stable paramagnetic centers introduced to nonmagnetic COFs. Based on this discovery, we designed two new COFs with narrow valence band, which show prospective magnetism after doping with iodine. Further, we unraveled magnetic anisotropy in two-dimensional COFs and showed that both spin-conduction and magnetic interactions can be modulated by manipulating the building blocks of COFs. Our work highlights a practical scenario to attain magnetism in COFs and other organic materials, which hold great promise for applications in organic spintronic devices.

preprint2020arXiv

Metastable atomic layer deposition: 3D self-assembly towards ultra dark materials

Black body materials prove promising candidates to meet future energy demands as they are able to harvest energy from the total bandwidth of solar radiation. Here, we report on high absorption (> 98 %) near-black body-like structures consisting of a silica scaffold and Ag nanoparticles with a layer thickness below 10 um; fabricated using metastable atomic layer deposition (MS-ALD) and to be applied for a wide solar spectrum ranging from 220 nm to 2500 nm. Several effects contribute collectively and in a synergistic manner to the high absorbance, including the pronounced heterogeneity of the nanoparticles in size and shape, particle plasmon hybridization and the trapping of omni-directionally scattered light in the 3D hierarchical hybrid structures. We propose that, in the future, MS-ALD needs to be considered as a simple and promising method to fabricate black-body materials with excellent broadband absorption.

preprint2020arXiv

Multispectral Pan-sharpening via Dual-Channel Convolutional Network with Convolutional LSTM Based Hierarchical Spatial-Spectral Feature Fusion

Multispectral pan-sharpening aims at producing a high resolution (HR) multispectral (MS) image in both spatial and spectral domains by fusing a panchromatic (PAN) image and a corresponding MS image. In this paper, we propose a novel dual-channel network (DCNet) framework for MS pan-sharpening. In our DCNet, the dual-channel backbone involves a spatial channel to capture spatial information with a 2D CNN, and a spectral channel to extract spectral information with a 3D CNN. This heterogeneous 2D/3D CNN architecture can minimize causing spectral information distortion, which typically happens in conventional 2D CNN models. In order to fully integrate the spatial and spectral features captured from different levels, we introduce a multi-level fusion strategy. Specifically, a spatial-spectral CLSTM (S$^2$-CLSTM) module is proposed for fusing the hierarchical spatial and spectral features, which can effectively capture correlations among multi-level features. The S$^2$-CLSTM module attaches two fusion ways: the intra-level fusion via bi-directional lateral connections and inter-level fusion via the cell state in the S$^2$-CLSTM. Finally, the ideal HR-MS image is recovered by a reconstruction module. Extensive experiments have been conducted at both simulated lower scale and the original scale of real-world datasets. Compared with the state-of-the-art methods, the proposed DCNet achieves superior or competitive performance.

preprint2020arXiv

Neural Discriminant Analysis for Deep Speaker Embedding

Probabilistic Linear Discriminant Analysis (PLDA) is a popular tool in open-set classification/verification tasks. However, the Gaussian assumption underlying PLDA prevents it from being applied to situations where the data is clearly non-Gaussian. In this paper, we present a novel nonlinear version of PLDA named as Neural Discriminant Analysis (NDA). This model employs an invertible deep neural network to transform a complex distribution to a simple Gaussian, so that the linear Gaussian model can be readily established in the transformed space. We tested this NDA model on a speaker recognition task where the deep speaker vectors (x-vectors) are presumably non-Gaussian. Experimental results on two datasets demonstrate that NDA consistently outperforms PLDA, by handling the non-Gaussian distributions of the x-vectors.

preprint2020arXiv

Newcomer Candidate: Characterizing Contributions of a Novice Developer to GitHub

Context: To attract, onboard, and retain any new-comer in Open Source Software (OSS) projects is vital to their livelihood. Recent studies conclude that OSS projects risk failure due to abandonment and poor participation of newcomers. Evidence suggests more new users are joining GitHub, however, the extent to which they contribute to OSS projects is unknown. Objective: In this study, we coin the term 'newcomer candidate' to describe new users to the GitHub platform. Our objective is to track and characterize their initial contributions. As a preliminary survey, we collected 208 newcomer candidate contributions in GitHub. Using this dataset, we then plan to track their contributions to reveal insights. Method: We will use a mixed-methods approach, i.e., quantitative and qualitative, to identify whether or not newcomer candidates practice social coding, the kinds of their contributions, projects they target, and the proportion that they eventually onboard to an OSS project. Limitation: The key limitation is that our newcomer candidates are restricted to those that were collected from our preliminary survey.

preprint2020arXiv

Privacy Risk and Preservation For COVID-19 Contact Tracing Apps

Contact tracing in the COVID-19 pandemic is key to prevent the further spread of COVID-19. Countries and regions around the world have developed and deployed or are considering adopting contact-tracing software or mobile apps. While contact tracing apps and software play an important role in the pandemic, red flags have been raised regarding the privacy risk associated with contact tracing. In this short paper, we provide an overview on the GPS and Bluetooth based contact-tracing apps in the framework of both centralized and decentralized models, examine the associated privacy risk and the effectiveness of the privacy-preserving measures adopted in different apps.

preprint2020arXiv

Stabilizing Training of Generative Adversarial Nets via Langevin Stein Variational Gradient Descent

Generative adversarial networks (GANs), famous for the capability of learning complex underlying data distribution, are however known to be tricky in the training process, which would probably result in mode collapse or performance deterioration. Current approaches of dealing with GANs' issues almost utilize some practical training techniques for the purpose of regularization, which on the other hand undermines the convergence and theoretical soundness of GAN. In this paper, we propose to stabilize GAN training via a novel particle-based variational inference -- Langevin Stein variational gradient descent (LSVGD), which not only inherits the flexibility and efficiency of original SVGD but aims to address its instability issues by incorporating an extra disturbance into the update dynamics. We further demonstrate that by properly adjusting the noise variance, LSVGD simulates a Langevin process whose stationary distribution is exactly the target distribution. We also show that LSVGD dynamics has an implicit regularization which is able to enhance particles' spread-out and diversity. At last we present an efficient way of applying particle-based variational inference on a general GAN training procedure no matter what loss function is adopted. Experimental results on one synthetic dataset and three popular benchmark datasets -- Cifar-10, Tiny-ImageNet and CelebA validate that LSVGD can remarkably improve the performance and stability of various GAN models.

preprint2020arXiv

Towards Privacy-aware Task Allocation in Social Sensing based Edge Computing Systems

With the advance in mobile computing, Internet of Things, and ubiquitous wireless connectivity, social sensing based edge computing (SSEC) has emerged as a new computation paradigm where people and their personally owned devices collect sensor measurements from the physical world and process them at the edge of the network. This paper focuses on a privacy-aware task allocation problem where the goal is to optimize the computation task allocation in SSEC systems while respecting the users' customized privacy settings. It introduces a novel Game-theoretic Privacy-aware Task Allocation (G-PATA) framework to achieve the goal. G-PATA includes (i) a bottom-up game-theoretic model to generate the maximum payoffs at end devices while satisfying the end user's privacy settings; (ii) a top-down incentive scheme to adjust the rewards for the tasks to ensure that the task allocation decisions made by end devices meet the Quality of Service (QoS) requirements of the applications. Furthermore, the framework incorporates an efficient load balancing and iteration reduction component to adapt to the dynamic changes in status and privacy configurations of end devices. The G-PATA framework was implemented on a real-world edge computing platform that consists of heterogeneous end devices (Jetson TX1 and TK1 boards, and Raspberry Pi3). We compare G-PATA with state-of-the-art task allocation schemes through two real-world social sensing applications. The results show that G-PATA significantly outperforms existing approaches under various privacy settings (our scheme achieved as much as 47% improvements in delay reduction for the application and 15% more payoffs for end devices compared to the baselines.).

preprint2020arXiv

WANA: Symbolic Execution of Wasm Bytecode for Cross-Platform Smart Contract Vulnerability Detection

Many popular blockchain platforms are supporting smart contracts for building decentralized applications. However, the vulnerabilities within smart contracts have led to serious financial loss to their end users. For the EOSIO blockchain platform, effective vulnerability detectors are still limited. Furthermore, existing vulnerability detection tools can only support one blockchain platform. In this work, we present WANA, a cross-platform smart contract vulnerability detection tool based on the symbolic execution of WebAssembly bytecode. Furthermore, WANA proposes a set of test oracles to detect the vulnerabilities in EOSIO and Ethereum smart contracts based on WebAssembly bytecode analysis. Our experimental analysis shows that WANA can effectively detect vulnerabilities in both EOSIO and Ethereum smart contracts with high efficiency.

preprint2019arXiv

Enlightening force chains: a review of photoelasticimetry in granular matter

A photoelastic material will reveal its internal stresses when observed through polarizing filters. This eye-catching property has enlightened our understanding of granular materials for over half a century, whether in the service of art, education, or scientific research. In this review article in honor of Robert Behringer, we highlight both his pioneering use of the method in physics research, and its reach into the public sphere through museum exhibits and outreach programs. We aim to provide clear protocols for artists, exhibit-designers, educators, and scientists to use in their own endeavors. It is our hope that this will build awareness about the ubiquitous presence of granular matter in our lives, enlighten its puzzling behavior, and promote conversations about its importance in environmental and industrial contexts. To aid in this endeavor, this paper also serves as a front door to a detailed wiki containing open, community-curated guidance on putting these methods into practice.

preprint2019arXiv

Experimental certification of steering criterion based on general entropic uncertainty relation

Quantum steering describes the phenomenon that one system can be immediately influenced by another with local measurements. It can be detected by the violation of a powerful and useful steering criterion from general entropic uncertainty relation. This criterion, in principle, can be evaluated straightforwardly and achieved by only probability distributions from a finite set of measurement settings. Herein, we experimentally verify the steering criterion by means of the two-photon Werner-like states and three Pauli measurements. The results indicate that quantum steering can be verified by the criterion in a convenient way. In particular, it is no need to perform the usual quantum state tomography in experiment, which reduces the required experimental resources greatly. Moreover, we demonstrate that the criterion is stronger than the linear one for the detecting quantum steering of the Werner-like states.

preprint2019arXiv

Experimental investigation of entropic uncertainty relations and coherence uncertainty relations

Uncertainty relation usually is one of the most important features in quantum mechanics, and is the backbone of quantum theory, which distinguishes from the rule in classical counterpart. Specifically, entropy-based uncertainty relations are of fundamental importance in the region of quantum information theory, offering one nontrivial bound of key rate towards quantum key distribution. In this work, we experimentally demonstrate the entropic uncertainty relations and coherence-based uncertainty relations in an all-optics platform. By means of preparing two kinds of bipartite initial states with high fidelity, i.e., Bell-like states and Bell-like diagonal states, we carry on local projective measurements over a complete set of mutually unbiased bases on the measured subsystem. In terms of quantum tomography, the density matrices of the initial states and the post-measurement states are reconstructed. It shows that our experimental results coincide with the theoretical predictions very well. Additionally, we also verify that the lower bounds of both the entropy-based and coherence-based uncertainty can be tightened by imposing the Holevo quantity and mutual information, and the entropic uncertainty is inversely correlated with the coherence. Our demonstrations might offer an insight into their uncertainty relations and their connection to quantum coherence in quantum information science, which might be applicable to the security analysis of quantum key distributions.

preprint2019arXiv

Experimental observation the Einstein-Podolsky-Rosen Steering based on the detection of entanglement

The Einstein-Podolsky-Rosen (EPR) steering is an intermediate quantum nonlocality between entanglement and Bell nonlocality, which plays an important role in quantum information processing tasks. In the past few years, the investigations concerning EPR steering have been demonstrated in a series of experiments. However, these studies rely on the relevant steering inequalities and the choices of measurement settings. Here, we experimentally verify the EPR steering via entanglement detection without using any steering inequality and measurement setting. By constructing two new states from a two-qubit target state, we observe the EPR steering by detecting the entanglement of these new states. The results show that the entanglement of the newly constructed states can be regarded as a new kind of steering witness for target states. Compared to the results of Xiao et al. [Phys. Rev. Lett. 118, 140404 (2017)], we find that the ability of detecting EPR steering in our scenario is stronger than two-setting projective measurements, which can observe more steerable states. Hence, our demonstrations can deepen the understanding of the connection between the EPR steering and entanglement.

preprint2019arXiv

Generalized random matrix model with additional interactions

We introduce a log-gas model that is a generalization of a random matrix ensemble with an additional interaction, whose strength depends on a parameter $γ$. The equilibrium density is computed by numerically solving the Riemann-Hilbert problem associated with the ensemble. The effect of the additional parameter $γ$ associated with the two-body interaction can be understood in terms of an effective $γ$-dependent single-particle confining potential.

preprint2018arXiv

Generalized superconductors from the coupling of a scalar field to the Einstein tensor and their refractive index in massive gravity

We construct the generalized superconductors from the coupling of a scalar field to the Einstein tensor in the massive gravity and investigate their negative refraction in the probe limit. We observe that the larger graviton mass and Einstein tensor coupling parameters both hinder the formation of the condensation, but the larger graviton mass or smaller coupling parameter makes it easier for the emergence of the Cave of Winds. Furthermore, we see that the larger graviton mass but smaller coupling parameter make the range of frequencies or the range of temperatures larger for which a negative Depine-Lakhtakia index occurs, which indicates that the graviton mass and Einstein tensor have completely different effects on the negative refraction. In addition, we find that the larger graviton mass and coupling parameters both can reduce the dissipation and improve the propagation in the holographic setup.

preprint2018arXiv

Optical control of magnetism in NiFe/VO2 heterostructures

Optical methods for magnetism manipulation have been considered as a promising strategy for ultralow-power and ultrahigh-speed spin switches, which becomes a hot spot in the field of spintronics. However, a widely applicable and efficient method to combine optical operation with magnetic modulation is still highly desired. Here, the strongly correlated electron material VO2 is introduced to realize phase-transition based optical control of the magnetism in NiFe. The NiFe/VO2 bilayer heterostructure features appreciable modulations in electrical conductivity (55%), coercivity (60%), and magnetic anisotropy (33.5%). Further analyses indicate that interfacial strain coupling plays a crucial role in this modulation. Utilizing this optically controlled magnetism modulation feature, programmable Boolean logic gates (AND, OR, NAND, NOR, XOR, NXOR and NOT) for high-speed and low-power data processing are demonstrated based on this engineered heterostructure. As a demonstration of phase-transition spintronics, this work may pave the way for next-generation electronics in the post-Moore era.