Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
38works
0followers
25topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

38 published item(s)

preprint2026arXiv

A Concise Agent is Less Expert: Revealing Side Effects of Using Style Features on Conversational Agents

Style features such as friendly, helpful, or concise are widely used in prompts to steer the behavior of Large Language Model (LLM) conversational agents, yet their unintended side effects remain poorly understood. In this work, we present the first systematic study of cross-feature stylistic side effects. We conduct a comprehensive survey of 127 conversational agent papers from ACL Anthology and identify 12 frequently used style features. Using controlled, synthetic dialogues across task-oriented and open domain settings, we quantify how prompting for one style feature causally affects others via a pairwise LLM as a Judge evaluation framework. Our results reveal consistent and structured side effects, such as prompting for conciseness significantly reduces perceived expertise. They demonstrate that style features are deeply entangled rather than orthogonal. To support future research, we introduce CASSE (Conversational Agent Stylistic Side Effects), a dataset capturing these complex interactions. We further evaluate prompt based and activation steering based mitigation strategies and find that while they can partially restore suppressed traits, they often degrade the primary intended style. These findings challenge the assumption of faithful style control in LLMs and highlight the need for multi-objective and more principled approaches to safe, targeted stylistic steering in conversational agents.

preprint2026arXiv

CHI-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows?

End-to-end automation of realistic healthcare operations stresses three capabilities underrepresented in current benchmarks: policy density, decisions must be grounded in a large library of medical, insurance, and operational rules; Multi-role composition: a single task requires the agent to play multiple roles with handoffs; and multilateral interaction: intermediate workflow steps are multi-turn dialogs, such as peer-to-peer review and patient outreach. We introduce $χ$-Bench, a benchmark of long-horizon healthcare workflows across three domains: provider prior authorization, payer utilization management, and care management. Each task hands the agent a clinical case in a high-fidelity simulator of 20 healthcare apps exposed via 87 MCP tools, which it must drive to a terminal status through tool calls and writing the role's artifacts, guided by a 1,290+ document managed-care operations handbook skill. Across 30 agent harness/models configurations, the best agent resolves only 28.0% of tasks, no agent clears 20% on strict pass^3, and executing all tasks in a single session slumps the performance to 3.8%. These results raise the hypothesis that similar gaps are likely to surface in other policy-dense, role-composed, irreversible enterprise domains.

preprint2026arXiv

GRPO-TTA: Test-Time Visual Tuning for Vision-Language Models via GRPO-Driven Reinforcement Learning

Group Relative Policy Optimization (GRPO) has recently shown strong performance in post-training large language models and vision-language models. It raises a question of whether the GRPO also significantly promotes the test-time adaptation (TTA) of vision language models. In this paper, we propose Group Relative Policy Optimization for Test-Time Adaptation (GRPO-TTA), which adapts GRPO to the TTA setting by reformulating class-specific prompt prediction as a group-wise policy optimization problem. Specifically, we construct output groups by sampling top-K class candidates from CLIP similarity distributions, enabling probability-driven optimization without access to ground-truth labels. Moreover, we design reward functions tailored to test-time adaptation, including alignment rewards and dispersion rewards, to guide effective visual encoder tuning. Extensive experiments across diverse benchmarks demonstrate that GRPO-TTA consistently outperforms existing test-time adaptation methods, with notably larger performance gains under natural distribution shifts.

preprint2025arXiv

Distributed Bilevel Optimization with Dual Pruning for Resource-limited Clients

With the development of large-scale models, traditional distributed bilevel optimization algorithms cannot be applied directly in low-resource clients. The key reason lies in the excessive computation involved in optimizing both the lower- and upper-level functions. Thus, we present the first resource-adaptive distributed bilevel optimization framework with a second-order free hypergradient estimator, which allows each client to optimize the submodels adapted to the available resources. Due to the coupled influence of partial outer parameters x and inner parameters y, it's challenging to theoretically analyze the upper bound regarding the globally averaged hypergradient for full model parameters. The error bound of inner parameter also needs to be reformulated since the local partial training. The provable theorems show that both RABO and RAFBO can achieve an asymptotically optimal convergence rate of $O(1/\sqrt{C_x^{\ast}Q})$, which is dominated by the minimum coverage of the outer parameter $C_x^{\ast}$. Extensive experiments on two different tasks demonstrate the effectiveness and computation efficiency of our proposed methods.

preprint2024arXiv

HEAP: Unsupervised Object Discovery and Localization with Contrastive Grouping

Unsupervised object discovery and localization aims to detect or segment objects in an image without any supervision. Recent efforts have demonstrated a notable potential to identify salient foreground objects by utilizing self-supervised transformer features. However, their scopes only build upon patch-level features within an image, neglecting region/image-level and cross-image relationships at a broader scale. Moreover, these methods cannot differentiate various semantics from multiple instances. To address these problems, we introduce Hierarchical mErging framework via contrAstive grouPing (HEAP). Specifically, a novel lightweight head with cross-attention mechanism is designed to adaptively group intra-image patches into semantically coherent regions based on correlation among self-supervised features. Further, to ensure the distinguishability among various regions, we introduce a region-level contrastive clustering loss to pull closer similar regions across images. Also, an image-level contrastive loss is present to push foreground and background representations apart, with which foreground objects and background are accordingly discovered. HEAP facilitates efficient hierarchical image decomposition, which contributes to more accurate object discovery while also enabling differentiation among objects of various classes. Extensive experimental results on semantic segmentation retrieval, unsupervised object discovery, and saliency detection tasks demonstrate that HEAP achieves state-of-the-art performance.

preprint2024arXiv

Silicon Optical Memory: Non-Volatile Optoelectronic Devices via Si-SiO$_2$ Hysteresis Effect

Implementing on-chip non-volatile optical memories has long been an actively pursued goal, promising significant enhancements in the capability and energy efficiency of photonic integrated circuits. Here, a novel optical memory has been demonstrated exclusively using the semiconductor primary material, silicon. By manipulating the optoelectronic effect of this device, we introduce a hysteresis effect at the silicon-silicon oxide interface, which in turn demonstrates multi-level, non-volatile optical data storage with robust retention and endurance. This new silicon optical memory provides a distinctively simple and accessible route to realize optical data storage in standard silicon foundry processes.

preprint2022arXiv

Calibration procedures for the CHASE/HIS science data

The Hα line is an important optical line in solar observations containing the information from the photosphere to the chromosphere. To study the mechanisms of solar eruptions and the plasma dynamics in the lower atmosphere, the Chinese Hα Solar Explorer (CHASE) was launched into a Sun-synchronous orbit on October 14, 2021. The scientific payload of the CHASE satellite is the Hα Imaging Spectrograph (HIS). The CHASE/HIS acquires, for the first time, seeing-free Hα spectroscopic observations with high spectral and temporal resolutions. It consists of two observational modes. The raster scanning mode provides full-Sun or region-of-interest spectra at Hα (6559.7-6565.9 Å) and Fe I (6567.8-6570.6 Å) wavebands. The continuum imaging mode obtains full-Sun photospheric images at around 6689 Å. In this paper, we present detailed calibration procedures for the CHASE/HIS science data, including the dark-field and flat-field correction, slit image curvature correction, wavelength and intensity calibration, and coordinate transformation. The higher-level data products can be directly used for scientific research.

preprint2022arXiv

CM-Net: Concentric Mask based Arbitrary-Shaped Text Detection

Recently fast arbitrary-shaped text detection has become an attractive research topic. However, most existing methods are non-real-time, which may fall short in intelligent systems. Although a few real-time text methods are proposed, the detection accuracy is far behind non-real-time methods. To improve the detection accuracy and speed simultaneously, we propose a novel fast and accurate text detection framework, namely CM-Net, which is constructed based on a new text representation method and a multi-perspective feature (MPF) module. The former can fit arbitrary-shaped text contours by concentric mask (CM) in an efficient and robust way. The latter encourages the network to learn more CM-related discriminative features from multiple perspectives and brings no extra computational cost. Benefiting the advantages of CM and MPF, the proposed CM-Net only needs to predict one CM of the text instance to rebuild the text contour and achieves the best balance between detection accuracy and speed compared with previous works. Moreover, to ensure that multi-perspective features are effectively learned, the multi-factor constraints loss is proposed. Extensive experiments demonstrate the proposed CM is efficient and robust to fit arbitrary-shaped text instances, and also validate the effectiveness of MPF and constraints loss for discriminative text features recognition. Furthermore, experimental results show that the proposed CM-Net is superior to existing state-of-the-art (SOTA) real-time text detection methods in both detection speed and accuracy on MSRA-TD500, CTW1500, Total-Text, and ICDAR2015 datasets.

preprint2022arXiv

Dynamic-quenching of a single-photon avalanche photodetector using an adaptive resistive switch

One of the most common approaches for quenching single-photon avalanche diodes is to use a passive resistor in series with it. A drawback of this approach has been the limited recovery speed of the single-photon avalanche diodes. High resistance is needed to quench the avalanche, leading to slower recharging of the single-photon avalanche diodes depletion capacitor. We address this issue by replacing a fixed quenching resistor with a bias-dependent adaptive resistive switch. Reversible generation of metallic conduction enables switching between low and high resistance states under unipolar bias. As an example, using a Pt/Al2O3/Ag resistor with a commercial silicon single-photon avalanche diodes, we demonstrate avalanche pulse widths as small as ~30 ns, 10x smaller than a passively quenched approach, thus significantly improving the single-photon avalanche diodes frequency response. The experimental results are consistent with a model where the adaptive resistor dynamically changes its resistance during discharging and recharging the single-photon avalanche diodes.

preprint2022arXiv

Investigating and Modeling the Dynamics of Long Ties

Long ties, the social ties that bridge different communities, are widely believed to play crucial roles in spreading novel information in social networks. However, some existing network theories and prediction models indicate that long ties might dissolve quickly or eventually become redundant, thus putting into question the long-term value of long ties. Our empirical analysis of real-world dynamic networks shows that contrary to such reasoning, long ties are more likely to persist than other social ties, and that many of them constantly function as social bridges without being embedded in local networks. Using a novel cost-benefit analysis model combined with machine learning, we show that long ties are highly beneficial, which instinctively motivates people to expend extra effort to maintain them. This partly explains why long ties are more persistent than what has been suggested by many existing theories and models. Overall, our study suggests the need for social interventions that can promote the formation of long ties, such as mixing people with diverse backgrounds.

preprint2022arXiv

Iterative Genetic Improvement: Scaling Stochastic Program Synthesis

Program synthesis aims to {\it automatically} find programs from an underlying programming language that satisfy a given specification. While this has the potential to revolutionize computing, how to search over the vast space of programs efficiently is an unsolved challenge in program synthesis. In cases where large programs are required for a solution, it is generally believed that {\it stochastic} search has advantages over other classes of search techniques. Unfortunately, existing stochastic program synthesizers do not meet this expectation very well, suffering from the scalability issue. Here we propose a new framework for stochastic program synthesis, called iterative genetic improvement to overcome this problem, a technique inspired by the practice of the software development process. The key idea of iterative genetic improvement is to apply genetic improvement to improve a current reference program, and then iteratively replace the reference program by the best program found. Compared to traditional stochastic synthesis approaches, iterative genetic improvement can build up the complexity of programs incrementally in a more robust way. We evaluate the approach on two program synthesis domains: list manipulation and string transformation. Our empirical results indicate that this method has considerable advantages over several representative stochastic program synthesizer techniques, both in terms of scalability and of solution quality.

preprint2022arXiv

Landmarking for Navigational Streaming of Stored High-Dimensional Media

Modern media data such as 360 videos and light field (LF) images are typically captured in much higher dimensions than the observers' visual displays. To efficiently browse high-dimensional media over bandwidth-constrained networks, a navigational streaming model is considered: a client navigates the large media space by dictating a navigation path to a server, who in response transmits the corresponding pre-encoded media data units (MDU) to the client one-by-one in sequence. Intra-coding an MDU (I-MDU) would result in a large bitrate but I-MDU can be randomly accessed, while inter-coding an MDU (P-MDU) using another MDU as a predictor incurs a small coding cost but imposes an order where the predictor must be first transmitted and decoded. From a compression perspective, the technical challenge is: how to achieve coding gain via inter-coding of MDUs, while enabling adequate random access for satisfactory user navigation. To address this problem, we propose landmarks, a selection of key MDUs from the high-dimensional media. Using landmarks as predictors, nearby MDUs in local neighborhoods are intercoded, resulting in a predictive MDU structure with controlled coding cost. It means that any requested MDU can be decoded by at most transmitting a landmark and an inter-coded MDU, enabling navigational random access. To build a landmarked MDU structure, we employ tree-structured vector quantizer (TSVQ) to first optimize landmark locations, then iteratively add/remove inter-coded MDUs as refinements using a fast branch-and-bound technique. Taking interactive LF images and viewport adaptive 360 images as illustrative applications, and I-, P- and previously proposed merge frames to intra- and inter-code MDUs, we show experimentally that landmarked MDU structures can noticeably reduce the expected transmission cost compared with MDU structures without landmarks.

preprint2022arXiv

MAFNet: A Multi-Attention Fusion Network for RGB-T Crowd Counting

RGB-Thermal (RGB-T) crowd counting is a challenging task, which uses thermal images as complementary information to RGB images to deal with the decreased performance of unimodal RGB-based methods in scenes with low-illumination or similar backgrounds. Most existing methods propose well-designed structures for cross-modal fusion in RGB-T crowd counting. However, these methods have difficulty in encoding cross-modal contextual semantic information in RGB-T image pairs. Considering the aforementioned problem, we propose a two-stream RGB-T crowd counting network called Multi-Attention Fusion Network (MAFNet), which aims to fully capture long-range contextual information from the RGB and thermal modalities based on the attention mechanism. Specifically, in the encoder part, a Multi-Attention Fusion (MAF) module is embedded into different stages of the two modality-specific branches for cross-modal fusion at the global level. In addition, a Multi-modal Multi-scale Aggregation (MMA) regression head is introduced to make full use of the multi-scale and contextual information across modalities to generate high-quality crowd density maps. Extensive experiments on two popular datasets show that the proposed MAFNet is effective for RGB-T crowd counting and achieves the state-of-the-art performance.

preprint2022arXiv

Optimizing LLVM Pass Sequences with Shackleton: A Linear Genetic Programming Framework

In this paper we introduce Shackleton as a generalized framework enabling the application of linear genetic programming -- a technique under the umbrella of evolutionary algorithms -- to a variety of use cases. We also explore here a novel application for this class of methods: optimizing sequences of LLVM optimization passes. The algorithm underpinning Shackleton is discussed, with an emphasis on the effects of different features unique to the framework when applied to LLVM pass sequences. Combined with analysis of different hyperparameter settings, we report the results on automatically optimizing pass sequences using Shackleton for two software applications at differing complexity levels. Finally, we reflect on the advantages and limitations of our current implementation and lay out a path for further improvements. These improvements aim to surpass hand-crafted solutions with an automatic discovery method for an optimal pass sequence.

preprint2022arXiv

Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut

Transformers trained with self-supervised learning using self-distillation loss (DINO) have been shown to produce attention maps that highlight salient foreground objects. In this paper, we demonstrate a graph-based approach that uses the self-supervised transformer features to discover an object from an image. Visual tokens are viewed as nodes in a weighted graph with edges representing a connectivity score based on the similarity of tokens. Foreground objects can then be segmented using a normalized graph-cut to group self-similar regions. We solve the graph-cut problem using spectral clustering with generalized eigen-decomposition and show that the second smallest eigenvector provides a cutting solution since its absolute value indicates the likelihood that a token belongs to a foreground object. Despite its simplicity, this approach significantly boosts the performance of unsupervised object discovery: we improve over the recent state of the art LOST by a margin of 6.9%, 8.1%, and 8.1% respectively on the VOC07, VOC12, and COCO20K. The performance can be further improved by adding a second stage class-agnostic detector (CAD). Our proposed method can be easily extended to unsupervised saliency detection and weakly supervised object detection. For unsupervised saliency detection, we improve IoU for 4.9%, 5.2%, 12.9% on ECSSD, DUTS, DUT-OMRON respectively compared to previous state of the art. For weakly supervised object detection, we achieve competitive performance on CUB and ImageNet.

preprint2022arXiv

Targeted Supervised Contrastive Learning for Long-Tailed Recognition

Real-world data often exhibits long tail distributions with heavy class imbalance, where the majority classes can dominate the training process and alter the decision boundaries of the minority classes. Recently, researchers have investigated the potential of supervised contrastive learning for long-tailed recognition, and demonstrated that it provides a strong performance gain. In this paper, we show that while supervised contrastive learning can help improve performance, past baselines suffer from poor uniformity brought in by imbalanced data distribution. This poor uniformity manifests in samples from the minority class having poor separability in the feature space. To address this problem, we propose targeted supervised contrastive learning (TSC), which improves the uniformity of the feature distribution on the hypersphere. TSC first generates a set of targets uniformly distributed on a hypersphere. It then makes the features of different classes converge to these distinct and uniformly distributed targets during training. This forces all classes, including minority classes, to maintain a uniform distribution in the feature space, improves class boundaries, and provides better generalization even in the presence of long-tail data. Experiments on multiple datasets show that TSC achieves state-of-the-art performance on long-tailed recognition tasks.

preprint2022arXiv

The Chinese Hα Solar Explorer (CHASE) mission: An overview

The Chinese Hα Solar Explorer (CHASE), dubbed "Xihe" - Goddess of the Sun, was launched on October 14, 2021 as the first solar space mission of China National Space Administration (CNSA). The CHASE mission is designed to test a newly developed satellite platform and to acquire the spectroscopic observations in the Hα waveband. The Hα Imaging Spectrograph (HIS) is the scientific payload of the CHASE satellite. It consists of two observational modes: raster scanning mode and continuum imaging mode. The raster scanning mode obtains full-Sun or region-of-interest spectral images from 6559.7 to 6565.9 Å and from 6567.8 to 6570.6 Å with 0.024 Å pixel spectral resolution and 1 minute temporal resolution. The continuum imaging mode obtains photospheric images in continuum around 6689 Å with the full width at half maximum of 13.4 Å. The CHASE mission will advance our understanding of the dynamics of solar activity in the photosphere and chromosphere. In this paper, we present an overview of the CHASE mission including the scientific objectives, HIS instrument overview, data calibration flow, and first results of on-orbit observations.

preprint2022arXiv

Uniform estimates of the Cauchy-Riemann equation on product domains

We observe that the continuity assumption on $f$ for the uniform estimates of the canonical solution to $\bar\partial u = f$ on products of $C^2$ bounded planar domains in \cite{DPZ} can be reduced to the boundedness assumption. This completely answers the original question raised by Kerzman in 1971. Moreover, the $L^p$ estimates of $\bar\partial$ is obtained for all $p \in [1, \infty]$.

preprint2022arXiv

Unsupervised Learning for Human Sensing Using Radio Signals

There is a growing literature demonstrating the feasibility of using Radio Frequency (RF) signals to enable key computer vision tasks in the presence of occlusions and poor lighting. It leverages that RF signals traverse walls and occlusions to deliver through-wall pose estimation, action recognition, scene captioning, and human re-identification. However, unlike RGB datasets which can be labeled by human workers, labeling RF signals is a daunting task because such signals are not human interpretable. Yet, it is fairly easy to collect unlabelled RF signals. It would be highly beneficial to use such unlabeled RF data to learn useful representations in an unsupervised manner. Thus, in this paper, we explore the feasibility of adapting RGB-based unsupervised representation learning to RF signals. We show that while contrastive learning has emerged as the main technique for unsupervised representation learning from images and videos, such methods produce poor performance when applied to sensing humans using RF signals. In contrast, predictive unsupervised learning methods learn high-quality representations that can be used for multiple downstream RF-based sensing tasks. Our empirical results show that this approach outperforms state-of-the-art RF-based human sensing on various tasks, opening the possibility of unsupervised representation learning from this novel modality.

preprint2022arXiv

Weakly nonlinear surface waves on the plasma-vacuum interface

We consider the free boundary problem for a plasma--vacuum interface in ideal incompressible magnetohydrodynamics. Unlike the classical statement, where the vacuum magnetic field obeys the div-curl system of pre-Maxwell dynamics, we do not neglect the displacement current in the vacuum region and consider the Maxwell equations for electric and magnetic fields. Our aim is to construct highly oscillating surface wave solutions in weakly nonlinear regime to this plasma--vacuum interface problem. Under a necessary and sufficient stability condition for a piecewise constant background state, we construct approximate solutions at any arbitrarily large order of accuracy to the free boundary problem in three space dimensions when the initial discontinuity displays high frequency oscillations. Moreover, such approximate surface waves have nontrivial residual non-oscillatory components.

preprint2021arXiv

Bio-Inspired Representation Learning for Visual Attention Prediction

Visual Attention Prediction (VAP) is a significant and imperative issue in the field of computer vision. Most of existing VAP methods are based on deep learning. However, they do not fully take advantage of the low-level contrast features while generating the visual attention map. In this paper, a novel VAP method is proposed to generate visual attention map via bio-inspired representation learning. The bio-inspired representation learning combines both low-level contrast and high-level semantic features simultaneously, which are developed by the fact that human eye is sensitive to the patches with high contrast and objects with high semantics. The proposed method is composed of three main steps: 1) feature extraction, 2) bio-inspired representation learning and 3) visual attention map generation. Firstly, the high-level semantic feature is extracted from the refined VGG16, while the low-level contrast feature is extracted by the proposed contrast feature extraction block in a deep network. Secondly, during bio-inspired representation learning, both the extracted low-level contrast and high-level semantic features are combined by the designed densely connected block, which is proposed to concatenate various features scale by scale. Finally, the weighted-fusion layer is exploited to generate the ultimate visual attention map based on the obtained representations after bio-inspired representation learning. Extensive experiments are performed to demonstrate the effectiveness of the proposed method.

preprint2021arXiv

Causal Network Motifs: Identifying Heterogeneous Spillover Effects in A/B Tests

Randomized experiments, or "A/B" tests, remain the gold standard for evaluating the causal effect of a policy intervention or product change. However, experimental settings, such as social networks, where users are interacting and influencing one another, may violate conventional assumptions of no interference for credible causal inference. Existing solutions to the network setting include accounting for the fraction or count of treated neighbors in a user's network, yet most current methods do not account for the local network structure beyond simply counting the number of neighbors. Our study provides an approach that accounts for both the local structure in a user's social network via motifs as well as the treatment assignment conditions of neighbors. We propose a two-part approach. We first introduce and employ "causal network motifs", which are network motifs that characterize the assignment conditions in local ego networks; and then we propose a tree-based algorithm for identifying different network interference conditions and estimating their average potential outcomes. Our approach can account for social network theories, such as structural diversity and echo chambers, and also can help specify network interference conditions that are suitable to each experiment. We test our method on a synthetic network setting and on a real-world experiment on a large-scale network, which highlight how accounting for local structures can better account for different interference patterns in networks.

preprint2021arXiv

MT: Multi-Perspective Feature Learning Network for Scene Text Detection

Text detection, the key technology for understanding scene text, has become an attractive research topic. For detecting various scene texts, researchers propose plenty of detectors with different advantages: detection-based models enjoy fast detection speed, and segmentation-based algorithms are not limited by text shapes. However, for most intelligent systems, the detector needs to detect arbitrary-shaped texts with high speed and accuracy simultaneously. Thus, in this study, we design an efficient pipeline named as MT, which can detect adhesive arbitrary-shaped texts with only a single binary mask in the inference stage. This paper presents the contributions on three aspects: (1) a light-weight detection framework is designed to speed up the inference process while keeping high detection accuracy; (2) a multi-perspective feature module is proposed to learn more discriminative representations to segment the mask accurately; (3) a multi-factor constraints IoU minimization loss is introduced for training the proposed model. The effectiveness of MT is evaluated on four real-world scene text datasets, and it surpasses all the state-of-the-art competitors to a large extent.

preprint2021arXiv

Neuron Linear Transformation: Modeling the Domain Shift for Crowd Counting

Cross-domain crowd counting (CDCC) is a hot topic due to its importance in public safety. The purpose of CDCC is to alleviate the domain shift between the source and target domain. Recently, typical methods attempt to extract domain-invariant features via image translation and adversarial learning. When it comes to specific tasks, we find that the domain shifts are reflected on model parameters' differences. To describe the domain gap directly at the parameter-level, we propose a Neuron Linear Transformation (NLT) method, exploiting domain factor and bias weights to learn the domain shift. Specifically, for a specific neuron of a source model, NLT exploits few labeled target data to learn domain shift parameters. Finally, the target neuron is generated via a linear transformation. Extensive experiments and analysis on six real-world datasets validate that NLT achieves top performance compared with other domain adaptation methods. An ablation study also shows that the NLT is robust and more effective than supervised and fine-tune training. Code is available at: \url{https://github.com/taohan10200/NLT}.

preprint2021arXiv

Ptychography Intensity Interferometry Imaging for Dynamic Distant Object

As a promising lensless imaging method for distance objects, intensity interferometry imaging (III) had been suffering from the unreliable phase retrieval process, hindering the development of III for decades. Recently, the introduction of the ptychographic detection in III overcame this challenge, and a method called ptychographic III (PIII) was proposed. We here experimentally demonstrate that PIII can image a dynamic distance object. A reasonable image for the moving object can be retrieved with only two speckle patterns for each probe, and only 10 to 20 iterations are needed. Meanwhile, PIII exhibits robust to the inaccurate information of the probe. Furthermore, PIII successfully recovers the image through a fog obfuscating the imaging light path, under which a conventional camera relying on lenses fails to provide a recognizable image.

preprint2021arXiv

Semantics-Consistent Representation Learning for Remote Sensing Image-Voice Retrieval

With the development of earth observation technology, massive amounts of remote sensing (RS) images are acquired. To find useful information from these images, cross-modal RS image-voice retrieval provides a new insight. This paper aims to study the task of RS image-voice retrieval so as to search effective information from massive amounts of RS data. Existing methods for RS image-voice retrieval rely primarily on the pairwise relationship to narrow the heterogeneous semantic gap between images and voices. However, apart from the pairwise relationship included in the datasets, the intra-modality and non-paired inter-modality relationships should also be taken into account simultaneously, since the semantic consistency among non-paired representations plays an important role in the RS image-voice retrieval task. Inspired by this, a semantics-consistent representation learning (SCRL) method is proposed for RS image-voice retrieval. The main novelty is that the proposed method takes the pairwise, intra-modality, and non-paired inter-modality relationships into account simultaneously, thereby improving the semantic consistency of the learned representations for the RS image-voice retrieval. The proposed SCRL method consists of two main steps: 1) semantics encoding and 2) semantics-consistent representation learning. Firstly, an image encoding network is adopted to extract high-level image features with a transfer learning strategy, and a voice encoding network with dilated convolution is devised to obtain high-level voice features. Secondly, a consistent representation space is conducted by modeling the three kinds of relationships to narrow the heterogeneous semantic gap and learn semantics-consistent representations across two modalities. Extensive experimental results on three challenging RS image-voice datasets show the effectiveness of the proposed method.

preprint2021arXiv

Unsighted deconvolution ghost imaging

Ghost imaging (GI) is an unconventional imaging method that retrieves the image of an object by correlating a series of known illumination patterns with the total reflected (or transmitted) intensity. We here demonstrate a scheme which can remove the basic requirement of knowing the incident patterns on the object, enabling GI to non-invasively image objects through turbid media. As an experimental proof, we project a set of patterns towards an object hidden inside turbid media that scramble the illumination, making the patterns falling on the object completely unknown. We theoretically prove that the spatial frequency of the object is preserved in the measurement of GI, even though the spatial information of both the object and the illumination is lost. The image is then reconstructed with phase retrieval algorithms.

preprint2021arXiv

Variational rotating solutions to non-isentropic Euler-Poisson equations with prescribed total mass

This paper proves the existence of variational rotating solutions to the compressible non-isentropic Euler-Poisson equations with prescribed total mass. This extends the result of the isentropic case [Auchmuty and Beals, Arch. Ration. Mech. Anal., 1971] to the non-isentropic case. Compared with the previous result of variational rotating solutions in non-isentropic case [Wu, Journal of Differential Equations, 2015], to keep the constraint of a prescribed finite total mass, the author establishes a new variational structure the non-isentropic Euler-Poisson equations.

preprint2020arXiv

Direct estimation of quantum coherence by collective measurements

The recently established resource theory of quantum coherence allows for a quantitative understanding of the superposition principle, with applications reaching from quantum computing to quantum biology. While different quantifiers of coherence have been proposed in the literature, their efficient estimation in today's experiments remains a challenge. Here, we introduce a collective measurement scheme for estimating the amount of coherence in quantum states, which requires entangled measurements on two copies of the state. As we show by numerical simulations, our scheme outperforms other estimation methods based on tomography or adaptive measurements, leading to a higher precision in a large parameter range for estimating established coherence quantifiers of qubit and qutrit states. We show that our method is accessible with today's technology by implementing it experimentally with photons, finding a good agreement between experiment and theory.

preprint2020arXiv

Efficient Batch Black-box Optimization with Deterministic Regret Bounds

In this work, we investigate black-box optimization from the perspective of frequentist kernel methods. We propose a novel batch optimization algorithm, which jointly maximizes the acquisition function and select points from a whole batch in a holistic way. Theoretically, we derive regret bounds for both the noise-free and perturbation settings irrespective of the choice of kernel. Moreover, we analyze the property of the adversarial regret that is required by a robust initialization for Bayesian Optimization (BO). We prove that the adversarial regret bounds decrease with the decrease of covering radius, which provides a criterion for generating a point set to minimize the bound. We then propose fast searching algorithms to generate a point set with a small covering radius for the robust initialization. Experimental results on both synthetic benchmark problems and real-world problems show the effectiveness of the proposed algorithms.

preprint2020arXiv

Focus on Semantic Consistency for Cross-domain Crowd Understanding

For pixel-level crowd understanding, it is time-consuming and laborious in data collection and annotation. Some domain adaptation algorithms try to liberate it by training models with synthetic data, and the results in some recent works have proved the feasibility. However, we found that a mass of estimation errors in the background areas impede the performance of the existing methods. In this paper, we propose a domain adaptation method to eliminate it. According to the semantic consistency, a similar distribution in deep layer's features of the synthetic and real-world crowd area, we first introduce a semantic extractor to effectively distinguish crowd and background in high-level semantic information. Besides, to further enhance the adapted model, we adopt adversarial learning to align features in the semantic space. Experiments on three representative real datasets show that the proposed domain adaptation scheme achieves the state-of-the-art for cross-domain counting problems.

preprint2020arXiv

In-Home Daily-Life Captioning Using Radio Signals

This paper aims to caption daily life --i.e., to create a textual description of people's activities and interactions with objects in their homes. Addressing this problem requires novel methods beyond traditional video captioning, as most people would have privacy concerns about deploying cameras throughout their homes. We introduce RF-Diary, a new model for captioning daily life by analyzing the privacy-preserving radio signal in the home with the home's floormap. RF-Diary can further observe and caption people's life through walls and occlusions and in dark settings. In designing RF-Diary, we exploit the ability of radio signals to capture people's 3D dynamics, and use the floormap to help the model learn people's interactions with objects. We also use a multi-modal feature alignment training scheme that leverages existing video-based captioning datasets to improve the performance of our radio-based captioning model. Extensive experimental results demonstrate that RF-Diary generates accurate captions under visible conditions. It also sustains its good performance in dark or occluded settings, where video-based captioning approaches fail to generate meaningful captions. For more information, please visit our project webpage: http://rf-diary.csail.mit.edu

preprint2020arXiv

Learning Longterm Representations for Person Re-Identification Using Radio Signals

Person Re-Identification (ReID) aims to recognize a person-of-interest across different places and times. Existing ReID methods rely on images or videos collected using RGB cameras. They extract appearance features like clothes, shoes, hair, etc. Such features, however, can change drastically from one day to the next, leading to inability to identify people over extended time periods. In this paper, we introduce RF-ReID, a novel approach that harnesses radio frequency (RF) signals for longterm person ReID. RF signals traverse clothes and reflect off the human body; thus they can be used to extract more persistent human-identifying features like body size and shape. We evaluate the performance of RF-ReID on longitudinal datasets that span days and weeks, where the person may wear different clothes across days. Our experiments demonstrate that RF-ReID outperforms state-of-the-art RGB-based ReID approaches for long term person ReID. Our results also reveal two interesting features: First since RF signals work in the presence of occlusions and poor lighting, RF-ReID allows for person ReID in such scenarios. Second, unlike photos and videos which reveal personal and private information, RF signals are more privacy-preserving, and hence can help extend person ReID to privacy-concerned domains, like healthcare.

preprint2020arXiv

Pixel-wise Crowd Understanding via Synthetic Data

Crowd analysis via computer vision techniques is an important topic in the field of video surveillance, which has wide-spread applications including crowd monitoring, public safety, space design and so on. Pixel-wise crowd understanding is the most fundamental task in crowd analysis because of its finer results for video sequences or still images than other analysis tasks. Unfortunately, pixel-level understanding needs a large amount of labeled training data. Annotating them is an expensive work, which causes that current crowd datasets are small. As a result, most algorithms suffer from over-fitting to varying degrees. In this paper, take crowd counting and segmentation as examples from the pixel-wise crowd understanding, we attempt to remedy these problems from two aspects, namely data and methodology. Firstly, we develop a free data collector and labeler to generate synthetic and labeled crowd scenes in a computer game, Grand Theft Auto V. Then we use it to construct a large-scale, diverse synthetic crowd dataset, which is named as "GCC Dataset". Secondly, we propose two simple methods to improve the performance of crowd understanding via exploiting the synthetic data. To be specific, 1) supervised crowd understanding: pre-train a crowd analysis model on the synthetic data, then fine-tune it using the real data and labels, which makes the model perform better on the real world; 2) crowd understanding via domain adaptation: translate the synthetic data to photo-realistic images, then train the model on translated data and labels. As a result, the trained model works well in real crowd scenes.

preprint2019arXiv

Bifrequency 3D Ghost Imaging with Haar Wavelet Transform

Recently, ghost imaging has been attracting attentions because its mechanism would lead to many applications inaccessible to conventional imaging methods. However, it is challenging for high contrast and high resolution imaging, due to its low signal-to-noise ratio (SNR) and the demand of high sampling rate in detection. To circumvent these challenges, we here propose a ghost imaging scheme that exploits Haar wavelets as illuminating patterns with a bi-frequency light projecting system and frequency-selecting single-pixel detectors. This method provides a theoretically 100% image contrast and high detection SNR, which reduces the requirement of high dynamic range of detectors, enabling high resolution ghost imaging. Moreover, it can highly reduce the sampling rate (far below Nyquist limit) for a sparse object by adaptively abandoning unnecessary patterns during the measurement. These characteristics are experimentally verified with a resolution of 512 times 512 and a sampling rate lower than 5%. A high-resolution (1000 times 1000 times 1000) 3D reconstruction of an object is also achieved from multi-angle images.

preprint2019arXiv

Mott phase in a van der Waals transition-metal halide at single layer limit

Two-dimensional materials offer opportunities for unravelling unprecedented ordered states at single layer limit. Among such ordered states, Mott phase is rarely explored. Here, we report the Mott phase in van der Waals chromium (II) iodide (CrI2) films. High quality CrI2 films with atomically flat surface and macro size are grown on graphitized 6H-SiC(0001) substrate by molecular beam epitaxy. By in situ low temperature scanning tunneling microscopy and spectroscopy (STM/STS), we reveal that the film has a band gap as large as ~3.2 eV, which is nearly thickness independent. Density functional plus dynamic mean field theory calculations suggest that CrI2 films may be a strong Mott insulator with a ferromagnetically ordered ground state. The Mott phase is corroborated by the spectral band splitting, that is consistent with the extended Hubbard model, and gap reduction at charge dopants. Our study provides a platform for studying correlated electron states at single layer limit.

preprint2019arXiv

Strong Majorization Uncertainty Relations: Theory and Experiment

In spite of enormous theoretical and experimental progresses in quantum uncertainty relations, the experimental investigation of most current, and universal formalism of uncertainty relations, namely majorization uncertainty relations (MURs), has not been implemented yet. A significant problem is that previous studies on the classification of MURs only focus on their mathematical expressions, while the physical difference between various forms remains unknown. First, we use a guessing game formalism to study the MURs, which helps us disclosing their physical nature, and distinguishing the essential differences of physical features between diverse forms of MURs. Second, we tighter the bounds of MURs in terms of flatness processes, or equivalently, in terms of majorization lattice. Third, to benchmark our theoretical results, we experimentally verify MURs in the photonic systems.