Source author record

Yan Huang

Yan Huang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

40works

36topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Beyond the All-in-One Agent: Benchmarking Role-Specialized Multi-Agent Collaboration in Enterprise Workflows

Large language model (LLM) agents are increasingly expected to operate in enterprise environments, where work is distributed across specialized roles, permission-controlled systems, and cross-departmental procedures. However, existing enterprise benchmarks largely evaluate single agents with broad tool access, while existing multi-agent benchmarks rarely capture realistic enterprise constraints such as role specialization, access control, stateful business systems, and policy-based approvals. We introduce \textsc{EntCollabBench}, a benchmark for evaluating enterprise multi-agent collaboration. \textsc{EntCollabBench} simulates a permission-isolated organization with 11 role-specialized agents across six departments and contains two evaluation subsets: a Workflow subset, where agents collaboratively modify enterprise system states, and an Approval subset, where agents make policy-grounded decisions. Evaluation is based on execution traces, database state verification, and deterministic policy adjudication rather than natural-language response judging. Experiments with representative LLM agents show that current models still struggle with end-to-end enterprise collaboration, especially in delegation, context transfer, parameter grounding, workflow closure, and decision commitment. \textsc{EntCollabBench} provides a reproducible testbed for measuring and improving agent systems intended for realistic organizational environments.

preprint2026arXiv

Metric-Normalized Posterior Leakage (mPL): Attacker-Aligned Privacy for Joint Consumption

Metric differential privacy (mDP) strengthens local differential privacy (LDP) by scaling noise to semantic distance, but many machine learning (ML) systems are consumed under joint observation, where model-agnostic, per-record guarantees can miss leakage from evidence aggregation. We introduce metric-normalized posterior leakage (mPL), an attacker-aligned, distance-calibrated measure of posterior-odds shift induced by releases, and show that for single or independent releases, uniformly bounding mPL is equivalent to mDP. Under joint observation, however, satisfying mDP may still leave mPL high because learned aggregators compound evidence across correlated items. To make control practical, we formalize probabilistically bounded mPL (PBmPL), which limits how often mPL may exceed a target budget, and we operationalize it via Adaptive mPL (AmPL), a trust-and-verify framework that perturbs, audits with a learned attacker, and adapts parameters (with optional Bayesian remapping) to balance privacy and utility. In a word-embedding case study, neural adversaries violate mPL under joint consumption despite per-record mDP perturbations, whereas AmPL substantially lowers the frequency of such violations with low utility loss, indicating PBmPL as a practical, certifiable protection for joint-consumption settings.

preprint2026arXiv

Omni-DeepSearch: A Benchmark for Audio-Driven Omni-Modal Deep Search

Current omni-modal benchmarks mainly evaluate models under settings where multiple modalities are provided simultaneously, while the ability to start from audio alone and actively search for cross-modal evidence remains underexplored. In this paper, we introduce \textbf{Omni-DeepSearch}, a benchmark for audio-driven omni-modal deep search. Given one or more audio clips and a related question, models must infer useful clues from audio, invoke text, image, and video search tools, and perform multi-hop reasoning to produce a short, objective, and verifiable answer. Omni-DeepSearch contains 640 samples across 15 fine-grained categories, covering four retrieval target modalities and four audio content types. A multi-stage filtering pipeline ensures audio dependence, retrieval necessity, visual modality necessity, and answer uniqueness. Experiments on recent closed-source and open-source omni-modal models show that this task remains highly challenging: the strongest evaluated model, Gemini-3-Pro, achieves only 43.44\% average accuracy. Further analyses illustrate key bottlenecks in audio entity inference, query formulation, tool-use reliability, multi-hop retrieval, and cross-modal verification. These results highlight audio-driven omni-modal deep search as an important and underexplored direction for future multimodal agents.

preprint2024arXiv

Context-Guided Spatio-Temporal Video Grounding

Spatio-temporal video grounding (or STVG) task aims at locating a spatio-temporal tube for a specific instance given a text query. Despite advancements, current methods easily suffer the distractors or heavy object appearance variations in videos due to insufficient object information from the text, leading to degradation. Addressing this, we propose a novel framework, context-guided STVG (CG-STVG), which mines discriminative instance context for object in videos and applies it as a supplementary guidance for target localization. The key of CG-STVG lies in two specially designed modules, including instance context generation (ICG), which focuses on discovering visual context information (in both appearance and motion) of the instance, and instance context refinement (ICR), which aims to improve the instance context from ICG by eliminating irrelevant or even harmful information from the context. During grounding, ICG, together with ICR, are deployed at each decoding stage of a Transformer architecture for instance context learning. Particularly, instance context learned from one decoding stage is fed to the next stage, and leveraged as a guidance containing rich and discriminative object feature to enhance the target-awareness in decoding feature, which conversely benefits generating better new instance context for improving localization finally. Compared to existing methods, CG-STVG enjoys object information in text query and guidance from mined instance visual context for more accurate target localization. In our experiments on three benchmarks, including HCSTVG-v1/-v2 and VidSTG, CG-STVG sets new state-of-the-arts in m_tIoU and m_vIoU on all of them, showing its efficacy. The code will be released at https://github.com/HengLan/CGSTVG.

preprint2024arXiv

Specific Emitter Identification Based on Joint Variational Mode Decomposition

Specific emitter identification (SEI) technology is significant in device administration scenarios, such as self-organized networking and spectrum management, owing to its high security. For nonlinear and non-stationary electromagnetic signals, SEI often employs variational modal decomposition (VMD) to decompose the signal in order to effectively characterize the distinct device fingerprint. However, the trade-off of VMD between the robustness to noise and the ability to preserve signal information has not been investigated in the current literature. Moreover, the existing VMD algorithm does not utilize the stability of the intrinsic distortion of emitters within a certain temporal span, consequently constraining its practical applicability in SEI. In this paper, we propose a joint variational modal decomposition (JVMD) algorithm, which is an improved version of VMD by simultaneously implementing modal decomposition on multi-frame signals. The consistency of multi-frame signals in terms of the central frequencies and the inherent modal functions (IMFs) is exploited, which effectively highlights the distinctive characteristics among emitters and reduces noise. Additionally, the complexity of JVMD is analyzed, which is proven to be more computational-friendly than VMD. Simulations of both modal decomposition and SEI that involve real-world datasets are presented to illustrate that when compared with VMD, the JVMD algorithm improves the accuracy of device classification and the robustness towards noise.

preprint2022arXiv

1st Place Solutions for RxR-Habitat Vision-and-Language Navigation Competition (CVPR 2022)

This report presents the methods of the winning entry of the RxR-Habitat Competition in CVPR 2022. The competition addresses the problem of Vision-and-Language Navigation in Continuous Environments (VLN-CE), which requires an agent to follow step-by-step natural language instructions to reach a target. We present a modular plan-and-control approach for the task. Our model consists of three modules: the candidate waypoints predictor (CWP), the history enhanced planner and the tryout controller. In each decision loop, CWP first predicts a set of candidate waypoints based on depth observations from multiple views. It can reduce the complexity of the action space and facilitate planning. Then, a history-enhanced planner is adopted to select one of the candidate waypoints as the subgoal. The planner additionally encodes historical memory to track the navigation progress, which is especially effective for long-horizon navigation. Finally, we propose a non-parametric heuristic controller named tryout to execute low-level actions to reach the planned subgoal. It is based on the trial-and-error mechanism which can help the agent to avoid obstacles and escape from getting stuck. All three modules work hierarchically until the agent stops. We further take several recent advances of Vision-and-Language Navigation (VLN) to improve the performance such as pretraining based on large-scale synthetic in-domain dataset, environment-level data augmentation and snapshot model ensemble. Our model won the RxR-Habitat Competition 2022, with 48% and 90% relative improvements over existing methods on NDTW and SR metrics respectively.

preprint2022arXiv

A Closer Look at Personalization in Federated Image Classification

Federated Learning (FL) is developed to learn a single global model across the decentralized data, while is susceptible when realizing client-specific personalization in the presence of statistical heterogeneity. However, studies focus on learning a robust global model or personalized classifiers, which yield divergence due to inconsistent objectives. This paper shows that it is possible to achieve flexible personalization after the convergence of the global model by introducing representation learning. In this paper, we first analyze and determine that non-IID data harms representation learning of the global model. Existing FL methods adhere to the scheme of jointly learning representations and classifiers, where the global model is an average of classification-based local models that are consistently subject to heterogeneity from non-IID data. As a solution, we separate representation learning from classification learning in FL and propose RepPer, an independent two-stage personalized FL framework.We first learn the client-side feature representation models that are robust to non-IID data and aggregate them into a global common representation model. After that, we achieve personalization by learning a classifier head for each client, based on the common representation obtained at the former stage. Notably, the proposed two-stage learning scheme of RepPer can be potentially used for lightweight edge computing that involves devices with constrained computation power.Experiments on various datasets (CIFAR-10/100, CINIC-10) and heterogeneous data setup show that RepPer outperforms alternatives in flexibility and personalization on non-IID data.

preprint2022arXiv

Actor and Action Modular Network for Text-based Video Segmentation

Text-based video segmentation aims to segment an actor in video sequences by specifying the actor and its performing action with a textual query. Previous methods fail to explicitly align the video content with the textual query in a fine-grained manner according to the actor and its action, due to the problem of \emph{semantic asymmetry}. The \emph{semantic asymmetry} implies that two modalities contain different amounts of semantic information during the multi-modal fusion process. To alleviate this problem, we propose a novel actor and action modular network that individually localizes the actor and its action in two separate modules. Specifically, we first learn the actor-/action-related content from the video and textual query, and then match them in a symmetrical manner to localize the target tube. The target tube contains the desired actor and action which is then fed into a fully convolutional network to predict segmentation masks of the actor. Our method also establishes the association of objects cross multiple frames with the proposed temporal proposal aggregation mechanism. This enables our method to segment the video effectively and keep the temporal consistency of predictions. The whole model is allowed for joint learning of the actor-action matching and segmentation, as well as achieves the state-of-the-art performance for both single-frame segmentation and full video segmentation on A2D Sentences and J-HMDB Sentences datasets.

preprint2022arXiv

Cyclic Differentiable Architecture Search

Differentiable ARchiTecture Search, i.e., DARTS, has drawn great attention in neural architecture search. It tries to find the optimal architecture in a shallow search network and then measures its performance in a deep evaluation network. The independent optimization of the search and evaluation networks, however, leaves room for potential improvement by allowing interaction between the two networks. To address the problematic optimization issue, we propose new joint optimization objectives and a novel Cyclic Differentiable ARchiTecture Search framework, dubbed CDARTS. Considering the structure difference, CDARTS builds a cyclic feedback mechanism between the search and evaluation networks with introspective distillation. First, the search network generates an initial architecture for evaluation, and the weights of the evaluation network are optimized. Second, the architecture weights in the search network are further optimized by the label supervision in classification, as well as the regularization from the evaluation network through feature distillation. Repeating the above cycle results in joint optimization of the search and evaluation networks and thus enables the evolution of the architecture to fit the final evaluation network. The experiments and analysis on CIFAR, ImageNet and NAS-Bench-201 demonstrate the effectiveness of the proposed approach over the state-of-the-art ones. Specifically, in the DARTS search space, we achieve 97.52% top-1 accuracy on CIFAR10 and 76.3% top-1 accuracy on ImageNet. In the chain-structured search space, we achieve 78.2% top-1 accuracy on ImageNet, which is 1.1% higher than EfficientNet-B0. Our code and models are publicly available at https://github.com/microsoft/Cream.

preprint2022arXiv

Generalizable Person Re-Identification via Self-Supervised Batch Norm Test-Time Adaption

In this paper, we investigate the generalization problem of person re-identification (re-id), whose major challenge is the distribution shift on an unseen domain. As an important tool of regularizing the distribution, batch normalization (BN) has been widely used in existing methods. However, they neglect that BN is severely biased to the training domain and inevitably suffers the performance drop if directly generalized without being updated. To tackle this issue, we propose Batch Norm Test-time Adaption (BNTA), a novel re-id framework that applies the self-supervised strategy to update BN parameters adaptively. Specifically, BNTA quickly explores the domain-aware information within unlabeled target data before inference, and accordingly modulates the feature distribution normalized by BN to adapt to the target domain. This is accomplished by two designed self-supervised auxiliary tasks, namely part positioning and part nearest neighbor matching, which help the model mine the domain-aware information with respect to the structure and identity of body parts, respectively. To demonstrate the effectiveness of our method, we conduct extensive experiments on three re-id datasets and confirm the superior performance to the state-of-the-art methods.

preprint2022arXiv

Scaling Bockchain with Adaptivity

This paper presents Balloon, a scalable blockchain consensus protocol which could dynamically adapt its performance to the overall computation power change. Balloon is based on a parallel chain architecture combined with a greedy heaviest sub-chain selection strategy. It adopts an inovative block sampling approach to assess the change of block generation rate in the network. By introducing view change mechanism, Balllon is able to dynamically adjust the number of parallel sub-chains. Balloon redefines the concept of block subtree weight with view change in consideration, so that a total order of blocks could be obtained safely. To deal with rapidly increasing block generation rate in the blockchain network, participants of previous Nakamoto-style protocols are required to continuously increase their mining difficulty so as to maintain an expected security gurantee. Balloon, however, could accomadate a fixed difficulty setup and assign superfluous block processing capability to new sub-chains, which makes it more open and also economical.

preprint2022arXiv

Study of background from accidental coincidence signals in the PandaX-II experiment

The PandaX-II experiment employed a 580kg liquid xenon detector to search for the interactions between dark matter particles and the target xenon atoms. The accidental coincidences of isolated signals result in a dangerous background which mimic the signature of the dark matter. We performed a detailed study on the accidental coincidence background in PandaX-II, including the possible origin of the isolated signals, the background level and corresponding background suppression method. With a boosted-decision-tree algorithm, the accidental coincidence background is reduced by 70% in the dark matter signal region, thus the sensitivity of dark matter search at PandaX-II is improved.

preprint2022arXiv

Tackling Data Heterogeneity: A New Unified Framework for Decentralized SGD with Sample-induced Topology

We develop a general framework unifying several gradient-based stochastic optimization methods for empirical risk minimization problems both in centralized and distributed scenarios. The framework hinges on the introduction of an augmented graph consisting of nodes modeling the samples and edges modeling both the inter-device communication and intra-device stochastic gradient computation. By designing properly the topology of the augmented graph, we are able to recover as special cases the renowned Local-SGD and DSGD algorithms, and provide a unified perspective for variance-reduction (VR) and gradient-tracking (GT) methods such as SAGA, Local-SVRG and GT-SAGA. We also provide a unified convergence analysis for smooth and (strongly) convex objectives relying on a proper structured Lyapunov function, and the obtained rate can recover the best known results for many existing algorithms. The rate results further reveal that VR and GT methods can effectively eliminate data heterogeneity within and across devices, respectively, enabling the exact convergence of the algorithm to the optimal solution. Numerical experiments confirm the findings in this paper.

preprint2022arXiv

Time dependent numerical model for the very high energy emissions of distant gamma-ray busrt GRB 201216C

Recently, the MAGIC Collaboration reported a $\sim 5σ$ statistical significance of the very-high-energy (VHE) emission from a distant GRB, GRB 201216C. Such distant GRB may be effectively absorbed by the extragalactic background light (EBL). The origin of the VHE emission from such distant objects is still unknown. Here, we propose a numerical model for studying the afterglow emission of this distant GRB. The model solves the continuity equation governing the temporal evolution of electron distribution, and the broad-band observed data can be explained by the synchrotron plus synchrotron self-Compton (SSC) radiation of the forward shock. The predicted observed 0.1 TeV flux can reach $\sim 10^{-9} -10^{-10}\rm erg ~ cm^{-2} ~ s^{-1}$ at $t \sim 10^3 -10^4 \rm s$, even with strong EBL absorption, such strong Sub-TeV emissions still can be observed by MAGIC telescope. Using this numerical model, the shock parameters in the modeling are similar with two other Sub-TeV GRBs (i.e., GRB 190114C and GRB 180720B), implying that the Sub-TeV GRBs have some commonalities: they have energetic burst energy, low circum-burst medium density and low magnetic equipartition factor. We regard GRB 201216C as a typical GRB, and estimate the maximum redshift of GRB that can be detected by MAGIC telescope, i.e., $z \sim 1.6$. We also find that the VHE photon energy of such distant GRB can only reach $\sim 0.1 ~\rm TeV$. Improving the low energy sensitivity of the VHE telescope is very important to detect the Sub-TeV emissions of these distant GRBs.

preprint2022arXiv

Uncovering the Source of Machine Bias

We develop a structural econometric model to capture the decision dynamics of human evaluators on an online micro-lending platform, and estimate the model parameters using a real-world dataset. We find two types of biases in gender, preference-based bias and belief-based bias, are present in human evaluators' decisions. Both types of biases are in favor of female applicants. Through counterfactual simulations, we quantify the effect of gender bias on loan granting outcomes and the welfare of the company and the borrowers. Our results imply that both the existence of the preference-based bias and that of the belief-based bias reduce the company's profits. When the preference-based bias is removed, the company earns more profits. When the belief-based bias is removed, the company's profits also increase. Both increases result from raising the approval probability for borrowers, especially male borrowers, who eventually pay back loans. For borrowers, the elimination of either bias decreases the gender gap of the true positive rates in the credit risk evaluation. We also train machine learning algorithms on both the real-world data and the data from the counterfactual simulations. We compare the decisions made by those algorithms to see how evaluators' biases are inherited by the algorithms and reflected in machine-based decisions. We find that machine learning algorithms can mitigate both the preference-based bias and the belief-based bias.

preprint2022arXiv

Univoque bases of real numbers: simply normal bases, irregular bases and multiple rationals

Given a positive integer $M$ and a real number $x\in(0,1]$, we call $q\in(1,M+1]$ a univoque simply normal base of $x$ if there exists a unique simply normal sequence $(d_i)\in\{0,1,\ldots,M\}^\mathbb N$ such that $x=\sum_{i=1}^\infty d_i q^{-i}$. Similarly, a base $q\in(1,M+1]$ is called a univoque irregular base of $x$ if there exists a unique sequence $(d_i)\in\{0,1,\ldots, M\}^\mathbb N$ such that $x=\sum_{i=1}^\infty d_i q^{-i}$ and the sequence $(d_i)$ has no digit frequency. Let $\mathcal U_{SN}(x)$ and $\mathcal U_{I_r}(x)$ be the sets of univoque simply normal bases and univoque irregular bases of $x$, respectively. In this paper we show that for any $x\in(0,1]$ both $\mathcal U_{SN}(x)$ and $\mathcal U_{I_r}(x)$ have full Hausdorff dimension. Furthermore, given finitely many rationals $x_1, x_2, \ldots, x_n\in(0,1]$ so that each $x_i$ has a finite expansion in base $M+1$, we show that there exists a full Hausdorff dimensional set of $q\in(1,M+1]$ such that each $x_i$ has a unique expansion in base $q$.

preprint2021arXiv

FWB-Net:Front White Balance Network for Color Shift Correction in Single Image Dehazing via Atmospheric Light Estimation

In recent years, single image dehazing deep models based on Atmospheric Scattering Model (ASM) have achieved remarkable results. But the dehazing outputs of those models suffer from color shift. Analyzing the ASM model shows that the atmospheric light factor (ALF) is set as a scalar which indicates ALF is constant for whole image. However, for images taken in real-world, the illumination is not uniformly distributed over whole image which brings model mismatch and possibly results in color shift of the deep models using ASM. Bearing this in mind, in this study, first, a new non-homogeneous atmospheric scattering model (NH-ASM) is proposed for improving image modeling of hazy images taken under complex illumination conditions. Second, a new U-Net based front white balance module (FWB-Module) is dedicatedly designed to correct color shift before generating dehazing result via atmospheric light estimation. Third, a new FWB loss is innovatively developed for training FWB-Module, which imposes penalty on color shift. In the end, based on NH-ASM and front white balance technology, an end-to-end CNN-based color-shift-restraining dehazing network is developed, termed as FWB-Net. Experimental results demonstrate the effectiveness and superiority of our proposed FWB-Net for dehazing on both synthetic and real-world images.

preprint2021arXiv

Internal Calibration of the PandaX-II Detector with Radon Gaseous Sources

We have developed a low-energy electron recoil (ER) calibration method with $^{220}$Rn for the PandaX-II detector. $^{220}$Rn, emanated from natural thorium compounds, was fed into the detector through the xenon purification system. From 2017 to 2019, we performed three dedicated calibration campaigns with different radon sources. We studied the detector response to $α$, $β$, and $γ$ particles with focus on low energy ER events. During the runs in 2017 and 2018, the amount of radioactivity of $^{222}$Rn were on the order of 1\% of that of $^{220}$Rn and thorium particulate contamination was negligible, especially in 2018. We also measured the background contribution from $^{214}$Pb for the first time in PandaX-II with the help from a $^{222}$Rn injection. Calibration strategy with $^{220}$Rn and $^{222}$Rn will be implemented in the upcoming PandaX-4T experiment and can be useful for other xenon-based detectors as well.

preprint2021arXiv

Results of Dark Matter Search using the Full PandaX-II Exposure

We report the dark matter search results obtained using the full 132 ton$\cdot$day exposure of the PandaX-II experiment, including all data from March 2016 to August 2018. No significant excess of events is identified above the expected background. Upper limits are set on the spin-independent dark matter-nucleon interactions. The lowest 90% confidence level exclusion on the spin-independent cross section is $2.2\times 10^{-46}$ cm$^2$ at a WIMP mass of 30 GeV/$c^2$.

preprint2020arXiv

Algorithmic Transparency with Strategic Users

Should firms that apply machine learning algorithms in their decision-making make their algorithms transparent to the users they affect? Despite growing calls for algorithmic transparency, most firms have kept their algorithms opaque, citing potential gaming by users that may negatively affect the algorithm's predictive power. We develop an analytical model to compare firm and user surplus with and without algorithmic transparency in the presence of strategic users and present novel insights. We identify a broad set of conditions under which making the algorithm transparent benefits the firm. We show that, in some cases, even the predictive power of machine learning algorithms may increase if the firm makes them transparent. By contrast, users may not always be better off under algorithmic transparency. The results hold even when the predictive power of the opaque algorithm comes largely from correlational features and the cost for users to improve on them is close to zero. Overall, our results show that firms should not view manipulation by users as bad. Rather, they should use algorithmic transparency as a lever to motivate users to invest in more desirable features.

preprint2020arXiv

Crowd, Lending, Machine, and Bias

Big data and machine learning (ML) algorithms are key drivers of many fintech innovations. While it may be obvious that replacing humans with machine would increase efficiency, it is not clear whether and where machines can make better decisions than humans. We answer this question in the context of crowd lending, where decisions are traditionally made by a crowd of investors. Using data from Prosper.com, we show that a reasonably sophisticated ML algorithm predicts listing default probability more accurately than crowd investors. The dominance of the machine over the crowd is more pronounced for highly risky listings. We then use the machine to make investment decisions, and find that the machine benefits not only the lenders but also the borrowers. When machine prediction is used to select loans, it leads to a higher rate of return for investors and more funding opportunities for borrowers with few alternative funding options. We also find suggestive evidence that the machine is biased in gender and race even when it does not use gender and race information as input. We propose a general and effective "debasing" method that can be applied to any prediction focused ML applications, and demonstrate its use in our context. We show that the debiased ML algorithm, which suffers from lower prediction accuracy, still leads to better investment decisions compared with the crowd. These results indicate that ML can help crowd lending platforms better fulfill the promise of providing access to financial resources to otherwise underserved individuals and ensure fairness in the allocation of these resources.

preprint2020arXiv

Kilonova Emission From Black Hole-Neutron Star Mergers. I. Viewing-Angle-Dependent Lightcurves

In this paper, we present a numerical method to study the predicted lightcurves as a function of viewing angle. We extrapolate the fitting formulae for the mass and velocity of tidal dynamical ejecta across a wide range of mass ratio validated with 66 simulations and use them in the calculations of the kilonova lightcurves. The calculated peak luminosity of a BH-NS merger kilonova is typically about a few times $10^{41}\ {\rm erg\ s^{-1}}$, which is always $\lesssim4.5\times10^{41}\ {\rm erg\ s^{-1}}$. This corresponds to the AB absolute magnitudes fainter than $\sim -15\ {\rm mag}$ in optical and $\sim -16\ {\rm mag}$ in infrared. Since the projected photosphere area of the dynamical ejecta is much larger than that of the disk wind outflows, the dynamical ejecta usually contribute to the majority of the kilonova emission from BH-NS mergers. The fitted blackbody temperature and the shape of the observed multi-band lightcurves are insensitive to the line of sight. The peak time of the observed multi-band lightcurves, affected by the light propagation effect, is related to the relative motion direction between the dynamical ejecta and the observer. The observed luminosity varies with the projected photosphere area determined by the viewing angles. However, the predicted peak luminosity only varies by a factor of $\sim (2 - 3)$ (or by $\sim1\ {\rm mag}$) for different viewing angles. When the short-duration gamma-ray burst afterglow is taken into account, for an on-axis geometry, the kilonova emission is usually outshone by the afterglow emission and can be only observed in the redder bands, especially in the $K$-band at late times. Compared with GW170817/AT2017gfo, the BH-NS merger kilonovae are optically dim but possibly infrared bright. At the same epoch after the merger, the blackbody fitting temperature of the BH-NS merger kilonovae is lower than that of GW170817/AT2017gfo.

preprint2020arXiv

L-Vector: Neural Label Embedding for Domain Adaptation

We propose a novel neural label embedding (NLE) scheme for the domain adaptation of a deep neural network (DNN) acoustic model with unpaired data samples from source and target domains. With NLE method, we distill the knowledge from a powerful source-domain DNN into a dictionary of label embeddings, or l-vectors, one for each senone class. Each l-vector is a representation of the senone-specific output distributions of the source-domain DNN and is learned to minimize the average L2, Kullback-Leibler (KL) or symmetric KL distance to the output vectors with the same label through simple averaging or standard back-propagation. During adaptation, the l-vectors serve as the soft targets to train the target-domain model with cross-entropy loss. Without parallel data constraint as in the teacher-student learning, NLE is specially suited for the situation where the paired target-domain data cannot be simulated from the source-domain data. We adapt a 6400 hours multi-conditional US English acoustic model to each of the 9 accented English (80 to 830 hours) and kids' speech (80 hours). NLE achieves up to 14.1% relative word error rate reduction over direct re-training with one-hot labels.

preprint2020arXiv

Large-scale Real-time Personalized Similar Product Recommendations

Similar product recommendation is one of the most common scenes in e-commerce. Many recommendation algorithms such as item-to-item Collaborative Filtering are working on measuring item similarities. In this paper, we introduce our real-time personalized algorithm to model product similarity and real-time user interests. We also introduce several other baseline algorithms including an image-similarity-based method, item-to-item collaborative filtering, and item2vec, and compare them on our large-scale real-world e-commerce dataset. The algorithms which achieve good offline results are also tested on the online e-commerce website. Our personalized method achieves a 10% improvement on the add-cart number in the real-world e-commerce scenario.

preprint2020arXiv

Learning Goal-oriented Dialogue Policy with Opposite Agent Awareness

Most existing approaches for goal-oriented dialogue policy learning used reinforcement learning, which focuses on the target agent policy and simply treat the opposite agent policy as part of the environment. While in real-world scenarios, the behavior of an opposite agent often exhibits certain patterns or underlies hidden policies, which can be inferred and utilized by the target agent to facilitate its own decision making. This strategy is common in human mental simulation by first imaging a specific action and the probable results before really acting it. We therefore propose an opposite behavior aware framework for policy learning in goal-oriented dialogues. We estimate the opposite agent's policy from its behavior and use this estimation to improve the target agent by regarding it as part of the target policy. We evaluate our model on both cooperative and competitive dialogue tasks, showing superior performance over state-of-the-art baselines.

preprint2020arXiv

Recurrent Deconvolutional Generative Adversarial Networks with Application to Text Guided Video Generation

This paper proposes a novel model for video generation and especially makes the attempt to deal with the problem of video generation from text descriptions, i.e., synthesizing realistic videos conditioned on given texts. Existing video generation methods cannot be easily adapted to handle this task well, due to the frame discontinuity issue and their text-free generation schemes. To address these problems, we propose a recurrent deconvolutional generative adversarial network (RD-GAN), which includes a recurrent deconvolutional network (RDN) as the generator and a 3D convolutional neural network (3D-CNN) as the discriminator. The RDN is a deconvolutional version of conventional recurrent neural network, which can well model the long-range temporal dependency of generated video frames and make good use of conditional information. The proposed model can be jointly trained by pushing the RDN to generate realistic videos so that the 3D-CNN cannot distinguish them from real ones. We apply the proposed RD-GAN to a series of tasks including conventional video generation, conditional video generation, video prediction and video classification, and demonstrate its effectiveness by achieving well performance.

preprint2020arXiv

Secondary-electron radiation accompanying hadronic GeV-TeV gamma-rays from supernova remnants

The synchrotron radiation from secondary electrons and positrons (SEPs) generated by hadronic interactions in the shock of supernova remnant (SNR) could be a distinct evidence of cosmic ray (CR) production in SNR shocks. Here we provide a method where the observed gamma-ray flux from SNRs, created by pion decays, is directly used to derive the SEP distribution and hence the synchrotron spectrum. We apply the method to three gamma-ray bright SNRs. In the young SNR RX J1713.7-3946, if the observed GeV-TeV gamma-rays are of hadronic origin and the magnetic field in the SNR shock is $B\gtrsim 0.5$mG, the SEPs may produce a spectral bump at $10^{-5}-10^{-2}$eV, exceeding the predicted synchrotron component of the leptonic model, and a soft spectral tail at $\gtrsim 100$keV, distinct from the hard spectral slope in the leptonic model. In the middle-aged SNRs IC443 and W44, if the observed gamma-rays are of hadronic origin, the SEP synchrotron radiation with $B\sim 400 - 500 μ$G can well account for the observed radio flux and spectral slopes, supporting the hadronic origin of gamma-rays. Future microwave to far-infrared and hard X-ray (>100keV) observations are encouraged to constraining the SEP radiation and the gamma-ray origin in SNRs.

preprint2019arXiv

Overview to the Hard X-ray Modulation Telescope (Insight-HXMT) Satellite

As China's first X-ray astronomical satellite, the Hard X-ray Modulation Telescope (HXMT), which was dubbed as Insight-HXMT after the launch on June 15, 2017, is a wide-band (1-250 keV) slat-collimator-based X-ray astronomy satellite with the capability of all-sky monitoring in 0.2-3 MeV. It was designed to perform pointing, scanning and gamma-ray burst (GRB) observations and, based on the Direct Demodulation Method (DDM), the image of the scanned sky region can be reconstructed. Here we give an overview of the mission and its progresses, including payload, core sciences, ground calibration/facility, ground segment, data archive, software, in-orbit performance, calibration, background model, observations and some preliminary results.

preprint2019arXiv

Searching for Neutrino-less Double Beta Decay of $^{136}$Xe with PandaX-II Liquid Xenon Detector

We report the Neutrino-less Double Beta Decay (NLDBD) search results from PandaX-II dual-phase liquid xenon time projection chamber. The total live time used in this analysis is 403.1 days from June 2016 to August 2018. With NLDBD-optimized event selection criteria, we obtain a fiducial mass of 219 kg of natural xenon. The accumulated xenon exposure is 242 kg$\cdot$yr, or equivalently 22.2 kg$\cdot$yr of $^{136}$Xe exposure. At the region around $^{136}$Xe decay Q-value of 2458 keV, the energy resolution of PandaX-II is 4.2%. We find no evidence of NLDBD in PandaX-II and establish a lower limit for decay half-life of 2.4 $ \times 10^{23} $ yr at the 90% confidence level, which corresponds to an effective Majorana neutrino mass $m_{ββ} < (1.3 - 3.5)$ eV. This is the first NLDBD result reported from a dual-phase xenon experiment.

preprint2016arXiv

A combined model for pseudorapidity distributions in Cu-Cu collisions at BNL-RHIC energies

The charged particles produced in nucleus-nucleus collisions come from leading particles and those frozen out from the hot and dense matter created in collisions. The leading particles are conventionally supposed having Gaussian rapidity distributions normalized to the number of participants. The hot and dense matter is assumed to expand according to the unified hydrodynamics, a hydro model which unifies the features of Landau and Hwa-Bjorken model, and freeze out into charged particles from a space-like hypersurface with a proper time of Tau_FO . The rapidity distribution of this part of charged particles can be derived out analytically. The combined contribution from both leading particles and unified hydrodynamics is then compared against the experimental data performed by BNL-RHIC-PHOBOS Collaboration in different centrality Cu-Cu collisions at sqrt(s_NN)=200 and 62.4 GeV, respectively. The model predictions are in well consistent with experimental measurements.

preprint2016arXiv

Anchoring and Agreement in Syntactic Annotations

We present a study on two key characteristics of human syntactic annotations: anchoring and agreement. Anchoring is a well known cognitive bias in human decision making, where judgments are drawn towards pre-existing values. We study the influence of anchoring on a standard approach to creation of syntactic resources where syntactic annotations are obtained via human editing of tagger and parser output. Our experiments demonstrate a clear anchoring effect and reveal unwanted consequences, including overestimation of parsing performance and lower quality of annotations in comparison with human-based annotations. Using sentences from the Penn Treebank WSJ, we also report systematically obtained inter-annotator agreement estimates for English dependency parsing. Our agreement results control for parser bias, and are consequential in that they are on par with state of the art parsing performance for English newswire. We discuss the impact of our findings on strategies for future annotation efforts and parser evaluations.

preprint2016arXiv

Multimodal Memory Modelling for Video Captioning

Video captioning which automatically translates video clips into natural language sentences is a very important task in computer vision. By virtue of recent deep learning technologies, e.g., convolutional neural networks (CNNs) and recurrent neural networks (RNNs), video captioning has made great progress. However, learning an effective mapping from visual sequence space to language space is still a challenging problem. In this paper, we propose a Multimodal Memory Model (M3) to describe videos, which builds a visual and textual shared memory to model the long-term visual-textual dependency and further guide global visual attention on described targets. Specifically, the proposed M3 attaches an external memory to store and retrieve both visual and textual contents by interacting with video and sentence with multiple read and write operations. First, text representation in the Long Short-Term Memory (LSTM) based text decoder is written into the memory, and the memory contents will be read out to guide an attention to select related visual targets. Then, the selected visual information is written into the memory, which will be further read out to the text decoder. To evaluate the proposed model, we perform experiments on two publicly benchmark datasets: MSVD and MSR-VTT. The experimental results demonstrate that our method outperforms the state-of-theart methods in terms of BLEU and METEOR.

preprint2016arXiv

The radio environment of the 21 Centimeter Array: RFI detection and mitigation

Detection and mitigation of radio frequency interference (RFI) is the first and also the key step for data processing in radio observations, especially for ongoing low frequency radio experiments towards the detection of the cosmic dawn and epoch of reionization (EoR). In this paper we demonstrate the technique and efficiency of RFI identification and mitigation for the 21 Centimeter Array (21CMA), a radio interferometer dedicated to the statistical measurement of EoR. For terrestrial, man-made RFI, we concentrate mainly on a statistical approach by identifying and then excising non-Gaussian signatures, in the sense that the extremely weak cosmic signal is actually buried under thermal and therefore Gaussian noise. We also introduce the so-called visibility correlation coefficient instead of conventional visibility, which allows a further suppression of rapidly time-varying RFI. Finally, we briefly discuss removals of the sky RFI, the leakage of sidelobes from off-field strong radio sources with time-invariant power and a featureless spectrum. It turns out that state of the art technique should allow us to detect and mitigate RFI to a satisfactory level in present low frequency interferometer observations such as those acquired with the 21CMA, and the accuracy and efficiency can be greatly improved with the employment of low-cost, high-speed computing facilities for data acquisition and processing.

preprint2015arXiv

A combined model for the pseudorapidity distributions in p-p collisions at center-of-mass energies from 23.6 to 7000 GeV

In p-p collisions, the produced charge particles consist of two leading particles and those frozen out from the hot and dense matter created in collisions. The two leading particles are respectively in the projectile and target fragmentation region, which, in this paper, are conventionally supposed to have Gaussian rapidity distributions. The hot and dense matter is assumed to expand according to the unified hydrodynamics, a hydro model which unifies the features of Landau and Hwa-Bjorken model, and freeze out into charged particles from a space-like hypersurface with a fixed proper time of Tau_FO. The rapidity distribution of this part of charged particles can be derived out analytically. The combined contribution from both leading particles and unified hydrodynamics is then compared against the experimental data performed in a wide now available center-of-mass energy region from 23.6 to 7000 GeV. The model predictions are in well consistent with experimental measurements.

preprint2015arXiv

Community Detection from Location-Tagged Networks

Many real world systems or web services can be represented as a network such as social networks and transportation networks. In the past decade, many algorithms have been developed to detect the communities in a network using connections between nodes. However in many real world networks, the locations of nodes have great influence on the community structure. For example, in a social network, more connections are established between geographically proximate users. The impact of locations on community has not been fully investigated by the research literature. In this paper, we propose a community detection method which takes locations of nodes into consideration. The goal is to detect communities with both geographic proximity and network closeness. We analyze the distribution of the distances between connected and unconnected nodes to measure the influence of location on the network structure on two real location-tagged social networks. We propose a method to determine if a location-based community detection method is suitable for a given network. We propose a new community detection algorithm that pushes the location information into the community detection. We test our proposed method on both synthetic data and real world network datasets. The results show that the communities detected by our method distribute in a smaller area compared with the traditional methods and have the similar or higher tightness on network connections.

preprint2015arXiv

Infrared colour properties of nearby radio-luminous galaxies

By combining the data of the Two Micron All Sky Survey, the \textit{Wide Field Infrared Survey Explorer} and the \textit{AKARI} satellite, we study the infrared colour properties of a sample of 2712 nearby radio-luminous galaxies (RLGs). These RLGs are divided into radio-loud (RL) active galactic nuclei (AGNs), mainly occurring at redshifts of $0.05<z<0.3$ and star-forming-dominated RLGs (SFGs), mainly occurring at redshifts of $0.01<z<0.15$. RL AGNs and SFGs are separately distributed in the ([3.4]$-$[4.6])$-$([4.6]$-$[12]) two-colour diagram, in which the RL AGNs display a double-core distribution, and the SFGs display a single-core distribution. SFGs have a redder [4.6]$-$[12] colour than RL AGNs due to the significant contribution from the dust component of SFGs. We find simple criteria of MIR colour separation between RL AGNs and SFGs such that: 95$\%$ of RL AGNs have [4.6]$-$[12] $<$ 3.0 and 94$\%$ of SFGs have [4.6]$-$[12] $>$ 3.0. We also analyse the MIR colours of RL AGNs divided into low- and high-excitation radio galaxies (LERGs and HERGs, respectively). The ([3.4]$-$[4.6])$-$([4.6]$-$[12]) diagram clearly shows separate distributions of LERGs and HERGs and a region of overlap, which suggests that LERGs and HERGs have different MIR properties. LERGs are responsible for the double-core distribution of RL AGNs on the ([3.4]$-$[4.6])$-$([4.6]$-$[12]) diagram. In addition, we also suggest 90$-$140$μ$m band spectral index $α(90,140)<-1.4$ as a criterion of selecting nearby active galaxies with non-thermal emissions at far-infrared wavelengths.

preprint2013arXiv

Large Scale Real-time Ridesharing with Service Guarantee on Road Networks

The mean occupancy rates of personal vehicle trips in the United States is only 1.6 persons per vehicle mile. Urban traffic gridlock is a familiar scene. Ridesharing has the potential to solve many environmental, congestion, and energy problems. In this paper, we introduce the problem of large scale real-time ridesharing with service guarantee on road networks. Servers and trip requests are dynamically matched while waiting time and service time constraints of trips are satisfied. We first propose two basic algorithms: a branch-and-bound algorithm and an integer programing algorithm. However, these algorithm structures do not adapt well to the dynamic nature of the ridesharing problem. Thus, we then propose a kinetic tree algorithm capable of better scheduling dynamic requests and adjusting routes on-the-fly. We perform experiments on a large real taxi dataset from Shanghai. The results show that the kinetic tree algorithm is faster than other algorithms in response time.

preprint2011arXiv

Distance Preserving Graph Simplification

Large graphs are difficult to represent, visualize, and understand. In this paper, we introduce "gate graph" - a new approach to perform graph simplification. A gate graph provides a simplified topological view of the original graph. Specifically, we construct a gate graph from a large graph so that for any "non-local" vertex pair (distance higher than some threshold) in the original graph, their shortest-path distance can be recovered by consecutive "local" walks through the gate vertices in the gate graph. We perform a theoretical investigation on the gate-vertex set discovery problem. We characterize its computational complexity and reveal the upper bound of minimum gate-vertex set using VC-dimension theory. We propose an efficient mining algorithm to discover a gate-vertex set with guaranteed logarithmic bound. We further present a fast technique for pruning redundant edges in a gate graph. The detailed experimental results using both real and synthetic graphs demonstrate the effectiveness and efficiency of our approach.

preprint2010arXiv

A heuristic view about the evolution and species

The controversy concerning both the definition of the species and methods for inferring the boundaries and numbers of species has occupied biologists for centuries, and the debate itself has become known as the species problem. The modern theory of evolution depends on a fundamental redefinition of "species". Here we show that based on the model of evolutionary continuum combined with fuzzy theory that the evolution system is a uncountable infinite set and species is a fuzzy set, the contradiction between discrete biological entities and continuous evolution system is solved, i.e. when a species evolved, the individuals scattered in space but continuously distributed on time sequences. Moreover, the calculation methods for species are suggested both in theory and practice.

preprint2006arXiv

General entanglement-assisted transformation for bipartite pure quantum states

We introduce the general catalysts for pure entanglement transformations under local operations and classical communications in such a way that we disregard the profit and loss of entanglement of the catalysts per se. As such, the possibilities of pure entanglement transformations are greatly expanded. Remarkably, we find an interesting phenomenon that, in some situations, incomparable pairs ${| ψ> ,| ϕ> \} $ and ${| χ> ,| χ^{\prime}> \} $ can assist each other mutually so as to realize the transformation $| ψ> | χ> \to| ϕ> | χ^{\prime}>$. We also design an efficient algorithm to detect whether a $k\times k$ general catalyst exists for a given entanglement transformation. This algorithm can as well be exploited to witness the existence of standard catalysts.

Yan Huang

What is connected

Connect this record

See the researcher in context

Building this map preview

40 published item(s)

Beyond the All-in-One Agent: Benchmarking Role-Specialized Multi-Agent Collaboration in Enterprise Workflows

Metric-Normalized Posterior Leakage (mPL): Attacker-Aligned Privacy for Joint Consumption

Omni-DeepSearch: A Benchmark for Audio-Driven Omni-Modal Deep Search

Context-Guided Spatio-Temporal Video Grounding

Specific Emitter Identification Based on Joint Variational Mode Decomposition

1st Place Solutions for RxR-Habitat Vision-and-Language Navigation Competition (CVPR 2022)

A Closer Look at Personalization in Federated Image Classification

Actor and Action Modular Network for Text-based Video Segmentation

Cyclic Differentiable Architecture Search

Generalizable Person Re-Identification via Self-Supervised Batch Norm Test-Time Adaption

Scaling Bockchain with Adaptivity

Study of background from accidental coincidence signals in the PandaX-II experiment

Tackling Data Heterogeneity: A New Unified Framework for Decentralized SGD with Sample-induced Topology

Time dependent numerical model for the very high energy emissions of distant gamma-ray busrt GRB 201216C

Uncovering the Source of Machine Bias

Univoque bases of real numbers: simply normal bases, irregular bases and multiple rationals

FWB-Net:Front White Balance Network for Color Shift Correction in Single Image Dehazing via Atmospheric Light Estimation

Internal Calibration of the PandaX-II Detector with Radon Gaseous Sources

Results of Dark Matter Search using the Full PandaX-II Exposure

Algorithmic Transparency with Strategic Users

Crowd, Lending, Machine, and Bias

Kilonova Emission From Black Hole-Neutron Star Mergers. I. Viewing-Angle-Dependent Lightcurves

L-Vector: Neural Label Embedding for Domain Adaptation

Large-scale Real-time Personalized Similar Product Recommendations

Learning Goal-oriented Dialogue Policy with Opposite Agent Awareness

Recurrent Deconvolutional Generative Adversarial Networks with Application to Text Guided Video Generation

Secondary-electron radiation accompanying hadronic GeV-TeV gamma-rays from supernova remnants

Overview to the Hard X-ray Modulation Telescope (Insight-HXMT) Satellite

Searching for Neutrino-less Double Beta Decay of $^{136}$Xe with PandaX-II Liquid Xenon Detector

A combined model for pseudorapidity distributions in Cu-Cu collisions at BNL-RHIC energies

Anchoring and Agreement in Syntactic Annotations

Multimodal Memory Modelling for Video Captioning

The radio environment of the 21 Centimeter Array: RFI detection and mitigation

A combined model for the pseudorapidity distributions in p-p collisions at center-of-mass energies from 23.6 to 7000 GeV

Community Detection from Location-Tagged Networks

Infrared colour properties of nearby radio-luminous galaxies

Large Scale Real-time Ridesharing with Service Guarantee on Road Networks

Distance Preserving Graph Simplification

A heuristic view about the evolution and species

General entanglement-assisted transformation for bipartite pure quantum states