Source author record

Cheng Zhang

Cheng Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

97works

59topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A generalised pre-training strategy for deep learning networks in semantic segmentation of remotely sensed images

In the segmentation of remotely sensed images, deep learning models are typically pre-trained using large image databases like ImageNet before fine-tuned on domain-specific datasets. However, the performance of these fine-tuned models is often hindered by the large domain gaps (i.e., differences in scenes and modalities) between ImageNet's images and remotely sensed images being processed. Therefore, many researchers have undertaken efforts to establish large-scale domain-specific image datasets for pre-training, aiming to enhance model performance. However, establishing such datasets is often challenging, requiring significant effort, and these datasets often exhibit limited generaliza-bility to other application scenarios. To address these issues, this study introduces a novel yet simple pre-training strategy designed to guide a model away from learning domain-specific features in a pre-training dataset during pre-training, thereby improving the generalisation ability of the pre-trained model. To evaluate the strategy's effectiveness, deep learning models are pre-trained on ImageNet and subsequently fine-tuned on four semantic segmentation datasets with diverse scenes and modalities, including iSAID, MFNet, PST900 and Potsdam. Experimental results show that the proposed pre-training strategy led to state-of-the-art accuracies on all four datasets, namely 67.4% mIoU for iSAID, 56.9% mIoU for MFNet, 84.22% mIoU for PST900, 91.88% mF1 for Potsdam. This research lays the groundwork for developing a unified foundation model applicable to both computer vision and remote sensing applications.

preprint2026arXiv

A Kernel Approach for Semi-implicit Variational Inference

Semi-implicit variational inference (SIVI) enhances the expressiveness of variational families through hierarchical semi-implicit distributions, but the intractability of their densities makes standard ELBO-based optimization biased. Recent score-matching approaches to SIVI (SIVI-SM) address this issue via a minimax formulation, at the expense of an additional lower-level optimization problem. In this paper, we propose kernel semi-implicit variational inference (KSIVI), a principled and tractable alternative that eliminates the lower-level optimization by leveraging kernel methods. We show that when optimizing over a reproducing kernel Hilbert space, the lower-level problem admits an explicit solution, reducing the objective to the kernel Stein discrepancy (KSD). Exploiting the hierarchical structure of semi-implicit distributions, the resulting KSD objective can be efficiently optimized using stochastic gradient methods. We establish optimization guarantees via variance bounds on Monte Carlo gradient estimators and derive statistical generalization bounds of order $\tilde{\mathcal{O}}(1/\sqrt{n})$. We further introduce a multi-layer hierarchical extension that improves expressiveness while preserving tractability. Empirical results on synthetic and real-world Bayesian inference tasks demonstrate the effectiveness of KSIVI.

preprint2026arXiv

Attribute-Grounded Selective Reasoning for Artwork Emotion Understanding with Multimodal Large Language Models

Multimodal large language models (MLLMs) can produce fluent artwork emotion explanations, but they often suffer from attribute flooding: they enumerate many visible formal attributes without identifying which cues actually support the affective judgment. We therefore formulate artwork emotion understanding as Attribute-Grounded Selective Reasoning (AGSR), where predefined formal attributes serve as evidence units and only emotionally operative attributes should enter the final interpretation. To make this problem measurable, we extend EmoArt, originally introduced at ACM MM 2025 as a 132,664-artwork resource with content, formal-attribute, valence-arousal, and emotion annotations, by adding a 1,400-artwork human salience extension annotated by 15 art-trained annotators. This extension provides instance-level supervision for distinguishing attributes that are merely present from those that are emotionally salient. We further propose FAB-G (Formal-Attribute Bottleneck-Guided reasoning), a supervised multi-agent framework that first predicts attribute-level salience and then constrains downstream emotional analysis to the retained cues. Experiments show that FAB-G yields consistent gains in emotion, arousal, and valence prediction, achieves stronger agreement with human-marked salient attributes under Dice and Tversky metrics, and produces substantially more compact final explanations than prompting-based baselines. Cross-dataset evaluation further suggests that attribute-grounded salience selection transfers beyond the source distribution of EmoArt, while also revealing attribute-specific boundary cases. The dataset and project page are available at https://zhiliangzhang.github.io/EmoArt-130k/

preprint2026arXiv

CalibAnyView: Beyond Single-View Camera Calibration in the Wild

Camera calibration is a fundamental prerequisite for reliable geometric perception, yet classical approaches rely on controlled acquisition setups that are impractical for in-the-wild imagery. Recent learning-based methods have shown promising results for single-view calibration, but inherently neglect geometric consistency across multiple views. We introduce CalibAnyView, a unified formulation that supports an arbitrary number of input views ($N \geq 1$) by explicitly modeling cross-view geometric consistency. To facilitate this, we construct a large-scale multi-view video dataset covering diverse real-world scenarios, including multiple camera models, dynamic scenes, realistic motion trajectories, and heterogeneous lens distortions. Building on this dataset, we develop a multi-view transformer that predicts dense perspective fields, which are further integrated into a geometric optimization framework to jointly estimate camera intrinsics and gravity direction. Extensive experiments demonstrate that CalibAnyView consistently outperforms state-of-the-art methods, achieves strong robustness under single-view settings, and further improves with multi-view inference, providing a reliable foundation for downstream tasks such as 3D reconstruction and robotic perception in the wild.

preprint2026arXiv

CL-bench Life: Can Language Models Learn from Real-Life Context?

Today's AI assistants such as OpenClaw are designed to handle context effectively, making context learning an increasingly important capability for models. As these systems move beyond professional settings into everyday life, the nature of the contexts they must handle also shifts. Real-life contexts are often messy, fragmented, and deeply tied to personal and social experience, such as multi-party conversations, personal archives, and behavioral traces. Yet it remains unclear whether current frontier language models can reliably learn from such contexts and solve tasks grounded in them. To this end, we introduce CL-bench Life, a fully human-curated benchmark comprising 405 context-task pairs and 5,348 verification rubrics, covering common real-life scenarios. Solving tasks in CL-bench Life requires models to reason over complex, messy real-life contexts, calling for strong real-life context learning abilities that go far beyond those evaluated in existing benchmarks. We evaluate ten frontier LMs and find that real-life context learning remains highly challenging: even the best-performing model achieves only 19.3% task solving rate, while the average performance across models is only 13.8%. Models still struggle to reason over contexts such as messy group chat histories and fragmented behavioral records from everyday life. CL-bench Life provides a crucial testbed for advancing real-life context learning, and progress on it can enable more intelligent and reliable AI assistants in everyday life.

preprint2026arXiv

High-capacity dual degrees of freedom quantum secret sharing protocol beyond the linear rate-distance bound

Quantum secret sharing (QSS) is the multipartite cryptographic primitive. Most of existing QSS protocols are limited by the linear rate-distance bound, and cannot realize the long-distance and high-capacity multipartite key distribution. This paper proposes a polarization (Pol) and phase (Ph) dual degrees of freedom (dual-DOF) QSS protocol based on the weak coherent pulse (WCP) sources. Our protocol combines the single-photon interference, two-photon interference and non-interference principles, and can resist the internal attack from the dishonest player. We develop simulation method to estimate its performance under the beam splitting attack. The simulation results show that our protocol can surpass the linear bound. Comparing with the differential-phase-shift twin-field QSS and WCP-Ph-QSS protocols, our protocol has stronger resistance against the beam splitting attack, and thus has longer maximal communication distance and higher key rate. By using the WCPs with high average photon number ($μ$ = 1.5), our protocol achieves a key rate about 5.4 times of that in WCP-Ph-QSS protocol. Its maximal communication distance (441.7 km) is about 7.9% longer than that of the WCP-Ph-QSS. Our protocol is highly feasible with current experimental technology and offers a promising approach for long-distance and high-capacity quantum networks.

preprint2026arXiv

On consistency around a $3 \times 3\times 3$ cube and Q3 analogue of the lattice Boussinesq equation

In this paper, we present two new aspects of lattice Boussinesq (BSQ) equations. First, we show that the lattice potential BSQ (lpBSQ) equation defined on a nine-point square lattice admits a natural extension of three-dimensional consistency to a $3\times 3\times 3$ cube\textemdash a cubic sublattice consisting of $27$ vertices. This extends the standard notion of three-dimensional consistency (defined on an elementary $2\times 2\times 2$ vertex cube for quadrilateral equations) to the non-quadrilateral, nine-point setting. Second, we construct a new three-component system which is referred to as the {\em lattice BSQ-Q3 system}, serving as the BSQ analogue of the Q3($δ$) equation in the Adler-Bobenko-Suris (ABS) classification. The construction relies on a gauge transformation between Lax pairs of lpBSQ with the parameter $δ$ arising from a $GL_3$ action. In a degeneration form, the system yields a $PGL_3$-invariant integrable lattice equation that generalises the $PGL_2$-invariant Schwarzian BSQ equation.

preprint2026arXiv

Quantamination: Dynamic Quantization Leaks Your Data Across the Batch

Dynamic quantization emerged as a practical approach to increase the utilization and efficiency of the machine learning serving flow. Unlike static quantization, which applies quantization offline, dynamic quantization operates on tensors at run-time, adapting its parameters to the actual input data. Today's mainstream machine learning frameworks, including ML compilers and inference engines, frequently recommend dynamic quantization as an initial step for optimizing model serving. This is because dynamic quantization can significantly reduce memory usage and computational load, leading to faster token generation and improved model serving efficiency without substantial loss in model accuracy. In this paper, we reveal a critical vulnerability in dynamic quantization: an adversary can exploit such quantization strategy to steal sensitive user data placed in the same batch as the adversary's input. Our analysis demonstrates that dynamic quantization, when improperly implemented or configured, can create side channels that expose information about other inputs within the same batch. We call this phenomenon Quantamination, describing contamination from quantization. Specifically, we show that at least 4 of the most popular ML frameworks in use today either default to or can use configurations that leak data across the batch boundary. This data leakage, in theory, allows attackers to partially or even fully recover other users' batched input data, representing a serious privacy risk for existing ML serving frameworks.

preprint2026arXiv

SeamCam: Quantifying Seamless Camouflage via Multi-Cue Visual Detectability

Animals are described as effectively camouflaged when they blend seamlessly with their surrounding, yet no standardized quantitative measure of this seamlessness exists. We address this gap by framing camouflage evaluation as a visual localization problem: a well-camouflaged animal is one that remains difficult to detect even when its category is known. We introduce SeamCam (Seamless Camouflage), a metric that quantifies how detectable an animal is from the available visual evidence. Given an image and a target species, SeamCam generates category-conditioned detection proposals, extracts segmentation masks, and identifies the subset whose collective union yields the highest IoU with the ground-truth mask. The SeamCam score is one minus this maximum recoverable localization signal, where a higher score indicates stronger camouflage (i.e., lower detectability). In a human two-alternative forced-choice study with 94 participants and 2,390 comparisons, SeamCam achieves 78.82% agreement with human camouflage difficulty judgments, outperforming state-of-the-art by about 25%. We then demonstrate SeamCam's utility as a preference signal for Direct Preference Optimization (DPO) to fine-tune a diffusion-based inpainting model for camouflage generation. This offers an affordable training approach with an objective explicitly suited for camouflage generation, unlike typical diffusion models. To support rigorous benchmarking, we further introduce CamFG-1.5k, a curated dataset of 1,521 high-resolution images in which animals are fully visible prior to camouflage generation, enabling unbiased evaluation by controlling for occlusion artifacts present in existing datasets. https://7amin.github.io/SeamCam/

preprint2024arXiv

Entropy-based Probing Beam Selection and Beam Prediction via Deep Learning

Hierarchical beam search in mmWave communications incurs substantial training overhead, necessitating deep learning-enabled beam predictions to effectively leverage channel priors and mitigate this overhead. In this study, we introduce a comprehensive probabilistic model of power distribution in beamspace, and formulate the joint optimization problem of probing beam selection and probabilistic beam prediction as an entropy minimization problem. Then, we propose a greedy scheme to iteratively and alternately solve this problem, where a transformer-based beam predictor is trained to estimate the conditional power distribution based on the probing beams and user location within each iteration, and the trained predictor selects an unmeasured beam that minimizes the entropy of remaining beams. To further reduce the number of interactions and the computational complexity of the iterative scheme, we propose a two-stage probing beam selection scheme. Firstly, probing beams are selected from a location-specific codebook designed by an entropy-based criterion, and predictions are made with corresponding feedback. Secondly, the optimal beam is identified using additional probing beams with the highest predicted power values. Simulation results demonstrate the superiority of the proposed schemes compared to hierarchical beam search and beam prediction with uniform probing beams.

preprint2024arXiv

Three-state coherent control using narrowband and passband sequences

In this work, we propose a comprehensive design for narrowband and passband composite pulse sequences by involving the dynamics of all states in the three-state system. The design is quite universal as all pulse parameters can be freely employed to modify the coefficients of error terms. Two modulation techniques, the strength and phase modulations, are used to achieve arbitrary population transfer with a desired excitation profile, while the system keeps minimal leakage to the third state. Furthermore, the current sequences are capable of tolerating inaccurate waveforms, detunings errors, and work well when rotating wave approximation is not strictly justified. Therefore, this work provides versatile adaptability for shaping various excitation profiles in both narrowband and passband sequences.

preprint2023arXiv

Properties of new even and odd nonlinear coherent states with different parameters

We construct a class of nonlinear coherent states (NLCSs) by introducing a more general nonlinear function and study their non-classical properties, specifically the second-order correlation function $g^{(2)}(0)$, Mandel parameter $Q$, squeezing, amplitude squared squeezing and Wigner function of the optical field. The results indicate that the non-classical properties of the new types of even and odd NLCSs crucially depend on nonlinear functions. More concretely, we find that the new even NLCSs could exhibit the photon-bunching effect whereas the new odd NLCSs could show photon-antibunching effect. The degree of squeezing is also significantly affected by the parameter selection of these NLCSs. By employing various forms of nonlinear functions, it becomes possible to construct NLCSs with diverse properties, thereby providing a theoretical foundation for corresponding experimental investigations.

preprint2022arXiv

Composite pulses for high fidelity population transfer in three-level systems

In this work, we propose a composite pulses scheme by modulating phases to achieve high fidelity population transfer in three-level systems. To circumvent the obstacle that not enough variables are exploited to eliminate the systematic errors in the transition probability, we put forward a cost function to find the optimal value. The cost function is independently constructed either in ensuring an accurate population of the target state, or in suppressing the population of the leakage state, or both of them. The results demonstrate that population transfer is implemented with high fidelity even when existing the deviations in the coupling coefficients. Furthermore, our composite pulses scheme can be extensible to arbitrarily long pulse sequences. As an example, we employ the composite pulses sequence for achieving the three-atom singlet state in an atom-cavity system with ultrahigh fidelity. The final singlet state shows robustness against deviations and is not seriously affected by waveform distortions. Also, the singlet state maintains a high fidelity under the decoherence environment.

preprint2022arXiv

Deep End-to-end Causal Inference

Causal inference is essential for data-driven decision making across domains such as business engagement, medical treatment and policy making. However, research on causal discovery has evolved separately from inference methods, preventing straight-forward combination of methods from both fields. In this work, we develop Deep End-to-end Causal Inference (DECI), a single flow-based non-linear additive noise model that takes in observational data and can perform both causal discovery and inference, including conditional average treatment effect (CATE) estimation. We provide a theoretical guarantee that DECI can recover the ground truth causal graph under standard causal discovery assumptions. Motivated by application impact, we extend this model to heterogeneous, mixed-type data with missing values, allowing for both continuous and discrete treatment decisions. Our results show the competitive performance of DECI when compared to relevant baselines for both causal discovery and (C)ATE estimation in over a thousand experiments on both synthetic datasets and causal machine learning benchmarks across data-types and levels of missingness.

preprint2022arXiv

Downstream Transformer Generation of Question-Answer Pairs with Preprocessing and Postprocessing Pipelines

We present a system called TP3 to perform a downstream task of transformers on generating question-answer pairs (QAPs) from a given article. TP3 first finetunes pretrained transformers on QAP datasets, then uses a preprocessing pipeline to select appropriate answers, feeds the relevant sentences and the answer to the finetuned transformer to generate candidate QAPs, and finally uses a postprocessing pipeline to filter inadequate QAPs. In particular, using pretrained T5 models as transformers and the SQuAD dataset as the finetruning dataset, we show that TP3 generates satisfactory number of QAPs with high qualities on the Gaokao-EN dataset.

preprint2022arXiv

Efficient Real-world Testing of Causal Decision Making via Bayesian Experimental Design for Contextual Optimisation

The real-world testing of decisions made using causal machine learning models is an essential prerequisite for their successful application. We focus on evaluating and improving contextual treatment assignment decisions: these are personalised treatments applied to e.g. customers, each with their own contextual information, with the aim of maximising a reward. In this paper we introduce a model-agnostic framework for gathering data to evaluate and improve contextual decision making through Bayesian Experimental Design. Specifically, our method is used for the data-efficient evaluation of the regret of past treatment assignments. Unlike approaches such as A/B testing, our method avoids assigning treatments that are known to be highly sub-optimal, whilst engaging in some exploration to gather pertinent information. We achieve this by introducing an information-based design objective, which we optimise end-to-end. Our method applies to discrete and continuous treatments. Comparing our information-theoretic approach to baselines in several simulation studies demonstrates the superior performance of our proposed approach.

preprint2022arXiv

Electron-beam Introduction of Heteroatomic Pt-Si Structures in Graphene

Electron-beam (e-beam) manipulation of single dopant atoms in an aberration-corrected scanning transmission electron microscope is emerging as a method for directed atomic motion and atom-by-atom assembly. Until now, the dopant species have been limited to atoms closely matched to carbon in terms of ionic radius and capable of strong covalent bonding with carbon atoms in the graphene lattice. In situ dopant insertion into a graphene lattice has thus far been demonstrated only for Si, which is ubiquitously present as a contaminant in this material. Here, we achieve in situ manipulation of Pt atoms and their insertion into the graphene host matrix using the e-beam deposited Pt on graphene as a host system. We further demonstrate a mechanism for stabilization of the Pt atom, enabled through the formation of Si-stabilized Pt heteroatomic clusters attached to the graphene surface. This study provides evidence toward the universality of the e-beam assembly approach, opening a pathway for exploring cluster chemistry through direct assembly.

preprint2022arXiv

Exploring and Evaluating Image Restoration Potential in Dynamic Scenes

In dynamic scenes, images often suffer from dynamic blur due to superposition of motions or low signal-noise ratio resulted from quick shutter speed when avoiding motions. Recovering sharp and clean results from the captured images heavily depends on the ability of restoration methods and the quality of the input. Although existing research on image restoration focuses on developing models for obtaining better restored results, fewer have studied to evaluate how and which input image leads to superior restored quality. In this paper, to better study an image's potential value that can be explored for restoration, we propose a novel concept, referring to image restoration potential (IRP). Specifically, We first establish a dynamic scene imaging dataset containing composite distortions and applied image restoration processes to validate the rationality of the existence to IRP. Based on this dataset, we investigate several properties of IRP and propose a novel deep model to accurately predict IRP values. By gradually distilling and selective fusing the degradation features, the proposed model shows its superiority in IRP prediction. Thanks to the proposed model, we are then able to validate how various image restoration related applications are benefited from IRP prediction. We show the potential usages of IRP as a filtering principle to select valuable frames, an auxiliary guidance to improve restoration models, and even an indicator to optimize camera settings for capturing better images under dynamic scenarios.

preprint2022arXiv

Exploring gluon tomography with polarization dependent diffractive J/$ψ$ production

We study azimuthal asymmetries in diffractive J/$ψ$ production in ultraperipheral heavy-ion collisions at RHIC and LHC energies using the color glass condensate effective theory. Our calculation successfully describes azimuthal averaged $J/ψ$ production cross section measured by STAR and ALICE. We further predict very large $\cos 2ϕ$ and $\cos 4ϕ$ azimuthal asymmetries for diffractive $J/ψ$ production both in UPCs at RHIC and LHC energies and in eA collisions at EIC energy. These novel polarization dependent observables may provide complementary information for constraining gluon transverse spatial distribution inside large nuclei. As compared to all previous analysis of diffractive $J/ψ$ production, the essential new elements integrated in our theoretical calculations are: the double-slit interference effect, the linear polarization of coherent photons, and the final state soft photon radiation effect.

preprint2022arXiv

Fingerprinting Deep Neural Networks Globally via Universal Adversarial Perturbations

In this paper, we propose a novel and practical mechanism which enables the service provider to verify whether a suspect model is stolen from the victim model via model extraction attacks. Our key insight is that the profile of a DNN model's decision boundary can be uniquely characterized by its Universal Adversarial Perturbations (UAPs). UAPs belong to a low-dimensional subspace and piracy models' subspaces are more consistent with victim model's subspace compared with non-piracy model. Based on this, we propose a UAP fingerprinting method for DNN models and train an encoder via contrastive learning that takes fingerprint as inputs, outputs a similarity score. Extensive studies show that our framework can detect model IP breaches with confidence > 99.99 within only 20 fingerprints of the suspect model. It has good generalizability across different model architectures and is robust against post-modifications on stolen models.

preprint2022arXiv

Local Constraint-Based Causal Discovery under Selection Bias

We consider the problem of discovering causal relations from independence constraints selection bias in addition to confounding is present. While the seminal FCI algorithm is sound and complete in this setup, no criterion for the causal interpretation of its output under selection bias is presently known. We focus instead on local patterns of independence relations, where we find no sound method for only three variable that can include background knowledge. Y-Structure patterns are shown to be sound in predicting causal relations from data under selection bias, where cycles may be present. We introduce a finite-sample scoring rule for Y-Structures that is shown to successfully predict causal relations in simulation experiments that include selection mechanisms. On real-world microarray data, we show that a Y-Structure variant performs well across different datasets, potentially circumventing spurious correlations due to selection bias.

preprint2022arXiv

NeurIPS Competition Instructions and Guide: Causal Insights for Learning Paths in Education

In this competition, participants will address two fundamental causal challenges in machine learning in the context of education using time-series data. The first is to identify the causal relationships between different constructs, where a construct is defined as the smallest element of learning. The second challenge is to predict the impact of learning one construct on the ability to answer questions on other constructs. Addressing these challenges will enable optimisation of students' knowledge acquisition, which can be deployed in a real edtech solution impacting millions of students. Participants will run these tasks in an idealised environment with synthetic data and a real-world scenario with evaluation data collected from a series of A/B tests.

preprint2022arXiv

On Incorrectness Logic and Kleene Algebra with Top and Tests

Kleene algebra with tests (KAT) is a foundational equational framework for reasoning about programs, which has found applications in program transformations, networking and compiler optimizations, among many other areas. In his seminal work, Kozen proved that KAT subsumes propositional Hoare logic, showing that one can reason about the (partial) correctness of while programs by means of the equational theory of KAT. In this work, we investigate the support that KAT provides for reasoning about incorrectness, instead, as embodied by Ohearn's recently proposed incorrectness logic. We show that KAT cannot directly express incorrectness logic. The main reason for this limitation can be traced to the fact that KAT cannot express explicitly the notion of codomain, which is essential to express incorrectness triples. To address this issue, we study Kleene Algebra with Top and Tests (TopKAT), an extension of KAT with a top element. We show that TopKAT is powerful enough to express a codomain operation, to express incorrectness triples, and to prove all the rules of incorrectness logic sound. This shows that one can reason about the incorrectness of while-like programs by means of the equational theory of TopKAT.

preprint2022arXiv

Optimal transport for causal discovery

To determine causal relationships between two variables, approaches based on Functional Causal Models (FCMs) have been proposed by properly restricting model classes; however, the performance is sensitive to the model assumptions, which makes it difficult to use. In this paper, we provide a novel dynamical-system view of FCMs and propose a new framework for identifying causal direction in the bivariate case. We first show the connection between FCMs and optimal transport, and then study optimal transport under the constraints of FCMs. Furthermore, by exploiting the dynamical interpretation of optimal transport under the FCM constraints, we determine the corresponding underlying dynamical process of the static cause-effect pair data. It provides a new dimension for describing static causal discovery tasks while enjoying more freedom for modeling the quantitative causal influences. In particular, we show that Additive Noise Models (ANMs) correspond to volume-preserving pressureless flows. Consequently, based on their velocity field divergence, we introduce a criterion for determining causal direction. With this criterion, we propose a novel optimal transport-based algorithm for ANMs which is robust to the choice of models and extend it to post-nonlinear models. Our method demonstrated state-of-the-art results on both synthetic and causal discovery benchmark datasets.

preprint2022arXiv

Optimistic Optimization of Gaussian Process Samples

Bayesian optimization is a popular formalism for global optimization, but its computational costs limit it to expensive-to-evaluate functions. A competing, computationally more efficient, global optimization framework is optimistic optimization, which exploits prior knowledge about the geometry of the search space in form of a dissimilarity function. We investigate to which degree the conceptual advantages of Bayesian Optimization can be combined with the computational efficiency of optimistic optimization. By mapping the kernel to a dissimilarity, we obtain an optimistic optimization algorithm for the Bayesian Optimization setting with a run-time of up to $\mathcal{O}(N \log N)$. As a high-level take-away we find that, when using stationary kernels on objectives of relatively low evaluation cost, optimistic optimization can be strongly preferable over Bayesian optimization, while for strongly coupled and parametric models, good implementations of Bayesian optimization can perform much better, even at low evaluation cost. We argue that there is a new research domain between geometric and probabilistic search, i.e. methods that run drastically faster than traditional Bayesian optimization, while retaining some of the crucial functionality of Bayesian optimization.

preprint2022arXiv

Probing the gluon tomography in photoproduction of di-pions

A sizable $\cos 4ϕ$ azimuthal asymmetry in exclusive di-pion production near $ρ^0$ resonance peak in ultraperipheral heavy ion collisions recently has been reported by STAR collaboration. We show that both elliptic gluon Wigner distribution and final state soft photon radiation can give rise to this azimuthal asymmetry. The fact that the QED effect alone severely underestimates the observed asymmetry might signal the existence of the nontrivial correlation in quantum phase distribution of gluons.

preprint2022arXiv

PVT: Point-Voxel Transformer for Point Cloud Learning

The recently developed pure Transformer architectures have attained promising accuracy on point cloud learning benchmarks compared to convolutional neural networks. However, existing point cloud Transformers are computationally expensive since they waste a significant amount of time on structuring the irregular data. To solve this shortcoming, we present Sparse Window Attention (SWA) module to gather coarse-grained local features from non-empty voxels, which not only bypasses the expensive irregular data structuring and invalid empty voxel computation, but also obtains linear computational complexity with respect to voxel resolution. Meanwhile, to gather fine-grained features about the global shape, we introduce relative attention (RA) module, a more robust self-attention variant for rigid transformations of objects. Equipped with the SWA and RA, we construct our neural architecture called PVT that integrates both modules into a joint framework for point cloud learning. Compared with previous Transformer-based and attention-based models, our method attains top accuracy of 94.0% on classification benchmark and 10x inference speedup on average. Extensive experiments also valid the effectiveness of PVT on part and semantic segmentation benchmarks (86.6% and 69.2% mIoU, respectively).

preprint2022arXiv

Reinforcement Learning in the Wild: Scalable RL Dispatching Algorithm Deployed in Ridehailing Marketplace

In this study, a real-time dispatching algorithm based on reinforcement learning is proposed and for the first time, is deployed in large scale. Current dispatching methods in ridehailing platforms are dominantly based on myopic or rule-based non-myopic approaches. Reinforcement learning enables dispatching policies that are informed of historical data and able to employ the learned information to optimize returns of expected future trajectories. Previous studies in this field yielded promising results, yet have left room for further improvements in terms of performance gain, self-dependency, transferability, and scalable deployment mechanisms. The present study proposes a standalone RL-based dispatching solution that is equipped with multiple mechanisms to ensure robust and efficient on-policy learning and inference while being adaptable for full-scale deployment. A new form of value updating based on temporal difference is proposed that is more adapted to the inherent uncertainty of the problem. For the driver-order assignment, a customized utility function is proposed that when tuned based on the statistics of the market, results in remarkable performance improvement and interpretability. In addition, for reducing the risk of cancellation after drivers' assignment, an adaptive graph pruning strategy based on the multi-arm bandit problem is introduced. The method is evaluated using offline simulation with real data and yields notable performance improvement. In addition, the algorithm is deployed online in multiple cities under DiDi's operation for A/B testing and is launched in one of the major international markets as the primary mode of dispatch. The deployed algorithm shows over 1.3% improvement in total driver income from A/B testing. In addition, by causal inference analysis, as much as 5.3% improvement in major performance metrics is detected after full-scale deployment.

preprint2022arXiv

Robust population inversion in three-level systems by composite pulses

In this work, we exploit the idea of composite pulses to achieve robust population inversion in a three-level quantum system. The scheme is based on the modulation of the coupling strength, while the other physical parameters remain unchanged. The composite pulses sequence is designed by vanishing high-order error terms, and can compensate the systematic errors to any desired order. In particular, this scheme keeps a good performance under the disturbance of waveform deformations. This trait ensures that population inversion can be nearly obtained even when the pulse sequence has a short jump delay. As an example, we employ the designed composite pulse sequence to prepare the W state in a robust manner in the superconducting circuits. The numerical results show that the fidelity can still maintain a high level in a decoherence environment.

preprint2022arXiv

Sharp $L^p$ estimates and size of nodal sets for generalized Steklov eigenfunctions

We prove sharp $L^p$ estimates for the Steklov eigenfunctions on compact manifolds with boundary in terms of their $L^2$ norms on the boundary. We prove it by establishing $L^p$ bounds for the harmonic extension operators as well as the spectral projection operators on the boundary. Moreover, we derive lower bounds on the size of nodal sets for a variation of the Steklov spectral problem. We consider a generalized version of the Steklov problem by adding a non-smooth potential on the boundary but some of our results are new even without potential.

preprint2022arXiv

Simultaneous Missing Value Imputation and Structure Learning with Groups

Learning structures between groups of variables from data with missing values is an important task in the real world, yet difficult to solve. One typical scenario is discovering the structure among topics in the education domain to identify learning pathways. Here, the observations are student performances for questions under each topic which contain missing values. However, most existing methods focus on learning structures between a few individual variables from the complete data. In this work, we propose VISL, a novel scalable structure learning approach that can simultaneously infer structures between groups of variables under missing data and perform missing value imputations with deep learning. Particularly, we propose a generative model with a structured latent space and a graph neural network-based architecture, scaling to a large number of variables. Empirically, we conduct extensive experiments on synthetic, semi-synthetic, and real-world education data sets. We show improved performances on both imputation and structure learning accuracy compared to popular and recent approaches.

preprint2022arXiv

Snowmass2021 Cosmic Frontier: Cosmic Microwave Background Measurements White Paper

This is a solicited whitepaper for the Snowmass 2021 community planning exercise. The paper focuses on measurements and science with the Cosmic Microwave Background (CMB). The CMB is foundational to our understanding of modern physics and continues to be a powerful tool driving our understanding of cosmology and particle physics. In this paper, we outline the broad and unique impact of CMB science for the High Energy Cosmic Frontier in the upcoming decade. We also describe the progression of ground-based CMB experiments, which shows that the community is prepared to develop the key capabilities and facilities needed to achieve these transformative CMB measurements.

preprint2022arXiv

Upsampling Autoencoder for Self-Supervised Point Cloud Learning

In computer-aided design (CAD) community, the point cloud data is pervasively applied in reverse engineering, where the point cloud analysis plays an important role. While a large number of supervised learning methods have been proposed to handle the unordered point clouds and demonstrated their remarkable success, their performance and applicability are limited to the costly data annotation. In this work, we propose a novel self-supervised pretraining model for point cloud learning without human annotations, which relies solely on upsampling operation to perform feature learning of point cloud in an effective manner. The key premise of our approach is that upsampling operation encourages the network to capture both high-level semantic information and low-level geometric information of the point cloud, thus the downstream tasks such as classification and segmentation will benefit from the pre-trained model. Specifically, our method first conducts the random subsampling from the input point cloud at a low proportion e.g., 12.5%. Then, we feed them into an encoder-decoder architecture, where an encoder is devised to operate only on the subsampled points, along with a upsampling decoder is adopted to reconstruct the original point cloud based on the learned features. Finally, we design a novel joint loss function which enforces the upsampled points to be similar with the original point cloud and uniformly distributed on the underlying shape surface. By adopting the pre-trained encoder weights as initialisation of models for downstream tasks, we find that our UAE outperforms previous state-of-the-art methods in shape classification, part segmentation and point cloud upsampling tasks. Code will be made publicly available upon acceptance.

preprint2021arXiv

A Causal View on Robustness of Neural Networks

We present a causal view on the robustness of neural networks against input manipulations, which applies not only to traditional classification tasks but also to general measurement data. Based on this view, we design a deep causal manipulation augmented model (deep CAMA) which explicitly models possible manipulations on certain causes leading to changes in the observed effect. We further develop data augmentation and test-time fine-tuning methods to improve deep CAMA's robustness. When compared with discriminative deep neural networks, our proposed model shows superior robustness against unseen manipulations. As a by-product, our model achieves disentangled representation which separates the representation of manipulations from those of other latent causes.

preprint2021arXiv

Educational Question Mining At Scale: Prediction, Analysis and Personalization

Online education platforms enable teachers to share a large number of educational resources such as questions to form exercises and quizzes for students. With large volumes of available questions, it is important to have an automated way to quantify their properties and intelligently select them for students, enabling effective and personalized learning experiences. In this work, we propose a framework for mining insights from educational questions at scale. We utilize the state-of-the-art Bayesian deep learning method, in particular partial variational auto-encoders (p-VAE), to analyze real students' answers to a large collection of questions. Based on p-VAE, we propose two novel metrics that quantify question quality and difficulty, respectively, and a personalized strategy to adaptively select questions for students. We apply our proposed framework to a real-world dataset with tens of thousands of questions and tens of millions of answers from an online education platform. Our framework not only demonstrates promising results in terms of statistical metrics but also obtains highly consistent results with domain experts' evaluation.

preprint2021arXiv

Estimating $α$-Rank by Maximizing Information Gain

Game theory has been increasingly applied in settings where the game is not known outright, but has to be estimated by sampling. For example, meta-games that arise in multi-agent evaluation can only be accessed by running a succession of expensive experiments that may involve simultaneous deployment of several agents. In this paper, we focus on $α$-rank, a popular game-theoretic solution concept designed to perform well in such scenarios. We aim to estimate the $α$-rank of the game using as few samples as possible. Our algorithm maximizes information gain between an epistemic belief over the $α$-ranks and the observed payoff. This approach has two main benefits. First, it allows us to focus our sampling on the entries that matter the most for identifying the $α$-rank. Second, the Bayesian formulation provides a facility to build in modeling assumptions by using a prior over game payoffs. We show the benefits of using information gain as compared to the confidence interval criterion of ResponseGraphUCB (Rowland et al. 2019), and provide theoretical results justifying our method.

preprint2021arXiv

Photoionization-induced broadband dispersive wave generated in an Ar-filled hollow-core photonic crystal fiber

The resonance band in hollow-core photonic crystal fiber (HC-PCF), while leading to high-loss region in the fiber transmission spectrum, has been successfully used for generating phase-matched dispersive wave (DW). Here, we report that the spectral width of the resonance-induced DW can be largely broadened due to plasma-driven blueshifting soliton. In the experiment, we observed that in a short length of Ar-filled single-ring HC-PCF the soliton self-compression and photoionization effects caused a strong spectral blueshift of the pump pulse, changing the phase-matching condition of the DW emission process. Therefore, broadening of DW spectrum to the longer-wavelength side was obtained with several spectral peaks, which correspond to the generation of DW at different positions along the fiber. In the simulation, we used super-Gauss windows with different central wavelengths to filter out these DW spectral peaks, and studied the time-domain characteristics of these peaks respectively using Fourier transform method. The simulation results verified that these multiple-peaks on the DW spectrum have different delays in the time domain, agreeing well with our theoretical prediction. Remarkably, we found that the whole time-domain DW trace can be compressed to ~29 fs using proper chirp compensation. The experimental and numerical results reported here provide some insight into the resonance-induced DW generation process in gas-filled HC-PCFs, they could also pave the way to ultrafast pulse generation using DW-emission mechanism.

preprint2021arXiv

Plasmons in the van der Waals charge-density-wave material 2H-TaSe2

Plasmons in two-dimensional (2D) materials beyond graphene have recently gained much attention. However, the experimental investigation is limited due to the lack of suitable materials. Here, we experimentally demonstrate localized plasmons in a correlated 2D charge-density-wave (CDW) material: 2H-TaSe2. The plasmon resonance can cover a broad spectral range from the terahertz (40 μm) to the telecom (1.55 μm) region, which is further tunable by changing thickness and dielectric environments. The plasmon dispersion flattens at large wave vectors, resulted from the universal screening effect of interband transitions. More interestingly, anomalous temperature dependence of plasmon resonances associated with CDW excitations is observed. In the CDW phase, the plasmon peak close to the CDW excitation frequency becomes wider and asymmetric, mimicking two coupled oscillators. Our study not only reveals the universal role of the intrinsic screening on 2D plasmons, but also opens an avenue for tunable plasmons in 2D correlated materials.

preprint2021arXiv

TeethTap: Recognizing Discrete Teeth Gestures Using Motion and Acoustic Sensing on an Earpiece

Teeth gestures become an alternative input modality for different situations and accessibility purposes. In this paper, we present TeethTap, a novel eyes-free and hands-free input technique, which can recognize up to 13 discrete teeth tapping gestures. TeethTap adopts a wearable 3D printed earpiece with an IMU sensor and a contact microphone behind both ears, which works in tandem to detect jaw movement and sound data, respectively. TeethTap uses a support vector machine to classify gestures from noise by fusing acoustic and motion data, and implements K-Nearest-Neighbor (KNN) with a Dynamic Time Warping (DTW) distance measurement using motion data for gesture classification. A user study with 11 participants demonstrated that TeethTap could recognize 13 gestures with a real-time classification accuracy of 90.9% in a laboratory environment. We further uncovered the accuracy differences on different teeth gestures when having sensors on single vs. both sides. Moreover, we explored the activation gesture under real-world environments, including eating, speaking, walking and jumping. Based on our findings, we further discussed potential applications and practical challenges of integrating TeethTap into future devices.

preprint2021arXiv

True-data Testbed for 5G/B5G Intelligent Network

Future beyond fifth-generation (B5G) and sixth-generation (6G) mobile communications will shift from facilitating interpersonal communications to supporting Internet of Everything (IoE), where intelligent communications with full integration of big data and artificial intelligence (AI) will play an important role in improving network efficiency and providing high-quality service. As a rapid evolving paradigm, the AI-empowered mobile communications demand large amounts of data acquired from real network environment for systematic test and verification. Hence, we build the world's first true-data testbed for 5G/B5G intelligent network (TTIN), which comprises 5G/B5G on-site experimental networks, data acquisition & data warehouse, and AI engine & network optimization. In the TTIN, true network data acquisition, storage, standardization, and analysis are available, which enable system-level online verification of B5G/6G-orientated key technologies and support data-driven network optimization through the closed-loop control mechanism. This paper elaborates on the system architecture and module design of TTIN. Detailed technical specifications and some of the established use cases are also showcased.

Cheng Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

97 published item(s)

A generalised pre-training strategy for deep learning networks in semantic segmentation of remotely sensed images

A Kernel Approach for Semi-implicit Variational Inference

Attribute-Grounded Selective Reasoning for Artwork Emotion Understanding with Multimodal Large Language Models

CalibAnyView: Beyond Single-View Camera Calibration in the Wild

CL-bench Life: Can Language Models Learn from Real-Life Context?

High-capacity dual degrees of freedom quantum secret sharing protocol beyond the linear rate-distance bound

On consistency around a $3 \times 3\times 3$ cube and Q3 analogue of the lattice Boussinesq equation

Quantamination: Dynamic Quantization Leaks Your Data Across the Batch

SeamCam: Quantifying Seamless Camouflage via Multi-Cue Visual Detectability

Entropy-based Probing Beam Selection and Beam Prediction via Deep Learning

Three-state coherent control using narrowband and passband sequences

Properties of new even and odd nonlinear coherent states with different parameters

Composite pulses for high fidelity population transfer in three-level systems

Deep End-to-end Causal Inference

Downstream Transformer Generation of Question-Answer Pairs with Preprocessing and Postprocessing Pipelines

Efficient Real-world Testing of Causal Decision Making via Bayesian Experimental Design for Contextual Optimisation

Electron-beam Introduction of Heteroatomic Pt-Si Structures in Graphene

Exploring and Evaluating Image Restoration Potential in Dynamic Scenes

Exploring gluon tomography with polarization dependent diffractive J/$ψ$ production

Fingerprinting Deep Neural Networks Globally via Universal Adversarial Perturbations

Local Constraint-Based Causal Discovery under Selection Bias

NeurIPS Competition Instructions and Guide: Causal Insights for Learning Paths in Education

On Incorrectness Logic and Kleene Algebra with Top and Tests

Optimal transport for causal discovery

Optimistic Optimization of Gaussian Process Samples

Probing the gluon tomography in photoproduction of di-pions

PVT: Point-Voxel Transformer for Point Cloud Learning

Reinforcement Learning in the Wild: Scalable RL Dispatching Algorithm Deployed in Ridehailing Marketplace

Robust population inversion in three-level systems by composite pulses

Sharp $L^p$ estimates and size of nodal sets for generalized Steklov eigenfunctions

Simultaneous Missing Value Imputation and Structure Learning with Groups

Snowmass2021 Cosmic Frontier: Cosmic Microwave Background Measurements White Paper

Upsampling Autoencoder for Self-Supervised Point Cloud Learning

A Causal View on Robustness of Neural Networks

Educational Question Mining At Scale: Prediction, Analysis and Personalization

Estimating $α$-Rank by Maximizing Information Gain

Photoionization-induced broadband dispersive wave generated in an Ar-filled hollow-core photonic crystal fiber

Plasmons in the van der Waals charge-density-wave material 2H-TaSe2

TeethTap: Recognizing Discrete Teeth Gestures Using Motion and Acoustic Sensing on an Earpiece

True-data Testbed for 5G/B5G Intelligent Network

$L^p$ eigenfunction bounds for fractional Schrödinger operators on manifolds

ACOUSTIC-TURF: Acoustic-based Privacy-Preserving COVID-19 Contact Tracing

Adaptive Parameterization for Neural Dialogue Generation

Assessing the Memory Ability of Recurrent Neural Networks

Attention-based network for low-light image enhancement

Causal Discovery in the Presence of Missing Data

Data Manipulation: Towards Effective Instance Learning for Neural Dialogue Generation via Learning to Augment and Reweight

Double-dot interferometer for quantum measurement of Majorana qubits and stabilizers

DRIFT: Deep Reinforcement Learning for Functional Software Testing

Eigenfunction equations of lattice KdV equations and connections to ABS lattice equations with a $δ$ term

Estimating a fluctuating magnetic field with a continuously monitored atomic ensemble

Hide-and-Seek Privacy Challenge

Intervention Generative Adversarial Networks

Learning from Easy to Complex: Adaptive Multi-curricula Learning for Neural Dialogue Generation

Non-bifurcating phylogenetic tree inference via the adaptive LASSO

On the identifiability of interaction functions in systems of interacting particles

Preparation of alginate hydrogel microparticles using droplet-based microfluidics: a review of methods

Sharp endpoint estimates for eigenfunctions restricted to submanifolds of codimension 2

Spatio-Temporal Hierarchical Adaptive Dispatching for Ridesharing Systems

The Collectivity of Heavy Mesons in Proton-Nucleus Collisions

Thermodynamics of Chiral Fermion System in a Uniform Magnetic Field

Thoracic Disease Identification and Localization using Distance Learning and Region Verification

VAEM: a Deep Generative Model for Heterogeneous Mixed Type Data

An Imprint of the Galactic Magnetic Field in the Diffuse Unpolarized Dust Emission

Tightening Bounds for Variational Inference by Revisiting Perturbation Theory

An asymptotic formula for the zeros of the deformed exponential function

Arrayed van der Waals Vertical Heterostructures based on 2D GaSe Grown by Molecular Beam Epitaxy

Diagnostic Prediction Using Discomfort Drawings

Diagnostic Prediction Using Discomfort Drawings with IBTM

Enhanced Thermoelectric Properties of Dirac Semimetal Cd3As2

Inter-Battery Topic Representation Learning

Observation of quasi-two-dimensional Dirac fermions in ZrTe5

Viewpoint and Topic Modeling of Current Events

Wafer-scale arrayed p-n junctions based on few-layer epitaxial GaTe