Researcher profile

Shengyu Zhu

Shengyu Zhu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2023arXiv

RIS-Assisted Joint Uplink Communication and Imaging: Phase Optimization and Bayesian Echo Decoupling

Achieving integrated sensing and communication (ISAC) via uplink transmission is challenging due to the unknown waveform and the coupling of communication and sensing echoes. In this paper, a joint uplink communication and imaging system is proposed for the first time, where a reconfigurable intelligent surface (RIS) is used to manipulate the electromagnetic signals for echo decoupling at the base station (BS). Aiming to enhance the transmission gain in desired directions and generate required radiation pattern in the region of interest (RoI), a phase optimization problem for RIS is formulated, which is high dimensional and nonconvex with discrete constraints. To tackle this problem, a back propagation based phase design scheme for both continuous and discrete phase models is developed. Moreover, the echo decoupling problem is tackled using the Bayesian method with the factor graph technique, where the problem is represented by a graph model which consists of difficult local functions. Based on the graph model, a message-passing algorithm is derived, which can efficiently cooperate with the adaptive sparse Bayesian learning (SBL) to achieve joint communication and imaging. Numerical results show that the proposed method approaches the relevant lower bound asymptotically, and the communication performance can be enhanced with the utilization of imaging echoes.

preprint2022arXiv

A Local Method for Identifying Causal Relations under Markov Equivalence

Causality is important for designing interpretable and robust methods in artificial intelligence research. We propose a local approach to identify whether a variable is a cause of a given target under the framework of causal graphical models of directed acyclic graphs (DAGs). In general, the causal relation between two variables may not be identifiable from observational data as many causal DAGs encoding different causal relations are Markov equivalent. In this paper, we first introduce a sufficient and necessary graphical condition to check the existence of a causal path from a variable to a target in every Markov equivalent DAG. Next, we provide local criteria for identifying whether a variable is a cause/non-cause of a target based only on the local structure instead of the entire graph. Finally, we propose a local learning algorithm for this causal query via learning the local structure of the variable and some additional statistical independence tests related to the target. Simulation studies show that our local algorithm is efficient and effective, compared with other state-of-art methods.

preprint2022arXiv

A Semi-Synthetic Dataset Generation Framework for Causal Inference in Recommender Systems

Accurate recommendation and reliable explanation are two key issues for modern recommender systems. However, most recommendation benchmarks only concern the prediction of user-item ratings while omitting the underlying causes behind the ratings. For example, the widely-used Yahoo!R3 dataset contains little information on the causes of the user-movie ratings. A solution could be to conduct surveys and require the users to provide such information. In practice, the user surveys can hardly avoid compliance issues and sparse user responses, which greatly hinders the exploration of causality-based recommendation. To better support the studies of causal inference and further explanations in recommender systems, we propose a novel semi-synthetic data generation framework for recommender systems where causal graphical models with missingness are employed to describe the causal mechanism of practical recommendation scenarios. To illustrate the use of our framework, we construct a semi-synthetic dataset with Causal Tags And Ratings (CTAR), based on the movies as well as their descriptive tags and rating information collected from a famous movie rating website. Using the collected data and the causal graph, the user-item-ratings and their corresponding user-item-tags are automatically generated, which provides the reasons (selected tags) why the user rates the items. Descriptive statistics and baseline results regarding the CTAR dataset are also reported. The proposed data generation framework is not limited to recommendation, and the released APIs can be used to generate customized datasets for other research tasks.

preprint2022arXiv

Masked Gradient-Based Causal Structure Learning

This paper studies the problem of learning causal structures from observational data. We reformulate the Structural Equation Model (SEM) with additive noises in a form parameterized by binary graph adjacency matrix and show that, if the original SEM is identifiable, then the binary adjacency matrix can be identified up to super-graphs of the true causal graph under mild conditions. We then utilize the reformulated SEM to develop a causal structure learning method that can be efficiently trained using gradient-based optimization, by leveraging a smooth characterization on acyclicity and the Gumbel-Softmax approach to approximate the binary adjacency matrix. It is found that the obtained entries are typically near zero or one and can be easily thresholded to identify the edges. We conduct experiments on synthetic and real datasets to validate the effectiveness of the proposed method, and show that it readily includes different smooth model functions and achieves a much improved performance on most datasets considered.

preprint2022arXiv

Out-of-distribution Generalization with Causal Invariant Transformations

In real-world applications, it is important and desirable to learn a model that performs well on out-of-distribution (OOD) data. Recently, causality has become a powerful tool to tackle the OOD generalization problem, with the idea resting on the causal mechanism that is invariant across domains of interest. To leverage the generally unknown causal mechanism, existing works assume a linear form of causal feature or require sufficiently many and diverse training domains, which are usually restrictive in practice. In this work, we obviate these assumptions and tackle the OOD problem without explicitly recovering the causal feature. Our approach is based on transformations that modify the non-causal feature but leave the causal part unchanged, which can be either obtained from prior knowledge or learned from the training data in the multi-domain scenario. Under the setting of invariant causal mechanism, we theoretically show that if all such transformations are available, then we can learn a minimax optimal model across the domains using only single domain data. Noticing that knowing a complete set of these causal invariant transformations may be impractical, we further show that it suffices to know only a subset of these transformations. Based on the theoretical findings, a regularized training procedure is proposed to improve the OOD generalization capability. Extensive experimental results on both synthetic and real datasets verify the effectiveness of the proposed algorithm, even with only a few causal invariant transformations.

preprint2022arXiv

Reframed GES with a Neural Conditional Dependence Measure

In a nonparametric setting, the causal structure is often identifiable only up to Markov equivalence, and for the purpose of causal inference, it is useful to learn a graphical representation of the Markov equivalence class (MEC). In this paper, we revisit the Greedy Equivalence Search (GES) algorithm, which is widely cited as a score-based algorithm for learning the MEC of the underlying causal structure. We observe that in order to make the GES algorithm consistent in a nonparametric setting, it is not necessary to design a scoring metric that evaluates graphs. Instead, it suffices to plug in a consistent estimator of a measure of conditional dependence to guide the search. We therefore present a reframing of the GES algorithm, which is more flexible than the standard score-based version and readily lends itself to the nonparametric setting with a general measure of conditional dependence. In addition, we propose a neural conditional dependence (NCD) measure, which utilizes the expressive power of deep neural networks to characterize conditional independence in a nonparametric manner. We establish the optimality of the reframed GES algorithm under standard assumptions and the consistency of using our NCD estimator to decide conditional independence. Together these results justify the proposed approach. Experimental results demonstrate the effectiveness of our method in causal discovery, as well as the advantages of using our NCD measure over kernel-based measures.

preprint2022arXiv

Universality of parametric Coupling Flows over parametric diffeomorphisms

Invertible neural networks based on Coupling Flows CFlows) have various applications such as image synthesis and data compression. The approximation universality for CFlows is of paramount importance to ensure the model expressiveness. In this paper, we prove that CFlows can approximate any diffeomorphism in C^k-norm if its layers can approximate certain single-coordinate transforms. Specifically, we derive that a composition of affine coupling layers and invertible linear transforms achieves this universality. Furthermore, in parametric cases where the diffeomorphism depends on some extra parameters, we prove the corresponding approximation theorems for our proposed parametric coupling flows named Para-CFlows. In practice, we apply Para-CFlows as a neural surrogate model in contextual Bayesian optimization tasks, to demonstrate its superiority over other neural surrogate models in terms of optimization performance.

preprint2021arXiv

Asymptotically Optimal One- and Two-Sample Testing with Kernels

We characterize the asymptotic performance of nonparametric one- and two-sample testing. The exponential decay rate or error exponent of the type-II error probability is used as the asymptotic performance metric, and an optimal test achieves the maximum rate subject to a constant level constraint on the type-I error probability. With Sanov's theorem, we derive a sufficient condition for one-sample tests to achieve the optimal error exponent in the universal setting, i.e., for any distribution defining the alternative hypothesis. We then show that two classes of Maximum Mean Discrepancy (MMD) based tests attain the optimal type-II error exponent on $\mathbb R^d$, while the quadratic-time Kernel Stein Discrepancy (KSD) based tests achieve this optimality with an asymptotic level constraint. For general two-sample testing, however, Sanov's theorem is insufficient to obtain a similar sufficient condition. We proceed to establish an extended version of Sanov's theorem and derive an exact error exponent for the quadratic-time MMD based two-sample tests. The obtained error exponent is further shown to be optimal among all two-sample tests satisfying a given level constraint. Our work hence provides an achievability result for optimal nonparametric one- and two-sample testing in the universal setting. Application to off-line change detection and related issues are also discussed.

preprint2020arXiv

Causal Discovery with Reinforcement Learning

Discovering causal structure among a set of variables is a fundamental problem in many empirical sciences. Traditional score-based casual discovery methods rely on various local heuristics to search for a Directed Acyclic Graph (DAG) according to a predefined score function. While these methods, e.g., greedy equivalence search, may have attractive results with infinite samples and certain model assumptions, they are usually less satisfactory in practice due to finite data and possible violation of assumptions. Motivated by recent advances in neural combinatorial optimization, we propose to use Reinforcement Learning (RL) to search for the DAG with the best scoring. Our encoder-decoder model takes observable data as input and generates graph adjacency matrices that are used to compute rewards. The reward incorporates both the predefined score function and two penalty terms for enforcing acyclicity. In contrast with typical RL applications where the goal is to learn a policy, we use RL as a search strategy and our final output would be the graph, among all graphs generated during training, that achieves the best reward. We conduct experiments on both synthetic and real datasets, and show that the proposed approach not only has an improved search ability but also allows a flexible score function under the acyclicity constraint.