Source author record

Shengyu Zhang

Shengyu Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

50works

22topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Boxed UC plane partitions and the two-site generalized phase model

This study investigates the connection between boxed UC plane partitions and the two-site generalized phase model. By introducing two maps, we investigate the representation of two-side generalized phase algebras and actions of monodromy matrix operators on basis vectors. The generating function of boxed UC plane partitions is established by the scalar product of the two-site generalized phase model, which can be expressed as products of Schur functions. It is shown that the generating function of boxed UC plane partitions is that of UC plane partitions with the double scaling limit.

preprint2026arXiv

CORE: Code-based Inverse Self-Training Framework with Graph Expansion for Virtual Agents

The development of Multimodal Virtual Agents has made significant progress through the integration of Multimodal Large Language Models. However, mainstream training paradigms face key challenges: Behavior Cloning is simple and effective through imitation but suffers from low behavioral diversity, while Reinforcement Learning is capable of discovering novel strategies through exploration but heavily relies on manually designed reward functions. To address the conflict between these two methods, we present CORE, a Code-based Inverse Self-Training Framework with Graph Expansion that bridges imitation and exploration, offering a novel training framework that promotes behavioral diversity while eliminating the reliance on manually reward design. Specifically, we introduce Semantic Code Abstraction to automatically infers reward functions from expert demonstrations without manual design. The inferred reward function, referred to as the Label Function, is executable code that verifies one key step within a task. Building on this, we propose Strategy Graph Expansion to enhance in-domain behavioral diversity, which constructs a multi-path graph called Strategy Graph that captures diverse valid solutions beyond expert demonstrations. Furthermore, we introduce Trajectory-Guided Extrapolation, which enriches out-of-domain behavioral diversity by utilizing both successful and failed trajectories to expand the task space. Experiments on Web and Android platforms demonstrate that CORE significantly improves both overall performance and generalization, highlighting its potential as a robust and generalizable training paradigm for building powerful virtual agents.

preprint2025arXiv

QAOA-MaxCut has barren plateaus for almost all graphs

The QAOA has been the subject of intense study over recent years, yet the corresponding Dynamical Lie Algebra (DLA)--a key indicator of the expressivity and trainability of VQAs--remains poorly understood beyond highly symmetric instances. An exponentially scaling DLA dimension is associated with the presence of so-called barren plateaus (BP) in the optimization landscape, which renders training intractable. In this work, we investigate the DLA of QAOA applied to the canonical MaxCut, for both weighted and unweighted graphs. For weighted graphs, we show that when the weights are drawn from a continuous distribution, the DLA dimension grows as $Θ(4^n)$ almost surely for all connected graphs except paths and cycles. In the more common unweighted setting, we show that asymptotically all but an exponentially vanishing fraction of graphs have $Θ(4^n)$ large DLA dimension. The entire simple Lie algebra decomposition of the corresponding DLAs is also identified, from which we prove that the variance of the loss function is $O(1/2^n)$, implying that QAOA on these weighted and unweighted graphs all suffers from BP. Moreover, we give explicit constructions for families of graphs whose DLAs have exponential dimension, including cases whose MaxCut is in $\mathsf P$. Our proof of the unweighted case is based on a number of splitting lemmas and DLA-freeness conditions that allow one to convert prohibitively complicated Lie algebraic problems into amenable graph theoretic problems. These form the basis for a new algorithm that computes such DLAs orders of magnitude faster than previous methods, reducing runtimes from days to seconds on standard hardware. We apply this algorithm to MQLib, a classical MaxCut benchmark suite covering over 3,500 instances with up to 53,130 vertices, and find that, ignoring edge weights, at least 75% of the instances possess a DLA of dimension at least $2^{128}$.

preprint2022arXiv

Adaptive Double-Exploration Tradeoff for Outlier Detection

We study a variant of the thresholding bandit problem (TBP) in the context of outlier detection, where the objective is to identify the outliers whose rewards are above a threshold. Distinct from the traditional TBP, the threshold is defined as a function of the rewards of all the arms, which is motivated by the criterion for identifying outliers. The learner needs to explore the rewards of the arms as well as the threshold. We refer to this problem as "double exploration for outlier detection". We construct an adaptively updated confidence interval for the threshold, based on the estimated value of the threshold in the previous rounds. Furthermore, by automatically trading off exploring the individual arms and exploring the outlier threshold, we provide an efficient algorithm in terms of the sample complexity. Experimental results on both synthetic datasets and real-world datasets demonstrate the efficiency of our algorithm.

preprint2022arXiv

BoostMIS: Boosting Medical Image Semi-supervised Learning with Adaptive Pseudo Labeling and Informative Active Annotation

In this paper, we propose a novel semi-supervised learning (SSL) framework named BoostMIS that combines adaptive pseudo labeling and informative active annotation to unleash the potential of medical image SSL models: (1) BoostMIS can adaptively leverage the cluster assumption and consistency regularization of the unlabeled data according to the current learning status. This strategy can adaptively generate one-hot "hard" labels converted from task model predictions for better task model training. (2) For the unselected unlabeled images with low confidence, we introduce an Active learning (AL) algorithm to find the informative samples as the annotation candidates by exploiting virtual adversarial perturbation and model's density-aware entropy. These informative candidates are subsequently fed into the next training cycle for better SSL label propagation. Notably, the adaptive pseudo-labeling and informative active annotation form a learning closed-loop that are mutually collaborative to boost medical image SSL. To verify the effectiveness of the proposed method, we collected a metastatic epidural spinal cord compression (MESCC) dataset that aims to optimize MESCC diagnosis and classification for improved specialist referral and treatment. We conducted an extensive experimental study of BoostMIS on MESCC and another public dataset COVIDx. The experimental results verify our framework's effectiveness and generalisability for different medical image datasets with a significant improvement over various state-of-the-art methods.

preprint2022arXiv

BOSS: Bottom-up Cross-modal Semantic Composition with Hybrid Counterfactual Training for Robust Content-based Image Retrieval

Content-Based Image Retrieval (CIR) aims to search for a target image by concurrently comprehending the composition of an example image and a complementary text, which potentially impacts a wide variety of real-world applications, such as internet search and fashion retrieval. In this scenario, the input image serves as an intuitive context and background for the search, while the corresponding language expressly requests new traits on how specific characteristics of the query image should be modified in order to get the intended target image. This task is challenging since it necessitates learning and understanding the composite image-text representation by incorporating cross-granular semantic updates. In this paper, we tackle this task by a novel \underline{\textbf{B}}ottom-up cr\underline{\textbf{O}}ss-modal \underline{\textbf{S}}emantic compo\underline{\textbf{S}}ition (\textbf{BOSS}) with Hybrid Counterfactual Training framework, which sheds new light on the CIR task by studying it from two previously overlooked perspectives: \emph{implicitly bottom-up composition of visiolinguistic representation} and \emph{explicitly fine-grained correspondence of query-target construction}. On the one hand, we leverage the implicit interaction and composition of cross-modal embeddings from the bottom local characteristics to the top global semantics, preserving and transforming the visual representation conditioned on language semantics in several continuous steps for effective target image search. On the other hand, we devise a hybrid counterfactual training strategy that can reduce the model's ambiguity for similar queries.

preprint2022arXiv

CCL4Rec: Contrast over Contrastive Learning for Micro-video Recommendation

Micro-video recommender systems suffer from the ubiquitous noises in users' behaviors, which might render the learned user representation indiscriminating, and lead to trivial recommendations (e.g., popular items) or even weird ones that are far beyond users' interests. Contrastive learning is an emergent technique for learning discriminating representations with random data augmentations. However, due to neglecting the noises in user behaviors and treating all augmented samples equally, the existing contrastive learning framework is insufficient for learning discriminating user representations in recommendation. To bridge this research gap, we propose the Contrast over Contrastive Learning framework for training recommender models, named CCL4Rec, which models the nuances of different augmented views by further contrasting augmented positives/negatives with adaptive pulling/pushing strengths, i.e., the contrast over (vanilla) contrastive learning. To accommodate these contrasts, we devise the hardness-aware augmentations that track the importance of behaviors being replaced in the query user and the relatedness of substitutes, and thus determining the quality of augmented positives/negatives. The hardness-aware augmentation also permits controllable contrastive learning, leading to performance gains and robust training. In this way, CCL4Rec captures the nuances of historical behaviors for a given user, which explicitly shields off the learned user representation from the effects of noisy behaviors. We conduct extensive experiments on two micro-video recommendation benchmarks, which demonstrate that CCL4Rec with far less model parameters could achieve comparable performance to existing state-of-the-art method, and improve the training/inference speed by several orders of magnitude.

preprint2022arXiv

Contextual Combinatorial Conservative Bandits

The problem of multi-armed bandits (MAB) asks to make sequential decisions while balancing between exploitation and exploration, and have been successfully applied to a wide range of practical scenarios. Various algorithms have been designed to achieve a high reward in a long term. However, its short-term performance might be rather low, which is injurious in risk sensitive applications. Building on previous work of conservative bandits, we bring up a framework of contextual combinatorial conservative bandits. An algorithm is presented and a regret bound of $\tilde O(d^2+d\sqrt{T})$ is proven, where $d$ is the dimension of the feature vectors, and $T$ is the total number of time steps. We further provide an algorithm as well as regret analysis for the case when the conservative reward is unknown. Experiments are conducted, and the results validate the effectiveness of our algorithm.

preprint2022arXiv

Contrastive Learning with Positive-Negative Frame Mask for Music Representation

Self-supervised learning, especially contrastive learning, has made an outstanding contribution to the development of many deep learning research fields. Recently, researchers in the acoustic signal processing field noticed its success and leveraged contrastive learning for better music representation. Typically, existing approaches maximize the similarity between two distorted audio segments sampled from the same music. In other words, they ensure a semantic agreement at the music level. However, those coarse-grained methods neglect some inessential or noisy elements at the frame level, which may be detrimental to the model to learn the effective representation of music. Towards this end, this paper proposes a novel Positive-nEgative frame mask for Music Representation based on the contrastive learning framework, abbreviated as PEMR. Concretely, PEMR incorporates a Positive-Negative Mask Generation module, which leverages transformer blocks to generate frame masks on the Log-Mel spectrogram. We can generate self-augmented negative and positive samples by masking important components or inessential components, respectively. We devise a novel contrastive learning objective to accommodate both self-augmented positives/negatives sampled from the same music. We conduct experiments on four public datasets. The experimental results of two music-related downstream tasks, music classification, and cover song identification, demonstrate the generalization ability and transferability of music representation learned by PEMR.

preprint2022arXiv

Dilated Context Integrated Network with Cross-Modal Consensus for Temporal Emotion Localization in Videos

Understanding human emotions is a crucial ability for intelligent robots to provide better human-robot interactions. The existing works are limited to trimmed video-level emotion classification, failing to locate the temporal window corresponding to the emotion. In this paper, we introduce a new task, named Temporal Emotion Localization in videos~(TEL), which aims to detect human emotions and localize their corresponding temporal boundaries in untrimmed videos with aligned subtitles. TEL presents three unique challenges compared to temporal action localization: 1) The emotions have extremely varied temporal dynamics; 2) The emotion cues are embedded in both appearances and complex plots; 3) The fine-grained temporal annotations are complicated and labor-intensive. To address the first two challenges, we propose a novel dilated context integrated network with a coarse-fine two-stream architecture. The coarse stream captures varied temporal dynamics by modeling multi-granularity temporal contexts. The fine stream achieves complex plots understanding by reasoning the dependency between the multi-granularity temporal contexts from the coarse stream and adaptively integrates them into fine-grained video segment features. To address the third challenge, we introduce a cross-modal consensus learning paradigm, which leverages the inherent semantic consensus between the aligned video and subtitle to achieve weakly-supervised learning. We contribute a new testing set with 3,000 manually-annotated temporal boundaries so that future research on the TEL problem can be quantitatively evaluated. Extensive experiments show the effectiveness of our approach on temporal emotion localization. The repository of this work is at https://github.com/YYJMJC/Temporal-Emotion-Localization-in-Videos.

preprint2022arXiv

Edge-Cloud Polarization and Collaboration: A Comprehensive Survey for AI

Influenced by the great success of deep learning via cloud computing and the rapid development of edge chips, research in artificial intelligence (AI) has shifted to both of the computing paradigms, i.e., cloud computing and edge computing. In recent years, we have witnessed significant progress in developing more advanced AI models on cloud servers that surpass traditional deep learning models owing to model innovations (e.g., Transformers, Pretrained families), explosion of training data and soaring computing capabilities. However, edge computing, especially edge and cloud collaborative computing, are still in its infancy to announce their success due to the resource-constrained IoT scenarios with very limited algorithms deployed. In this survey, we conduct a systematic review for both cloud and edge AI. Specifically, we are the first to set up the collaborative learning mechanism for cloud and edge modeling with a thorough review of the architectures that enable such mechanism. We also discuss potentials and practical experiences of some on-going advanced edge AI topics including pretraining models, graph neural networks and reinforcement learning. Finally, we discuss the promising directions and challenges in this field.

preprint2022arXiv

End-to-End Modeling via Information Tree for One-Shot Natural Language Spatial Video Grounding

Natural language spatial video grounding aims to detect the relevant objects in video frames with descriptive sentences as the query. In spite of the great advances, most existing methods rely on dense video frame annotations, which require a tremendous amount of human effort. To achieve effective grounding under a limited annotation budget, we investigate one-shot video grounding, and learn to ground natural language in all video frames with solely one frame labeled, in an end-to-end manner. One major challenge of end-to-end one-shot video grounding is the existence of videos frames that are either irrelevant to the language query or the labeled frames. Another challenge relates to the limited supervision, which might result in ineffective representation learning. To address these challenges, we designed an end-to-end model via Information Tree for One-Shot video grounding (IT-OS). Its key module, the information tree, can eliminate the interference of irrelevant frames based on branch search and branch cropping techniques. In addition, several self-supervised tasks are proposed based on the information tree to improve the representation learning under insufficient labeling. Experiments on the benchmark dataset demonstrate the effectiveness of our model.

preprint2022arXiv

HERO: HiErarchical spatio-tempoRal reasOning with Contrastive Action Correspondence for End-to-End Video Object Grounding

Video Object Grounding (VOG) is the problem of associating spatial object regions in the video to a descriptive natural language query. This is a challenging vision-language task that necessitates constructing the correct cross-modal correspondence and modeling the appropriate spatio-temporal context of the query video and caption, thereby localizing the specific objects accurately. In this paper, we tackle this task by a novel framework called HiErarchical spatio-tempoRal reasOning (HERO) with contrastive action correspondence. We study the VOG task at two aspects that prior works overlooked: (1) Contrastive Action Correspondence-aware Retrieval. Notice that the fine-grained video semantics (e.g., multiple actions) is not totally aligned with the annotated language query (e.g., single action), we first introduce the weakly-supervised contrastive learning that classifies the video as action-consistent and action-independent frames relying on the video-caption action semantic correspondence. Such a design can build the fine-grained cross-modal correspondence for more accurate subsequent VOG. (2) Hierarchical Spatio-temporal Modeling Improvement. While transformer-based VOG models present their potential in sequential modality (i.e., video and caption) modeling, existing evidence also indicates that the transformer suffers from the issue of the insensitive spatio-temporal locality. Motivated by that, we carefully design the hierarchical reasoning layers to decouple fully connected multi-head attention and remove the redundant interfering correlations. Furthermore, our proposed pyramid and shifted alignment mechanisms are effective to improve the cross-modal information utilization of neighborhood spatial regions and temporal frames. We conducted extensive experiments to show our HERO outperforms existing techniques by achieving significant improvement on two benchmark datasets.

preprint2022arXiv

Intelligent Request Strategy Design in Recommender System

Waterfall Recommender System (RS), a popular form of RS in mobile applications, is a stream of recommended items consisting of successive pages that can be browsed by scrolling. In waterfall RS, when a user finishes browsing a page, the edge (e.g., mobile phones) would send a request to the cloud server to get a new page of recommendations, known as the paging request mechanism. RSs typically put a large number of items into one page to reduce excessive resource consumption from numerous paging requests, which, however, would diminish the RSs' ability to timely renew the recommendations according to users' real-time interest and lead to a poor user experience. Intuitively, inserting additional requests inside pages to update the recommendations with a higher frequency can alleviate the problem. However, previous attempts, including only non-adaptive strategies (e.g., insert requests uniformly), would eventually lead to resource overconsumption. To this end, we envision a new learning task of edge intelligence named Intelligent Request Strategy Design (IRSD). It aims to improve the effectiveness of waterfall RSs by determining the appropriate occasions of request insertion based on users' real-time intention. Moreover, we propose a new paradigm of adaptive request insertion strategy named Uplift-based On-edge Smart Request Framework (AdaRequest). AdaRequest 1) captures the dynamic change of users' intentions by matching their real-time behaviors with their historical interests based on attention-based neural networks. 2) estimates the counterfactual uplift of user purchase brought by an inserted request based on causal inference. 3) determines the final request insertion strategy by maximizing the utility function under online resource constraints. We conduct extensive experiments on both offline dataset and online A/B test to verify the effectiveness of AdaRequest.

preprint2022arXiv

MAGIC: Multimodal relAtional Graph adversarIal inferenCe for Diverse and Unpaired Text-based Image Captioning

Text-based image captioning (TextCap) requires simultaneous comprehension of visual content and reading the text of images to generate a natural language description. Although a task can teach machines to understand the complex human environment further given that text is omnipresent in our daily surroundings, it poses additional challenges in normal captioning. A text-based image intuitively contains abundant and complex multimodal relational content, that is, image details can be described diversely from multiview rather than a single caption. Certainly, we can introduce additional paired training data to show the diversity of images' descriptions, this process is labor-intensive and time-consuming for TextCap pair annotations with extra texts. Based on the insight mentioned above, we investigate how to generate diverse captions that focus on different image parts using an unpaired training paradigm. We propose the Multimodal relAtional Graph adversarIal inferenCe (MAGIC) framework for diverse and unpaired TextCap. This framework can adaptively construct multiple multimodal relational graphs of images and model complex relationships among graphs to represent descriptive diversity. Moreover, a cascaded generative adversarial network is developed from modeled graphs to infer the unpaired caption generation in image-sentence feature alignment and linguistic coherence levels. We validate the effectiveness of MAGIC in generating diverse captions from different relational information items of an image. Experimental results show that MAGIC can generate very promising outcomes without using any image-caption training pairs.

preprint2022arXiv

MIC: Model-agnostic Integrated Cross-channel Recommenders

Semantically connecting users and items is a fundamental problem for the matching stage of an industrial recommender system. Recent advances in this topic are based on multi-channel retrieval to efficiently measure users' interest on items from the massive candidate pool. However, existing work are primarily built upon pre-defined retrieval channels, including User-CF (U2U), Item-CF (I2I), and Embedding-based Retrieval (U2I), thus access to the limited correlation between users and items which solely entail from partial information of latent interactions. In this paper, we propose a model-agnostic integrated cross-channel (MIC) approach for the large-scale recommendation, which maximally leverages the inherent multi-channel mutual information to enhance the matching performance. Specifically, MIC robustly models correlation within user-item, user-user, and item-item from latent interactions in a universal schema. For each channel, MIC naturally aligns pairs with semantic similarity and distinguishes them otherwise with more uniform anisotropic representation space. While state-of-the-art methods require specific architectural design, MIC intuitively considers them as a whole by enabling the complete information flow among users and items. Thus MIC can be easily plugged into other retrieval recommender systems. Extensive experiments show that our MIC helps several state-of-the-art models boost their performance on two real-world benchmarks. The satisfactory deployment of the proposed MIC on industrial online services empirically proves its scalability and flexibility.

preprint2022arXiv

Optimizing Quantum Annealing Schedules with Monte Carlo Tree Search enhanced with neural networks

Quantum annealing is a practical approach to approximately implement the adiabatic quantum computational model under a real-world setting. The goal of an adiabatic algorithm is to prepare the ground state of a problem-encoded Hamiltonian at the end of an annealing path. This is typically achieved by driving the dynamical evolution of a quantum system slowly to enforce adiabaticity. Properly optimized annealing schedules often significantly accelerate the computational process. Inspired by the recent success of deep reinforcement learning such as DeepMind's AlphaZero, we propose a Monte Carlo Tree Search (MCTS) algorithm and its enhanced version boosted with neural networks, which we name QuantumZero (QZero), to automate the design of annealing schedules in a hybrid quantum-classical framework. Both the MCTS and QZero algorithms perform remarkably well in discovering effective annealing schedules even when the annealing time is short for the 3-SAT examples we consider in this study. Furthermore, the flexibility of neural networks allows us to apply transfer-learning techniques to boost QZero's performance. We demonstrate in benchmark studies, that MCTS and QZero perform more efficiently than other reinforcement learning algorithms in designing annealing schedules.

preprint2022arXiv

Personalizing Intervened Network for Long-tailed Sequential User Behavior Modeling

In an era of information explosion, recommendation systems play an important role in people's daily life by facilitating content exploration. It is known that user activeness, i.e., number of behaviors, tends to follow a long-tail distribution, where the majority of users are with low activeness. In practice, we observe that tail users suffer from significantly lower-quality recommendation than the head users after joint training. We further identify that a model trained on tail users separately still achieve inferior results due to limited data. Though long-tail distributions are ubiquitous in recommendation systems, improving the recommendation performance on the tail users still remains challenge in both research and industry. Directly applying related methods on long-tail distribution might be at risk of hurting the experience of head users, which is less affordable since a small portion of head users with high activeness contribute a considerate portion of platform revenue. In this paper, we propose a novel approach that significantly improves the recommendation performance of the tail users while achieving at least comparable performance for the head users over the base model. The essence of this approach is a novel Gradient Aggregation technique that learns common knowledge shared by all users into a backbone model, followed by separate plugin prediction networks for the head users and the tail users personalization. As for common knowledge learning, we leverage the backward adjustment from the causality theory for deconfounding the gradient estimation and thus shielding off the backbone training from the confounder, i.e., user activeness. We conduct extensive experiments on two public recommendation benchmark datasets and a large-scale industrial datasets collected from the Alipay platform. Empirical studies validate the rationality and effectiveness of our approach.

preprint2022arXiv

Retroformer: Pushing the Limits of Interpretable End-to-end Retrosynthesis Transformer

Retrosynthesis prediction is one of the fundamental challenges in organic synthesis. The task is to predict the reactants given a core product. With the advancement of machine learning, computer-aided synthesis planning has gained increasing interest. Numerous methods were proposed to solve this problem with different levels of dependency on additional chemical knowledge. In this paper, we propose Retroformer, a novel Transformer-based architecture for retrosynthesis prediction without relying on any cheminformatics tools for molecule editing. Via the proposed local attention head, the model can jointly encode the molecular sequence and graph, and efficiently exchange information between the local reactive region and the global reaction context. Retroformer reaches the new state-of-the-art accuracy for the end-to-end template-free retrosynthesis, and improves over many strong baselines on better molecule and reaction validity. In addition, its generative procedure is highly interpretable and controllable. Overall, Retroformer pushes the limits of the reaction reasoning ability of deep generative models.

preprint2022arXiv

SPLDExtraTrees: Robust machine learning approach for predicting kinase inhibitor resistance

Drug resistance is a major threat to the global health and a significant concern throughout the clinical treatment of diseases and drug development. The mutation in proteins that is related to drug binding is a common cause for adaptive drug resistance. Therefore, quantitative estimations of how mutations would affect the interaction between a drug and the target protein would be of vital significance for the drug development and the clinical practice. Computational methods that rely on molecular dynamics simulations, Rosetta protocols, as well as machine learning methods have been proven to be capable of predicting ligand affinity changes upon protein mutation. However, the severely limited sample size and heavy noise induced overfitting and generalization issues have impeded wide adoption of machine learning for studying drug resistance. In this paper, we propose a robust machine learning method, termed SPLDExtraTrees, which can accurately predict ligand binding affinity changes upon protein mutation and identify resistance-causing mutations. Especially, the proposed method ranks training data following a specific scheme that starts with easy-to-learn samples and gradually incorporates harder and diverse samples into the training, and then iterates between sample weight recalculations and model updates. In addition, we calculate additional physics-based structural features to provide the machine learning model with the valuable domain knowledge on proteins for this data-limited predictive tasks. The experiments substantiate the capability of the proposed method for predicting kinase inhibitor resistance under three scenarios, and achieves predictive accuracy comparable to that of molecular dynamics and Rosetta methods with much less computational costs.

preprint2022arXiv

Suppressing ZZ Crosstalk of Quantum Computers through Pulse and Scheduling Co-Optimization

Noise is a significant obstacle to quantum computing, and $ZZ$ crosstalk is one of the most destructive types of noise affecting superconducting qubits. Previous approaches to suppressing $ZZ$ crosstalk have mainly relied on specific chip design that can complicate chip fabrication and aggravate decoherence. To some extent, special chip design can be avoided by relying on pulse optimization to suppress $ZZ$ crosstalk. However, existing approaches are non-scalable, as their required time and memory grow exponentially with the number of qubits involved. To address the above problems, we propose a scalable approach by co-optimizing pulses and scheduling. We optimize pulses to offer an ability to suppress $ZZ$ crosstalk surrounding a gate, and then design scheduling strategies to exploit this ability and achieve suppression across the whole circuit. A main advantage of such co-optimization is that it does not require special hardware support. Besides, we implement our approach as a general framework that is compatible with different pulse optimization methods. We have conducted extensive evaluations by simulation and on a real quantum computer. Simulation results show that our proposal can improve the fidelity of quantum computing on $4{\sim}12$ qubits by up to $81\times$ ($11\times$ on average). Ramsey experiments on a real quantum computer also demonstrate that our method can eliminate the effect of $ZZ$ crosstalk to a great extent.

preprint2022arXiv

The prospects of Monte Carlo antibody loop modelling on a fault-tolerant quantum computer

Quantum computing for the biological sciences is an area of rapidly growing interest, but specific industrial applications remain elusive. Quantum Markov chain Monte Carlo has been proposed as a method for accelerating a broad class of computational problems, including problems of pharmaceutical interest. Here we investigate the prospects of quantum advantage via this approach, by applying it to the problem of modelling antibody structure, a crucial task in drug development. To minimize the resources required while maintaining pharmaceutical-level accuracy, we propose a specific encoding of molecular dihedral angles into registers of qubits and a method for implementing, in quantum superposition, a Markov chain Monte Carlo update step based on a classical all-atom force field. We give the first detailed analysis of the resources required to solve a problem of industrial size and relevance and find that, though the time and space requirements of using a quantum computer in this way are considerable, continued technological improvements could bring the required resources within reach in the future.

preprint2021arXiv

Shortcuts to Adiabaticity for Open Systems in Circuit Quantum Electrodynamics

Shortcuts to adiabaticity (STA) are powerful quantum control methods, allowing quick evolution into target states of otherwise slow adiabatic dynamics. Such methods have widespread applications in quantum technologies, and various STA protocols have been demonstrated in closed systems. However, realizing STA for open quantum systems has presented a greater challenge, due to complex controls required in existing proposals. Here we present the first experimental demonstration of STA for open quantum systems, using a superconducting circuit QED system consisting of two coupled bosonic oscillators and a transmon qubit. By applying a counterdiabatic driving pulse, we reduce the adiabatic evolution time of a single lossy mode from 800 ns to 100 ns. In addition, we propose and implement an optimal control protocol to achieve fast and qubit-unconditional equilibrium of multiple lossy modes. Our results pave the way for accelerating dynamics of open quantum systems and have potential applications in designing fast open-system protocols of physical and interdisciplinary interest, such as accelerating bioengineering and chemical reaction dynamics.

preprint2021arXiv

Variational Quantum-Neural Hybrid Eigensolver

The variational quantum eigensolver (VQE) is one of the most representative quantum algorithms in the noisy intermediate-size quantum (NISQ) era, and is generally speculated to deliver one of the first quantum advantages for the ground-state simulations of some non-trivial Hamiltonians. However, short quantum coherence time and limited availability of quantum hardware resources in the NISQ hardware strongly restrain the capacity and expressiveness of VQEs. In this Letter, we introduce the variational quantum-neural hybrid eigensolver (VQNHE) in which the shallow-circuit quantum ansatz can be further enhanced by classical post-processing with neural networks. We show that VQNHE consistently and significantly outperforms VQE in simulating ground-state energies of quantum spins and molecules given the same amount of quantum resources. More importantly, we demonstrate that for arbitrary post-processing neural functions, VQNHE only incurs an polynomial overhead of processing time and represents the first scalable method to exponentially accelerate VQE with non-unitary post-processing that can be efficiently implemented in the NISQ era.

preprint2020arXiv

Comprehensive Information Integration Modeling Framework for Video Titling

In e-commerce, consumer-generated videos, which in general deliver consumers' individual preferences for the different aspects of certain products, are massive in volume. To recommend these videos to potential consumers more effectively, diverse and catchy video titles are critical. However, consumer-generated videos seldom accompany appropriate titles. To bridge this gap, we integrate comprehensive sources of information, including the content of consumer-generated videos, the narrative comment sentences supplied by consumers, and the product attributes, in an end-to-end modeling framework. Although automatic video titling is very useful and demanding, it is much less addressed than video captioning. The latter focuses on generating sentences that describe videos as a whole while our task requires the product-aware multi-grained video analysis. To tackle this issue, the proposed method consists of two processes, i.e., granular-level interaction modeling and abstraction-level story-line summarization. Specifically, the granular-level interaction modeling first utilizes temporal-spatial landmark cues, descriptive words, and abstractive attributes to builds three individual graphs and recognizes the intra-actions in each graph through Graph Neural Networks (GNN). Then the global-local aggregation module is proposed to model inter-actions across graphs and aggregate heterogeneous graphs into a holistic graph representation. The abstraction-level story-line summarization further considers both frame-level video features and the holistic graph to utilize the interactions between products and backgrounds, and generate the story-line topic of the video. We collect a large-scale dataset accordingly from real-world data in Taobao, a world-leading e-commerce platform, and will make the desensitized version publicly available to nourish further development of the research community...

preprint2020arXiv

DeepOPF: A Deep Neural Network Approach for Security-Constrained DC Optimal Power Flow

We develop DeepOPF as a Deep Neural Network (DNN) approach for solving security-constrained direct current optimal power flow (SC-DCOPF) problems, which are critical for reliable and cost-effective power system operation.DeepOPF is inspired by the observation that solving SC-DCOPF problems for a given power network is equivalent to depicting a high-dimensional mapping from the load inputs to the generation and phase angle outputs. We first train a DNN to learn the mapping and predict the generations from the load inputs. We then directly reconstruct the phase angles from the generations and loads by using the power flow equations. Such a predict-and-reconstruct approach reduces the dimension of the mapping to learn, subsequently cutting down the size of the DNN and the amount of training data needed. We further derive a condition for tuning the size of the DNN according to the desired approximation accuracy of the load-generation mapping. We develop a post-processing procedure based on $\ell_1$-projection to ensure the feasibility of the obtained solution, which can be of independent interest. Simulation results for IEEE test cases show that DeepOPF generates feasible solutions with less than 0.2% optimality loss, while speeding up the computation time by up to two orders of magnitude as compared to a state-of-the-art solver.

preprint2020arXiv

Grounded and Controllable Image Completion by Incorporating Lexical Semantics

In this paper, we present an approach, namely Lexical Semantic Image Completion (LSIC), that may have potential applications in art, design, and heritage conservation, among several others. Existing image completion procedure is highly subjective by considering only visual context, which may trigger unpredictable results which are plausible but not faithful to a grounded knowledge. To permit both grounded and controllable completion process, we advocate generating results faithful to both visual and lexical semantic context, i.e., the description of leaving holes or blank regions in the image (e.g., hole description). One major challenge for LSIC comes from modeling and aligning the structure of visual-semantic context and translating across different modalities. We term this process as structure completion, which is realized by multi-grained reasoning blocks in our model. Another challenge relates to the unimodal biases, which occurs when the model generates plausible results without using the textual description. This can be true since the annotated captions for an image are often semantically equivalent in existing datasets, and thus there is only one paired text for a masked image in training. We devise an unsupervised unpaired-creation learning path besides the over-explored paired-reconstruction path, as well as a multi-stage training strategy to mitigate the insufficiency of labeled data. We conduct extensive quantitative and qualitative experiments as well as ablation studies, which reveal the efficacy of our proposed LSIC.

preprint2020arXiv

Poet: Product-oriented Video Captioner for E-commerce

In e-commerce, a growing number of user-generated videos are used for product promotion. How to generate video descriptions that narrate the user-preferred product characteristics depicted in the video is vital for successful promoting. Traditional video captioning methods, which focus on routinely describing what exists and happens in a video, are not amenable for product-oriented video captioning. To address this problem, we propose a product-oriented video captioner framework, abbreviated as Poet. Poet firstly represents the videos as product-oriented spatial-temporal graphs. Then, based on the aspects of the video-associated product, we perform knowledge-enhanced spatial-temporal inference on those graphs for capturing the dynamic change of fine-grained product-part characteristics. The knowledge leveraging module in Poet differs from the traditional design by performing knowledge filtering and dynamic memory modeling. We show that Poet achieves consistent performance improvement over previous methods concerning generation quality, product aspects capturing, and lexical diversity. Experiments are performed on two product-oriented video captioning datasets, buyer-generated fashion video dataset (BFVD) and fan-generated fashion video dataset (FFVD), collected from Mobile Taobao. We will release the desensitized datasets to promote further investigations on both video captioning and general video analysis problems.

preprint2020arXiv

Quantum algorithms for graph problems with cut queries

Let $G$ be an $n$-vertex graph with $m$ edges. When asked a subset $S$ of vertices, a cut query on $G$ returns the number of edges of $G$ that have exactly one endpoint in $S$. We show that there is a bounded-error quantum algorithm that determines all connected components of $G$ after making $O(\log(n)^6)$ many cut queries. In contrast, it follows from results in communication complexity that any randomized algorithm even just to decide whether the graph is connected or not must make at least $Ω(n/\log(n))$ many cut queries. We further show that with $O(\log(n)^8)$ many cut queries a quantum algorithm can with high probability output a spanning forest for $G$. En route to proving these results, we design quantum algorithms for learning a graph using cut queries. We show that a quantum algorithm can learn a graph with maximum degree $d$ after $O(d \log(n)^2)$ many cut queries, and can learn a general graph with $O(\sqrt{m} \log(n)^{3/2})$ many cut queries. These two upper bounds are tight up to the poly-logarithmic factors, and compare to $Ω(dn)$ and $Ω(m/\log(n))$ lower bounds on the number of cut queries needed by a randomized algorithm for the same problems, respectively. The key ingredients in our results are the Bernstein-Vazirani algorithm, approximate counting with "OR queries", and learning sparse vectors from inner products as in compressed sensing.

preprint2016arXiv

Linear time algorithm for quantum 2SAT

A canonical result about satisfiability theory is that the 2-SAT problem can be solved in linear time, despite the NP-hardness of the 3-SAT problem. In the quantum 2-SAT problem, we are given a family of 2-qubit projectors $Π_{ij}$ on a system of $n$ qubits, and the task is to decide whether the Hamiltonian $H=\sum Π_{ij}$ has a 0-eigenvalue, or it is larger than $1/n^α$ for some $α=O(1)$. The problem is not only a natural extension of the classical 2-SAT problem to the quantum case, but is also equivalent to the problem of finding the ground state of 2-local frustration-free Hamiltonians of spin $\frac{1}{2}$, a well-studied model believed to capture certain key properties in modern condensed matter physics. While Bravyi has shown that the quantum 2-SAT problem has a classical polynomial-time algorithm, the running time of his algorithm is $O(n^4)$. In this paper we give a classical algorithm with linear running time in the number of local projectors, therefore achieving the best possible complexity.

preprint2016arXiv

Semiquantum key distribution with secure delegated quantum computation

Semiquantum key distribution allows a quantum party to share a random key with a "classical" party who only can prepare and measure qubits in the computational basis or reorder some qubits when he has access to a quantum channel. In this work, we present a protocol where a secret key can be established between a quantum user and an almost classical user who only needs the quantum ability to access quantum channels, by securely delegating quantum computation to a quantum server. We show the proposed protocol is robust even when the delegated quantum server is a powerful adversary, and is experimentally feasible with current technology. As one party of our protocol is the most quantum-resource efficient, it can be more practical and significantly widen the applicability scope of quantum key distribution.

preprint2016arXiv

Sensitivity Conjecture and Log-rank Conjecture for functions with small alternating numbers

The Sensitivity Conjecture and the Log-rank Conjecture are among the most important and challenging problems in concrete complexity. Incidentally, the Sensitivity Conjecture is known to hold for monotone functions, and so is the Log-rank Conjecture for $f(x \wedge y)$ and $f(x\oplus y)$ with monotone functions $f$, where $\wedge$ and $\oplus$ are bit-wise AND and XOR, respectively. In this paper, we extend these results to functions $f$ which alternate values for a relatively small number of times on any monotone path from $0^n$ to $1^n$. These deepen our understandings of the two conjectures, and contribute to the recent line of research on functions with small alternating numbers.

preprint2015arXiv

Fourier Sparsity of GF(2) Polynomials

We study a conjecture called "linear rank conjecture" recently raised in (Tsang et al., FOCS'13), which asserts that if many linear constraints are required to lower the degree of a GF(2) polynomial, then the Fourier sparsity (i.e. number of non-zero Fourier coefficients) of the polynomial must be large. We notice that the conjecture implies a surprising phenomenon that if the highest degree monomials of a GF(2) polynomial satisfy a certain condition, then the Fourier sparsity of the polynomial is large regardless of the monomials of lower degrees -- whose number is generally much larger than that of the highest degree monomials. We develop a new technique for proving lower bound on the Fourier sparsity of GF(2) polynomials, and apply it to certain special classes of polynomials to showcase the above phenomenon.

preprint2015arXiv

Nonlocality and conflicting interest games

Nonlocality enables two parties to win specific games with probabilities strictly higher than allowed by any classical theory. Nevertheless, all known such examples consider games where the two parties have a common interest, since they jointly win or lose the game. The main question we ask here is whether the nonlocal feature of quantum mechanics can offer an advantage in a scenario where the two parties have conflicting interests. We answer this in the affirmative by presenting a simple conflicting interest game, where quantum strategies outperform classical ones. Moreover, we show that our game has a fair quantum equilibrium with higher payoffs for both players than in any fair classical equilibrium. Finally, we play the game using a commercial entangled photon source and demonstrate experimentally the quantum advantage.

preprint2015arXiv

Quantum game players can have advantage without discord

The last two decades have witnessed a rapid development of quantum information processing, a new paradigm which studies the power and limit of "quantum advantages" in various information processing tasks. Problems such as when quantum advantage exists, and if existing, how much it could be, are at a central position of these studies. In a broad class of scenarios, there are, implicitly or explicitly, at least two parties involved, who share a state, and the correlation in this shared state is the key factor to the efficiency under concern. In these scenarios, the shared \emph{entanglement} or \emph{discord} is usually what accounts for quantum advantage. In this paper, we examine a fundamental problem of this nature from the perspective of game theory, a branch of applied mathematics studying selfish behaviors of two or more players. We exhibit a natural zero-sum game, in which the chance for any player to win the game depends only on the ending correlation. We show that in a certain classical equilibrium, a situation in which no player can further increase her payoff by any local classical operation, whoever first uses a quantum computer has a big advantage over its classical opponent. The equilibrium is fair to both players and, as a shared correlation, it does not contain any discord, yet a quantum advantage still exists. This indicates that at least in game theory, the previous notion of discord as a measure of non-classical correlation needs to be reexamined, when there are two players with different objectives.

preprint2014arXiv

Multipartite Quantum Correlation and Communication Complexities

The concepts of quantum correlation complexity and quantum communication complexity were recently proposed to quantify the minimum amount of resources needed in generating bipartite classical or quantum states in the single-shot setting. The former is the minimum size of the initially shared state $σ$ on which local operations by the two parties (without communication) can generate the target state $ρ$, and the latter is the minimum amount of communication needed when initially sharing nothing. In this paper, we generalize these two concepts to multipartite cases, for both exact and approximate state generation. Our results are summarized as follows. (1) For multipartite pure states, the correlation complexity can be completely characterized by local ranks of sybsystems. (2) We extend the notion of PSD-rank of matrices to that of tensors, and use it to bound the quantum correlation complexity for generating multipartite classical distributions. (3) For generating multipartite mixed quantum states, communication complexity is not always equal to correlation complexity (as opposed to bipartite case). But they differ by at most a factor of 2. Generating a multipartite mixed quantum state has the same communication complexity as generating its optimal purification. But for correlation complexity of these two tasks can be different (though still related by less than a factor of 2). (4) To generate a bipartite classical distribution $P(x,y)$ approximately, the quantum communication complexity is completely characterized by the approximate PSD-rank of $P$. The quantum correlation complexity of approximately generating multipartite pure states is bounded by approximate local ranks.

preprint2013arXiv

Efficient quantum protocols for XOR functions

We show that for any Boolean function f on {0,1}^n, the bounded-error quantum communication complexity of XOR functions $f\circ \oplus$ satisfies that $Q_ε(f\circ \oplus) = O(2^d (\log\|\hat f\|_{1,ε} + \log \frac{n}ε) \log(1/ε))$, where d is the F2-degree of f, and $\|\hat f\|_{1,ε} = \min_{g:\|f-g\|_\infty \leq ε} \|\hat f\|_1$. This implies that the previous lower bound $Q_ε(f\circ \oplus) = Ω(\log\|\hat f\|_{1,ε})$ by Lee and Shraibman \cite{LS09} is tight for f with low F2-degree. The result also confirms the quantum version of the Log-rank Conjecture for low-degree XOR functions. In addition, we show that the exact quantum communication complexity satisfies $Q_E(f) = O(2^d \log \|\hat f\|_0)$, where $\|\hat f\|_0$ is the number of nonzero Fourier coefficients of f. This matches the previous lower bound $Q_E(f(x,y)) = Ω(\log rank(M_f))$ by Buhrman and de Wolf \cite{BdW01} for low-degree XOR functions.

preprint2013arXiv

Fourier sparsity, spectral norm, and the Log-rank conjecture

We study Boolean functions with sparse Fourier coefficients or small spectral norm, and show their applications to the Log-rank Conjecture for XOR functions f(x\oplus y) --- a fairly large class of functions including well studied ones such as Equality and Hamming Distance. The rank of the communication matrix M_f for such functions is exactly the Fourier sparsity of f. Let d be the F2-degree of f and D^CC(f) stand for the deterministic communication complexity for f(x\oplus y). We show that 1. D^CC(f) = O(2^{d^2/2} log^{d-2} ||\hat f||_1). In particular, the Log-rank conjecture holds for XOR functions with constant F2-degree. 2. D^CC(f) = O(d ||\hat f||_1) = O(\sqrt{rank(M_f)}\logrank(M_f)). We obtain our results through a degree-reduction protocol based on a variant of polynomial rank, and actually conjecture that its communication cost is already \log^{O(1)}rank(M_f). The above bounds also hold for the parity decision tree complexity of f, a measure that is no less than the communication complexity (up to a factor of 2). Along the way we also show several structural results about Boolean functions with small F2-degree or small spectral norm, which could be of independent interest. For functions f with constant F2-degree: 1) f can be written as the summation of quasi-polynomially many indicator functions of subspaces with \pm-signs, improving the previous doubly exponential upper bound by Green and Sanders; 2) being sparse in Fourier domain is polynomially equivalent to having a small parity decision tree complexity; 3) f depends only on polylog||\hat f||_1 linear functions of input variables. For functions f with small spectral norm: 1) there is an affine subspace with co-dimension O(||\hat f||_1) on which f is a constant; 2) there is a parity decision tree with depth O(||\hat f||_1 log ||\hat f||_0).

preprint2013arXiv

On the Complexity of Trial and Error

Motivated by certain applications from physics, biochemistry, economics, and computer science, in which the objects under investigation are not accessible because of various limitations, we propose a trial-and-error model to examine algorithmic issues in such situations. Given a search problem with a hidden input, we are asked to find a valid solution, to find which we can propose candidate solutions (trials), and use observed violations (errors), to prepare future proposals. In accordance with our motivating applications, we consider the fairly broad class of constraint satisfaction problems, and assume that errors are signaled by a verification oracle in the format of the index of a violated constraint (with the content of the constraint still hidden). Our discoveries are summarized as follows. On one hand, despite the seemingly very little information provided by the verification oracle, efficient algorithms do exist for a number of important problems. For the Nash, Core, Stable Matching, and SAT problems, the unknown-input versions are as hard as the corresponding known-input versions, up to a factor of polynomial. We further give almost tight bounds on the latter two problems' trial complexities. On the other hand, there are problems whose complexities are substantially increased in the unknown-input model. In particular, no time-efficient algorithms exist (under standard hardness assumptions) for Graph Isomorphism and Group Isomorphism problems. The tools used to achieve these results include order theory, strong ellipsoid method, and some non-standard reductions. Our model investigates the value of information, and our results demonstrate that the lack of input information can introduce various levels of extra difficulty. The model exhibits intimate connections with (and we hope can also serve as a useful supplement to) certain existing learning and complexity theories.

preprint2013arXiv

Solving Linear Programming with Constraints Unknown

What is the value of input information in solving linear programming? The celebrated ellipsoid algorithm tells us that the full information of input constraints is not necessary; the algorithm works as long as there exists an oracle that, on a proposed candidate solution, returns a violation in the format of a separating hyperplane. Can linear programming still be efficiently solved if the returned violation is in other formats? We study this question in a trial-and-error framework: there is an oracle that, upon a proposed solution, returns the index of a violated constraint (with the content of the constraint still hidden). When more than one constraint is violated, two variants in the model are investigated. (1) The oracle returns the index of a "most violated" constraint, measured by the Euclidean distance of the proposed solution and the half-spaces defined by the constraints. In this case, the LP can be efficiently solved. (2) The oracle returns the index of an arbitrary (i.e., worst-case) violated constraint. In this case, we give an algorithm with running time exponential in the number of variables. We then show that the exponential dependence on n is unfortunately necessary even for the query complexity. These results put together shed light on the amount of information that one needs in order to solve a linear program efficiently. The proofs of the results employ a variety of geometric techniques, including McMullen's Upper Bound Theorem, the weighted spherical Voronoi diagram, and the furthest Voronoi diagram. In addition, we give an alternative proof to a conjecture of László Fejes Tóth on bounding the number of disconnected components formed by the union of m convex bodies in R^n. Our proof, inspired by the Gauss-Bonnet Theorem in global differential geometry, is independent of the known and reveals more clear insights into the problem and the bound.

preprint2012arXiv

Correlation/Communication complexity of generating bipartite states

We study the correlation complexity (or equivalently, the communication complexity) of generating a bipartite quantum state $ρ$. When $ρ$ is a pure state, we completely characterize the complexity for approximately generating $ρ$ by a corresponding approximate rank, closing a gap left in Ambainis, Schulman, Ta-Shma, Vazirani and Wigderson (SIAM Journal on Computing, 32(6):1570-1585, 2003). When $ρ$ is a classical distribution $P(x,y)$, we tightly characterize the complexity of generating $P$ by the psd-rank, a measure recently proposed by Fiorini, Massar, Pokutta, Tiwary and de Wolf (STOC 2012). We also present a characterization of the complexity of generating a general quantum state $ρ$.

preprint2011arXiv

A quantum protocol for sampling correlated equilibria unconditionally and without a mediator

A correlated equilibrium is a fundamental solution concept in game theory that enjoys many desirable properties. However, it requires a trusted mediator, which is a major drawback in many practical applications. A computational solution to this problem was proposed by Dodis, Halevi and Rabin. They extended the original game by adding an initial communication stage and showed that any correlated strategy for 2-player games can be achieved, provided that the players are computationally bounded. In this paper, we show that if the players can communicate via a quantum channel before the game, then any correlated equilibrium for 2-player games can be achieved, without a trusted mediator and unconditionally. This provides another example of a major advantage of quantum information processing. More precisely, we prove that for any correlated equilibrium p of a strategic game G, there exists an extended game (with a quantum communication initial stage) Q with an efficiently computable approximate Nash equilibrium q, such that the expected payoff for both players in q is at least as high as in p. The main cryptographic tool used in the construction is the quantum weak coin flipping protocol of Mochon.

preprint2011arXiv

On characterizing quantum correlated equilibria

Quantum game theory lays a foundation for understanding the interaction of people using quantum computers with conflicting interests. Recently Zhang proposed a simple yet rich model to study quantum strategic games, and addressed some quantitative questions for general games of growing sizes \cite{Zha10}. However, one fundamental question that the paper did not consider is the characterization of quantum correlated equilibria (QCE). In this paper, we answer this question by giving a sufficient and necessary condition for an arbitrary state $ρ$ being a QCE. In addition, when the condition fails to hold for some player $i$, we give an explicit POVM for that player to achieve a strictly positive gain. Finally, we give some upper bounds for the maximum gain by playing quantum strategies over classical ones, and the bounds are tight for some games.

preprint2011arXiv

On the power of a unique quantum witness

In a celebrated paper, Valiant and Vazirani raised the question of whether the difficulty of NP-complete problems was due to the wide variation of the number of witnesses of their instances. They gave a strong negative answer by showing that distinguishing between instances having zero or one witnesses is as hard as recognizing NP, under randomized reductions. We consider the same question in the quantum setting and investigate the possibility of reducing quantum witnesses in the context of the complexity class QMA, the quantum analogue of NP. The natural way to quantify the number of quantum witnesses is the dimension of the witness subspace W in some appropriate Hilbert space H. We present an efficient deterministic procedure that reduces any problem where the dimension d of W is bounded by a polynomial to a problem with a unique quantum witness. The main idea of our reduction is to consider the Alternating subspace of the d-th tensor power of H. Indeed, the intersection of this subspace with the d-th tensor power of W is one-dimensional, and therefore can play the role of the unique quantum witness.

preprint2011arXiv

Quantum Strategic Game Theory

We propose a simple yet rich model to extend the notions of Nash equilibria and correlated equilibria of strategic games to the quantum setting, in which we then study the relations between classical and quantum equilibria. Unlike the previous work that focus on qualitative questions on specific games of small sizes, we address the following fundamental and quantitative question for general games: How much "advantage" can playing quantum strategies provide, if any? Two measures of the advantage are studied, summarized as follows. 1. A natural measure is the increase of payoff. We consider natural mappings between classical and quantum states, and study how well those mappings preserve the equilibrium properties. Among other results, we exhibit correlated equilibrium $p$ whose quantum superposition counterpart $\sum_s \sqrt{p(s)}\ket{s}$ is far from being a quantum correlated equilibrium; actually a player can increase her payoff from almost 0 to almost 1 in a [0,1]-normalized game. We achieve this by a tensor product construction on carefully designed base cases. 2. For studying the hardness of generating correlated equilibria, we propose to examine \emph{correlation complexity}, a new complexity measure for correlation generation. We show that there are $n$-bit correlated equilibria which can be generated by only one EPR pair followed by local operation (without communication), but need at least $\log(n)$ classical shared random bits plus communication. The randomized lower bound can be improved to $n$, the best possible, assuming (even a much weaker version of) a recent conjecture in linear algebra. We believe that the correlation complexity, as a complexity-theoretical counterpart of the celebrated Bell's inequality, has independent interest in both physics and computational complexity theory and deserves more explorations.

preprint2011arXiv

The influence lower bound via query elimination

We give a simpler proof, via query elimination, of a result due to O'Donnell, Saks, Schramm and Servedio, which shows a lower bound on the zero-error randomized query complexity of a function f in terms of the maximum influence of any variable of f. Our lower bound also applies to the two-sided error distributional query complexity of f, and it allows an immediate extension which can be used to prove stronger lower bounds for some functions.

preprint2011arXiv

Tight bounds on the randomized communication complexity of symmetric XOR functions in one-way and SMP models

We study the communication complexity of symmetric XOR functions, namely functions $f: \{0,1\}^n \times \{0,1\}^n \rightarrow \{0,1\}$ that can be formulated as $f(x,y)=D(|x\oplus y|)$ for some predicate $D: \{0,1,...,n\} \rightarrow \{0,1\}$, where $|x\oplus y|$ is the Hamming weight of the bitwise XOR of $x$ and $y$. We give a public-coin randomized protocol in the Simultaneous Message Passing (SMP) model, with the communication cost matching the known lower bound for the \emph{quantum} and \emph{two-way} model up to a logarithm factor. As a corollary, this closes a quadratic gap between quantum lower bound and randomized upper bound for the one-way model, answering an open question raised in Shi and Zhang \cite{SZ09}.

preprint2010arXiv

Composition theorems in communication complexity

A well-studied class of functions in communication complexity are composed functions of the form $(f \comp g^n)(x,y)=f(g(x^1, y^1),..., g(x^n,y^n))$. This is a rich family of functions which encompasses many of the important examples in the literature. It is thus of great interest to understand what properties of $f$ and $g$ affect the communication complexity of $(f \comp g^n)$, and in what way. Recently, Sherstov \cite{She09b} and independently Shi-Zhu \cite{SZ09b} developed conditions on the inner function $g$ which imply that the quantum communication complexity of $f \comp g^n$ is at least the approximate polynomial degree of $f$. We generalize both of these frameworks. We show that the pattern matrix framework of Sherstov works whenever the inner function $g$ is {\em strongly balanced}---we say that $g: X \times Y \to \{-1,+1\}$ is strongly balanced if all rows and columns in the matrix $M_g=[g(x,y)]_{x,y}$ sum to zero. This result strictly generalizes the pattern matrix framework of Sherstov \cite{She09b}, which has been a very useful idea in a variety of settings \cite{She08b,RS08,Cha07,LS09,CA08,BHN09}. Shi-Zhu require that the inner function $g$ has small {\em spectral discrepancy}, a somewhat awkward condition to verify. We relax this to the usual notion of discrepancy. We also enhance the framework of composed functions studied so far by considering functions $F(x,y) = f(g(x,y))$, where the range of $g$ is a group $G$. When $G$ is Abelian, the analogue of the strongly balanced condition becomes a simple group invariance property of $g$. We are able to formulate a general lower bound on $F$ whenever $g$ satisfies this property.

preprint2007arXiv

Every NAND formula of size N can be evaluated in time N^{1/2+o(1)} on a quantum computer

For every NAND formula of size N, there is a bounded-error N^{1/2+o(1)}-time quantum algorithm, based on a coined quantum walk, that evaluates this formula on a black-box input. Balanced, or ``approximately balanced,'' NAND formulas can be evaluated in O(sqrt{N}) queries, which is optimal. It follows that the (2-o(1))-th power of the quantum query complexity is a lower bound on the formula size, almost solving in the positive an open problem posed by Laplante, Lee and Szegedy.

preprint2006arXiv

The Communication Complexity of the Hamming Distance Problem

We investigate the randomized and quantum communication complexity of the Hamming Distance problem, which is to determine if the Hamming distance between two n-bit strings is no less than a threshold d. We prove a quantum lower bound of Ω(d) qubits in the general interactive model with shared prior entanglement. We also construct a classical protocol of O(d \log d) bits in the restricted Simultaneous Message Passing model, improving previous protocols of O(d^2) bits (A. C.-C. Yao, Proceedings of the Thirty-Fifth Annual ACM Symposium on Theory of Computing, pp. 77-81, 2003), and O(d\log n) bits (D. Gavinsky, J. Kempe, and R. de Wolf, quant-ph/0411051, 2004).

Shengyu Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

50 published item(s)

Boxed UC plane partitions and the two-site generalized phase model

CORE: Code-based Inverse Self-Training Framework with Graph Expansion for Virtual Agents

QAOA-MaxCut has barren plateaus for almost all graphs

Adaptive Double-Exploration Tradeoff for Outlier Detection

BoostMIS: Boosting Medical Image Semi-supervised Learning with Adaptive Pseudo Labeling and Informative Active Annotation

BOSS: Bottom-up Cross-modal Semantic Composition with Hybrid Counterfactual Training for Robust Content-based Image Retrieval

CCL4Rec: Contrast over Contrastive Learning for Micro-video Recommendation

Contextual Combinatorial Conservative Bandits

Contrastive Learning with Positive-Negative Frame Mask for Music Representation

Dilated Context Integrated Network with Cross-Modal Consensus for Temporal Emotion Localization in Videos

Edge-Cloud Polarization and Collaboration: A Comprehensive Survey for AI

End-to-End Modeling via Information Tree for One-Shot Natural Language Spatial Video Grounding

HERO: HiErarchical spatio-tempoRal reasOning with Contrastive Action Correspondence for End-to-End Video Object Grounding

Intelligent Request Strategy Design in Recommender System

MAGIC: Multimodal relAtional Graph adversarIal inferenCe for Diverse and Unpaired Text-based Image Captioning

MIC: Model-agnostic Integrated Cross-channel Recommenders

Optimizing Quantum Annealing Schedules with Monte Carlo Tree Search enhanced with neural networks

Personalizing Intervened Network for Long-tailed Sequential User Behavior Modeling

Retroformer: Pushing the Limits of Interpretable End-to-end Retrosynthesis Transformer

SPLDExtraTrees: Robust machine learning approach for predicting kinase inhibitor resistance

Suppressing ZZ Crosstalk of Quantum Computers through Pulse and Scheduling Co-Optimization

The prospects of Monte Carlo antibody loop modelling on a fault-tolerant quantum computer

Shortcuts to Adiabaticity for Open Systems in Circuit Quantum Electrodynamics

Variational Quantum-Neural Hybrid Eigensolver

Comprehensive Information Integration Modeling Framework for Video Titling

DeepOPF: A Deep Neural Network Approach for Security-Constrained DC Optimal Power Flow

Grounded and Controllable Image Completion by Incorporating Lexical Semantics

Poet: Product-oriented Video Captioner for E-commerce

Quantum algorithms for graph problems with cut queries

Linear time algorithm for quantum 2SAT

Semiquantum key distribution with secure delegated quantum computation

Sensitivity Conjecture and Log-rank Conjecture for functions with small alternating numbers

Fourier Sparsity of GF(2) Polynomials

Nonlocality and conflicting interest games

Quantum game players can have advantage without discord

Multipartite Quantum Correlation and Communication Complexities

Efficient quantum protocols for XOR functions

Fourier sparsity, spectral norm, and the Log-rank conjecture

On the Complexity of Trial and Error

Solving Linear Programming with Constraints Unknown

Correlation/Communication complexity of generating bipartite states

A quantum protocol for sampling correlated equilibria unconditionally and without a mediator

On characterizing quantum correlated equilibria

On the power of a unique quantum witness

Quantum Strategic Game Theory

The influence lower bound via query elimination

Tight bounds on the randomized communication complexity of symmetric XOR functions in one-way and SMP models

Composition theorems in communication complexity

Every NAND formula of size N can be evaluated in time N^{1/2+o(1)} on a quantum computer

The Communication Complexity of the Hamming Distance Problem