Source author record

Ming Li

Ming Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

quant-ph Machine Learning eess.AS eess.SP Sound Computer Vision physics.optics Cryptography and Security Information Theory math.IT Artificial Intelligence physics.atom-ph physics.soc-ph astro-ph.CO cond-mat.stat-mech Computation and Language eess.IV nucl-th Biological Physics hep-ph Information Retrieval math.DG physics.app-ph physics.chem-ph physics.data-an Biomolecules cond-mat.mtrl-sci Multiagent Systems Networking and Internet Architecture physics.ins-det Quantitative Methods Social and Information Networks Subcellular Processes Computational Complexity cond-mat.dis-nn cond-mat.mes-hall cond-mat.quant-gas Data Structures and Algorithms math.CV math.DS nlin.AO Populations and Evolution astro-ph.GA Computational Engineering, Finance, and Science Computational Geometry cond-mat.soft cond-mat.supr-con Databases eess.SY Genomics hep-ex hep-th math.AP math.CO math.MG math.NA math.PR math.ST Methodology Molecular Networks Neural and Evolutionary Computing Numerical Analysis Robotics Software Engineering Statistics Theory Systems and Control

Catalog footprint

What is connected

209works

66topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

HeterSEED: Semantics-Structure Decoupling for Heterogeneous Graph Learning under Heterophily

Many real-world heterogeneous graphs exhibit pronounced heterophily, where connected nodes often have dissimilar labels or play different semantic roles. In such settings, standard heterogeneous graph neural networks that aggregate messages along metapaths or meta-relations primarily based on feature similarity can propagate misleading information, since feature similarity may be misaligned with underlying relational semantics. In this paper, we propose HeterSEED, a semantics-structure decoupling framework for heterogeneous graph learning under heterophily. HeterSEED decouples representation learning into a heterogeneous semantic channel that captures type- and relation-aware local semantics and a structure-aware heterophily channel that separates homophilic and heterophilic neighborhoods via pseudo-label-guided partitioning and aggregates them using metapath-based structural weights. A node-level adaptive fusion mechanism then combines the two channels to produce context-dependent node representations. Theoretically, we establish that, on heterogeneous graphs under heterophily, HeterSEED is strictly more expressive than standard heterogeneous graph neural networks that rely primarily on feature similarity and provably reduces the prediction bias introduced by heterophilic neighbors. Experiments on five real-world heterogeneous graphs, including two large-scale networks at the million-node and hundred-million-edge scale, demonstrate that HeterSEED consistently outperforms representative heterogeneous graph neural networks and recent heterophily-aware baselines, especially in strongly heterophilic regimes.

preprint2026arXiv

The Bystander Effect in Multi-Agent Reasoning: Quantifying Cognitive Loafing in Collaborative Interactions

Multi-agent systems (MAS) assume that collaborating inherently improves Large Language Model (LLM) reasoning. We challenge this by demonstrating that simulated social pressure triggers an algorithmic ``Bystander Effect,'' inducing severe cognitive loafing. By evaluating 22,500 deterministic trajectories across 3 dataset contexts (GAIA, SWE-bench, Multi-Challenge) with 3 state-of-the-art (SOTA) models, we semantically audit internal reasoning traces against external outputs. We formalize the \textit{Interaction Depth Limit} ($D_L$), the exact plurality threshold where an agent's logical sovereignty collapses into social compliance. Crucially, we uncover the \textit{Sovereignty Gap}: models frequently compute the correct derivation internally but suffer ``Alignment Hallucinations'' -- actively subjugating empirical evidence to sycophantically appease a simulated swarm. We prove that multi-agent social load is strictly non-commutative; the "brand" identity of the ``Lead Anchor'' auditor disproportionately dictates the swarm's integrity. These findings expose architectural vulnerabilities, proving that unstructured multi-agent topologies can degrade independent reasoning.

preprint2026arXiv

The Inverse-Wisdom Law: Architectural Tribalism and the Consensus Paradox in Agentic Swarms

As AI transitions toward multi-agent systems (MAS) to solve complex workflows, research paradigms operate on the axiomatic assumption that agent collaboration mirrors the "Wisdom of the Crowd". We challenge this assumption by formalizing the Consensus Paradox: a phenomenon where agentic swarms prioritize internal architectural agreement over external logical truth. Through a 36 experiments encompassing 12,804 trajectories across three state-of-the-art (SOTA) benchmarks (GAIA, Multi-Challenge, and SWE-bench), we prove the Inverse-Wisdom Law: in kinship-dominant swarms, adding logical agents increases the stability of erroneous trajectories rather than the probability of truth. The introduction of additional logical audits converges the system toward a Logic Saturation where internal entropy hits zero while factual error hits unity. By evaluating the interaction between the 3 preeminent SOTA models (Gemini 3.1 Pro, Claude Sonnet 4.6, and GPT-5.4), we establish the Architectural Tribalism Asymmetry as a mechanistic law of transformer weights. We demonstrate that terminal swarm integrity is strictly gated by the synthesizer's receptive logic, rather than aggregate agent quality. We define the Tribalism Coefficient and the Sycophantic Weight as the primary mechanistic determinants of swarm failure. Finally, we establish the Heterogeneity Mandate as a foundational safety requirement for resilient agentic architectures.

preprint2025arXiv

MaRCA: Multi-Agent Reinforcement Learning for Dynamic Computation Allocation in Large-Scale Recommender Systems

Modern recommender systems face significant computational challenges due to growing model complexity and traffic scale, making efficient computation allocation critical for maximizing business revenue. Existing approaches typically simplify multi-stage computation resource allocation, neglecting inter-stage dependencies, thus limiting global optimality. In this paper, we propose MaRCA, a multi-agent reinforcement learning framework for end-to-end computation resource allocation in large-scale recommender systems. MaRCA models the stages of a recommender system as cooperative agents, using Centralized Training with Decentralized Execution (CTDE) to optimize revenue under computation resource constraints. We introduce an AutoBucket TestBench for accurate computation cost estimation, and a Model Predictive Control (MPC)-based Revenue-Cost Balancer to proactively forecast traffic loads and adjust the revenue-cost trade-off accordingly. Since its end-to-end deployment in the advertising pipeline of a leading global e-commerce platform in November 2024, MaRCA has consistently handled hundreds of billions of ad requests per day and has delivered a 16.67% revenue uplift using existing computation resources.

preprint2024arXiv

A First-Principle Approach to X-ray Active Optics: Design and Verification

This paper presents the first-principle design approach for X-ray active optics, using the simulation-modulation cycle in place of the measurement-modulation feedback loops used in traditional active optics. Hence, the new active optics have the potential to outperform the accuracy of surface-shape metrology instruments. We apply an X-ray mirror with localized thermal elastic deformation to validate the idea. Both the finite element simulations and surface shape measurements have demonstrated that the active optics modulation accuracy limit can be achieved at the atomic layer level. It is believed that the implementation of the first-principle design strategy has the capacity to revolutionize both the manufacturing processes of X-ray mirrors and the beamline engineering of synchrotron radiation.

preprint2024arXiv

A Practical Beamforming Design for Active RIS-assisted MU-MISO Systems

Reconfigurable Intelligent Surfaces (RIS) have been proposed as a revolutionary technology with the potential to address several critical requirements of 6G communication systems. Despite its powerful ability for radio environment reconfiguration, the ``double fading'' effect constricts the practical system performance enhancements due to the significant path loss. A new active RIS architecture has been recently proposed to overcome this challenge. However, existing active RIS studies rely on an ideal amplification model without considering the practical hardware limitation of amplifiers, which may cause performance degradation using such inaccurate active RIS modeling. Motivated by this fact, in this paper we first investigate the amplification principle of typical active RIS and propose a more accurate amplification model based on amplifier hardware characteristics. Then, based on the new amplification model, we propose a novel joint transmit beamforming and RIS reflection beamforming design considering the incident signal power on practical active RIS for multiuser multi-input single-output (MU-MISO) communication system. Fractional programming (FP), majorization minimization (MM) and block coordinate descent (BCD) methods are used to solve for the complex problem. Simulation results indicate the importance of the consideration of practical amplifier hardware characteristics in the joint beamforming designs and demonstrate the effectiveness of the proposed algorithm compared to other benchmarks.

preprint2024arXiv

BiSinger: Bilingual Singing Voice Synthesis

Although Singing Voice Synthesis (SVS) has made great strides with Text-to-Speech (TTS) techniques, multilingual singing voice modeling remains relatively unexplored. This paper presents BiSinger, a bilingual pop SVS system for English and Chinese Mandarin. Current systems require separate models per language and cannot accurately represent both Chinese and English, hindering code-switch SVS. To address this gap, we design a shared representation between Chinese and English singing voices, achieved by using the CMU dictionary with mapping rules. We fuse monolingual singing datasets with open-source singing voice conversion techniques to generate bilingual singing voices while also exploring the potential use of bilingual speech data. Experiments affirm that our language-independent representation and incorporation of related datasets enable a single model with enhanced performance in English and code-switch SVS while maintaining Chinese song performance. Audio samples are available at https://bisinger-svs.github.io.

preprint2024arXiv

Cooperative Cell-Free ISAC Networks: Joint BS Mode Selection and Beamforming Design

Owing to the promising ability of saving hardware cost and spectrum resources, integrated sensing and communication (ISAC) is regarded as a revolutionary technology for future sixth-generation (6G) networks. The mono-static ISAC systems considered in most of existing works can only achieve limited sensing performance due to the single observation angle and easily blocked transmission links, which motivates researchers to investigate cooperative ISAC networks. In order to further improve the degrees of freedom (DoFs) of cooperative ISAC networks, the transmitter-receiver selection, i.e., base station (BS) mode selection problem, is meaningful to be studied. However, to our best knowledge, this crucial problem has not been extensively studied in existing works. In this paper, we consider the joint BS mode selection, transmit beamforming, and receive filter designs for cooperative cell-free ISAC networks, where multi-BSs cooperatively serve communication users and detect targets. An efficient joint beamforming design algorithm and three different heuristic BS mode selection methods are proposed to solve the non-convex NP-hard problem. Simulation results demonstrates the advantages of cooperative ISAC networks, the importance of BS mode selection, and the effectiveness of proposed algorithms.

preprint2024arXiv

Quadrotor Stabilization with Safety Guarantees: A Universal Formula Approach

Safe stabilization is a significant challenge for quadrotors, which involves reaching a goal position while avoiding obstacles. Most of the existing solutions for this problem rely on optimization-based methods, demanding substantial onboard computational resources. This paper introduces a novel approach to address this issue and provides a solution that offers fast computational capabilities tailored for onboard execution. Drawing inspiration from Sontag's universal formula, we propose an analytical control strategy that incorporates the conditions of control Lyapunov functions (CLFs) and control barrier functions (CBFs), effectively avoiding the need for solving optimization problems onboard. Moreover, we extend our approach by incorporating the concepts of input-to-state stability (ISS) and input-to-state safety (ISSf), enhancing the universal formula's capacity to effectively manage disturbances. Furthermore, we present a projection-based approach to ensure that the universal formula remains effective even when faced with control input constraints. The basic idea of this approach is to project the control input derived from the universal formula onto the closest point within the control input domain. Through comprehensive simulations and experimental results, we validate the efficacy and highlight the advantages of our methodology.

preprint2023arXiv

A Theory of Human-Like Few-Shot Learning

We aim to bridge the gap between our common-sense few-sample human learning and large-data machine learning. We derive a theory of human-like few-shot learning from von-Neuman-Landauer's principle. modelling human learning is difficult as how people learn varies from one to another. Under commonly accepted definitions, we prove that all human or animal few-shot learning, and major models including Free Energy Principle and Bayesian Program Learning that model such learning, approximate our theory, under Church-Turing thesis. We find that deep generative model like variational autoencoder (VAE) can be used to approximate our theory and perform significantly better than baseline models including deep neural networks, for image recognition, low resource language processing, and character recognition.

preprint2023arXiv

Deep Learning-Based UAV Aerial Triangulation without Image Control Points

The emerging drone aerial survey has the advantages of low cost, high efficiency, and flexible use. However, UAVs are often equipped with cheap POS systems and non-measurement cameras, and their flight attitudes are easily affected. How to realize the large-scale mapping of UAV image-free control supported by POS faces many technical problems. The most basic and important core technology is how to accurately realize the absolute orientation of images through advanced aerial triangulation technology. In traditional aerial triangulation, image matching algorithms are constrained to varying degrees by preset prior knowledge. In recent years, deep learning has developed rapidly in the field of photogrammetric computer vision. It has surpassed the performance of traditional handcrafted features in many aspects. It has shown stronger stability in image-based navigation and positioning tasks, especially it has better resistance to unfavorable factors such as blur, illumination changes, and geometric distortion. Based on the introduction of the key technologies of aerial triangulation without image control points, this paper proposes a new drone image registration method based on deep learning image features to solve the problem of high mismatch rate in traditional methods. It adopts SuperPoint as the feature detector, uses the superior generalization performance of CNN to extract precise feature points from the UAV image, thereby achieving high-precision aerial triangulation. Experimental results show that under the same pre-processing and post-processing conditions, compared with the traditional method based on the SIFT algorithm, this method achieves suitable precision more efficiently, which can meet the requirements of UAV aerial triangulation without image control points in large-scale surveys.

preprint2023arXiv

Discriminator-Guided Model-Based Offline Imitation Learning

Offline imitation learning (IL) is a powerful method to solve decision-making problems from expert demonstrations without reward labels. Existing offline IL methods suffer from severe performance degeneration under limited expert data. Including a learned dynamics model can potentially improve the state-action space coverage of expert data, however, it also faces challenging issues like model approximation/generalization errors and suboptimality of rollout data. In this paper, we propose the Discriminator-guided Model-based offline Imitation Learning (DMIL) framework, which introduces a discriminator to simultaneously distinguish the dynamics correctness and suboptimality of model rollout data against real expert demonstrations. DMIL adopts a novel cooperative-yet-adversarial learning strategy, which uses the discriminator to guide and couple the learning process of the policy and dynamics model, resulting in improved model performance and robustness. Our framework can also be extended to the case when demonstrations contain a large proportion of suboptimal data. Experimental results show that DMIL and its extension achieve superior performance and robustness compared to state-of-the-art offline IL methods under small datasets.

preprint2023arXiv

Self-supervised Geometric Features Discovery via Interpretable Attention for Vehicle Re-Identification and Beyond

To learn distinguishable patterns, most of recent works in vehicle re-identification (ReID) struggled to redevelop official benchmarks to provide various supervisions, which requires prohibitive human labors. In this paper, we seek to achieve the similar goal but do not involve more human efforts. To this end, we introduce a novel framework, which successfully encodes both geometric local features and global representations to distinguish vehicle instances, optimized only by the supervision from official ID labels. Specifically, given our insight that objects in ReID share similar geometric characteristics, we propose to borrow self-supervised representation learning to facilitate geometric features discovery. To condense these features, we introduce an interpretable attention module, with the core of local maxima aggregation instead of fully automatic learning, whose mechanism is completely understandable and whose response map is physically reasonable. To the best of our knowledge, we are the first that perform self-supervised learning to discover geometric features. We conduct comprehensive experiments on three most popular datasets for vehicle ReID, i.e., VeRi-776, CityFlow-ReID, and VehicleID. We report our state-of-the-art (SOTA) performances and promising visualization results. We also show the excellent scalability of our approach on other ReID related tasks, i.e., person ReID and multi-target multi-camera (MTMC) vehicle tracking.

preprint2023arXiv

Sparsity Exploitation via Joint Receive Processing and Transmit Beamforming Design for MIMO-OFDM ISAC Systems

Integrated sensing and communication (ISAC) is widely recognized as a pivotal enabling technique for the advancement of future wireless networks. This paper aims to efficiently exploit the inherent sparsity of echo signals for the multi-input-multi-output (MIMO) orthogonal frequency division multiplexing (OFDM) based ISAC system. A novel joint receive echo processing and transmit beamforming design is presented to achieve this goal. Specifically, we first propose a compressive sensing (CS)-assisted estimation approach to facilitate ISAC receive echo processing, which can not only enable accurate recovery of target information, but also allow substantial reduction in the number of sensing subcarriers to be sampled and processed. Then, based on the proposed CS-assisted processing method, the associated transmit beamforming design is formulated with the objective of maximizing the sum-rate of multiuser communications while satisfying the transmit power budget and ensuring the received signal-to-noise ratio (SNR) for the designated sensing subcarriers. In order to address the formulated non-convex problem involving high-dimensional variables, an effective iterative algorithm employing majorization minimization (MM), fractional programming (FP), and the nonlinear equality alternative direction method of multipliers (neADMM) with closed-form solutions has been developed. Finally, extensive numerical simulations are conducted to verify the effectiveness of the proposed algorithm and the superior performance of the introduced sparsity exploitation strategy.

preprint2022arXiv

A proof-of-principle demonstration of quantum microwave photonics

With the rapid development of microwave photonics, which has expanded to numerous applications of commercial importance, eliminating the emerging bottlenecks becomes of vital importance. For example, as the main branch of microwave photonics, radio-over-fiber technology provides high bandwidth, low-loss, and long-distance propagation capability, facilitating wide applications ranging from telecommunication to wireless networks. With ultrashort pulses as the optical carrier, huge capacity is further endowed. However, the wide bandwidth of ultrashort pulses results in the severe vulnerability of high-frequency RF signals to fiber dispersion. With a time-energy entangled biphoton source as the optical carrier and combined with the single-photon detection technique, a quantum microwave photonics method is proposed and demonstrated experimentally. The results show that it not only realizes unprecedented nonlocal RF signal modulation with strong resistance to the dispersion associated with ultrashort pulse carriers but provides an alternative mechanism to effectively distill the RF signal out from the dispersion. Furthermore, the spurious-free dynamic range of both the nonlocally modulated and distilled RF signals has been significantly improved. With the ultra-weak detection and high-speed processing advantages endowed by the low-timing-jitter single-photon detection, the quantum microwave photonics method opens up new possibilities in modern communication and networks.

preprint2022arXiv

A Schur type lemma for the Mean Berwald curvature in Finsler geometry

In this short paper, we study a symmetric covariant tensor in Finsler geometry, which is called the mean Berwald curvature. We first investigate the geometry of the fibres as the submanifolds of the tangent sphere bundle on a Finsler manifold. Then we prove that if the mean Berwald curvature is isotropic along fibres, then the Berwald scalar curvature is constant along fibres.

preprint2022arXiv

About One-point Statistics of the Ratio of Two Fourier-transformed Cosmic Fields and an Application

The Fourier transformation is an effective and efficient operation of Gaussianization at the one-point level. Using a set of N-body simulation data, we verified that the one-point distribution functions of the dark matter momentum divergence and density fields closely follow complex Gaussian distributions. The one-point distribution function of the quotient of two complex Gaussian variables is introduced and studied. Statistical theories are then applied to model one-point statistics about the growth of individual Fourier mode of the dark matter density field, which can be obtained by the ratio of two Fourier transformed cosmic fields. Our simulation results proved that the models based on the Gaussian approximation are impressively accurate, and our analysis revealed many interesting aspects about the growth of dark matter's density fluctuation in Fourier space.

preprint2022arXiv

CaFT: Clustering and Filter on Tokens of Transformer for Weakly Supervised Object Localization

Weakly supervised object localization (WSOL) is a challenging task to localize the object by only category labels. However, there is contradiction between classification and localization because accurate classification network tends to pay attention to discriminative region of objects rather than the entirety. We propose this discrimination is caused by handcraft threshold choosing in CAM-based methods. Therefore, we propose Clustering and Filter of Tokens (CaFT) with Vision Transformer (ViT) backbone to solve this problem in another way. CaFT first sends the patch tokens of the image split to ViT and cluster the output tokens to generate initial mask of the object. Secondly, CaFT considers the initial mask as pseudo labels to train a shallow convolution head (Attention Filter, AtF) following backbone to directly extract the mask from tokens. Then, CaFT splits the image into parts, outputs masks respectively and merges them into one refined mask. Finally, a new AtF is trained on the refined masks and used to predict the box of object. Experiments verify that CaFT outperforms previous work and achieves 97.55\% and 69.86\% localization accuracy with ground-truth class on CUB-200 and ImageNet-1K respectively. CaFT provides a fresh way to think about the WSOL task.

preprint2022arXiv

Collaborative Knowledge Graph Fusion by Exploiting the Open Corpus

To alleviate the challenges of building Knowledge Graphs (KG) from scratch, a more general task is to enrich a KG using triples from an open corpus, where the obtained triples contain noisy entities and relations. It is challenging to enrich a KG with newly harvested triples while maintaining the quality of the knowledge representation. This paper proposes a system to refine a KG using information harvested from an additional corpus. To this end, we formulate our task as two coupled sub-tasks, namely join event extraction (JEE) and knowledge graph fusion (KGF). We then propose a Collaborative Knowledge Graph Fusion Framework to allow our sub-tasks to mutually assist one another in an alternating manner. More concretely, the explorer carries out the JEE supervised by both the ground-truth annotation and an existing KG provided by the supervisor. The supervisor then evaluates the triples extracted by the explorer and enriches the KG with those that are highly ranked. To implement this evaluation, we further propose a Translated Relation Alignment Scoring Mechanism to align and translate the extracted triples to the prior KG. Experiments verify that this collaboration can both improve the performance of the JEE and the KGF.

preprint2022arXiv

Cross-Age Speaker Verification: Learning Age-Invariant Speaker Embeddings

Automatic speaker verification has achieved remarkable progress in recent years. However, there is little research on cross-age speaker verification (CASV) due to insufficient relevant data. In this paper, we mine cross-age test sets based on the VoxCeleb dataset and propose our age-invariant speaker representation(AISR) learning method. Since the VoxCeleb is collected from the YouTube platform, the dataset consists of cross-age data inherently. However, the meta-data does not contain the speaker age label. Therefore, we adopt the face age estimation method to predict the speaker age value from the associated visual data, then label the audio recording with the estimated age. We construct multiple Cross-Age test sets on VoxCeleb (Vox-CA), which deliberately select the positive trials with large age-gap. Also, the effect of nationality and gender is considered in selecting negative pairs to align with Vox-H cases. The baseline system performance drops from 1.939\% EER on the Vox-H test set to 10.419\% on the Vox-CA20 test set, which indicates how difficult the cross-age scenario is. Consequently, we propose an age-decoupling adversarial learning (ADAL) method to alleviate the negative effect of the age gap and reduce intra-class variance. Our method outperforms the baseline system by over 10\% related EER reduction on the Vox-CA20 test set. The source code and trial resources are available on https://github.com/qinxiaoyi/Cross-Age_Speaker_Verification

preprint2022arXiv

Cross-Channel Attention-Based Target Speaker Voice Activity Detection: Experimental Results for M2MeT Challenge

In this paper, we present the speaker diarization system for the Multi-channel Multi-party Meeting Transcription Challenge (M2MeT) from team DKU_DukeECE. As the highly overlapped speech exists in the dataset, we employ an x-vector-based target-speaker voice activity detection (TS-VAD) to find the overlap between speakers. For the single-channel scenario, we separately train a model for each of the 8 channels and fuse the results. We also employ the cross-channel self-attention to further improve the performance, where the non-linear spatial correlations between different channels are learned and fused. Experimental results on the evaluation set show that the single-channel TS-VAD reduces the DER by over 75% from 12.68\% to 3.14%. The multi-channel TS-VAD further reduces the DER by 28% and achieves a DER of 2.26%. Our final submitted system achieves a DER of 2.98% on the AliMeeting test set, which ranks 1st in the M2MET challenge.

preprint2022arXiv

Deep Reinforcement Learning based Joint Active and Passive Beamforming Design for RIS-Assisted MISO Systems

Owing to the unique advantages of low cost and controllability, reconfigurable intelligent surface (RIS) is a promising candidate to address the blockage issue in millimeter wave (mmWave) communication systems, consequently has captured widespread attention in recent years. However, the joint active beamforming and passive beamforming design is an arduous task due to the high computational complexity and the dynamic changes of wireless environment. In this paper, we consider a RIS-assisted multi-user multiple-input single-output (MU-MISO) mmWave system and aim to develop a deep reinforcement learning (DRL) based algorithm to jointly design active hybrid beamformer at the base station (BS) side and passive beamformer at the RIS side. By employing an advanced soft actor-critic (SAC) algorithm, we propose a maximum entropy based DRL algorithm, which can explore more stochastic policies than deterministic policy, to design active analog precoder and passive beamformer simultaneously. Then, the digital precoder is determined by minimum mean square error (MMSE) method. The experimental results demonstrate that our proposed SAC algorithm can achieve better performance compared with conventional optimization algorithm and DRL algorithm.

preprint2022arXiv

Embedding Graphs on Grassmann Manifold

Learning efficient graph representation is the key to favorably addressing downstream tasks on graphs, such as node or graph property prediction. Given the non-Euclidean structural property of graphs, preserving the original graph data's similarity relationship in the embedded space needs specific tools and a similarity metric. This paper develops a new graph representation learning scheme, namely EGG, which embeds approximated second-order graph characteristics into a Grassmann manifold. The proposed strategy leverages graph convolutions to learn hidden representations of the corresponding subspace of the graph, which is then mapped to a Grassmann point of a low dimensional manifold through truncated singular value decomposition (SVD). The established graph embedding approximates denoised correlationship of node attributes, as implemented in the form of a symmetric matrix space for Euclidean calculation. The effectiveness of EGG is demonstrated using both clustering and classification tasks at the node level and graph level. It outperforms baseline models on various benchmarks.

preprint2022arXiv

Explainable COVID-19 Infections Identification and Delineation Using Calibrated Pseudo Labels

The upheaval brought by the arrival of the COVID-19 pandemic has continued to bring fresh challenges over the past two years. During this COVID-19 pandemic, there has been a need for rapid identification of infected patients and specific delineation of infection areas in computed tomography (CT) images. Although deep supervised learning methods have been established quickly, the scarcity of both image-level and pixel-level labels as well as the lack of explainable transparency still hinder the applicability of AI. Can we identify infected patients and delineate the infections with extreme minimal supervision? Semi-supervised learning has demonstrated promising performance under limited labelled data and sufficient unlabelled data. Inspired by semi-supervised learning, we propose a model-agnostic calibrated pseudo-labelling strategy and apply it under a consistency regularization framework to generate explainable identification and delineation results. We demonstrate the effectiveness of our model with the combination of limited labelled data and sufficient unlabelled data or weakly-labelled data. Extensive experiments have shown that our model can efficiently utilize limited labelled data and provide explainable classification and segmentation results for decision-making in clinical routine. The code is available at https://github.com/ayanglab/XAI COVID-19.

preprint2022arXiv

Fully-integrated multipurpose microwave frequency identification system on a single chip

We demonstrate a fully-integrated multipurpose microwave frequency identification system on silicon-on-insulator platform. Thanks to its multipurpose features, the chip is able to identify different types of microwave signals, including single-frequency, multiple-frequency, chirped and frequency-hopping microwave signals, as well as discriminate instantaneous frequency variation among the frequency-modulated signals. This demonstration exhibits fully integrated solution and fully functional microwave frequency identification, which can meet the requirements in reduction of size, weight and power for future advanced microwave photonic processor.

preprint2022arXiv

Generating Adversarial Samples For Training Wake-up Word Detection Systems Against Confusing Words

Wake-up word detection models are widely used in real life, but suffer from severe performance degradation when encountering adversarial samples. In this paper we discuss the concept of confusing words in adversarial samples. Confusing words are commonly encountered, which are various kinds of words that sound similar to the predefined keywords. To enhance the wake word detection system's robustness against confusing words, we propose several methods to generate the adversarial confusing samples for simulating real confusing words scenarios in which we usually do not have any real confusing samples in the training set. The generated samples include concatenated audio, synthesized data, and partially masked keywords. Moreover, we use a domain embedding concatenated system to improve the performance. Experimental results show that the adversarial samples generated in our approach help improve the system's robustness in both the common scenario and the confusing words scenario. In addition, we release the confusing words testing database called HI-MIA-CW for future research.

preprint2022arXiv

Integrated Sensing and Communication with Reconfigurable Intelligent Surfaces: Opportunities, Applications, and Future Directions

Integrated sensing and communication (ISAC) is emerging as a key enabler to address the growing spectrum congestion problem and satisfy increasing demands for ubiquitous sensing and communication. By sharing various resources and information, ISAC achieves much higher spectral, energy, hardware, and economic efficiencies. Concurrently, reconfigurable intelligent surface (RIS) technology has been deemed as a promising approach due to its capability of intelligently manipulating the wireless propagation environment in an energy and hardware efficient manner. In this article, we analyze the potential of deploying RIS to improve communication and sensing performance in ISAC systems. We first describe the fundamentals of RIS and its applications in traditional communication and sensing systems, then introduce the principles of ISAC and overview existing explorations on RIS-assisted ISAC, followed by one case study to verify the advantages of deploying RIS in ISAC systems. Finally, open challenges and research directions are discussed to stimulate this line of research and pave the way for practical applications.

preprint2022arXiv

Invertible Voice Conversion

In this paper, we propose an invertible deep learning framework called INVVC for voice conversion. It is designed against the possible threats that inherently come along with voice conversion systems. Specifically, we develop an invertible framework that makes the source identity traceable. The framework is built on a series of invertible $1\times1$ convolutions and flows consisting of affine coupling layers. We apply the proposed framework to one-to-one voice conversion and many-to-one conversion using parallel training data. Experimental results show that this approach yields impressive performance on voice conversion and, moreover, the converted results can be reversed back to the source inputs utilizing the same parameters as in forwarding.

preprint2022arXiv

IRS-assisted Multi-cell Multi-band Systems: Practical Reflection Model and Joint Beamforming Design

Intelligent reflecting surface (IRS) has been regarded as a promising and revolutionary technology for future wireless communication systems owing to its capability of tailoring signal propagation environment in an energy/spectrum/hardware-efficient manner. However, most existing studies on IRS optimizations are based on a simple and ideal reflection model that is impractical in hardware implementation, which thus leads to severe performance loss in realistic wideband/multi-band systems. To deal with this problem, in this paper we first propose a more practical and more tractable IRS reflection model that describes the difference of reflection responses for signals at different frequencies. Then, we investigate the joint transmit beamforming and IRS reflection beamforming design for an IRS-assisted multi-cell multi-band system. Both power minimization and sum-rate maximization problems are solved by exploiting popular second-order cone programming (SOCP), Riemannian manifold, minimization-majorization (MM), weighted minimum mean square error (WMMSE), and block coordinate descent (BCD) methods. Simulation results illustrate the significant performance improvement of our proposed joint transmit beamforming and reflection design algorithms based on the practical reflection model in terms of power saving and rate enhancement.

preprint2022arXiv

Joint Beamforming Design for Intelligent Omni Surface Assisted Wireless Communication Systems

Intelligent reflecting surface (IRS) has been widely considered as one of the key enabling techniques for future wireless communication networks owing to its ability of dynamically controlling the phase shift of reflected electromagnetic (EM) waves to construct a favorable propagation environment. While IRS only focuses on signal reflection, the recently emerged innovative concept of intelligent omni-surface (IOS) can provide the dual functionality of manipulating reflecting and transmitting signals. Thus, IOS is a new paradigm for achieving ubiquitous wireless communications. In this paper, we consider an IOSassisted multi-user multi-input single-output (MU-MISO) system where the IOS utilizes its reflective and transmissive properties to enhance the MU-MISO transmission. Both power minimization and sum-rate maximization problems are solved by exploiting the second-order cone programming (SOCP), Riemannian manifold, weighted minimum mean square error (WMMSE), and block coordinate descent (BCD) methods. Simulation results verify the advancements of the IOS for wireless systems and illustrate the significant performance improvement of our proposed joint transmit beamforming, reflecting and transmitting phase-shift, and IOS energy division design algorithms. Compared with conventional IRS, IOS can significantly extend the communication coverage, enhance the strength of received signals, and improve the quality of communication links.

preprint2022arXiv

Joint Beamforming Design for RIS-Assisted Integrated Sensing and Communication Systems

Integrated sensing and communication (ISAC) has been envisioned as a promising technology to tackle the spectrum congestion problem for future networks. In this correspondence, we investigate to deploy a reconfigurable intelligent surface (RIS) in an ISAC system for achieving better performance. In particular, a multi-antenna base station (BS) simultaneously serves multiple single-antenna users with the assistance of a RIS and detects potential targets. The active beamforming of the BS and the passive beamforming of the RIS are jointly optimized to maximize the achievable sum-rate of the communication users while satisfying the constraint of beampattern similarity for radar sensing, the restriction of the RIS, and the transmit power budget. An efficient alternating algorithm based on the fractional programming (FP), majorization-minimization (MM), and manifold optimization methods is developed to convert the resulting non-convex optimization problem into two solvable sub-problems and iteratively solve them. Simulation studies illustrate the advancement of deploying RIS in ISAC systems and the effectiveness of the proposed algorithm.

preprint2022arXiv

Joint Beamforming Design in DFRC Systems for Wideband Sensing and OFDM Communications

Dual-function radar-communication (DFRC) systems, which can efficiently utilize the congested spectrum and costly hardware resources by employing one common waveform for both sensing and communication (S&C), have attracted increasing attention. While the orthogonal frequency division multiplexing (OFDM) technique has been widely adopted to support high-quality communications, it also has great potentials of improving radar sensing performance and providing flexible S&C. In this paper, we propose to jointly design the dual-functional transmit signals occupying several subcarriers to realize multi-user OFDM communications and detect one moving target in the presence of clutter. Meanwhile, the signals in other frequency subcarriers can be optimized in a similar way to perform other tasks. The transmit beamforming and receive filter are jointly optimized to maximize the radar output signal-to-interference-plus-noise ratio (SINR), while satisfying the communication SINR requirement and the power budget. An majorization minimization (MM) method based algorithm is developed to solve the resulting non-convex optimization problem. Numerical results reveal the significant wideband sensing gain brought by jointly designing the transmit signals in different subcarriers, and demonstrate the advantages of our proposed scheme and the effectiveness of the developed algorithm.

preprint2022arXiv

Low-Latency Online Speaker Diarization with Graph-Based Label Generation

This paper introduces an online speaker diarization system that can handle long-time audio with low latency. We enable Agglomerative Hierarchy Clustering (AHC) to work in an online fashion by introducing a label matching algorithm. This algorithm solves the inconsistency between output labels and hidden labels that are generated each turn. To ensure the low latency in the online setting, we introduce a variant of AHC, namely chkpt-AHC, to cluster the speakers. In addition, we propose a speaker embedding graph to exploit a graph-based re-clustering method, further improving the performance. In the experiment, we evaluate our systems on both DIHARD3 and VoxConverse datasets. The experimental results show that our proposed online systems have better performance than our baseline online system and have comparable performance to our offline systems. We find out that the framework combining the chkpt-AHC method and the label matching algorithm works well in the online setting. Moreover, the chkpt-AHC method greatly reduces the time cost, while the graph-based re-clustering method helps improve the performance.

preprint2022arXiv

MealRec: A Meal Recommendation Dataset

Bundle recommendation systems aim to recommend a bundle of items for a user to consider as a whole. They have become a norm in modern life and have been applied to many real-world settings, such as product bundle recommendation, music playlist recommendation and travel package recommendation. However, compared to studies of bundle recommendation approaches in areas such as online shopping and digital music services, research on meal recommendations for restaurants in the hospitality industry has made limited progress, due largely to the lack of high-quality benchmark datasets. A publicly available dataset specialising in meal recommendation research for the research community is in urgent demand. In this paper, we introduce a meal recommendation dataset (MealRec) that aims to facilitate future research. MealRec is constructed from the user review records of Allrecipe.com, covering 1,500+ users, 7,200+ recipes and 3,800+ meals. Each recipe is described with rich information, such as ingredients, instructions, pictures, category and tags, etc; and each meal is three-course, consisting of an appetizer, a main dish and a dessert. Furthermore, we propose a category-constrained meal recommendation model that is evaluated through comparative experiments with several state-of-the-art bundle recommendation methods on MealRec. Experimental results confirm the superiority of our model and demonstrate that MealRec is a promising testbed for meal recommendation related research. The MealRec dataset and the source code of our proposed model are available at https://github.com/WUT-IDEA/MealRec for access and reproducibility.

preprint2022arXiv

MetaCVR: Conversion Rate Prediction via Meta Learning in Small-Scale Recommendation Scenarios

Different from large-scale platforms such as Taobao and Amazon, CVR modeling in small-scale recommendation scenarios is more challenging due to the severe Data Distribution Fluctuation (DDF) issue. DDF prevents existing CVR models from being effective since 1) several months of data are needed to train CVR models sufficiently in small scenarios, leading to considerable distribution discrepancy between training and online serving; and 2) e-commerce promotions have significant impacts on small scenarios, leading to distribution uncertainty of the upcoming time period. In this work, we propose a novel CVR method named MetaCVR from a perspective of meta learning to address the DDF issue. Firstly, a base CVR model which consists of a Feature Representation Network (FRN) and output layers is designed and trained sufficiently with samples across months. Then we treat time periods with different data distributions as different occasions and obtain positive and negative prototypes for each occasion using the corresponding samples and the pre-trained FRN. Subsequently, a Distance Metric Network (DMN) is devised to calculate the distance metrics between each sample and all prototypes to facilitate mitigating the distribution uncertainty. At last, we develop an Ensemble Prediction Network (EPN) which incorporates the output of FRN and DMN to make the final CVR prediction. In this stage, we freeze the FRN and train the DMN and EPN with samples from recent time period, therefore effectively easing the distribution discrepancy. To the best of our knowledge, this is the first study of CVR prediction targeting the DDF issue in small-scale recommendation scenarios. Experimental results on real-world datasets validate the superiority of our MetaCVR and online A/B test also shows our model achieves impressive gains of 11.92% on PCVR and 8.64% on GMV.

preprint2022arXiv

Multi-Granularity Distillation Scheme Towards Lightweight Semi-Supervised Semantic Segmentation

Albeit with varying degrees of progress in the field of Semi-Supervised Semantic Segmentation, most of its recent successes are involved in unwieldy models and the lightweight solution is still not yet explored. We find that existing knowledge distillation techniques pay more attention to pixel-level concepts from labeled data, which fails to take more informative cues within unlabeled data into account. Consequently, we offer the first attempt to provide lightweight SSSS models via a novel multi-granularity distillation (MGD) scheme, where multi-granularity is captured from three aspects: i) complementary teacher structure; ii) labeled-unlabeled data cooperative distillation; iii) hierarchical and multi-levels loss setting. Specifically, MGD is formulated as a labeled-unlabeled data cooperative distillation scheme, which helps to take full advantage of diverse data characteristics that are essential in the semi-supervised setting. Image-level semantic-sensitive loss, region-level content-aware loss, and pixel-level consistency loss are set up to enrich hierarchical distillation abstraction via structurally complementary teachers. Experimental results on PASCAL VOC2012 and Cityscapes reveal that MGD can outperform the competitive approaches by a large margin under diverse partition protocols. For example, the performance of ResNet-18 and MobileNet-v2 backbone is boosted by 11.5% and 4.6% respectively under 1/16 partition protocol on Cityscapes. Although the FLOPs of the model backbone is compressed by 3.4-5.3x (ResNet-18) and 38.7-59.6x (MobileNetv2), the model manages to achieve satisfactory segmentation results.

preprint2022arXiv

Multiple and Asymmetric Scalings in Explosive Percolation

Explosive percolation in the Achlioptas process has recently attracted much research attention. From extensive simulations in an event-based ensemble, we find that, in dimensions from $2$ to $6$ and on random graphs, the Achlioptas processes all have two scaling windows and multiple fractal structures. The mixing of these multiple scalings successfully explains the previously observed anomalous phenomena in the conventional ensemble, and, moreover, correct critical exponents are now determined with a high precision by the event-based method. The multiple scalings and the ensemble inequivalence may bring new insights for other statistical systems.

preprint2022arXiv

Non-Cooperative Resource Management for Intelligent Reflecting Surface Aided Networks

Intelligent reflecting surface (IRS) has emerged as a promising and revolutionizing technology for future wireless networks. Most existing IRS studies focus on simple cooperative systems which usually have a single frequency band. In realistic non-cooperative multi-band networks, however, the existing IRS designs may be not applicable or have severe performance degradation. Thus, in the complex network environment, it is more rational to consider IRSs as public resources to be dynamically allocated to appropriate users. In this paper, we first introduce the auction theory to tackle the resource management problem for a multi-IRS-assisted non-cooperative network. An efficient auction algorithm framework is introduced to sub-optimally solve this non-convex problem. Simulation result illustrates that the significant performance improvement can be achieved by applying the auction algorithm in the complex multi-IRS-assisted non-cooperative network.

preprint2022arXiv

On isotropic Berwald scalar curvature

In this short paper, we establish a closer relation between the Berwald scalar curvature and the $S$-curvature. In fact, we prove that a Finsler metric has isotropic Berwald scalar curvature if and only if it has weakly isotropic $S$-curvature. For Finsler metrics of scalar flag curvature and of weakly isotropic $S$-curvature, they have almost isotropic $S$-curvature if and only if the flag curvature is weakly isotropic.

preprint2022arXiv

Online Target Speaker Voice Activity Detection for Speaker Diarization

This paper proposes an online target speaker voice activity detection system for speaker diarization tasks, which does not require a priori knowledge from the clustering-based diarization system to obtain the target speaker embeddings. First, we employ a ResNet-based front-end model to extract the frame-level speaker embeddings for each coming block of a signal. Next, we predict the detection state of each speaker based on these frame-level speaker embeddings and the previously estimated target speaker embedding. Then, the target speaker embeddings are updated by aggregating these frame-level speaker embeddings according to the predictions in the current block. We iteratively extract the results for each block and update the target speaker embedding until reaching the end of the signal. Experimental results show that the proposed method is better than the offline clustering-based diarization system on the AliMeeting dataset.

preprint2022arXiv

Parallel Pre-trained Transformers (PPT) for Synthetic Data-based Instance Segmentation

Recently, Synthetic data-based Instance Segmentation has become an exceedingly favorable optimization paradigm since it leverages simulation rendering and physics to generate high-quality image-annotation pairs. In this paper, we propose a Parallel Pre-trained Transformers (PPT) framework to accomplish the synthetic data-based Instance Segmentation task. Specifically, we leverage the off-the-shelf pre-trained vision Transformers to alleviate the gap between natural and synthetic data, which helps to provide good generalization in the downstream synthetic data scene with few samples. Swin-B-based CBNet V2, SwinL-based CBNet V2 and Swin-L-based Uniformer are employed for parallel feature learning, and the results of these three models are fused by pixel-level Non-maximum Suppression (NMS) algorithm to obtain more robust results. The experimental results reveal that PPT ranks first in the CVPR2022 AVA Accessibility Vision and Autonomy Challenge, with a 65.155% mAP.

preprint2022arXiv

Probing gluon Bose correlations in DIS

We study correlations originating from the quantum nature of gluons in a hadronic wave function. Bose-Einstein correlation between identical particles lead to the enhancement in the number of pairs of gluons with the same quantum numbers and small relative momentum. We show that these preexisting correlations can be probed in Deep Inelastic Scattering experiments at high energy. Specifically, we consider diffractive dijet plus a third jet production. The azimuthal dependence displays a peak at the zero relative angle between the transverse momentum imbalance of the photon-going dijet and the transverse momentum of the hadron-going jet. Our calculations explicitly show that the peak originates from Bose enhancement. Comparing electron-proton to electron-nucleus collisions, we demonstrate that the nuclear target enhances the relative strength of the peak. With the future high luminosity Electron-Ion Collider the proposed measurements of gluon Bose enhancement become experimentally feasible.

preprint2022arXiv

Quantum algorithms for the generalized eigenvalue problem

The generalized eigenvalue (GE) problems are of particular importance in various areas of science engineering and machine learning. We present a variational quantum algorithm for finding the desired generalized eigenvalue of the GE problem, $\mathcal{A}|ψ\rangle=λ\mathcal{B}|ψ\rangle$, by choosing suitable loss functions. Our approach imposes the superposition of the trial state and the obtained eigenvectors with respect to the weighting matrix $\mathcal{B}$ on the Rayleigh-quotient. Furthermore, both the values and derivatives of the loss functions can be calculated on near-term quantum devices with shallow quantum circuit. Finally, we propose a full quantum generalized eigensolver (FQGE) to calculate the minimal generalized eigenvalue with quantum gradient descent algorithm. As a demonstration of the principle, we numerically implement our algorithms to conduct a 2-qubit simulation and successfully find the generalized eigenvalues of the matrix pencil $(\mathcal{A},\,\mathcal{B})$. The numerically experimental result indicates that FQGE is robust under Gaussian noise.

preprint2022arXiv

Quantum relaxed row and column iteration methods based on block-encoding

Iteration method is commonly used in solving linear systems of equations. We present quantum algorithms for the relaxed row and column iteration methods by constructing unitary matrices in the iterative processes, which generalize row and column iteration methods to solve linear systems on a quantum computer. Comparing with the conventional row and column iteration methods, the convergence accelerates when appropriate parameters are chosen. Once the quantum states are efficiently prepared, the complexity of our relaxed row and column methods is improved exponentially and is linear with the number of the iteration steps. In addition, phase estimations and Hamiltonian simulations are not required in these algorithms.

preprint2022arXiv

Realizing two-qubit gates through mode engineering on a trapped-ion quantum computer

Two-qubit gates are a fundamental constituent of a quantum computer and typically its most challenging operation. In a trapped-ion quantum computer, this is typically implemented with laser beams which are modulated in amplitude, frequency, phase, or a combination of these. The required modulation becomes increasingly more complex as the quantum computer becomes larger, complicating the control hardware design. Here, we develop a simple method to essentially remove the pulse-modulation complexity by engineering the normal modes of the ion chain. We experimentally demonstrate the required mode engineering in a three ion chain. This opens up the possibility to trade off complexity between the design of the trapping fields and the optical control system, which will help scale the ion trap quantum computing platform.

preprint2022arXiv

Reflection and Relay Dual-Functional RIS Assisted MU-MISO Systems

Reconfigurable intelligent surfaces (RISs) have been deemed as one of potential components of future wireless communication systems because they can adaptively manipulate the wireless propagation environment with low-cost passive devices. However, due to double fading effect, the passive RIS can offer sufficient signal strength only when receivers are nearby and located at the same side as the incident signals. Moreover, RIS cannot provide service coverage for the users at the back side of it. In this paper we introduce a novel reflection and relay dual-functional RIS architecture, which can simultaneously realize passive reflection and active relay functionalities to enhance the coverage. The problem of joint transmit beamforming and dual-functional RIS design is investigated to maximize the achievable sum-rate of a multiuser multiple-input single-output (MU-MISO) system. Based on fractional programming (FP) theory and majorization-minimization (MM) technique, we propose an efficient iterative transmit beamforming and RIS design algorithm. Simulation results demonstrate the superiority of the introduced dual-functional RIS architecture and the effectiveness of the proposed algorithm.

preprint2022arXiv

The DKU-OPPO System for the 2022 Spoofing-Aware Speaker Verification Challenge

This paper describes our DKU-OPPO system for the 2022 Spoofing-Aware Speaker Verification (SASV) Challenge. First, we split the joint task into speaker verification (SV) and spoofing countermeasure (CM), these two tasks which are optimized separately. For ASV systems, four state-of-the-art methods are employed. For CM systems, we propose two methods on top of the challenge baseline to further improve the performance, namely Embedding Random Sampling Augmentation (ERSA) and One-Class Confusion Loss(OCCL). Second, we also explore whether SV embedding could help improve CM system performance. We observe a dramatic performance degradation of existing CM systems on the domain-mismatched Voxceleb2 dataset. Third, we compare different fusion strategies, including parallel score fusion and sequential cascaded systems. Compared to the 1.71% SASV-EER baseline, our submitted cascaded system obtains a 0.21% SASV-EER on the challenge official evaluation set.

preprint2022arXiv

The HCCL-DKU system for fake audio generation task of the 2022 ICASSP ADD Challenge

The voice conversion task is to modify the speaker identity of continuous speech while preserving the linguistic content. Generally, the naturalness and similarity are two main metrics for evaluating the conversion quality, which has been improved significantly in recent years. This paper presents the HCCL-DKU entry for the fake audio generation task of the 2022 ICASSP ADD challenge. We propose a novel ppg-based voice conversion model that adopts a fully end-to-end structure. Experimental results show that the proposed method outperforms other conversion models, including Tacotron-based and Fastspeech-based models, on conversion quality and spoofing performance against anti-spoofing systems. In addition, we investigate several post-processing methods for better spoofing power. Finally, we achieve second place with a deception success rate of 0.916 in the ADD challenge.

preprint2022arXiv

Towards Lightweight Applications: Asymmetric Enroll-Verify Structure for Speaker Verification

With the development of deep learning, automatic speaker verification has made considerable progress over the past few years. However, to design a lightweight and robust system with limited computational resources is still a challenging problem. Traditionally, a speaker verification system is symmetrical, indicating that the same embedding extraction model is applied for both enrollment and verification in inference. In this paper, we come up with an innovative asymmetric structure, which takes the large-scale ECAPA-TDNN model for enrollment and the small-scale ECAPA-TDNNLite model for verification. As a symmetrical system, our proposed ECAPA-TDNNLite model achieves an EER of 3.07% on the Voxceleb1 original test set with only 11.6M FLOPS. Moreover, the asymmetric structure further reduces the EER to 2.31%, without increasing any computational costs during verification.

preprint2022arXiv

Transverse momentum distributions of valence quark in light and heavy vector mesons

We study the leading-twist time-reversal even transverse momentum dependent parton distribution functions (TMDs) of light and heavy vector mesons, i.e., the $ρ$, $J/ψ$ and $Υ$. We employ the leading Fock-state light front wave functions (LF-LFWFs) of $ρ$ and $J/ψ$ from our recent study, and supplement with $Υ$'s LF-LFWFs. These LF-LFWFs are extracted from dynamically solved Bethe-Salpeter wave functions. The vector meson TMDs are then studied with the light front overlap representation at leading Fock-state. All the obtained TMDs are non-vanishing and evolve with current quark mass, in particular the tensor polarized TMDs $f_{1LT}$ and $f_{1TT}$ which undergo a sign flip. The $ρ$ TMDs are compared with other model studies and agreement is found, aside from $f_{1LT}$ and $f_{1TT}$. Finally, the collinear PDFs of vector mesons are studied. The $ρ$'s valence PDFs $f_{1,v}(x)$ and $g_{1L,v}(x)$ are evolved to the scale of 2.4 GeV, with their first three moments compared to lattice QCD prediction. The qualitative behavior of tensor polarized PDF $f_{1LL}(x)$ in $ρ$ at large $x$ is also discussed.

preprint2022arXiv

User Association and Hybrid Beamforming Designs for Cooperative mmWave MIMO Systems

Hybrid analog and digital beamforming has emerged as a key enabling technology for millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) communication systems since it can balance the trade-off between system performance and hardware efficiency. Owing to the strong ability of central control, cooperative networks show great potential to enhance the spectral efficiency of mmWave communications. In this paper, we consider cooperative mmWave MIMO systems and propose user association and hybrid beamforming design algorithms for three typical hybrid beamforming architectures. The central processing unit (CPU) of the cooperative networks first matches the service pairs of base stations (BSs) and users. Then, an iterative hybrid beamforming design algorithm is proposed to maximize the weighted achievable sum-rate performance of the mmWave MIMO system with fully connected hybrid beamforming architecture. Moreover, a heuristic analog beamforming design algorithm is introduced for the fixed subarray hybrid beamforming architecture. In an effort to further exploit multiple-antenna diversities, we also consider the dynamic subarray architecture and propose a novel antenna design algorithm for the analog beamforming design. Simulation results illustrate that the proposed hybrid beamforming algorithms achieve a significant performance improvement than other existing approaches and the dynamic subarray architecture has great advantages of improving the energy efficiency (EE) performance.

preprint2022arXiv

Wiggle: Physical Challenge-Response Verification of Vehicle Platooning

Autonomous vehicle platooning promises many benefits such as fuel efficiency, road safety, reduced traffic congestion, and passenger comfort. Platooning vehicles travel in a single file, in close distance, and at the same velocity. The platoon formation is autonomously maintained by a Cooperative Adaptive Cruise Control (CACC) system which relies on sensory data and vehicle-to-vehicle (V2V) communications. In fact, V2V messages play a critical role in shortening the platooning distance while maintaining safety. Whereas V2V message integrity and source authentication can be verified via cryptographic methods, establishing the truthfulness of the message contents is a much harder task. This work establishes a physical access control mechanism to restrict V2V messages to platooning members. Specifically, we aim at tying the digital identity of a candidate requesting to join a platoon to its physical trajectory relative to the platoon. We propose the {\em Wiggle} protocol that employs a physical challenge-response exchange to prove that a candidate requesting to be admitted into a platoon actually follows it. The protocol name is inspired by the random longitudinal movements that the candidate is challenged to execute. {\em Wiggle} prevents any remote adversary from joining the platoon and injecting fake CACC messages. Compared to prior works, {\em Wiggle} is resistant to pre-recording attacks and can verify that the candidate is directly behind the verifier at the same lane.

preprint2021arXiv

A Feature Fusion-Net Using Deep Spatial Context Encoder and Nonstationary Joint Statistical Model for High Resolution SAR Image Classification

Convolutional neural networks (CNNs) have been applied to learn spatial features for high-resolution (HR) synthetic aperture radar (SAR) image classification. However, there has been little work on integrating the unique statistical distributions of SAR images which can reveal physical properties of terrain objects, into CNNs in a supervised feature learning framework. To address this problem, a novel end-to-end supervised classification method is proposed for HR SAR images by considering both spatial context and statistical features. First, to extract more effective spatial features from SAR images, a new deep spatial context encoder network (DSCEN) is proposed, which is a lightweight structure and can be effectively trained with a small number of samples. Meanwhile, to enhance the diversity of statistics, the nonstationary joint statistical model (NS-JSM) is adopted to form the global statistical features. Specifically, SAR images are transformed into the Gabor wavelet domain and the produced multi-subbands magnitudes and phases are modeled by the log-normal and uniform distribution. The covariance matrix is further utilized to capture the inter-scale and intra-scale nonstationary correlation between the statistical subbands and make the joint statistical features more compact and distinguishable. Considering complementary advantages, a feature fusion network (Fusion-Net) base on group compression and smooth normalization is constructed to embed the statistical features into the spatial features and optimize the fusion feature representation. As a result, our model can learn the discriminative features and improve the final classification performance. Experiments on four HR SAR images validate the superiority of the proposed method over other related algorithms.

preprint2021arXiv

Analysis of Heterogeneous Structures of Non-separated Scales on Curved Bridge Nodes

Numerically predicting the performance of heterogenous structures without scale separation represents a significant challenge to meet the critical requirements on computational scalability and efficiency -- adopting a mesh fine enough to fully account for the small-scale heterogeneities leads to prohibitive computational costs while simply ignoring these fine heterogeneities tends to drastically over-stiffen the structure's rigidity. This study proposes an approach to construct new material-aware shape (basis) functions per element on a coarse discretization of the structure with respect to each curved bridge nodes (CBNs) defined along the elements' boundaries. Instead of formulating their derivation by solving a nonlinear optimization problem, the shape functions are constructed by building a map from the CBNs to the interior nodes and are ultimately presented in an explicit matrix form as a product of a Bézier interpolation transformation and a boundary-interior transformation. The CBN shape function accomodates more flexibility in closely capturing the coarse element's heterogeneity, overcomes the important and challenging issues of inter-element stiffness and displacement discontinuity across interface between coarse elements, and improves the analysis accuracy by orders of magnitude; they also meet the basic geometric properties of shape functions that avoid aphysical analysis results. Extensive numerical examples, including a 3D industrial example of billions of degrees of freedom, are also tested to demonstrate the approach's performance in comparison with results obtained from classical approaches.

preprint2021arXiv

Channel Estimation for Practical IRS-Assisted OFDM Systems

Intelligent reflecting surface (IRS), composed of a large number of hardware-efficient passive elements, is deemed as a potential technique for future wireless communications since it can adaptively enhance the propagation environment. In order to effectively utilize IRS to achieve promising beamforming gains, the problem of channel state information (CSI) acquisition needs to be carefully considered. However, most recent works assume to employ an ideal IRS, i.e., each reflecting element has constant amplitude, variable phase shifts, as well as the same response for the signals with different frequencies, which will cause severe estimation error due to the mismatch between the ideal IRS and the practical one. In this paper, we study channel estimation in practical IRS-aided orthogonal frequency division multiplexing (OFDM) systems with discrete phase shifts. Different from the prior works which assume that IRS has an ideal reflection model, we perform channel estimation by considering amplitude-phase shift-frequency relationship for the response of practical IRS. Aiming at minimizing normalized-mean-square-error (NMSE) of the estimated channel, a novel IRS time-varying reflection pattern is designed by leveraging the alternating optimization (AO) algorithm for the case of using low-resolution phase shifters. Moreover, for the high-resolution IRS cases, we provide another practical reflection pattern scheme to further reduce the complexity. Simulation results demonstrate the necessity of considering practical IRS model for channel estimation and the effectiveness of our proposed channel estimation methods.

preprint2021arXiv

Classical-to-quantum transition in multimode nonlinear systems with strong photon-photon coupling

With advanced micro- and nano-photonic structures, the vacuum photon-photon coupling rate is anticipated to approach the intrinsic loss rate and lead to unconventional quantum effects. Here, we investigate the classical-to-quantum transition of such photonic nonlinear systems using the quantum cluster-expansion method, which addresses the computational challenge in tracking large photon number states of the fundamental and harmonic optical fields involved in the second harmonic generation process. Compared to the mean-field approximation used in weak coupling limit, the quantum cluster-expansion method solves multimode dynamics efficiently and reveals the quantum behaviors of optical parametric oscillations around the threshold. This work presents a universal tool to study quantum dynamics of multimode systems and explore the nonlinear photonic devices for continuous-variable quantum information processing.

preprint2021arXiv

Don't Change Me! User-Controllable Selective Paraphrase Generation

In the paraphrase generation task, source sentences often contain phrases that should not be altered. Which phrases, however, can be context dependent and can vary by application. Our solution to this challenge is to provide the user with explicit tags that can be placed around any arbitrary segment of text to mean "don't change me!" when generating a paraphrase; the model learns to explicitly copy these phrases to the output. The contribution of this work is a novel data generation technique using distant supervision that allows us to start with a pretrained sequence-to-sequence model and fine-tune a paraphrase generator that exhibits this behavior, allowing user-controllable paraphrase generation. Additionally, we modify the loss during fine-tuning to explicitly encourage diversity in model output. Our technique is language agnostic, and we report experiments in English and Chinese.

preprint2021arXiv

Dual-Functional Radar-Communication Waveform Design: A Symbol-Level Precoding Approach

Dual-functional radar-communication (DFRC) systems can simultaneously perform both radar and communication functionalities using the same hardware platform and spectrum resource. In this paper, we consider multi-input multi-output (MIMO) DFRC systems and focus on transmit beamforming designs to provide both radar sensing and multi-user communications. Unlike conventional block-level precoding techniques, we propose to use the recently emerged symbol-level precoding approach in DFRC systems, which provides additional degrees of freedom (DoFs) that guarantee preferable instantaneous transmit beampatterns for radar sensing and achieve better communication performance. In particular, the squared error between the designed and desired beampatterns is minimized subject to the quality-of-service (QoS) requirements of the communication users and the constant-modulus power constraint. Two efficient algorithms are developed to solve this non-convex problem on both the Euclidean and Riemannian spaces. The first algorithm employs penalty dual decomposition (PDD), majorization-minimization (MM), and block coordinate descent (BCD) methods to convert the original optimization problem into two solvable sub-problems, and iteratively solves them using efficient algorithms. The second algorithm provides a much faster solution at the price of a slight performance loss, first transforming the original problem into Riemannian space, and then utilizing the augmented Lagrangian method (ALM) to obtain an unconstrained problem that is subsequently solved via a Riemannian Broyden-Fletcher-Goldfarb-Shanno (RBFGS) algorithm. Extensive simulations verify the distinct advantages of the proposed symbol-level precoding designs in both radar sensing and multi-user communications.

preprint2021arXiv

First saturation correction in high energy proton-nucleus collisions: Part III. Ensemble averaging

In high energy proton-nucleus collisions, the gluon saturation effects from the nucleus are fully incorporated into the light-like Wilson lines. The gluon saturation effects from the proton, which are anticipated to be important either in the extreme high energy limit or towards the dense-dense (nucleus-nucleus) collision regimes, have been studied perturbatively within the Color Glass Condensate effective theory in previous papers of this series. A configuration-by-configuration expression for the single inclusive semi-hard gluon production including the first saturation correction was obtained. In this paper, we perform ensemble averaging in the McLerran-Venugopalan model and the Dipole Approximation. We find that, in the saturation correction, the effects of the initial state interactions are negligible while the final state interactions play most important role and give a positive-valued contribution to the semi-hard gluon spectrum. Furthermore, we show that the single gluon spectrum scales approximately $1/k_{\perp}^{4}$ at small $k_{\perp}$, suggesting that a resummation of higher order saturation corrections is required to regulate the infrared region of the gluon spectrum.

preprint2021arXiv

Intelligent reflecting surface assisted multi-cell multi-band wireless networks

Intelligent reflecting surface (IRS) is deemed as a promising and revolutionizing technology for future wireless communication systems owing to its capability to intelligently change the propagation environment and introduce a new dimension into wireless communication optimization. Most existing studies on IRS are based on an ideal reflection model. However, it is difficult to implement an IRS which can simultaneously realize any adjustable phase shift for the signals with different frequencies. Therefore, the practical phase shift model, which can describe the difference of IRS phase shift responses for the signals with different frequencies, should be utilized in the IRS optimization for wideband and multi-band systems. In this paper, we consider an IRS-assisted multi-cell multi-band system, in which different base stations (BSs) operate at different frequency bands. We aim to jointly design the transmit beamforming of BSs and the reflection beamforming of the IRS to minimize the total transmit power subject to signal to interference-plus-noise ratio (SINR) constraints of individual user and the practical IRS reflection model. With the aid of the practical phase shift model, the influence between the signals with different frequencies is taken into account during the design of IRS. Simulation results illustrate the importance of considering the practical communication scenario on the IRS designs and validate the effectiveness of our proposed algorithm.

preprint2021arXiv

Intelligent Reflecting Surface based Passive Information Transmission: A Symbol-Level Precoding Approach

Intelligent reflecting surfaces (IRS) have been proposed as a revolutionary technology owing to its capability of adaptively reconfiguring the propagation environment in a cost-effective and hardware-efficient fashion. While the application of IRS as a passive reflector to enhance the performance of wireless communications has been widely investigated in the literature, using IRS as a passive transmitter recently is emerging as a new concept and attracting steadily growing interest. In this paper, we propose two novel IRS-based passive information transmission systems using advanced symbol-level precoding. One is a standalone passive information transmission system, where the IRS operates as a passive transmitter serving multiple receivers by adjusting its elements to reflect unmodulated carrier signals. The other is a joint passive reflection and information transmission system, where the IRS not only enhances transmissions for multiple primary information receivers (PIRs) by passive reflection, but also simultaneously delivers additional information to a secondary information receiver (SIR) by embedding its information into the primary signals at the symbol level. Two typical optimization problems, i.e., power minimization and quality-of-service (QoS) balancing, are investigated for the proposed IRS-based passive information transmission systems. Simulation results demonstrate the feasibility of IRS-based passive information transmission and the effectiveness of our proposed algorithms, as compared to other benchmark schemes.

preprint2021arXiv

Intelligent reflecting surface enhanced wideband MIMO-OFDM communications: From practical model to reflection optimization

Intelligent reflecting surface (IRS) is envisioned as a revolutionary technology for future wireless communication systems since it can intelligently change radio environment and integrate it into wireless communication optimization. However, most existing works adopted an ideal IRS reflection model, which is impractical and can cause significant performance degradation in realistic wideband systems. To address this issue, we first study the dual phase- and amplitude-squint effect of reflected signals and present a simplified practical IRS reflection model for wideband signals. Then, an IRS enhanced wideband multiuser multi-input single-output orthogonal frequency division multiplexing (MU-MISO-OFDM) system is investigated. We aim to jointly design the transmit beamformer and IRS reflection for the case of using both continuous and discrete phase shifters to maximize the average sum-rate over all subcarriers. By exploiting the relationship between sum-rate maximization and mean square error (MSE) minimization, the original problem is equivalently transformed into a multi-block/variable problem, which can be efficiently solved by the block coordinate descent (BCD) method. Complexity and convergence for both cases are analyzed or illustrated. Simulation results demonstrate that the proposed algorithm can offer significant average sum-rate enhancement compared to that achieved using the ideal IRS reflection model, which confirms the importance of the use of the practical model for the design of wideband systems.

preprint2021arXiv

MathNet: Haar-Like Wavelet Multiresolution-Analysis for Graph Representation and Learning

Graph Neural Networks (GNNs) have recently caught great attention and achieved significant progress in graph-level applications. In this paper, we propose a framework for graph neural networks with multiresolution Haar-like wavelets, or MathNet, with interrelated convolution and pooling strategies. The underlying method takes graphs in different structures as input and assembles consistent graph representations for readout layers, which then accomplishes label prediction. To achieve this, the multiresolution graph representations are first constructed and fed into graph convolutional layers for processing. The hierarchical graph pooling layers are then involved to downsample graph resolution while simultaneously remove redundancy within graph signals. The whole workflow could be formed with a multi-level graph analysis, which not only helps embed the intrinsic topological information of each graph into the GNN, but also supports fast computation of forward and adjoint graph transforms. We show by extensive experiments that the proposed framework obtains notable accuracy gains on graph classification and regression tasks with performance stability. The proposed MathNet outperforms various existing GNN models, especially on big data sets.

preprint2021arXiv

Numerical convergence of pre-initial conditions on dark matter halo properties

Generating pre-initial conditions (or particle loads) is the very first step to set up a cosmological N-body simulation. In this work, we revisit the numerical convergence of pre-initial conditions on dark matter halo properties using a set of simulations which only differs in initial particle loads, i.e. grid, glass, and the newly introduced capacity constrained Voronoi tessellation (CCVT). We find that the median halo properties agree fairly well (i.e. within a convergence level of a few per cent) among simulations running from different initial loads. We also notice that for some individual haloes cross-matched among different simulations, the relative difference of their properties sometimes can be several tens of per cent. By looking at the evolution history of these poorly converged haloes, we find that they are usually merging haloes or haloes have experienced recent merger events, and their merging processes in different simulations are out-of-sync, making the convergence of halo properties become poor temporarily. We show that, comparing to the simulation starting with an anisotropic grid load, the simulation with an isotropic CCVT load converges slightly better to the simulation with a glass load, which is also isotropic. Among simulations with different pre-initial conditions, haloes in higher density environments tend to have their properties converged slightly better. Our results confirm that CCVT loads behave as well as the widely used grid and glass loads at small scales, and for the first time we quantify the convergence of two independent isotropic particle loads (i.e. glass and CCVT) on halo properties.

preprint2021arXiv

Organization of cooperation in fractal structures

It is known that the small-world structure constitutes sufficient conditions to sustain cooperation and thus enhances cooperation. On the contrary, the network with a very long average distance is usually thought of as suppressing the emergence of the cooperation. In this paper we show that the fractal structure, of which the average distance is very long, does not always play a negative role in the organization of cooperation. Compared to regular networks, the fractal structure might even facilitate the emergence of cooperation. This mainly depends on the existence of locally compact clusters. The sparse inter-connection between these clusters constructs an asymmetric barrier that the defection strategy is almost impossible to cross, but the cooperation strategy has a not too small chance. More generally, the network need not to be a standard fractal, as long as such structures exist. In turn, when this typical structure is absent, the fractal structure will also suppress the emergence of the cooperation, such as the fractal configuration obtained by diluting a random tree-like network. Our findings also clarify some contradictions in the previous studies, and suggest that both removing and inserting links from/into a regular network can enhance cooperation.

preprint2021arXiv

Quantum microwave photonics

By harnessing quantum superposition and entanglement, remarkable progress has sprouted over the past three decades from different areas of research in communication computation and simulation. To further improve the processing ability of microwave pho-tonics, here, we have demonstrated a quantum microwave photonic processing system using a low jitter superconducting nanowire single photon detector (SNSPD) and a time-correlated single-photon counting (TCSPC) module. This method uniquely combines extreme optical sensitivity, down to a single-photon level (below -100 dBm), and wide processing bandwidth, twice higher than the transmission bandwidth of the cable. Moreover, benefitted from the trigger, the system can selectively process the desired RF signal and attenuates the other in-tense noise and undesired RF components even the power is 15dB greater than the desired signal power. Using this method we show microwave phase shifting and frequency filtering for the desired RF signal on the single-photon level. Besides its applications in space and under-water communications and testing and qualification of pre-packaged photonic modulators and detectors. This RF signal processing capability at the single-photon level can lead to significant development in the high-speed quantum processing method.

preprint2021arXiv

Superconnections and An Intrinsic Gauss-Bonnet-Chern Formula for Finsler Manifolds

In this paper, we establish an intrinsic Gauss-Bonnet-Chern formula for Finsler manifolds by using the Mathai-Quillen's superconnection formalism, in which no extra vector field is involved. Furthermore, we prove a more general Lichnerowicz formula in this direction through a geometric localization procedure.

preprint2021arXiv

The 2020 Personalized Voice Trigger Challenge: Open Database, Evaluation Metrics and the Baseline Systems

The 2020 Personalized Voice Trigger Challenge (PVTC2020) addresses two different research problems a unified setup: joint wake-up word detection with speaker verification on close-talking single microphone data and far-field multi-channel microphone array data. Specially, the second task poses an additional cross-channel matching challenge on top of the far-field condition. To simulate the real-life application scenario, the enrollment utterances are recorded from close-talking cell-phone only, while the test utterances are recorded from both the close-talking cell-phone and the far-field microphone arrays. This paper introduces our challenge setup and the released database as well as the evaluation metrics. In addition, we present a joint end-to-end neural network baseline system trained with the proposed database for speaker-dependent wake-up word detection. Results show that the cost calculated from the miss rate and the false alarm rate, can reach 0.37 in the close-talking single microphone task and 0.31 in the far-field microphone array task. The official website and the open-source baseline system have been released.

preprint2021arXiv

The DKU-Duke-Lenovo System Description for the Third DIHARD Speech Diarization Challenge

In this paper, we present the submitted system for the third DIHARD Speech Diarization Challenge from the DKU-Duke-Lenovo team. Our system consists of several modules: voice activity detection (VAD), segmentation, speaker embedding extraction, attentive similarity scoring, agglomerative hierarchical clustering. In addition, the target speaker VAD (TSVAD) is used for the phone call data to further improve the performance. Our final submitted system achieves a DER of 15.43% for the core evaluation set and 13.39% for the full evaluation set on task 1, and we also get a DER of 21.63% for core evaluation set and 18.90% for full evaluation set on task 2.

preprint2021arXiv

Tight upper bound on the quantum value of Svetlichny operators under local filtering and hidden genuine nonlocality

Nonlocal quantum correlations among the quantum subsystems play essential roles in quantum science. The violation of the Svetlichny inequality provides sufficient conditions of genuine tripartite nonlocality. We provide tight upper bounds on the maximal quantum value of the Svetlichny operators under local filtering operations, and present a qualitative analytical analysis on the hidden genuine nonlocality for three-qubit systems. We investigate in detail two classes of three-qubit states whose hidden genuine nonlocalities can be revealed by local filtering.

preprint2021arXiv

Two-dimensional antiferroelectric tunnel junction

Ferroelectric tunnel junctions (FTJs), which consist of two metal electrodes separated by a thin ferroelectric barrier, have recently aroused significant interest for technological applications as nanoscale resistive switching devices. So far, most of existing FTJs have been based on perovskite-oxide barrier layers. The recent discovery of the two-dimensional (2D) van der Waals ferroelectric materials opens a new route to realize tunnel junctions with new functionalities and nm-scale dimensions. Due to the weak coupling between the atomic layers in these materials, the relative dipole alignment between them can be controlled by applied voltage. This allows transitions between ferroelectric and antiferroelectric orderings, resulting in significant changes of the electronic structure. Here, we propose to realize 2D antiferroelectric tunnel junctions (AFTJs), which exploit this new functionality, based on bilayer In$_2$X$_3$ (X = S, Se, Te) barriers and different 2D electrodes. Using first-principles density functional theory calculations, we demonstrate that the In$_2$X$_3$ bilayers exhibit stable ferroelectric and antiferroelectric states separated by sizable energy barriers, thus supporting a non-volatile switching between these states. Using quantum-mechanical modeling of the electronic transport, we explore in-plane and out-of-plane tunneling across the In$_2$S$_3$ van der Waals bilayers, and predict giant tunneling electroresistance (TER) effects and multiple non-volatile resistance states driven by ferroelectric-antiferroelectric order transitions. Our proposal opens a new route to realize nanoscale memory devices with ultrahigh storage density using 2D AFTJs.

Ming Li

What is connected

Connect this record

See the researcher in context

Building this map preview

209 published item(s)

HeterSEED: Semantics-Structure Decoupling for Heterogeneous Graph Learning under Heterophily

The Bystander Effect in Multi-Agent Reasoning: Quantifying Cognitive Loafing in Collaborative Interactions

The Inverse-Wisdom Law: Architectural Tribalism and the Consensus Paradox in Agentic Swarms

MaRCA: Multi-Agent Reinforcement Learning for Dynamic Computation Allocation in Large-Scale Recommender Systems

A First-Principle Approach to X-ray Active Optics: Design and Verification

A Practical Beamforming Design for Active RIS-assisted MU-MISO Systems

BiSinger: Bilingual Singing Voice Synthesis

Cooperative Cell-Free ISAC Networks: Joint BS Mode Selection and Beamforming Design

Quadrotor Stabilization with Safety Guarantees: A Universal Formula Approach

A Theory of Human-Like Few-Shot Learning

Deep Learning-Based UAV Aerial Triangulation without Image Control Points

Discriminator-Guided Model-Based Offline Imitation Learning

Self-supervised Geometric Features Discovery via Interpretable Attention for Vehicle Re-Identification and Beyond

Sparsity Exploitation via Joint Receive Processing and Transmit Beamforming Design for MIMO-OFDM ISAC Systems

A proof-of-principle demonstration of quantum microwave photonics

A Schur type lemma for the Mean Berwald curvature in Finsler geometry

About One-point Statistics of the Ratio of Two Fourier-transformed Cosmic Fields and an Application

CaFT: Clustering and Filter on Tokens of Transformer for Weakly Supervised Object Localization

Collaborative Knowledge Graph Fusion by Exploiting the Open Corpus

Cross-Age Speaker Verification: Learning Age-Invariant Speaker Embeddings

Cross-Channel Attention-Based Target Speaker Voice Activity Detection: Experimental Results for M2MeT Challenge

Deep Reinforcement Learning based Joint Active and Passive Beamforming Design for RIS-Assisted MISO Systems

Embedding Graphs on Grassmann Manifold

Explainable COVID-19 Infections Identification and Delineation Using Calibrated Pseudo Labels

Fully-integrated multipurpose microwave frequency identification system on a single chip

Generating Adversarial Samples For Training Wake-up Word Detection Systems Against Confusing Words

Integrated Sensing and Communication with Reconfigurable Intelligent Surfaces: Opportunities, Applications, and Future Directions

Invertible Voice Conversion

IRS-assisted Multi-cell Multi-band Systems: Practical Reflection Model and Joint Beamforming Design

Joint Beamforming Design for Intelligent Omni Surface Assisted Wireless Communication Systems

Joint Beamforming Design for RIS-Assisted Integrated Sensing and Communication Systems

Joint Beamforming Design in DFRC Systems for Wideband Sensing and OFDM Communications

Low-Latency Online Speaker Diarization with Graph-Based Label Generation

MealRec: A Meal Recommendation Dataset

MetaCVR: Conversion Rate Prediction via Meta Learning in Small-Scale Recommendation Scenarios

Multi-Granularity Distillation Scheme Towards Lightweight Semi-Supervised Semantic Segmentation

Multiple and Asymmetric Scalings in Explosive Percolation

Non-Cooperative Resource Management for Intelligent Reflecting Surface Aided Networks

On isotropic Berwald scalar curvature

Online Target Speaker Voice Activity Detection for Speaker Diarization

Parallel Pre-trained Transformers (PPT) for Synthetic Data-based Instance Segmentation

Probing gluon Bose correlations in DIS

Quantum algorithms for the generalized eigenvalue problem

Quantum relaxed row and column iteration methods based on block-encoding

Realizing two-qubit gates through mode engineering on a trapped-ion quantum computer

Reflection and Relay Dual-Functional RIS Assisted MU-MISO Systems

The DKU-OPPO System for the 2022 Spoofing-Aware Speaker Verification Challenge

The HCCL-DKU system for fake audio generation task of the 2022 ICASSP ADD Challenge

Towards Lightweight Applications: Asymmetric Enroll-Verify Structure for Speaker Verification

Transverse momentum distributions of valence quark in light and heavy vector mesons

User Association and Hybrid Beamforming Designs for Cooperative mmWave MIMO Systems

Wiggle: Physical Challenge-Response Verification of Vehicle Platooning

A Feature Fusion-Net Using Deep Spatial Context Encoder and Nonstationary Joint Statistical Model for High Resolution SAR Image Classification

Analysis of Heterogeneous Structures of Non-separated Scales on Curved Bridge Nodes

Channel Estimation for Practical IRS-Assisted OFDM Systems

Classical-to-quantum transition in multimode nonlinear systems with strong photon-photon coupling

Don't Change Me! User-Controllable Selective Paraphrase Generation

Dual-Functional Radar-Communication Waveform Design: A Symbol-Level Precoding Approach

First saturation correction in high energy proton-nucleus collisions: Part III. Ensemble averaging

Intelligent reflecting surface assisted multi-cell multi-band wireless networks

Intelligent Reflecting Surface based Passive Information Transmission: A Symbol-Level Precoding Approach

Intelligent reflecting surface enhanced wideband MIMO-OFDM communications: From practical model to reflection optimization

MathNet: Haar-Like Wavelet Multiresolution-Analysis for Graph Representation and Learning

Numerical convergence of pre-initial conditions on dark matter halo properties

Organization of cooperation in fractal structures

Quantum microwave photonics

Superconnections and An Intrinsic Gauss-Bonnet-Chern Formula for Finsler Manifolds

The 2020 Personalized Voice Trigger Challenge: Open Database, Evaluation Metrics and the Baseline Systems

The DKU-Duke-Lenovo System Description for the Third DIHARD Speech Diarization Challenge

Tight upper bound on the quantum value of Svetlichny operators under local filtering and hidden genuine nonlocality

Two-dimensional antiferroelectric tunnel junction

A critical survey on the kinetic assays of DNA polymerase fidelity from a new theoretical perspective

Acoustic Word Embedding System for Code-Switching Query-by-example Spoken Term Detection

Atss-Net: Target Speaker Separation via Attention-based Neural Network