Source author record

Qi Wu

Qi Wu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

82works

29topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A Scheduling Framework for Efficient MoE Inference on Edge GPU-NDP Systems

Mixture-of-Experts (MoE) models facilitate edge deployment by decoupling model capacity from active computation, yet their large memory footprint drives the need for GPU systems with near-data processing (NDP) capabilities that offload experts to dedicated processing units. However, deploying MoE models on such edge-based GPU-NDP systems faces three critical challenges: 1) severe load imbalance across NDP units due to non-uniform expert selection and expert parallelism, 2) insufficient GPU utilization during expert computation within NDP units, and 3) extensive data pre-profiling necessitated by unpredictable expert activation patterns for pre-fetching. To address these challenges, this paper proposes an efficient inference framework featuring three key optimizations. First, the underexplored tensor parallelism in MoE inference is exploited to partition and compute large expert parameters across multiple NDP units simultaneously towards edge low-batch scenarios. Second, a load-balancing-aware scheduling algorithm distributes expert computations across NDP units and GPU to maximize resource utilization. Third, a dataset-free pre-fetching strategy proactively loads frequently accessed experts to minimize activation delays. Experimental results show that our framework enables GPU-NDP systems to achieve 2.41x on average and up to 2.56x speedup in end-to-end latency compared to state-of-the-art approaches, significantly enhancing MoE inference efficiency in resource-constrained environments.

preprint2026arXiv

CD-PIM: A High-Bandwidth and Compute-Efficient LPDDR5-Based PIM for Low-Batch LLM Acceleration on Edge-Device

Edge deployment of low-batch large language models (LLMs) faces critical memory bandwidth bottlenecks when executing memory-intensive general matrix-vector multiplications (GEMV) operations. While digital processing-in-memory (PIM) architectures promise to accelerate GEMV operations, existing PIM-equipped edge devices still suffer from three key limitations: limited bandwidth improvement, component under-utilization in mixed workloads, and low compute capacity of computing units (CUs). In this paper, we propose CD-PIM to address these challenges through three key innovations. First, we introduce a high-bandwidth compute-efficient mode (HBCEM) that enhances bandwidth by dividing each bank into four pseudo-banks through segmented global bitlines. Second, we propose a low-batch interleaving mode (LBIM) to improve component utilization by overlapping GEMV operations with GEMM operations. Third, we design a compute-efficient CU that performs enhanced GEMV operations in a pipelined manner by serially feeding weight data into the computing core. Forth, we adopt a column-wise mapping for the key-cache matrix and row-wise mapping for the value-cache matrix, which fully utilizes CU resources. Our evaluation shows that compared to a GPU-only baseline and state-of-the-art PIM designs, our CD-PIM achieves 11.42x and 4.25x speedup on average within a single batch in HBCEM mode, respectively. Moreover, for low-batch sizes, the CD-PIM achieves an average speedup of 1.12x in LBIM compared to HBCEM.

preprint2026arXiv

Dipion transitions from $X(3872)$ to $χ_{cJ}\ (J=0,1,2)$

In this work, we investigate the dipion transition processes $X(3872)\to ππχ_{cJ} (J=0,1,2)$ within the framework of heavy hadron chiral perturbation theory, treating $X(3872)$ as a molecular state composed of $D\bar{D}^*$+ H.c. components. By analyzing the box and triangle loop diagrams with the nonrelativistic effective field theory power-counting rule, we demonstrate that box diagrams dominate these dipion transition processes. Branching ratios are calculated as functions of the mixing angle $θ$, which parametrizes the neutral and charged meson compositions of the $X(3872)$. Our results indicate that the branching fractions for $X(3872)\toππχ_{c0}$, $X(3872)\to ππχ_{c1}$, and $X(3872)\to ππχ_{c2}$ are of the orders of $10^{-4}$, $10^{-3}$, and $10^{-5}$, respectively. We also predict the ratios ${\mathcal{B}[X(3872)\rightarrow ππχ_{c0/2}]}/{\mathcal{B}[X(3872)\rightarrow ππχ_{c1}]}$ and ${\mathcal{B}[X(3872)\rightarrow π^+π^-χ_{cJ}]}/{\mathcal{B}[X(3872)\rightarrow π^0π^0χ_{cJ}]}$. The latter deviates from isospin-symmetry expectations, revealing various degrees of isospin violation. By studying the $π^+π^-$ and $π^+χ_{cJ}$ invariant mass spectra, we find a double-bump structure in the $π^ + π^-$ invariant mass distributions of the process $X(3872)\to π^+π^-χ_{c1}$ and $π^+χ_{c0}$ invariant mass distribution of the process $X(3872)\to π^+π^-χ_{c0}$, which could be tested by future experimental measurements.

preprint2026arXiv

SpatialNav: Leveraging Spatial Scene Graphs for Zero-Shot Vision-and-Language Navigation

Although learning-based vision-and-language navigation (VLN) agents can learn spatial knowledge implicitly from large-scale training data, zero-shot VLN agents lack this process, relying primarily on local observations for navigation, which leads to inefficient exploration and a significant performance gap. To deal with the problem, we consider a zero-shot VLN setting that agents are allowed to fully explore the environment before task execution. Then, we construct the Spatial Scene Graph (SSG) to explicitly capture global spatial structure and semantics in the explored environment. Based on the SSG, we introduce SpatialNav, a zero-shot VLN agent that integrates an agent-centric spatial map, a compass-aligned visual representation, and a remote object localization strategy for efficient navigation. Comprehensive experiments in both discrete and continuous environments demonstrate that SpatialNav significantly outperforms existing zero-shot agents and clearly narrows the gap with state-of-the-art learning-based methods. Such results highlight the importance of global spatial representations for generalizable navigation.

preprint2026arXiv

VLN-MME: Diagnosing MLLMs as Language-guided Visual Navigation agents

Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities across a wide range of vision-language tasks. However, their performance as embodied agents, which requires multi-round dialogue spatial reasoning and sequential action prediction, needs further exploration. Our work investigates this potential in the context of Vision-and-Language Navigation (VLN) by introducing a unified and extensible evaluation framework to probe MLLMs as zero-shot agents by bridging traditional navigation datasets into a standardized benchmark, named VLN-MME. We simplify the evaluation with a highly modular and accessible design. This flexibility streamlines experiments, enabling structured comparisons and component-level ablations across diverse MLLM architectures, agent designs, and navigation tasks. Crucially, enabled by our framework, we observe that enhancing our baseline agent with Chain-of-Thought (CoT) reasoning and self-reflection leads to an unexpected performance decrease. This suggests MLLMs exhibit poor context awareness in embodied navigation tasks; although they can follow instructions and structure their output, their 3D spatial reasoning fidelity is low. VLN-MME lays the groundwork for systematic evaluation of general-purpose MLLMs in embodied navigation settings and reveals limitations in their sequential decision-making capabilities. We believe these findings offer crucial guidance for MLLM post-training as embodied agents.

preprint2026arXiv

X-OmniClaw Technical Report: A Unified Mobile Agent for Multimodal Understanding and Interaction

Inspired by the development of OpenClaw, there is a growing demand for mobile-based personal agents capable of handling complex and intuitive interactions. In this technical report, we introduce X-OmniClaw, a unified mobile agent designed for multimodal understanding and interaction in the Android ecosystem. This unified architecture of perception, memory, and action enables the agent to handle complex mobile tasks with high contextual awareness. Specifically, Omni Perception provides a unified multimodal ingress pipeline that integrates UI states, real-world visual contexts, and speech inputs, leveraging a temporal alignment module to decompose raw data into structured multimodal intent representations. Omni Memory leverages multimodal memory optimization to enhance personalized intelligence by integrating runtime working memory for task continuity with long-term personal memory distilled from local data, enabling highly context-aware and personalized interactions. Finally, Omni Action employs a hybrid grounding strategy that combines structural XML metadata with visual perception for robust interaction. Through Behavior Cloning and Trajectory Replay, the system captures user navigation as reusable skills, enabling precise direct-access execution. Demonstrations across diverse scenarios show that X-OmniClaw effectively enhances interaction efficiency and task reliability, providing a practical architectural blueprint for the next generation of mobile-native personal assistants.

preprint2024arXiv

Hidden charmonium decays of spin-2 partner of $X(3872)$

The Belle collaboration recently reported a promising candidate for the spin-2 $D^*\bar{D}^*$ partner of the $X(3872)$, called the $X_2$ for short, having a mass of $(4014.3 \pm 4.0 \pm 1.5)~\mathrm{MeV}$ and a width of $(4 \pm 11 \pm 6)~\mathrm{MeV} $. Assuming the $X_2$ as a pure molecule of the $D^*\bar{D}^*$, we calculated in detail the hidden charmonium decays of the $X_2 \to J/ψV$ and $X_2\toη_cP$ via the intermediate meson loops, where $V = ρ^0\,,ω$ and $P= π^0\,,η\,,η'$. The results indicate that the decay widths are strongly dependent on the $X_2$ mass. At present center value of the mass $4014.3~\mathrm{MeV}$, the width for the $X_2\to J/ψρ^0$ is predicted to be a few tens of keV, while it is on the order of $10^{2\text{-}3}~\mathrm{keV}$ for the $X_2\to J/ ψω$; the predicted width for the $X_2\to η_c π^0$ is about a few keV, while the widths for $X_2\toη_cη$ and $η_cη'$ are around a few tens and tenths of keV, respectively. We also investigated the dependence of the ratios between these widths on the $X_2$ mass and on the $η$-$η'$ mixing angle, which may be good quantities for experiments. We hope that the present calculations would be checked experimentally in the future.

preprint2023arXiv

The optical spectral features of 27 Fermi blazars

A spectral variation accompanied with flux variability is a commonly-observed phenomenon for blazars. In order to further investigate the optical spectral feature of blazars, we have collected the long-term optical V and R band data of 27 blazars (14 BL Lacs and 13 FSRQs), and calculated their optical spectral indices. The results show that the spectral indices vary with respect to the brightness for all of these blazars. In general, the optical spectrum progressively becomes flatter (or steeper), when the brightness increases. However the spectrum changes more and more slowly, until it tends to be stable. In other words, the source becomes bluer (or redder) and then gradually stabilizes when it brightens, which are briefly named the bluer-stable-when-brighter (BSWB) and redder-stable-when-brighter (RSWB) behaviors, respectively. Thirteen of the 14 BL Lacs show the BSWB behavior, with an exception of AO 0235+164. On the contrary, most of FSRQs (10 out of 13) exhibit the RSWB trend. It is confirmed that blazars follow the two universal optical spectral behaviors, namely, BSWB and RSWB. The model of two constant-spectral-index components can well explain the optical spectral features qualitatively and quantitatively. The results illustrate that the optical emission are mainly composed of two stable-color components, i.e., less variable thermal emission and high variable synchrotron radiation. And in most cases, the thermal component of BL Lacs is redder than that of synchrotron radiation, whereas FSRQs are the opposite.

preprint2022arXiv

An Automated FPGA-based Framework for Rapid Prototyping of Nonbinary LDPC Codes

Nonbinary LDPC codes have shown superior performance close to the Shannon limit. Compared to binary LDPC codes of similar lengths, they can reach orders of magnitudes lower error rate. However, multitude of design freedoms of nonbinary LDPC codes complicates the practical code and decoder design process. Fast simulations are critically important to evaluate the pros and cons. Rapid prototyping on FPGA is attractive but takes significant design efforts due to its high design complexity. We propose a high-throughput reconfigurable hardware emulation architecture with decoder and peripheral co-design. The architecture enables a library and script-based framework that automates the construction of FPGA emulations. Code and decoder design parameters are programmed either during run time or by script in design time. We demonstrate the capability of the framework in evaluating practical code and decoder design by experimenting with two popular nonbinary LDPC codes, regular (2, dc) codes and quasi-cyclic codes: each emulation model can be auto-constructed within hours and the decoder delivers excellent error-correcting performance on a Xilinx Virtex-5 FPGA with throughput of up to hundreds of Mbps.

preprint2022arXiv

Attract me to Buy: Advertisement Copywriting Generation with Multimodal Multi-structured Information

Recently, online shopping has gradually become a common way of shopping for people all over the world. Wonderful merchandise advertisements often attract more people to buy. These advertisements properly integrate multimodal multi-structured information of commodities, such as visual spatial information and fine-grained structure information. However, traditional multimodal text generation focuses on the conventional description of what existed and happened, which does not match the requirement of advertisement copywriting in the real world. Because advertisement copywriting has a vivid language style and higher requirements of faithfulness. Unfortunately, there is a lack of reusable evaluation frameworks and a scarcity of datasets. Therefore, we present a dataset, E-MMAD (e-commercial multimodal multi-structured advertisement copywriting), which requires, and supports much more detailed information in text generation. Noticeably, it is one of the largest video captioning datasets in this field. Accordingly, we propose a baseline method and faithfulness evaluation metric on the strength of structured information reasoning to solve the demand in reality on this dataset. It surpasses the previous methods by a large margin on all metrics. The dataset and method are coming soon on \url{https://e-mmad.github.io/e-mmad.net/index.html}.

preprint2022arXiv

BERT-based Financial Sentiment Index and LSTM-based Stock Return Predictability

Traditional sentiment construction in finance relies heavily on the dictionary-based approach, with a few exceptions using simple machine learning techniques such as Naive Bayes classifier. While the current literature has not yet invoked the rapid advancement in the natural language processing, we construct in this research a textual-based sentiment index using a well-known pre-trained model BERT developed by Google, especially for three actively trading individual stocks in Hong Kong market with at the same time the hot discussion on Weibo.com. On the one hand, we demonstrate a significant enhancement of applying BERT in financial sentiment analysis when compared with the existing models. On the other hand, by combining with the other two commonly-used methods when it comes to building the sentiment index in the financial literature, i.e., the option-implied and the market-implied approaches, we propose a more general and comprehensive framework for the financial sentiment analysis, and further provide convincing outcomes for the predictability of individual stock return by combining LSTM (with a feature of a nonlinear mapping). It is significantly distinct with the dominating econometric methods in sentiment influence analysis which are all of a nature of linear regression.

preprint2022arXiv

Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation

Most existing works in vision-and-language navigation (VLN) focus on either discrete or continuous environments, training agents that cannot generalize across the two. The fundamental difference between the two setups is that discrete navigation assumes prior knowledge of the connectivity graph of the environment, so that the agent can effectively transfer the problem of navigation with low-level controls to jumping from node to node with high-level actions by grounding to an image of a navigable direction. To bridge the discrete-to-continuous gap, we propose a predictor to generate a set of candidate waypoints during navigation, so that agents designed with high-level actions can be transferred to and trained in continuous environments. We refine the connectivity graph of Matterport3D to fit the continuous Habitat-Matterport3D, and train the waypoints predictor with the refined graphs to produce accessible waypoints at each time step. Moreover, we demonstrate that the predicted waypoints can be augmented during training to diversify the views and paths, and therefore enhance agent's generalization ability. Through extensive experiments we show that agents navigating in continuous environments with predicted waypoints perform significantly better than agents using low-level actions, which reduces the absolute discrete-to-continuous gap by 11.76% Success Weighted by Path Length (SPL) for the Cross-Modal Matching Agent and 18.24% SPL for the Recurrent VLN-BERT. Our agents, trained with a simple imitation learning objective, outperform previous methods by a large margin, achieving new state-of-the-art results on the testing environments of the R2R-CE and the RxR-CE datasets.

preprint2022arXiv

BSRT: Improving Burst Super-Resolution with Swin Transformer and Flow-Guided Deformable Alignment

This work addresses the Burst Super-Resolution (BurstSR) task using a new architecture, which requires restoring a high-quality image from a sequence of noisy, misaligned, and low-resolution RAW bursts. To overcome the challenges in BurstSR, we propose a Burst Super-Resolution Transformer (BSRT), which can significantly improve the capability of extracting inter-frame information and reconstruction. To achieve this goal, we propose a Pyramid Flow-Guided Deformable Convolution Network (Pyramid FG-DCN) and incorporate Swin Transformer Blocks and Groups as our main backbone. More specifically, we combine optical flows and deformable convolutions, hence our BSRT can handle misalignment and aggregate the potential texture information in multi-frames more efficiently. In addition, our Transformer-based structure can capture long-range dependency to further improve the performance. The evaluation on both synthetic and real-world tracks demonstrates that our approach achieves a new state-of-the-art in BurstSR task. Further, our BSRT wins the championship in the NTIRE2022 Burst Super-Resolution Challenge.

preprint2022arXiv

ClusTR: Exploring Efficient Self-attention via Clustering for Vision Transformers

Although Transformers have successfully transitioned from their language modelling origins to image-based applications, their quadratic computational complexity remains a challenge, particularly for dense prediction. In this paper we propose a content-based sparse attention method, as an alternative to dense self-attention, aiming to reduce the computation complexity while retaining the ability to model long-range dependencies. Specifically, we cluster and then aggregate key and value tokens, as a content-based method of reducing the total token count. The resulting clustered-token sequence retains the semantic diversity of the original signal, but can be processed at a lower computational cost. Besides, we further extend the clustering-guided attention from single-scale to multi-scale, which is conducive to dense prediction tasks. We label the proposed Transformer architecture ClusTR, and demonstrate that it achieves state-of-the-art performance on various vision tasks but at lower computational cost and with fewer parameters. For instance, our ClusTR small model with 22.7M parameters achieves 83.2\% Top-1 accuracy on ImageNet. Source code and ImageNet models will be made publicly available.

preprint2022arXiv

Custom Sine Waves Are Enough for Imitation Learning of Bipedal Gaits with Different Styles

Not until recently, robust bipedal locomotion has been achieved through reinforcement learning. However, existing implementations rely heavily on insights and efforts from human experts, which is costly for the iterative design of robot systems. Also, styles of the learned motion are strictly limited to that of the reference. In this paper, we propose a new way to learn bipedal locomotion from a simple sine wave as the reference for foot heights. With the naive human insight that the two feet should be lifted up alternatively and periodically, we experimentally demonstrate on the Cassie robot that, a simple reward function is able to make the robot learn to walk end-to-end and efficiently without any explicit knowledge of the model. With custom sine waves, the learned gait pattern can also have customized styles. Codes are released at github.com/WooQi57/sin-cassie-rl.

preprint2022arXiv

Diagnosing Vision-and-Language Navigation: What Really Matters

Vision-and-language navigation (VLN) is a multimodal task where an agent follows natural language instructions and navigates in visual environments. Multiple setups have been proposed, and researchers apply new model architectures or training techniques to boost navigation performance. However, there still exist non-negligible gaps between machines' performance and human benchmarks. Moreover, the agents' inner mechanisms for navigation decisions remain unclear. To the best of our knowledge, how the agents perceive the multimodal input is under-studied and needs investigation. In this work, we conduct a series of diagnostic experiments to unveil agents' focus during navigation. Results show that indoor navigation agents refer to both object and direction tokens when making decisions. In contrast, outdoor navigation agents heavily rely on direction tokens and poorly understand the object tokens. Transformer-based agents acquire a better cross-modal understanding of objects and display strong numerical reasoning ability than non-Transformer-based agents. When it comes to vision-and-language alignments, many models claim that they can align object tokens with specific visual targets. We find unbalanced attention on the vision and text input and doubt the reliability of such cross-modal alignments.

preprint2022arXiv

Fast Nearest Convolution for Real-Time Efficient Image Super-Resolution

Deep learning-based single image super-resolution (SISR) approaches have drawn much attention and achieved remarkable success on modern advanced GPUs. However, most state-of-the-art methods require a huge number of parameters, memories, and computational resources, which usually show inferior inference times when applying them to current mobile device CPUs/NPUs. In this paper, we propose a simple plain convolution network with a fast nearest convolution module (NCNet), which is NPU-friendly and can perform a reliable super-resolution in real-time. The proposed nearest convolution has the same performance as the nearest upsampling but is much faster and more suitable for Android NNAPI. Our model can be easily deployed on mobile devices with 8-bit quantization and is fully compatible with all major mobile AI accelerators. Moreover, we conduct comprehensive experiments on different tensor operations on a mobile device to illustrate the efficiency of our network architecture. Our NCNet is trained and validated on the DIV2K 3x dataset, and the comparison with other efficient SR methods demonstrated that the NCNet can achieve high fidelity SR results while using fewer inference times. Our codes and pretrained models are publicly available at \url{https://github.com/Algolzw/NCNet}.

preprint2022arXiv

Flipping of antiferromagnetic to superconducting states in pressurized quasi-one-dimensional manganese-based compounds

One of the universal features of unconventional superconductors is that the superconducting (SC) state is developed in the proximity of an antiferromagnetic (AFM) state. Understanding the interplay between these two states is one of the key issues to uncover the underlying physics of unconventional SC mechanism. Here, we report a pressure-induced flipping of the AFM state to SC state in the quasi-one-dimensional AMn6Bi5 (A = K, Rb, and Cs) compounds. We find that at a critical pressure the AFM state suddenly disappears at a finite temperature and a SC state simultaneously emerges at a lower temperature without detectable structural changes. Intriguingly, all members of the family present the AFM-SC transition at almost the same critical pressures (Pc), though their ambient-pressure unit-cell volumes vary substantially. Our theoretical calculations indicate that the increasing weight of dxz orbital electrons near Fermi energy under the pressure may be the origin of the flipping. These results reveal a diversity of competing nature between the AFM and SC states among the 3d-transition-metal compounds.

preprint2022arXiv

Four-dimensional direct detection with Jones space optical full-field recovery

Data centers, the engines of the global Internet, are supported by massive high-speed optical interconnects. In optical fiber communication, the classic direct detection obtains only the intensity of the optical field, while the coherent detection counterpart utilizes both phase and polarization diversities at the expense of beating with a narrow-linewidth and high-stable local oscillator (LO). Herein, we propose and demonstrate a four-dimensional Jones space optical field recovery (4-D JSFR) scheme without LO. The information encoded on the intensity and phase of both polarizations can be captured by the polarization-diversity full-field receiver structure and subsequently extracted through deep neural network-aided field recovery. It achieves similar electrical spectral efficiency as standard intradyne coherent detection. The fully recovered optical field can extend the transmission distance beyond the power fading limitation induced by fiber chromatic dispersion. Furthermore, the LO-free advantage makes 4-D JSFR suitable for monolithic photonic integration, offering a spectrally efficient and cost-effective candidate for large-scale data center applications. Our results could motivate a fundamental paradigm shift in the optical field recovery theory and future optical transceiver design.

preprint2022arXiv

MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answering

Knowledge-based visual question answering requires the ability of associating external knowledge for open-ended cross-modal scene understanding. One limitation of existing solutions is that they capture relevant knowledge from text-only knowledge bases, which merely contain facts expressed by first-order predicates or language descriptions while lacking complex but indispensable multimodal knowledge for visual understanding. How to construct vision-relevant and explainable multimodal knowledge for the VQA scenario has been less studied. In this paper, we propose MuKEA to represent multimodal knowledge by an explicit triplet to correlate visual objects and fact answers with implicit relations. To bridge the heterogeneous gap, we propose three objective losses to learn the triplet representations from complementary views: embedding structure, topological relation and semantic space. By adopting a pre-training and fine-tuning learning strategy, both basic and domain-specific multimodal knowledge are progressively accumulated for answer prediction. We outperform the state-of-the-art by 3.35% and 6.08% respectively on two challenging knowledge-required datasets: OK-VQA and KRVQA. Experimental results prove the complementary benefits of the multimodal knowledge with existing knowledge bases and the advantages of our end-to-end framework over the existing pipeline methods. The code is available at https://github.com/AndersonStra/MuKEA.

preprint2022arXiv

Observation of three superconducting transitions in the pressurized CDW-bearing compound TaTe2

Transition metal dichalcogenides host a wide variety of lattice and electronic structures, as well as corresponding exotic physical properties, especially under certain tuning conditions. Here, we are the first to report the observation of pressure-induced three superconducting transitions in TaTe2, a charge density wave (CDW) - bearing layered transition-metal dichalcogenide that is metallic but not superconducting at ambient pressure. We find that its CDW state can be easily suppressed upon increasing pressure up to ~ 1 GPa. A superconducting state then emerges from the suppressed CDW state and persists to the pressure about 7 GPa. Unexpectedly, another superconducting state appears at ~ 11 GPa within the same monoclinic (M) structure of its ambient-pressure one. Upon further compression to 21 GPa, a third superconducting state with higher Tc appears from a high-pressure (HP) phase. Our experimental results suggest that the pressure-induced three superconducting transitions in TaTe2 are respectively driven by the suppression of the CDW state, the change of the angle in the M phase and the transition of M-to-HP phase. These results demonstrate not only the versatile nature of this correlated electron system, but also the first experimental example that shows the pressure-induced evolution from a CDW state to three superconducting states driven by different mechanisms.

preprint2022arXiv

Optical Field Recovery in Jones Space

Optical full-field recovery makes it possible to compensate for fiber impairments such as chromatic dispersion and polarization mode dispersion (PMD) in the digital signal processing. For cost-sensitive short-reach optical networks, some advanced single-polarization (SP) optical field recovery schemes are recently proposed to avoid chromatic dispersion-induced power fading effect, and improve the spectral efficiency for larger potential capacity. Polarization division multiplexing (PDM) can further double both the spectral efficiency and the system capacity of these SP carrier-assisted direct detection (DD) schemes. However, the so-called polarization fading phenomenon induced by random polarization rotation is a fundamental obstacle which prevents SP carrier-assisted DD systems from polarization diversity. In this paper, we propose a receiver of Jones-space field recovery (JSFR) to realize polarization diversity with SP carrier-assisted DD schemes in Jones space. Different receiver structures and simplified recovery procedures for JSFR are explored theoretically. The proposed JSFR pushes the SP DD schemes towards PDM without extra optical signal-to-noise ratio (OSNR) penalty. In addition, the JSFR shows good tolerance to PMD since the optical field recovery is conducted before polarization recovery. In the concept-of-proof experiment, we demonstrate 448-Gb/s reception over 80-km single-mode fiber using the proposed JSFR based on 22 couplers. Furthermore, we qualitatively compare the optical field recovery in Jones space and Stokes space from the perspective of the modulation dimension. Qualitatively, we compare the optical field recovery in the Jones space and Stokes space from the perspective of the modulation dimension.

preprint2022arXiv

Quasi-uniaxial pressure induced superconductivity in stoichiometric compound UTe$_2$

The recent discovery of superconductivity in heavy Fermion compound UTe2, a candidate topological and triplet-paired superconductor, has aroused widespread interest. However, to date, there is no consensus on whether the stoichiometric sample of UTe2 is superconducting or not due to lack of reliable evidence to distinguish the difference between the nominal and real compositions of samples. Here, we are the first to clarify that the stoichiometric UT2 is non-superconducting at ambient pressure and under hydrostatic pressure up to 6 GPa, however we find that it can be compressed into superconductivity by application of quasi-uniaxial pressure. Measurements of resistivity, magnetoresistance and susceptibility reveal that the quasi-uniaxial pressure results in a suppression of the Kondo coherent state seen at ambient pressure, and then leads to a superconductivity initially emerged on the ab-plane at 1.5 GPa. At 4.8 GPa, the superconductivity is developed in three crystallographic directions. The superconducting state coexists with an exotic magnetic ordered state that develops just below the onset temperature of the superconducting transition. The discovery of the quasi-uniaxial-pressure-induced superconductivity with exotic magnetic state in the stoichiometric UTe2 not only provide new understandings on this compound, but also highlight the vital role of Te deficiency in developing the superconductivity at ambient pressures.

preprint2022arXiv

UniMiSS: Universal Medical Self-Supervised Learning via Breaking Dimensionality Barrier

Self-supervised learning (SSL) opens up huge opportunities for medical image analysis that is well known for its lack of annotations. However, aggregating massive (unlabeled) 3D medical images like computerized tomography (CT) remains challenging due to its high imaging cost and privacy restrictions. In this paper, we advocate bringing a wealth of 2D images like chest X-rays as compensation for the lack of 3D data, aiming to build a universal medical self-supervised representation learning framework, called UniMiSS. The following problem is how to break the dimensionality barrier, \ie, making it possible to perform SSL with both 2D and 3D images? To achieve this, we design a pyramid U-like medical Transformer (MiT). It is composed of the switchable patch embedding (SPE) module and Transformers. The SPE module adaptively switches to either 2D or 3D patch embedding, depending on the input dimension. The embedded patches are converted into a sequence regardless of their original dimensions. The Transformers model the long-term dependencies in a sequence-to-sequence manner, thus enabling UniMiSS to learn representations from both 2D and 3D images. With the MiT as the backbone, we perform the UniMiSS in a self-distillation manner. We conduct expensive experiments on six 3D/2D medical image analysis tasks, including segmentation and classification. The results show that the proposed UniMiSS achieves promising performance on various downstream tasks, outperforming the ImageNet pre-training and other advanced SSL counterparts substantially. Code is available at \def\UrlFont{\rm\small\ttfamily} \url{https://github.com/YtongXie/UniMiSS-code}.

preprint2022arXiv

Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

A long-term goal of AI research is to build intelligent agents that can communicate with humans in natural language, perceive the environment, and perform real-world tasks. Vision-and-Language Navigation (VLN) is a fundamental and interdisciplinary research topic towards this goal, and receives increasing attention from natural language processing, computer vision, robotics, and machine learning communities. In this paper, we review contemporary studies in the emerging field of VLN, covering tasks, evaluation metrics, methods, etc. Through structured analysis of current progress and challenges, we highlight the limitations of current VLN and opportunities for future work. This paper serves as a thorough reference for the VLN research community.

preprint2021arXiv

Data-driven advice for interpreting local and global model predictions in bioinformatics problems

Tree-based algorithms such as random forests and gradient boosted trees continue to be among the most popular and powerful machine learning models used across multiple disciplines. The conventional wisdom of estimating the impact of a feature in tree based models is to measure the \textit{node-wise reduction of a loss function}, which (i) yields only global importance measures and (ii) is known to suffer from severe biases. Conditional feature contributions (CFCs) provide \textit{local}, case-by-case explanations of a prediction by following the decision path and attributing changes in the expected output of the model to each feature along the path. However, Lundberg et al. pointed out a potential bias of CFCs which depends on the distance from the root of a tree. The by now immensely popular alternative, SHapley Additive exPlanation (SHAP) values appear to mitigate this bias but are computationally much more expensive. Here we contribute a thorough comparison of the explanations computed by both methods on a set of 164 publicly available classification problems in order to provide data-driven algorithm recommendations to current researchers. For random forests, we find extremely high similarities and correlations of both local and global SHAP values and CFC scores, leading to very similar rankings and interpretations. Analogous conclusions hold for the fidelity of using global feature importance scores as a proxy for the predictive power associated with each feature.

preprint2021arXiv

How to Train Your Agent to Read and Write

Reading and writing research papers is one of the most privileged abilities that a qualified researcher should master. However, it is difficult for new researchers (\eg{students}) to fully {grasp} this ability. It would be fascinating if we could train an intelligent agent to help people read and summarize papers, and perhaps even discover and exploit the potential knowledge clues to write novel papers. Although there have been existing works focusing on summarizing (\emph{i.e.}, reading) the knowledge in a given text or generating (\emph{i.e.}, writing) a text based on the given knowledge, the ability of simultaneously reading and writing is still under development. Typically, this requires an agent to fully understand the knowledge from the given text materials and generate correct and fluent novel paragraphs, which is very challenging in practice. In this paper, we propose a Deep ReAder-Writer (DRAW) network, which consists of a \textit{Reader} that can extract knowledge graphs (KGs) from input paragraphs and discover potential knowledge, a graph-to-text \textit{Writer} that generates a novel paragraph, and a \textit{Reviewer} that reviews the generated paragraph from three different aspects. Extensive experiments show that our DRAW network outperforms considered baselines and several state-of-the-art methods on AGENDA and M-AGENDA datasets. Our code and supplementary are released at https://github.com/menggehe/DRAW.

preprint2021arXiv

Learning for Visual Navigation by Imagining the Success

Visual navigation is often cast as a reinforcement learning (RL) problem. Current methods typically result in a suboptimal policy that learns general obstacle avoidance and search behaviours. For example, in the target-object navigation setting, the policies learnt by traditional methods often fail to complete the task, even when the target is clearly within reach from a human perspective. In order to address this issue, we propose to learn to imagine a latent representation of the successful (sub-)goal state. To do so, we have developed a module which we call Foresight Imagination (ForeSIT). ForeSIT is trained to imagine the recurrent latent representation of a future state that leads to success, e.g. either a sub-goal state that is important to reach before the target, or the goal state itself. By conditioning the policy on the generated imagination during training, our agent learns how to use this imagination to achieve its goal robustly. Our agent is able to imagine what the (sub-)goal state may look like (in the latent space) and can learn to navigate towards that state. We develop an efficient learning algorithm to train ForeSIT in an on-policy manner and integrate it into our RL objective. The integration is not trivial due to the constantly evolving state representation shared between both the imagination and the policy. We, empirically, observe that our method outperforms the state-of-the-art methods by a large margin in the commonly accepted benchmark AI2THOR environment. Our method can be readily integrated or added to other model-free RL navigation frameworks.

preprint2021arXiv

Production of $Z_{cs}$ in $B$ and $B_s$ decay

In the present work, we investigate the production of $Z_{cs}^+$ in $B^+$ and $B_s^0$ decay, where $Z_{cs}^+$ is assigned as a $D_s^{+} \bar{D}^{\ast0} + D_s^{\ast +}\bar{D}^0$ molecular state. By using an effective Lagrangian approach, we evaluate the branching ratio of $B^0_s\rightarrow K^- Z^+_{cs}$ and $B^+\rightarrow ϕZ^{+}_{cs}$ via the triangle loop mechanism. The estimated branching fractions of $B^0_s\rightarrow K^- Z^+_{cs}$ and $B^+\rightarrow ϕZ^{+}_{cs}$ are an order of $10^{-4} $ and $10^{-5}$, respectively. The ratio of these two branching fraction is estimated to be about 5, which indicate that the $B_s^0 \to K^\pm Z^\mp_{cs} \to K^+ K^- J/ψ$ may be a better process of searching $Z_{cs}$ and accessible for further experimental measurement of the Belle II and LHCb collaborations.

preprint2021arXiv

Semantics for Robotic Mapping, Perception and Interaction: A Survey

For robots to navigate and interact more richly with the world around them, they will likely require a deeper understanding of the world in which they operate. In robotics and related research fields, the study of understanding is often referred to as semantics, which dictates what does the world "mean" to a robot, and is strongly tied to the question of how to represent that meaning. With humans and robots increasingly operating in the same world, the prospects of human-robot interaction also bring semantics and ontology of natural language into the picture. Driven by need, as well as by enablers like increasing availability of training data and computational resources, semantics is a rapidly growing research area in robotics. The field has received significant attention in the research literature to date, but most reviews and surveys have focused on particular aspects of the topic: the technical research issues regarding its use in specific robotic topics like mapping or segmentation, or its relevance to one particular application domain like autonomous driving. A new treatment is therefore required, and is also timely because so much relevant research has occurred since many of the key surveys were published. This survey therefore provides an overarching snapshot of where semantics in robotics stands today. We establish a taxonomy for semantics research in or relevant to robotics, split into four broad categories of activity, in which semantics are extracted, used, or both. Within these broad categories we survey dozens of major topics including fundamentals from the computer vision field and key robotics research areas utilizing semantics, including mapping, navigation and interaction with the world. The survey also covers key practical considerations, including enablers like increased data availability and improved computational hardware, and major application areas where...

preprint2021arXiv

Universal quantum transition from superconducting to insulating states in pressurized Bi2Sr2CaCu2O8+δ superconductors

Copper oxide superconductors have continually fascinated the communities of condensed matter physics and material sciences because they host the highest ambient-pressure superconducting transition temperature (Tc) and mysterious physics. Searching for the universal correlation between the superconducting state and its normal state or neighboring ground state is believed to be an effective way for finding clues to elucidate the underlying mechanism of the superconductivity. One of the common pictures for the copper oxide superconductors is that a well-behaved metallic phase will present after the superconductivity is entirely suppressed by chemical doping or application of the magnetic field. Here, we report a different observation of universal quantum transition from superconducting state to insulating-like state under pressure in the under-, optimally- and over-doped Bi2212 superconductors with two CuO2 planes in a unit cell. The same phenomenon has been also found in the Bi2201 superconductor with one CuO2 plane and the Bi2223 superconductor with three CuO2 planes in a unit cell. These results not only provide fresh information but also pose a new challenge for achieving a unified understanding on the underlying physics of the high-Tc superconductivity.

preprint2021arXiv

What causes the absence of pulsations in Central Compact Objects in Supernova Remnants?

Most young neutron stars belonging to the class of Central Compact Objects in supernova remnants (CCOs) do not have known periodicities. We investigated seven such CCOs to understand the common reasons for the absence of detected pulsations. Making use of XMM-Newton, Chandra, and NICER observations, we perform a systematic timing and spectral analysis to derive updated sensitivity limits for both periodic signals and multi-temperature spectral components that could be associated with radiation from hotspots on the neutron star surface. Based on these limits, we then investigated for each target the allowed viewing geometry that could explain the lack of pulsations. We estimate it is unlikely ($< 10^{-6}$) to attribute that we do not see pulsations to an unfavorable viewing geometry for five considered sources. Alternatively, the carbon atmosphere model, which assumes homogeneous temperature distribution on the surface, describes the spectra equally well and provides a reasonable interpretation for the absence of detected periodicities within current limits. The unusual properties of CCOs with respect to other young neutron stars could suggest a different evolutionary path, as that proposed for sources experiencing episodes of significant fallback accretion after the supernova event.

Qi Wu

What is connected

Connect this record

See the researcher in context

Building this map preview

82 published item(s)

A Scheduling Framework for Efficient MoE Inference on Edge GPU-NDP Systems

CD-PIM: A High-Bandwidth and Compute-Efficient LPDDR5-Based PIM for Low-Batch LLM Acceleration on Edge-Device

Dipion transitions from $X(3872)$ to $χ_{cJ}\ (J=0,1,2)$

SpatialNav: Leveraging Spatial Scene Graphs for Zero-Shot Vision-and-Language Navigation

VLN-MME: Diagnosing MLLMs as Language-guided Visual Navigation agents

X-OmniClaw Technical Report: A Unified Mobile Agent for Multimodal Understanding and Interaction

Hidden charmonium decays of spin-2 partner of $X(3872)$

The optical spectral features of 27 Fermi blazars

An Automated FPGA-based Framework for Rapid Prototyping of Nonbinary LDPC Codes

Attract me to Buy: Advertisement Copywriting Generation with Multimodal Multi-structured Information

BERT-based Financial Sentiment Index and LSTM-based Stock Return Predictability

Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation

BSRT: Improving Burst Super-Resolution with Swin Transformer and Flow-Guided Deformable Alignment

ClusTR: Exploring Efficient Self-attention via Clustering for Vision Transformers

Custom Sine Waves Are Enough for Imitation Learning of Bipedal Gaits with Different Styles

Diagnosing Vision-and-Language Navigation: What Really Matters

Fast Nearest Convolution for Real-Time Efficient Image Super-Resolution

Flipping of antiferromagnetic to superconducting states in pressurized quasi-one-dimensional manganese-based compounds

Four-dimensional direct detection with Jones space optical full-field recovery

MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answering

Observation of three superconducting transitions in the pressurized CDW-bearing compound TaTe2

Optical Field Recovery in Jones Space

Quasi-uniaxial pressure induced superconductivity in stoichiometric compound UTe$_2$

UniMiSS: Universal Medical Self-Supervised Learning via Breaking Dimensionality Barrier

Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

Data-driven advice for interpreting local and global model predictions in bioinformatics problems

How to Train Your Agent to Read and Write

Learning for Visual Navigation by Imagining the Success

Production of $Z_{cs}$ in $B$ and $B_s$ decay

Semantics for Robotic Mapping, Perception and Interaction: A Survey

Universal quantum transition from superconducting to insulating states in pressurized Bi2Sr2CaCu2O8+δ superconductors

What causes the absence of pulsations in Central Compact Objects in Supernova Remnants?

$D$ wave bottomonia production from $Z_b^{(\prime)}$ decay

Attention-SLAM: A Visual Monocular SLAM Learning from Human Gaze

Cops-Ref: A new Dataset and Task on Compositional Referring Expression Comprehension

Correlation between Fermi surface reconstruction and superconductivity in pressurized FeTe0.55Se0.45

DAM: Deliberation, Abandon and Memory Networks for Generating Detailed and Non-repetitive Responses in Visual Dialogue

Data-driven Meta-set Based Fine-Grained Visual Classification

Evasion of HSR in the charmless decays of excited $P$-wave charmonia

Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning

Foreground-Background Imbalance Problem in Deep Object Detectors: A Review

Give Me Something to Eat: Referring Expression Comprehension with Commonsense Knowledge

Gold Seeker: Information Gain from Policy Distributions for Goal-oriented Vision-and-Langauge Reasoning

Intelligent Home 3D: Automatic 3D-House Design from Linguistic Descriptions Only

Length-Controllable Image Captioning

MeisterMorxrc at SemEval-2020 Task 9: Fine-Tune Bert and Multitask Learning for Sentiment Analysis of Code-Mixed Tweets

Memory-Gated Recurrent Networks

Object-and-Action Aware Model for Visual Language Navigation

Production of $P-$wave charmed and charmed-strange mesons in pion and kaon induced reactions

Quantum phases of SrCu2(BO3)2 from high-pressure thermodynamics

Reemergence of superconductivity in pressurized quasi-one-dimensional superconductor K2Mo3As3

REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments

Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs

Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering

Soft Expert Reward Learning for Vision-and-Language Navigation

Understanding Distributional Ambiguity via Non-robust Chance Constraint

Cross-sectional Learning of Extremal Dependence among Financial Assets

Hall coefficient diagnostics of surface state in pressurized SmB6

Ask Me Anything: Free-form Visual Question Answering Based on Knowledge from External Sources

Correlation between non-centrosymmetry and superconductivity in quasi-one-dimensional compounds A2Cr3As3 (A=K, Rb)

Image Captioning and Visual Question Answering Based on Attributes and External Knowledge

Production of the $X_b$ in $Υ(5S, 6S)\to γX_b$ radiative decays

Reversible tuning of superconductivity in pressurized qausi-one-dimensional A2Cr3As3 (A=K and Rb)

The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions

Visual Question Answering: A Survey of Methods and Datasets

What value do explicit high level concepts have in vision to language problems?

Breakdown of Three-dimensional Dirac Semimetal State in pressurized Cd3As2

Correlation between intercalated magnetic layers and superconductivity in pressurized EuFe2(As0.81P0.19)2

Explicit Knowledge-based Reasoning for Visual Question Answering

The Cross-Depiction Problem: Computer Vision Algorithms for Recognising Objects in Artwork and in Photographs

Connection between ambient-pressure and pressure-induced superconducting phases in alkaline iron selenide superconductors

Robust antiferromagnetism preventing superconductivity in pressurized Ba0.61K0.39Mn2Bi2

The role of 245 phase in alkaline iron selenide superconductors revealed by high pressure studies

Observation of antiferromagnetic order collapse in the pressurized insulator LaMnPO