Source author record

Xian Wu

Xian Wu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

31works

22topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

RoboAlign-R1: Distilled Multimodal Reward Alignment for Robot Video World Models

Existing robot video world models are typically trained with low-level objectives such as reconstruction and perceptual similarity, which are poorly aligned with the capabilities that matter most for robot decision making, including instruction following, manipulation success, and physical plausibility. They also suffer from error accumulation in long-horizon autoregressive prediction. We present RoboAlign-R1, a framework that combines reward-aligned post-training with stabilized long-horizon inference for robot video world models. We construct RobotWorldBench, a benchmark of 10,000 annotated video-instruction pairs collected from four robot data sources, and train a multimodal teacher judge, RoboAlign-Judge, to provide fine-grained six-dimensional evaluation of generated videos. We then distill the teacher into a lightweight student reward model for efficient reinforcement-learning-based post-training. To reduce long-horizon rollout drift, we further introduce Sliding Window Re-encoding (SWR), a training-free inference strategy that periodically refreshes the generation context. Under our in-domain evaluation protocol, RoboAlign-R1 improves the aggregate six-dimension score by 10.1% over the strongest baseline, including gains of 7.5% on Manipulation Accuracy and 4.6% on Instruction Following; these ranking improvements are further supported by an external VLM-based cross-check and a blinded human study. Meanwhile, SWR improves long-horizon prediction quality with only about 1% additional latency, yielding a 2.8% gain in SSIM and a 9.8% reduction in LPIPS. Together, these results show that reward-aligned post-training and stabilized long-horizon decoding improve task consistency, physical realism, and long-horizon prediction quality in robot video world models.

preprint2022arXiv

AlignTransformer: Hierarchical Alignment of Visual Regions and Disease Tags for Medical Report Generation

Recently, medical report generation, which aims to automatically generate a long and coherent descriptive paragraph of a given medical image, has received growing research interests. Different from the general image captioning tasks, medical report generation is more challenging for data-driven neural models. This is mainly due to 1) the serious data bias: the normal visual regions dominate the dataset over the abnormal visual regions, and 2) the very long sequence. To alleviate above two problems, we propose an AlignTransformer framework, which includes the Align Hierarchical Attention (AHA) and the Multi-Grained Transformer (MGT) modules: 1) AHA module first predicts the disease tags from the input image and then learns the multi-grained visual features by hierarchically aligning the visual regions and disease tags. The acquired disease-grounded visual features can better represent the abnormal regions of the input image, which could alleviate data bias problem; 2) MGT module effectively uses the multi-grained features and Transformer framework to generate the long medical report. The experiments on the public IU-Xray and MIMIC-CXR datasets show that the AlignTransformer can achieve results competitive with state-of-the-art methods on the two datasets. Moreover, the human evaluation conducted by professional radiologists further proves the effectiveness of our approach.

preprint2022arXiv

AutoField: Automating Feature Selection in Deep Recommender Systems

Feature quality has an impactful effect on recommendation performance. Thereby, feature selection is a critical process in developing deep learning-based recommender systems. Most existing deep recommender systems, however, focus on designing sophisticated neural networks, while neglecting the feature selection process. Typically, they just feed all possible features into their proposed deep architectures, or select important features manually by human experts. The former leads to non-trivial embedding parameters and extra inference time, while the latter requires plenty of expert knowledge and human labor effort. In this work, we propose an AutoML framework that can adaptively select the essential feature fields in an automatic manner. Specifically, we first design a differentiable controller network, which is capable of automatically adjusting the probability of selecting a particular feature field; then, only selected feature fields are utilized to retrain the deep recommendation model. Extensive experiments on three benchmark datasets demonstrate the effectiveness of our framework. We conduct further experiments to investigate its properties, including the transferability, key components, and parameter sensitivity.

preprint2022arXiv

Conditional Generation Net for Medication Recommendation

Medication recommendation targets to provide a proper set of medicines according to patients' diagnoses, which is a critical task in clinics. Currently, the recommendation is manually conducted by doctors. However, for complicated cases, like patients with multiple diseases at the same time, it's difficult to propose a considerate recommendation even for experienced doctors. This urges the emergence of automatic medication recommendation which can help treat the diagnosed diseases without causing harmful drug-drug interactions.Due to the clinical value, medication recommendation has attracted growing research interests.Existing works mainly formulate medication recommendation as a multi-label classification task to predict the set of medicines. In this paper, we propose the Conditional Generation Net (COGNet) which introduces a novel copy-or-predict mechanism to generate the set of medicines. Given a patient, the proposed model first retrieves his or her historical diagnoses and medication recommendations and mines their relationship with current diagnoses. Then in predicting each medicine, the proposed model decides whether to copy a medicine from previous recommendations or to predict a new one. This process is quite similar to the decision process of human doctors. We validate the proposed model on the public MIMIC data set, and the experimental results show that the proposed model can outperform state-of-the-art approaches.

preprint2022arXiv

DeepPortraitDrawing: Generating Human Body Images from Freehand Sketches

Researchers have explored various ways to generate realistic images from freehand sketches, e.g., for objects and human faces. However, how to generate realistic human body images from sketches is still a challenging problem. It is, first because of the sensitivity to human shapes, second because of the complexity of human images caused by body shape and pose changes, and third because of the domain gap between realistic images and freehand sketches. In this work, we present DeepPortraitDrawing, a deep generative framework for converting roughly drawn sketches to realistic human body images. To encode complicated body shapes under various poses, we take a local-to-global approach. Locally, we employ semantic part auto-encoders to construct part-level shape spaces, which are useful for refining the geometry of an input pre-segmented hand-drawn sketch. Globally, we employ a cascaded spatial transformer network to refine the structure of body parts by adjusting their spatial locations and relative proportions. Finally, we use a global synthesis network for the sketch-to-image translation task, and a face refinement network to enhance facial details. Extensive experiments have shown that given roughly sketched human portraits, our method produces more realistic images than the state-of-the-art sketch-to-image synthesis techniques.

preprint2022arXiv

Denoising Neural Network for News Recommendation with Positive and Negative Implicit Feedback

News recommendation is different from movie or e-commercial recommendation as people usually do not grade the news. Therefore, user feedback for news is always implicit (click behavior, reading time, etc). Inevitably, there are noises in implicit feedback. On one hand, the user may exit immediately after clicking the news as he dislikes the news content, leaving the noise in his positive implicit feedback; on the other hand, the user may be recommended multiple interesting news at the same time and only click one of them, producing the noise in his negative implicit feedback. Opposite implicit feedback could construct more integrated user preferences and help each other to minimize the noise influence. Previous works on news recommendation only used positive implicit feedback and suffered from the noise impact. In this paper, we propose a denoising neural network for news recommendation with positive and negative implicit feedback, named DRPN. DRPN utilizes both feedback for recommendation with a module to denoise both positive and negative implicit feedback to further enhance the performance. Experiments on the real-world large-scale dataset demonstrate the state-of-the-art performance of DRPN.

preprint2022arXiv

End-to-end Spoken Conversational Question Answering: Task, Dataset and Model

In spoken question answering, the systems are designed to answer questions from contiguous text spans within the related speech transcripts. However, the most natural way that human seek or test their knowledge is via human conversations. Therefore, we propose a new Spoken Conversational Question Answering task (SCQA), aiming at enabling the systems to model complex dialogue flows given the speech documents. In this task, our main objective is to build the system to deal with conversational questions based on the audio recordings, and to explore the plausibility of providing more cues from different modalities with systems in information gathering. To this end, instead of directly adopting automatically generated speech transcripts with highly noisy data, we propose a novel unified data distillation approach, DDNet, which effectively ingests cross-modal information to achieve fine-grained representations of the speech and language modalities. Moreover, we propose a simple and novel mechanism, termed Dual Attention, by encouraging better alignments between audio and text to ease the process of knowledge transfer. To evaluate the capacity of SCQA systems in a dialogue-style interaction, we assemble a Spoken Conversational Question Answering (Spoken-CoQA) dataset with more than 40k question-answer pairs from 4k conversations. The performance of the existing state-of-the-art methods significantly degrade on our dataset, hence demonstrating the necessity of cross-modal information integration. Our experimental results demonstrate that our proposed method achieves superior performance in spoken conversational question answering tasks.

preprint2022arXiv

Graph-in-Graph Network for Automatic Gene Ontology Description Generation

Gene Ontology (GO) is the primary gene function knowledge base that enables computational tasks in biomedicine. The basic element of GO is a term, which includes a set of genes with the same function. Existing research efforts of GO mainly focus on predicting gene term associations. Other tasks, such as generating descriptions of new terms, are rarely pursued. In this paper, we propose a novel task: GO term description generation. This task aims to automatically generate a sentence that describes the function of a GO term belonging to one of the three categories, i.e., molecular function, biological process, and cellular component. To address this task, we propose a Graph-in-Graph network that can efficiently leverage the structural information of GO. The proposed network introduces a two-layer graph: the first layer is a graph of GO terms where each node is also a graph (gene graph). Such a Graph-in-Graph network can derive the biological functions of GO terms and generate proper descriptions. To validate the effectiveness of the proposed network, we build three large-scale benchmark datasets. By incorporating the proposed Graph-in-Graph network, the performances of seven different sequence-to-sequence models can be substantially boosted across all evaluation metrics, with up to 34.7%, 14.5%, and 39.1% relative improvements in BLEU, ROUGE-L, and METEOR, respectively.

preprint2022arXiv

Multi-modal Contrastive Representation Learning for Entity Alignment

Multi-modal entity alignment aims to identify equivalent entities between two different multi-modal knowledge graphs, which consist of structural triples and images associated with entities. Most previous works focus on how to utilize and encode information from different modalities, while it is not trivial to leverage multi-modal knowledge in entity alignment because of the modality heterogeneity. In this paper, we propose MCLEA, a Multi-modal Contrastive Learning based Entity Alignment model, to obtain effective joint representations for multi-modal entity alignment. Different from previous works, MCLEA considers task-oriented modality and models the inter-modal relationships for each entity representation. In particular, MCLEA firstly learns multiple individual representations from multiple modalities, and then performs contrastive learning to jointly model intra-modal and inter-modal interactions. Extensive experimental results show that MCLEA outperforms state-of-the-art baselines on public datasets under both supervised and unsupervised settings.

preprint2022arXiv

NeRF-SR: High-Quality Neural Radiance Fields using Supersampling

We present NeRF-SR, a solution for high-resolution (HR) novel view synthesis with mostly low-resolution (LR) inputs. Our method is built upon Neural Radiance Fields (NeRF) that predicts per-point density and color with a multi-layer perceptron. While producing images at arbitrary scales, NeRF struggles with resolutions that go beyond observed images. Our key insight is that NeRF benefits from 3D consistency, which means an observed pixel absorbs information from nearby views. We first exploit it by a supersampling strategy that shoots multiple rays at each image pixel, which further enforces multi-view constraint at a sub-pixel level. Then, we show that NeRF-SR can further boost the performance of supersampling by a refinement network that leverages the estimated depth at hand to hallucinate details from related patches on only one HR reference image. Experiment results demonstrate that NeRF-SR generates high-quality results for novel view synthesis at HR on both synthetic and real-world datasets without any external information.

preprint2022arXiv

O2NA: An Object-Oriented Non-Autoregressive Approach for Controllable Video Captioning

Video captioning combines video understanding and language generation. Different from image captioning that describes a static image with details of almost every object, video captioning usually considers a sequence of frames and biases towards focused objects, e.g., the objects that stay in focus regardless of the changing background. Therefore, detecting and properly accommodating focused objects is critical in video captioning. To enforce the description of focused objects and achieve controllable video captioning, we propose an Object-Oriented Non-Autoregressive approach (O2NA), which performs caption generation in three steps: 1) identify the focused objects and predict their locations in the target caption; 2) generate the related attribute words and relation words of these focused objects to form a draft caption; and 3) combine video information to refine the draft caption to a fluent final caption. Since the focused objects are generated and located ahead of other words, it is difficult to apply the word-by-word autoregressive generation process; instead, we adopt a non-autoregressive approach. The experiments on two benchmark datasets, i.e., MSR-VTT and MSVD, demonstrate the effectiveness of O2NA, which achieves results competitive with the state-of-the-arts but with both higher diversity and higher inference speed.

preprint2022arXiv

Radiology Report Generation with a Learned Knowledge Base and Multi-modal Alignment

In clinics, a radiology report is crucial for guiding a patient's treatment. However, writing radiology reports is a heavy burden for radiologists. To this end, we present an automatic, multi-modal approach for report generation from a chest x-ray. Our approach, motivated by the observation that the descriptions in radiology reports are highly correlated with specific information of the x-ray images, features two distinct modules: (i) Learned knowledge base: To absorb the knowledge embedded in the radiology reports, we build a knowledge base that can automatically distil and restore medical knowledge from textual embedding without manual labour; (ii) Multi-modal alignment: to promote the semantic alignment among reports, disease labels, and images, we explicitly utilize textual embedding to guide the learning of the visual feature space. We evaluate the performance of the proposed model using metrics from both natural language generation and clinic efficacy on the public IU-Xray and MIMIC-CXR datasets. Our ablation study shows that each module contributes to improving the quality of generated reports. Furthermore, with the assistance of both modules, our approach outperforms state-of-the-art methods over almost all the metrics.

preprint2022arXiv

Rational tensegrities through the lens of toric geometry

A classical tensegrity model consists of an embedded graph in a vector space with rigid bars representing edges, and an assignment of a stress to every edge such that at every vertex of the graph the stresses sum up to zero. The tensegrity frameworks have been recently extended from the two dimensional graph case to the multidimensional setting. We study the multidimensional tensegrities using tools from toric geometry. For a given rational tensegrity framework $\mathcal{F}$, we construct a glued toric surface $X_\mathcal{F}$. We show that the abelian group of tensegrities on $\mathcal{F}$ is isomorphic to a subgroup of the Chow group $A^1(X_\mathcal{F};\QQ)$. In the case of planar frameworks, we show how to explicitly carry out the computation of tensegrities via classical tools in toric geometry.

preprint2022arXiv

Rethinking and Improving Natural Language Generation with Layer-Wise Multi-View Decoding

In sequence-to-sequence learning, e.g., natural language generation, the decoder relies on the attention mechanism to efficiently extract information from the encoder. While it is common practice to draw information from only the last encoder layer, recent work has proposed to use representations from different encoder layers for diversified levels of information. Nonetheless, the decoder still obtains only a single view of the source sequences, which might lead to insufficient training of the encoder layer stack due to the hierarchy bypassing problem. In this work, we propose layer-wise multi-view decoding, where for each decoder layer, together with the representations from the last encoder layer, which serve as a global view, those from other encoder layers are supplemented for a stereoscopic view of the source sequences. Systematic experiments and analyses show that we successfully address the hierarchy bypassing problem, require almost negligible parameter increase, and substantially improve the performance of sequence-to-sequence learning with deep representations on five diverse tasks, i.e., machine translation, abstractive summarization, image captioning, video captioning, medical report generation, and paraphrase generation. In particular, our approach achieves new state-of-the-art results on ten benchmark datasets, including a low-resource machine translation dataset and two low-resource medical report generation datasets.

preprint2021arXiv

Relation-aware Meta-learning for Market Segment Demand Prediction with Limited Records

E-commerce business is revolutionizing our shopping experiences by providing convenient and straightforward services. One of the most fundamental problems is how to balance the demand and supply in market segments to build an efficient platform. While conventional machine learning models have achieved great success on data-sufficient segments, it may fail in a large-portion of segments in E-commerce platforms, where there are not sufficient records to learn well-trained models. In this paper, we tackle this problem in the context of market segment demand prediction. The goal is to facilitate the learning process in the target segments by leveraging the learned knowledge from data-sufficient source segments. Specifically, we propose a novel algorithm, RMLDP, to incorporate a multi-pattern fusion network (MPFN) with a meta-learning paradigm. The multi-pattern fusion network considers both local and seasonal temporal patterns for segment demand prediction. In the meta-learning paradigm, transferable knowledge is regarded as the model parameter initialization of MPFN, which are learned from diverse source segments. Furthermore, we capture the segment relations by combining data-driven segment representation and segment knowledge graph representation and tailor the segment-specific relations to customize transferable model parameter initialization. Thus, even with limited data, the target segment can quickly find the most relevant transferred knowledge and adapt to the optimal parameters. We conduct extensive experiments on two large-scale industrial datasets. The results justify that our RMLDP outperforms a set of state-of-the-art baselines. Besides, RMLDP has been deployed in Taobao, a real-world E-commerce platform. The online A/B testing results further demonstrate the practicality of RMLDP.

preprint2020arXiv

Amplitude and frequency sensing of microwave fields with a superconducting transmon qudit

Experiments with superconducting circuits require careful calibration of the applied pulses and fields over a large frequency range. This remains an ongoing challenge as commercial semiconductor electronics are not able to probe signals arriving at the chip due to its cryogenic environment. Here, we demonstrate how the on-chip amplitude and frequency of a microwave signal can be inferred from the ac Stark shifts of higher transmon levels. In our time-resolved measurements we employ Ramsey fringes, allowing us to detect the amplitude of the systems transfer function over a range of several hundreds of MHz with an energy sensitivity on the order of $10^{-4}$. Combined with similar measurements for the phase of the transfer function, our sensing method can facilitate pulse correction for high fidelity quantum gates in superconducting circuits. Additionally, the potential to characterize arbitrary microwave fields promotes applications in related areas of research, such as quantum optics or hybrid microwave systems including photonic, mechanical or magnonic subsystems.

preprint2020arXiv

Automated Relational Meta-learning

In order to efficiently learn with small amount of data on new tasks, meta-learning transfers knowledge learned from previous tasks to the new ones. However, a critical challenge in meta-learning is the task heterogeneity which cannot be well handled by traditional globally shared meta-learning methods. In addition, current task-specific meta-learning methods may either suffer from hand-crafted structure design or lack the capability to capture complex relations between tasks. In this paper, motivated by the way of knowledge organization in knowledge bases, we propose an automated relational meta-learning (ARML) framework that automatically extracts the cross-task relations and constructs the meta-knowledge graph. When a new task arrives, it can quickly find the most relevant structure and tailor the learned structure knowledge to the meta-learner. As a result, the proposed framework not only addresses the challenge of task heterogeneity by a learned meta-knowledge graph, but also increases the model interpretability. We conduct extensive experiments on 2D toy regression and few-shot image classification and the results demonstrate the superiority of ARML over state-of-the-art baselines.

preprint2020arXiv

Jointly Predicting Job Performance, Personality, Cognitive Ability, Affect, and Well-Being

Assessment of job performance, personalized health and psychometric measures are domains where data-driven and ubiquitous computing exhibits the potential of a profound impact in the future. Existing techniques use data extracted from questionnaires, sensors (wearable, computer, etc.), or other traits, to assess well-being and cognitive attributes of individuals. However, these techniques can neither predict individual's well-being and psychological traits in a global manner nor consider the challenges associated to processing the data available, that is incomplete and noisy. In this paper, we create a benchmark for predictive analysis of individuals from a perspective that integrates: physical and physiological behavior, psychological states and traits, and job performance. We design data mining techniques as benchmark and uses real noisy and incomplete data derived from wearable sensors to predict 19 constructs based on 12 standardized well-validated tests. The study included 757 participants who were knowledge workers in organizations across the USA with varied work roles. We developed a data mining framework to extract the meaningful predictors for each of the 19 variables under consideration. Our model is the first benchmark that combines these various instrument-derived variables in a single framework to understand people's behavior by leveraging real uncurated data from wearable, mobile, and social media sources. We verify our approach experimentally using the data obtained from our longitudinal study. The results show that our framework is consistently reliable and capable of predicting the variables under study better than the baselines when prediction is restricted to the noisy, incomplete data.

preprint2020arXiv

Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms

We study the problem of least squares linear regression where the data-points are dependent and are sampled from a Markov chain. We establish sharp information theoretic minimax lower bounds for this problem in terms of $τ_{\mathsf{mix}}$, the mixing time of the underlying Markov chain, under different noise settings. Our results establish that in general, optimization with Markovian data is strictly harder than optimization with independent data and a trivial algorithm (SGD-DD) that works with only one in every $\tildeΘ(τ_{\mathsf{mix}})$ samples, which are approximately independent, is minimax optimal. In fact, it is strictly better than the popular Stochastic Gradient Descent (SGD) method with constant step-size which is otherwise minimax optimal in the regression with independent data setting. Beyond a worst case analysis, we investigate whether structured datasets seen in practice such as Gaussian auto-regressive dynamics can admit more efficient optimization schemes. Surprisingly, even in this specific and natural setting, Stochastic Gradient Descent (SGD) with constant step-size is still no better than SGD-DD. Instead, we propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate. Our improved rate serves as one of the first results where an algorithm outperforms SGD-DD on an interesting Markov chain and also provides one of the first theoretical analyses to support the use of experience replay in practice.

preprint2020arXiv

Nearest Neighbor Search for Hyperbolic Embeddings

Embedding into hyperbolic space is emerging as an effective representation technique for datasets that exhibit hierarchical structure. This development motivates the need for algorithms that are able to effectively extract knowledge and insights from datapoints embedded in negatively curved spaces. We focus on the problem of nearest neighbor search, a fundamental problem in data analysis. We present efficient algorithmic solutions that build upon established methods for nearest neighbor search in Euclidean space, allowing for easy adoption and integration with existing systems. We prove theoretical guarantees for our techniques and our experiments demonstrate the effectiveness of our approach on real datasets over competing algorithms.

preprint2020arXiv

Nonlinear Signal Distortion Corrections Through Quantum Sensing

Having accurate gate generation is essential for precise control of a quantum system. The generated gate usually suffers from linear and nonlinear distortion. Previous works have demonstrated how to use a qubit to correct linear frequency distortions but have not commented on how to handle nonlinear distortions. This is an important issue as we show that nonlinear amplitude distortions from the RF electronics can affect Rabi pulses by as much as 10%. We present work that demonstrates how a transmon qubit can be used as a highly sensitive cryogenic detector to characterize these nonlinear amplitude distortions. We show that a correction can drive these errors down to <1% over a 700 MHz range. This correction technique provides a method to minimize the effects of signal distortions and can be easily applied to broadband control pulses to produce higher fidelity arbitrary quantum gates.

preprint2020arXiv

Representation Learning on Variable Length and Incomplete Wearable-Sensory Time Series

The prevalence of wearable sensors (e.g., smart wristband) is creating unprecedented opportunities to not only inform health and wellness states of individuals, but also assess and infer personal attributes, including demographic and personality attributes. However, the data captured from wearables, such as heart rate or number of steps, present two key challenges: 1) the time series is often of variable-length and incomplete due to different data collection periods (e.g., wearing behavior varies by person); and 2) inter-individual variability to external factors like stress and environment. This paper addresses these challenges and brings us closer to the potential of personalized insights about an individual, taking the leap from quantified self to qualified self. Specifically, HeartSpace proposed in this paper encodes time series data with variable-length and missing values via the integration of a time series encoding module and a pattern aggregation network. Additionally, HeartSpace implements a Siamese-triplet network to optimize representations by jointly capturing intra- and inter-series correlations during the embedding learning process. The empirical evaluation over two different real-world data presents significant performance gains overstate-of-the-art baselines in a variety of applications, including personality prediction, demographics inference, and user identification.

preprint2020arXiv

Wireless power transfer via topological modes in dimer chains

The topological characteristics, including invariant topological orders, band inversion, and the topological edge mode (TEM) in the photonic insulators, have been widely studied. Whether people can take advantage of intriguing topological modes in simple one-dimensional systems to implement some practical applications is an issue which people are increasingly concerned about. In this work, based on a photonic dimer chain composed of ultra-subwavelength resonators, we verify experimentally that the TEM in the effective second-order parity-time (PT) system is immune to the inner disorder perturbation, and can be used to realize the long-range wireless power transfer (WPT) with high transmission efficiency. To intuitively show the TEM can be used for WPT, a power signal source is used to excite the TEM. It can be clearly seen that two LED lamps with 0.5-W at both ends of the structure are lighted up with the aid of TEMs. In addition, in order to solve the special technical problems of standby power loss and frequency tracking, we further propose that a WPT system with effective third-order PT symmetry can be constructed by using one topological interface mode and two TEMs. Inspired by the long-range WPT with TEMs in this work, it is expected to use more complex topological structures to achieve energy transmission with more functions, such as the WPT devices whose direction can be selected flexibly in the quasiperiodic or trimer topological chains.

preprint2019arXiv

Multi-Grained Named Entity Recognition

This paper presents a novel framework, MGNER, for Multi-Grained Named Entity Recognition where multiple entities or entity mentions in a sentence could be non-overlapping or totally nested. Different from traditional approaches regarding NER as a sequential labeling task and annotate entities consecutively, MGNER detects and recognizes entities on multiple granularities: it is able to recognize named entities without explicitly assuming non-overlapping or totally nested structures. MGNER consists of a Detector that examines all possible word segments and a Classifier that categorizes entities. In addition, contextual information and a self-attention mechanism are utilized throughout the framework to improve the NER performance. Experimental results show that MGNER outperforms current state-of-the-art baselines up to 4.4% in terms of the F1 score among nested/non-overlapping NER tasks.

preprint2019arXiv

Optimal Control for the Quantum Simulation of Nuclear Dynamics

We propose a method for enacting the unitary time propagation of two interacting neutrons at leading order of chiral effective field theory by efficiently encoding the nuclear dynamics into a single multi-level quantum device. The emulated output of the quantum simulation shows that, by applying a single gate that draws on the underlying characteristics of the device, it is possible to observe multiple cycles of the nucleons' dynamics before the onset of decoherence. Owing to the signal's longevity, we can then extract spectroscopic properties of the simulated nuclear system. This allows us to validate the encoding of the nuclear Hamiltonian and the robustness of the simulation in the presence of quantum-hardware noise by comparing the extracted spectroscopic information to exact calculations. This work paves the way for transformative calculations of dynamical properties of nuclei on near-term quantum devices.

preprint2015arXiv

Characterization of a gate-defined double quantum dot in a Si/SiGe nanomembrane

We report the fabrication and characterization of a gate-defined double quantum dot formed in a Si/SiGe nanomembrane. In the past, all gate-defined quantum dots in Si/SiGe heterostructures were formed on top of strain-graded virtual substrates. The strain grading process necessarily introduces misfit dislocations into a heterostructure, and these defects introduce lateral strain inhomogeneities, mosaic tilt, and threading dislocations. The use of a SiGe nanomembrane as the virtual substrate enables the strain relaxation to be entirely elastic, eliminating the need for misfit dislocations. However, in this approach the formation of the heterostructure is more complicated, involving two separate epitaxial growth procedures separated by a wet-transfer process that results in a buried non-epitaxial interface 625 nm from the quantum dot. We demonstrate that in spite of this buried interface in close proximity to the device, a double quantum dot can be formed that is controllable enough to enable tuning of the inter-dot tunnel coupling, the identification of spin states, and the measurement of a singlet-to-triplet transition as a function of an applied magnetic field.

preprint2015arXiv

End-to-End Photo-Sketch Generation via Fully Convolutional Representation Learning

Sketch-based face recognition is an interesting task in vision and multimedia research, yet it is quite challenging due to the great difference between face photos and sketches. In this paper, we propose a novel approach for photo-sketch generation, aiming to automatically transform face photos into detail-preserving personal sketches. Unlike the traditional models synthesizing sketches based on a dictionary of exemplars, we develop a fully convolutional network to learn the end-to-end photo-sketch mapping. Our approach takes whole face photos as inputs and directly generates the corresponding sketch images with efficient inference and learning, in which the architecture are stacked by only convolutional kernels of very small sizes. To well capture the person identity during the photo-sketch transformation, we define our optimization objective in the form of joint generative-discriminative minimization. In particular, a discriminative regularization term is incorporated into the photo-sketch generation, enhancing the discriminability of the generated person sketches against other individuals. Extensive experiments on several standard benchmarks suggest that our approach outperforms other state-of-the-art methods in both photo-sketch generation and face sketch verification.

preprint2015arXiv

Low-noise kinetic inductance traveling-wave amplifier using three-wave mixing

We have fabricated a wide-bandwidth, high dynamic range, low-noise cryogenic amplifier based on a superconducting kinetic inductance traveling-wave device. The device was made from NbTiN and consisted of a long, coplanar waveguide on a silicon chip. By adding a DC current and an RF pump tone we are able to generate parametric amplification using three-wave mixing. The devices exhibit gain of more than 15 dB across an instantaneous bandwidth from 4 to 8 GHz. The total usable gain bandwidth, including both sides of the signal-idler gain region, is more than 6 GHz. The noise referred to the input of the devices approaches the quantum limit, with less than 1 photon excess noise. Compared to similarly constructed four-wave mixing amplifiers, these devices operate with the RF pump at $\sim$20 dB lower power and at frequencies far from the signal. This will permit easier integration into large scale qubit and detector applications.

preprint2014arXiv

The Multi-shop Ski Rental Problem

We consider the {\em multi-shop ski rental} problem. This problem generalizes the classic ski rental problem to a multi-shop setting, in which each shop has different prices for renting and purchasing a pair of skis, and a \emph{consumer} has to make decisions on when and where to buy. We are interested in the {\em optimal online (competitive-ratio minimizing) mixed strategy} from the consumer's perspective. For our problem in its basic form, we obtain exciting closed-form solutions and a linear time algorithm for computing them. We further demonstrate the generality of our approach by investigating three extensions of our basic problem, namely ones that consider costs incurred by entering a shop or switching to another shop. Our solutions to these problems suggest that the consumer must assign positive probability in \emph{exactly one} shop at any buying time. Our results apply to many real-world applications, ranging from cost management in \texttt{IaaS} cloud to scheduling in distributed computing.

preprint2014arXiv

Two-axis control of a singlet-triplet qubit with an integrated micromagnet

The qubit is the fundamental building block of a quantum computer. We fabricate a qubit in a silicon double quantum dot with an integrated micromagnet in which the qubit basis states are the singlet state and the spin-zero triplet state of two electrons. Because of the micro magnet, the magnetic field difference $ΔB$ between the two sides of the double dot is large enough to enable the achievement of coherent rotation of the qubit's Bloch vector about two different axes of the Bloch sphere. By measuring the decay of the quantum oscillations, the inhomogeneous spin coherence time $T_{2}^{*}$ is determined. By measuring $T_{2}^{*}$ at many different values of the exchange coupling $J$ and at two different values of $ΔB$, we provide evidence that the micromagnet does not limit decoherence, with the dominant limits on $T_{2}^{*}$ arising from charge noise and from coupling to nuclear spins.

preprint2013arXiv

Fast coherent manipulation of three-electron states in a double quantum dot

A fundamental goal in the manipulation of quantum systems is the achievement of many coherent oscillations within the characteristic dephasing time T2*[1]. Most manipulations of electron spins in quantum dots have focused on the construction and control of two-state quantum systems, or qubits, in which each quantum dot is occupied by a single electron[2-7]. Here we perform quantum manipulations on a system with more electrons per quantum dot, in a double dot with three electrons. We demonstrate that tailored pulse sequences can be used to induce coherent rotations between 3-electron quantum states. Certain pulse sequences yield coherent oscillations with a very high figure of merit (the ratio of coherence time to rotation time) of >100. The presence of the third electron enables very fast rotations to all possible states, in contrast to the case when only two electrons are used, in which some rotations are slow. The minimum oscillation frequency we observe is >5 GHz.

Xian Wu

What is connected

Connect this record

See the researcher in context

Building this map preview

31 published item(s)

RoboAlign-R1: Distilled Multimodal Reward Alignment for Robot Video World Models

AlignTransformer: Hierarchical Alignment of Visual Regions and Disease Tags for Medical Report Generation

AutoField: Automating Feature Selection in Deep Recommender Systems

Conditional Generation Net for Medication Recommendation

DeepPortraitDrawing: Generating Human Body Images from Freehand Sketches

Denoising Neural Network for News Recommendation with Positive and Negative Implicit Feedback

End-to-end Spoken Conversational Question Answering: Task, Dataset and Model

Graph-in-Graph Network for Automatic Gene Ontology Description Generation

Multi-modal Contrastive Representation Learning for Entity Alignment

NeRF-SR: High-Quality Neural Radiance Fields using Supersampling

O2NA: An Object-Oriented Non-Autoregressive Approach for Controllable Video Captioning

Radiology Report Generation with a Learned Knowledge Base and Multi-modal Alignment

Rational tensegrities through the lens of toric geometry

Rethinking and Improving Natural Language Generation with Layer-Wise Multi-View Decoding

Relation-aware Meta-learning for Market Segment Demand Prediction with Limited Records

Amplitude and frequency sensing of microwave fields with a superconducting transmon qudit

Automated Relational Meta-learning

Jointly Predicting Job Performance, Personality, Cognitive Ability, Affect, and Well-Being

Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms

Nearest Neighbor Search for Hyperbolic Embeddings

Nonlinear Signal Distortion Corrections Through Quantum Sensing

Representation Learning on Variable Length and Incomplete Wearable-Sensory Time Series

Wireless power transfer via topological modes in dimer chains

Multi-Grained Named Entity Recognition

Optimal Control for the Quantum Simulation of Nuclear Dynamics

Characterization of a gate-defined double quantum dot in a Si/SiGe nanomembrane

End-to-End Photo-Sketch Generation via Fully Convolutional Representation Learning

Low-noise kinetic inductance traveling-wave amplifier using three-wave mixing

The Multi-shop Ski Rental Problem

Two-axis control of a singlet-triplet qubit with an integrated micromagnet

Fast coherent manipulation of three-electron states in a double quantum dot