Source author record

Xi Wang

Xi Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

39works

28topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Modeling Subjective Urban Perception with Human Gaze

Urban perception describes how people subjectively evaluate urban environments, shaping how cities are experienced and understood. Existing computational approaches primarily model urban perception directly from street view images, but largely ignore the human perceptual process through which such judgments are formed. In this paper, we introduce Place Pulse-Gaze, an urban perception dataset that augments street view images with synchronized eye-tracking recordings and individual perception labels. Based on this dataset, we propose a Gaze-Guided Urban Perception Framework to study how gaze behavior contributes to the modeling of subjective urban perception. The framework systematically investigates three complementary settings: gaze-only modeling, gaze fusion with explicit semantic scene representations, and gaze fusion with implicit richer visual representations. Experiments show that gaze alone already carries useful predictive signals for subjective urban perception, and that integrating gaze with scene representations further improves prediction under both semantic and richer visual representations. Overall, our findings highlight the importance of incorporating human perceptual processes into urban scene understanding and open a direction for gaze-guided multimodal urban computing.

preprint2026arXiv

SF20K Competition 2025: Summary and findings

This report presents the results and findings of the first edition of the Short-Films 20K (SF20K) Competition, held in conjunction with the SLoMO Workshop at ICCV 2025. The competition is designed to advance story-level video understanding beyond short-clip action recognition, introducing an open-ended video question-answering task built on a corpus of amateur short films. This setup ensures that models must rely on multimodal understanding rather than memorization of popular movies. Evaluation is conducted using the SF20K-Test benchmark (95 movies, 979 question-answer pairs) and scored via LLM-QA-Eval, an automated judge based on GPT-4.1-nano. The competition attracted 22 teams and 286 submissions across two tracks: a Main Track with unrestricted model size and a Special Track limited to models under 8 billion parameters. The winning team achieved 65.7% accuracy on the Main Track and 48.7% on the Special Track, against a human performance ceiling of 91.7%. Our analysis reveals several key findings: narrative-aware, shot-level processing consistently outperforms uniform frame sampling; well-designed multi-stage pipelines using smaller models can match or exceed end-to-end inference with models over 30x larger; and subtitle quality is a dominant factor in performance. These results highlight that the primary bottleneck in long-form video QA lies in information selection and reasoning structure rather than raw model capacity, and that a substantial gap remains between current methods and human-level narrative comprehension.

preprint2024arXiv

RHOBIN Challenge: Reconstruction of Human Object Interaction

Modeling the interaction between humans and objects has been an emerging research direction in recent years. Capturing human-object interaction is however a very challenging task due to heavy occlusion and complex dynamics, which requires understanding not only 3D human pose, and object pose but also the interaction between them. Reconstruction of 3D humans and objects has been two separate research fields in computer vision for a long time. We hence proposed the first RHOBIN challenge: reconstruction of human-object interactions in conjunction with the RHOBIN workshop. It was aimed at bringing the research communities of human and object reconstruction as well as interaction modeling together to discuss techniques and exchange ideas. Our challenge consists of three tracks of 3D reconstruction from monocular RGB images with a focus on dealing with challenging interaction scenarios. Our challenge attracted more than 100 participants with more than 300 submissions, indicating the broad interest in the research communities. This paper describes the settings of our challenge and discusses the winning methods of each track in more detail. We observe that the human reconstruction task is becoming mature even under heavy occlusion settings while object pose estimation and joint reconstruction remain challenging tasks. With the growing interest in interaction modeling, we hope this report can provide useful insights and foster future research in this direction. Our workshop website can be found at \href{https://rhobin-challenge.github.io/}{https://rhobin-challenge.github.io/}.

preprint2023arXiv

The study of eleven contact binaries with mass ratios less than 0.1

Multi-band photometric observations of eleven totally eclipsing contact binaries were carried out. Applying the Wilson-Devinney program, photometric solutions were obtained. There are two W-subtype systems, which are CRTS J133031.1+161202 and CRTS J154254.0+324652, and the rest systems are A-subtype systems. CRTS J154254.0+324652 has the highest fill-out factor with 94.3$\%$, and the lowest object is CRTS J155009.2+493639 with only 18.9$\%$. The mass ratios of the eleven systems are all less than 0.1, which means that they are extremely low mass ratio binary systems. We performed period variation investigation and found that the orbital periods of three systems decrease slowly, which may be caused by the angular momentum loss, and of six systems increase slowly, which indicates that the materials may transfer from the secondary component to the primary component. LAMOST low$-$resolution spectra of four objects were analyzed, and using the spectral subtraction technique, H$α$ emission line was detected, which means that the four objects exhibit chromospheric activity. In order to understand their evolutionary status, the mass-luminosity and mass-radius diagrams were plotted. The two diagrams indicate that the primary component is in the main sequence evolution stage, and the secondary component is above TAMS, indicating that they are over-luminous. To determine whether the eleven systems are in stable state, the ratio of spin angular momentum to orbital angular momentum ($J_{s}/J_{o}$) and the instability parameters were calculated, and we argued that CRTS J234634.7+222824 is on the verge of a merger.

preprint2022arXiv

Large-scale Knowledge Distillation with Elastic Heterogeneous Computing Resources

Although more layers and more parameters generally improve the accuracy of the models, such big models generally have high computational complexity and require big memory, which exceed the capacity of small devices for inference and incurs long training time. In addition, it is difficult to afford long training time and inference time of big models even in high performance servers, as well. As an efficient approach to compress a large deep model (a teacher model) to a compact model (a student model), knowledge distillation emerges as a promising approach to deal with the big models. Existing knowledge distillation methods cannot exploit the elastic available computing resources and correspond to low efficiency. In this paper, we propose an Elastic Deep Learning framework for knowledge Distillation, i.e., EDL-Dist. The advantages of EDL-Dist are three-fold. First, the inference and the training process is separated. Second, elastic available computing resources can be utilized to improve the efficiency. Third, fault-tolerance of the training and inference processes is supported. We take extensive experimentation to show that the throughput of EDL-Dist is up to 3.125 times faster than the baseline method (online knowledge distillation) while the accuracy is similar or higher.

preprint2022arXiv

Length L-function for Network-Constrained Point Data

Network constrained points are referred to as points restricted to road networks, such as taxi pick up and drop off locations. A significant pattern of network constrained points is referred to as an aggregation; e.g., the aggregation of pick up points may indicate a high taxi demand in a particular area. Although the network K function using the shortest path network distance has been proposed to detect point aggregation, its statistical unit is still radius based. R neighborhood, in particular, has inconsistent network length owing to the complex configuration of road networks which cause unfair counts and identification errors in networks (e.g., the length of the r neighborhood located at an intersection is longer than that on straight roads, which may include more points). In this study, we derived the length L function for network constrained points to identify the aggregation by designing a novel neighborhood as the statistical unit; the total length of this is consistent throughout the network. Compared to the network K function, our method can detect a true to life aggregation scale, identify the aggregation with higher network density, as well as identify the aggregations that the network K function cannot. We validated our method using taxi trips pick up location data within Zhongguancun Area in Beijing, analyzing differences in maximal aggregation between workdays and weekends to understand taxi demand in the morning and evening peak.

preprint2022arXiv

Light-Induced Ferromagnetism in Moiré Superlattices

Many-body interactions between carriers lie at the heart of correlated physics. The ability to tune such interactions would open the possibility to access and control complex electronic phase diagrams on demand. Recently, moiré superlattices formed by two-dimensional materials have emerged as a promising platform for quantum engineering such phenomena. The power of the moiré system lies in the high tunability of its physical parameters by tweaking layer twist angle, electrical field, moiré carrier filling, and interlayer coupling. Here, we report that optical excitation can drastically tune the spin-spin interactions between moiré trapped carriers, resulting in ferromagnetic order in WS2/WSe2 moiré superlattices over a small range of doping at elevated temperatures. Near the filling factor v = -1/3 (i.e., one hole per three moiré unit cells), as the excitation power at the exciton resonance increases, a well-developed hysteresis loop emerges in the reflective magnetic circular dichroism (RMCD) signal as a function of magnetic field, a hallmark of ferromagnetism. The hysteresis loop persists down to charge neutrality, and its shape evolves as the moiré superlattice is gradually filled, indicating changes of magnetic ground state properties. The observed phenomenon points to a mechanism in which itinerant photo-excited excitons mediate exchange coupling between moiré trapped holes. This exciton-mediated interaction can be of longer range than direct coupling between moiré trapped holes, and thus magnetic order can arise even in the dilute hole regime under optical excitation. This discovery adds a new and dynamic tuning knob to the rich many-body Hamiltonian of moiré quantum matter.

preprint2022arXiv

Mass Testing and Characterization of 20-inch PMTs for JUNO

Main goal of the JUNO experiment is to determine the neutrino mass ordering using a 20kt liquid-scintillator detector. Its key feature is an excellent energy resolution of at least 3 % at 1 MeV, for which its instruments need to meet a certain quality and thus have to be fully characterized. More than 20,000 20-inch PMTs have been received and assessed by JUNO after a detailed testing program which began in 2017 and elapsed for about four years. Based on this mass characterization and a set of specific requirements, a good quality of all accepted PMTs could be ascertained. This paper presents the performed testing procedure with the designed testing systems as well as the statistical characteristics of all 20-inch PMTs intended to be used in the JUNO experiment, covering more than fifteen performance parameters including the photocathode uniformity. This constitutes the largest sample of 20-inch PMTs ever produced and studied in detail to date, i.e. 15,000 of the newly developed 20-inch MCP-PMTs from Northern Night Vision Technology Co. (NNVT) and 5,000 of dynode PMTs from Hamamatsu Photonics K. K.(HPK).

preprint2022arXiv

NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality

Text to speech (TTS) has made rapid progress in both academia and industry in recent years. Some questions naturally arise that whether a TTS system can achieve human-level quality, how to define/judge that quality and how to achieve it. In this paper, we answer these questions by first defining the human-level quality based on the statistical significance of subjective measure and introducing appropriate guidelines to judge it, and then developing a TTS system called NaturalSpeech that achieves human-level quality on a benchmark dataset. Specifically, we leverage a variational autoencoder (VAE) for end-to-end text to waveform generation, with several key modules to enhance the capacity of the prior from text and reduce the complexity of the posterior from speech, including phoneme pre-training, differentiable duration modeling, bidirectional prior/posterior modeling, and a memory mechanism in VAE. Experiment evaluations on popular LJSpeech dataset show that our proposed NaturalSpeech achieves -0.01 CMOS (comparative mean opinion score) to human recordings at the sentence level, with Wilcoxon signed rank test at p-level p >> 0.05, which demonstrates no statistically significant difference from human recordings for the first time on this dataset.

preprint2022arXiv

PeCLR: Self-Supervised 3D Hand Pose Estimation from monocular RGB via Equivariant Contrastive Learning

Encouraged by the success of contrastive learning on image classification tasks, we propose a new self-supervised method for the structured regression task of 3D hand pose estimation. Contrastive learning makes use of unlabeled data for the purpose of representation learning via a loss formulation that encourages the learned feature representations to be invariant under any image transformation. For 3D hand pose estimation, it too is desirable to have invariance to appearance transformation such as color jitter. However, the task requires equivariance under affine transformations, such as rotation and translation. To address this issue, we propose an equivariant contrastive objective and demonstrate its effectiveness in the context of 3D hand pose estimation. We experimentally investigate the impact of invariant and equivariant contrastive objectives and show that learning equivariant features leads to better representations for the task of 3D hand pose estimation. Furthermore, we show that standard ResNets with sufficient depth, trained on additional unlabeled data, attain improvements of up to 14.5% in PA-EPE on FreiHAND and thus achieves state-of-the-art performance without any task specific, specialized architectures. Code and models are available at https://ait.ethz.ch/projects/2021/PeCLR/

preprint2022arXiv

PrEF: Percolation-based Evolutionary Framework for the diffusion-source-localization problem in large networks

We assume that the state of a number of nodes in a network could be investigated if necessary, and study what configuration of those nodes could facilitate a better solution for the diffusion-source-localization (DSL) problem. In particular, we formulate a candidate set which contains the diffusion source for sure, and propose the method, Percolation-based Evolutionary Framework (PrEF), to minimize such set. Hence one could further conduct more intensive investigation on only a few nodes to target the source. To achieve that, we first demonstrate that there are some similarities between the DSL problem and the network immunization problem. We find that the minimization of the candidate set is equivalent to the minimization of the order parameter if we view the observer set as the removal node set. Hence, PrEF is developed based on the network percolation and evolutionary algorithm. The effectiveness of the proposed method is validated on both model and empirical networks in regard to varied circumstances. Our results show that the developed approach could achieve a much smaller candidate set compared to the state of the art in almost all cases. Meanwhile, our approach is also more stable, i.e., it has similar performance irrespective of varied infection probabilities, diffusion models, and outbreak ranges. More importantly, our approach might provide a new framework to tackle the DSL problem in extreme large networks.

preprint2022arXiv

Self-supervised Context-aware Style Representation for Expressive Speech Synthesis

Expressive speech synthesis, like audiobook synthesis, is still challenging for style representation learning and prediction. Deriving from reference audio or predicting style tags from text requires a huge amount of labeled data, which is costly to acquire and difficult to define and annotate accurately. In this paper, we propose a novel framework for learning style representation from abundant plain text in a self-supervised manner. It leverages an emotion lexicon and uses contrastive learning and deep clustering. We further integrate the style representation as a conditioned embedding in a multi-style Transformer TTS. Comparing with multi-style TTS by predicting style tags trained on the same dataset but with human annotations, our method achieves improved results according to subjective evaluations on both in-domain and out-of-domain test sets in audiobook speech. Moreover, with implicit context-aware style representation, the emotion transition of synthesized audio in a long paragraph appears more natural. The audio samples are available on the demo web.

preprint2022arXiv

Sparse Local Patch Transformer for Robust Face Alignment and Landmarks Inherent Relation Learning

Heatmap regression methods have dominated face alignment area in recent years while they ignore the inherent relation between different landmarks. In this paper, we propose a Sparse Local Patch Transformer (SLPT) for learning the inherent relation. The SLPT generates the representation of each single landmark from a local patch and aggregates them by an adaptive inherent relation based on the attention mechanism. The subpixel coordinate of each landmark is predicted independently based on the aggregated feature. Moreover, a coarse-to-fine framework is further introduced to incorporate with the SLPT, which enables the initial landmarks to gradually converge to the target facial landmarks using fine-grained features from dynamically resized local patches. Extensive experiments carried out on three popular benchmarks, including WFLW, 300W and COFW, demonstrate that the proposed method works at the state-of-the-art level with much less computational complexity by learning the inherent relation between facial landmarks. The code is available at the project website.

preprint2021arXiv

JUNO Physics and Detector

The Jiangmen Underground Neutrino Observatory (JUNO) is a 20 kton LS detector at 700-m underground. An excellent energy resolution and a large fiducial volume offer exciting opportunities for addressing many important topics in neutrino and astro-particle physics. With 6 years of data, the neutrino mass ordering can be determined at 3-4 sigma and three oscillation parameters can be measured to a precision of 0.6% or better by detecting reactor antineutrinos. With 10 years of data, DSNB could be observed at 3-sigma; a lower limit of the proton lifetime of 8.34e33 years (90% C.L.) can be set by searching for p->nu_bar K^+; detection of solar neutrinos would shed new light on the solar metallicity problem and examine the vacuum-matter transition region. A core-collapse supernova at 10 kpc would lead to ~5000 IBD and ~2000 (300) all-flavor neutrino-proton (electron) scattering events. Geo-neutrinos can be detected with a rate of ~400 events/year. We also summarize the final design of the JUNO detector and the key R&D achievements. All 20-inch PMTs have been tested. The average photon detection efficiency is 28.9% for the 15,000 MCP PMTs and 28.1% for the 5,000 dynode PMTs, higher than the JUNO requirement of 27%. Together with the >20 m attenuation length of LS, we expect a yield of 1345 p.e. per MeV and an effective energy resolution of 3.02%/\sqrt{E (MeV)}$ in simulations. The underwater electronics is designed to have a loss rate <0.5% in 6 years. With degassing membranes and a micro-bubble system, the radon concentration in the 35-kton water pool could be lowered to <10 mBq/m^3. Acrylic panels of radiopurity <0.5 ppt U/Th are produced. The 20-kton LS will be purified onsite. Singles in the fiducial volume can be controlled to ~10 Hz. The JUNO experiment also features a double calorimeter system with 25,600 3-inch PMTs, a LS testing facility OSIRIS, and a near detector TAO.

preprint2021arXiv

Leveraging Review Properties for Effective Recommendation

Many state-of-the-art recommendation systems leverage explicit item reviews posted by users by considering their usefulness in representing the users' preferences and describing the items' attributes. These posted reviews may have various associated properties, such as their length, their age since they were posted, or their item rating. However, it remains unclear how these different review properties contribute to the usefulness of their corresponding reviews in addressing the recommendation task. In particular, users show distinct preferences when considering different aspects of the reviews (i.e. properties) for making decisions about the items. Hence, it is important to model the relationship between the reviews' properties and the usefulness of reviews while learning the users' preferences and the items' attributes. Therefore, we propose to model the reviews with their associated available properties. We introduce a novel review properties-based recommendation model (RPRM) that learns which review properties are more important than others in capturing the usefulness of reviews, thereby enhancing the recommendation results. Furthermore, inspired by the users' information adoption framework, we integrate two loss functions and a negative sampling strategy into our proposed RPRM model, to ensure that the properties of reviews are correlated with the users' preferences. We examine the effectiveness of RPRM using the well-known Yelp and Amazon datasets. Our results show that RPRM significantly outperforms a classical and five state-of-the-art baselines. Moreover, we experimentally show the advantages of using our proposed loss functions and negative sampling strategy, which further enhance the recommendation performances of RPRM.

preprint2021arXiv

Observations of Ultrafast Superfluorescent Beatings in a Cesium Atomic Vapor Excited by Femtosecond Laser Pulses

Spontaneous emission from individual atoms in vapor lasts nanoseconds, if not microseconds, and beatings in this emission involve only directly excited energy sublevels. In contrast, the superfluorescent emissions burst on a much-reduced timescale and their beatings involve both directly and indirectly excited energy sublevels. In this work, picosecond and femtosecond superfluorescent beatings are observed from a dense cesium atomic vapor. Cesium atoms are excited by 60-femtosecond long, 800 nm laser pulses via two-photon processes into their coherent superpositions of the ground 6S and excited 8S states. As a part of the transient four wave mixing process, the yoked superfluorescent blue light at lower transitions of 6S - 7P are recorded and studied. Delayed buildup time of this blue light is measured as a function of the input laser beam power using a high-resolution 2 ps streak camera. The power dependent buildup delay time is consistently doubled as the vapor temperature is lowered to cut the number of atoms by half. At low power and density, a beating with a period of 100 picoseconds representing the ground state splitting is observed. The autocorrelation measurements of the generated blue light exhibit a beating with a quasi-period of 230 fs corresponding to the splitting of the 7P level primarily at lower input laser power. Understanding and, eventually, controlling the intriguing nature of superfluorescent beatings may permit a rapid quantum operation free from the rather slow spontaneous emission processes from atoms and molecules.

preprint2021arXiv

Unraveling intrinsic flexoelectricity in twisted double bilayer graphene

Moiré superlattices of two-dimensional (2D) materials with a small twist angle are thought to exhibit appreciable flexoelectric effect, though unambiguous confirmation of their flexoelectricity is challenging due to artifacts associated with commonly used piezoresponse force microscopy (PFM). For example, unexpectedly small phase contrast ($\sim$$8^{\circ}$) between opposite flexoelectric polarizations was reported in twisted bilayer graphene (tBG), though theoretically predicted value is $180^{\circ}$. Here we developed a methodology to extract intrinsic moiré flexoelectricity using twisted double bilayer graphene (tDBG) as a model system, probed by lateral PFM. For small twist angle samples, we found that a vectorial decomposition is essential to recover the small intrinsic flexoelectric response at domain walls from a large background signal. The obtained three-fold symmetry of commensurate domains with significant flexoelectric response at domain walls is fully consistent with our theoretical calculations. Incommensurate domains in tDBG with relatively large twist angles can also be observed by this technique. Our work provides a general strategy for unraveling intrinsic flexoelectricity in van der Waals moiré superlattices while providing insights into engineered symmetry breaking in centrosymmetric materials.

preprint2020arXiv

Deep Mining External Imperfect Data for Chest X-ray Disease Screening

Deep learning approaches have demonstrated remarkable progress in automatic Chest X-ray analysis. The data-driven feature of deep models requires training data to cover a large distribution. Therefore, it is substantial to integrate knowledge from multiple datasets, especially for medical images. However, learning a disease classification model with extra Chest X-ray (CXR) data is yet challenging. Recent researches have demonstrated that performance bottleneck exists in joint training on different CXR datasets, and few made efforts to address the obstacle. In this paper, we argue that incorporating an external CXR dataset leads to imperfect training data, which raises the challenges. Specifically, the imperfect data is in two folds: domain discrepancy, as the image appearances vary across datasets; and label discrepancy, as different datasets are partially labeled. To this end, we formulate the multi-label thoracic disease classification problem as weighted independent binary tasks according to the categories. For common categories shared across domains, we adopt task-specific adversarial training to alleviate the feature differences. For categories existing in a single dataset, we present uncertainty-aware temporal ensembling of model predictions to mine the information from the missing labels further. In this way, our framework simultaneously models and tackles the domain and label discrepancies, enabling superior knowledge mining ability. We conduct extensive experiments on three datasets with more than 360,000 Chest X-ray images. Our method outperforms other competing models and sets state-of-the-art performance on the official NIH test set with 0.8349 AUC, demonstrating its effectiveness of utilizing the external dataset to improve the internal classification.

preprint2020arXiv

Dissipative Distillation of Supercritical Quantum Gases

We experimentally realize a method to produce non-equilibrium Bose Einstein condensates with condensed fraction exceeding those of equilibrium samples with the same parameters. To do this, we immerse an ultracold Bose gas of 87Rb in a cloud of 39K with substantially higher temperatures, providing a controlled source of dissipation. By combining the action of the dissipative environment with evaporative cooling, we are able to progressively distil the non-equilibrium Bose-Einstein condensate from the thermal cloud. We show that by increasing the strength of the dissipation it is even possible to produce condensates above the critical temperature. We finally demonstrate that our out-of-equilibrium samples are long-lived and do not reach equilibrium in a time that is accessible for our experiment. Due to its high degree of control, our distillation process is a promising tool for the engineering of open quantum systems.

preprint2020arXiv

Feasibility and physics potential of detecting $^8$B solar neutrinos at JUNO

The Jiangmen Underground Neutrino Observatory~(JUNO) features a 20~kt multi-purpose underground liquid scintillator sphere as its main detector. Some of JUNO's features make it an excellent experiment for $^8$B solar neutrino measurements, such as its low-energy threshold, its high energy resolution compared to water Cherenkov detectors, and its much large target mass compared to previous liquid scintillator detectors. In this paper we present a comprehensive assessment of JUNO's potential for detecting $^8$B solar neutrinos via the neutrino-electron elastic scattering process. A reduced 2~MeV threshold on the recoil electron energy is found to be achievable assuming the intrinsic radioactive background $^{238}$U and $^{232}$Th in the liquid scintillator can be controlled to 10$^{-17}$~g/g. With ten years of data taking, about 60,000 signal and 30,000 background events are expected. This large sample will enable an examination of the distortion of the recoil electron spectrum that is dominated by the neutrino flavor transformation in the dense solar matter, which will shed new light on the tension between the measured electron spectra and the predictions of the standard three-flavor neutrino oscillation framework. If $Δm^{2}_{21}=4.8\times10^{-5}~(7.5\times10^{-5})$~eV$^{2}$, JUNO can provide evidence of neutrino oscillation in the Earth at the about 3$σ$~(2$σ$) level by measuring the non-zero signal rate variation with respect to the solar zenith angle. Moveover, JUNO can simultaneously measure $Δm^2_{21}$ using $^8$B solar neutrinos to a precision of 20\% or better depending on the central value and to sub-percent precision using reactor antineutrinos. A comparison of these two measurements from the same detector will help elucidate the current tension between the value of $Δm^2_{21}$ reported by solar neutrino experiments and the KamLAND experiment.

preprint2020arXiv

LP-WaveNet: Linear Prediction-based WaveNet Speech Synthesis

We propose a linear prediction (LP)-based waveform generation method via WaveNet vocoding framework. A WaveNet-based neural vocoder has significantly improved the quality of parametric text-to-speech (TTS) systems. However, it is challenging to effectively train the neural vocoder when the target database contains massive amount of acoustical information such as prosody, style or expressiveness. As a solution, the approaches that only generate the vocal source component by a neural vocoder have been proposed. However, they tend to generate synthetic noise because the vocal source component is independently handled without considering the entire speech production process; where it is inevitable to come up with a mismatch between vocal source and vocal tract filter. To address this problem, we propose an LP-WaveNet vocoder, where the complicated interactions between vocal source and vocal tract components are jointly trained within a mixture density network-based WaveNet model. The experimental results verify that the proposed system outperforms the conventional WaveNet vocoders both objectively and subjectively. In particular, the proposed method achieves 4.47 MOS within the TTS framework.

preprint2020arXiv

Negative Confidence-Aware Weakly Supervised Binary Classification for Effective Review Helpfulness Classification

The incompleteness of positive labels and the presence of many unlabelled instances are common problems in binary classification applications such as in review helpfulness classification. Various studies from the classification literature consider all unlabelled instances as negative examples. However, a classification model that learns to classify binary instances with incomplete positive labels while assuming all unlabelled data to be negative examples will often generate a biased classifier. In this work, we propose a novel Negative Confidence-aware Weakly Supervised approach (NCWS), which customises a binary classification loss function by discriminating the unlabelled examples with different negative confidences during the classifier's training. We use the review helpfulness classification as a test case for examining the effectiveness of our NCWS approach. We thoroughly evaluate NCWS by using three different datasets, namely one from Yelp (venue reviews), and two from Amazon (Kindle and Electronics reviews). Our results show that NCWS outperforms strong baselines from the literature including an existing SVM-based approach (i.e. SVM-P), the positive and unlabelled learning-based approach (i.e. C-PU) and the positive confidence-based approach (i.e. P-conf) in addressing the classifier's bias problem. Moreover, we further examine the effectiveness of NCWS by using its classified helpful reviews in a state-of-the-art review-based venue recommendation model (i.e. DeepCoNN) and demonstrate the benefits of using NCWS in enhancing venue recommendation effectiveness in comparison to the baselines.

preprint2020arXiv

Performance Analysis of RSU-based Multihomed Multilane Vehicular Networks

Motivated by the potentially high downlink traffic demands of commuters in future autonomous vehicles, we study a network architecture where vehicles use Vehicle-to-Vehicle (V2V) links to form relay network clusters, which in turn use Vehicle-to-Infrastructure (V2I) links to connect to one or more Road Side Units (RSUs). Such cluster-based multihoming offers improved performance, e.g., in coverage and per user shared rate, but depends on the penetration of V2V-V2I capable vehicles and possible blockage, by legacy vehicles, of line of sight based V2V links, such as those based on millimeter-wave and visible light technologies. This paper provides a performance analysis of a typical vehicle's connectivity and throughput on a highway in the free-flow regime, exploring its dependence on vehicle density, sensitivity to blockages, number of lanes and heterogeneity across lanes. The results show that, even with moderate vehicle densities and penetration of V2V-V2I capable vehicles, such architectures can achieve substantial improvements in connectivity and reduction in per-user rate variability as compared to V2I based networks. The typical vehicle's performance is also shown to improve considerably in the multilane highway setting as compared to a single lane road. This paper also sheds light on how the network performance is affected when vehicles can control their relative positions, by characterizing the connectivity-throughput tradeoff faced by the clusters of vehicles.

preprint2020arXiv

TAO Conceptual Design Report: A Precision Measurement of the Reactor Antineutrino Spectrum with Sub-percent Energy Resolution

The Taishan Antineutrino Observatory (TAO, also known as JUNO-TAO) is a satellite experiment of the Jiangmen Underground Neutrino Observatory (JUNO). A ton-level liquid scintillator detector will be placed at about 30 m from a core of the Taishan Nuclear Power Plant. The reactor antineutrino spectrum will be measured with sub-percent energy resolution, to provide a reference spectrum for future reactor neutrino experiments, and to provide a benchmark measurement to test nuclear databases. A spherical acrylic vessel containing 2.8 ton gadolinium-doped liquid scintillator will be viewed by 10 m^2 Silicon Photomultipliers (SiPMs) of >50% photon detection efficiency with almost full coverage. The photoelectron yield is about 4500 per MeV, an order higher than any existing large-scale liquid scintillator detectors. The detector operates at -50 degree C to lower the dark noise of SiPMs to an acceptable level. The detector will measure about 2000 reactor antineutrinos per day, and is designed to be well shielded from cosmogenic backgrounds and ambient radioactivities to have about 10% background-to-signal ratio. The experiment is expected to start operation in 2022.

preprint2020arXiv

Time Series Data Cleaning with Regular and Irregular Time Intervals

Errors are prevalent in time series data, especially in the industrial field. Data with errors could not be stored in the database, which results in the loss of data assets. Handling the dirty data in time series is non-trivial, when given irregular time intervals. At present, to deal with these time series containing errors, besides keeping original erroneous data, discarding erroneous data and manually checking erroneous data, we can also use the cleaning algorithm widely used in the database to automatically clean the time series data. This survey provides a classification of time series data cleaning techniques and comprehensively reviews the state-of-the-art methods of each type. In particular, we have a special focus on the irregular time intervals. Besides we summarize data cleaning tools, systems and evaluation criteria from research and industry. Finally, we highlight possible directions time series data cleaning.

preprint2020arXiv

Toward Quantifying Ambiguities in Artistic Images

It has long been hypothesized that perceptual ambiguities play an important role in aesthetic experience: a work with some ambiguity engages a viewer more than one that does not. However, current frameworks for testing this theory are limited by the availability of stimuli and data collection methods. This paper presents an approach to measuring the perceptual ambiguity of a collection of images. Crowdworkers are asked to describe image content, after different viewing durations. Experiments are performed using images created with Generative Adversarial Networks, using the Artbreeder website. We show that text processing of viewer responses can provide a fine-grained way to measure and describe image ambiguities.

preprint2016arXiv

Self-Assembled, Nanostructured, Tunable Metamaterials via Spinodal Decomposition

Self-assembly via nanoscale phase-separation offers an elegant route to fabricate nanocomposites with physical properties unattainable in single-component systems. One important class of nanocomposites are optical metamaterials which exhibit exotic properties and lead to opportunities for agile control of light propagation. Such metamaterials are typically fabricated via expensive and hard-to-scale top-down processes requiring precise integration of dissimilar materials. In turn, there is a need for alternative, more efficient routes to fabricate large-scale metamaterials for practical applications with deep-subwavelength resolution. Here, we demonstrate a bottom-up approach to fabricate scalable nanostructured metamaterials via spinodal decomposition. To demonstrate the potential of such an approach, we leverage the innate spinodal decomposition of the VO2-TiO2 system, the metal-to-insulator transition in VO2, and thin-film epitaxy, to produce self-organized nanostructures with coherent interfaces and a structural unit cell down to 15 nm (tunable between horizontally- and vertically-aligned lamellae) wherein the iso-frequency surface is temperature-tunable from elliptic- to hyperbolic-dispersion producing metamaterial behavior. These results provide an efficient route for the fabrication of nanostructured metamaterials and other nanocomposites for desired functionalities.

preprint2015arXiv

Evaluating Two-Stream CNN for Video Classification

Videos contain very rich semantic information. Traditional hand-crafted features are known to be inadequate in analyzing complex video semantics. Inspired by the huge success of the deep learning methods in analyzing image, audio and text data, significant efforts are recently being devoted to the design of deep nets for video analytics. Among the many practical needs, classifying videos (or video clips) based on their major semantic categories (e.g., "skiing") is useful in many applications. In this paper, we conduct an in-depth study to investigate important implementation options that may affect the performance of deep nets on video classification. Our evaluations are conducted on top of a recent two-stream convolutional neural network (CNN) pipeline, which uses both static frames and motion optical flows, and has demonstrated competitive performance against the state-of-the-art methods. In order to gain insights and to arrive at a practical guideline, many important options are studied, including network architectures, model fusion, learning parameters and the final prediction methods. Based on the evaluations, very competitive results are attained on two popular video classification benchmarks. We hope that the discussions and conclusions from this work can help researchers in related fields to quickly set up a good basis for further investigations along this very promising direction.

preprint2015arXiv

Fusing Multi-Stream Deep Networks for Video Classification

This paper studies deep network architectures to address the problem of video classification. A multi-stream framework is proposed to fully utilize the rich multimodal information in videos. Specifically, we first train three Convolutional Neural Networks to model spatial, short-term motion and audio clues respectively. Long Short Term Memory networks are then adopted to explore long-term temporal dynamics. With the outputs of the individual streams, we propose a simple and effective fusion method to generate the final predictions, where the optimal fusion weights are learned adaptively for each class, and the learning process is regularized by automatically estimated class relationships. Our contributions are two-fold. First, the proposed multi-stream framework is able to exploit multimodal features that are more comprehensive than those previously attempted. Second, we demonstrate that the adaptive fusion method using the class relationship as a regularizer outperforms traditional alternatives that estimate the weights in a "free" fashion. Our framework produces significantly better results than the state of the arts on two popular benchmarks, 92.2\% on UCF-101 (without using audio) and 84.9\% on Columbia Consumer Videos.

preprint2015arXiv

Hamiltonian Properties of DCell Networks

DCell has been proposed for data centers as a server centric interconnection network structure. DCell can support millions of servers with high network capacity by only using commodity switches. With one exception, we prove that a $k$ level DCell built with $n$ port switches is Hamiltonian-connected for $k \geq 0$ and $n \geq 2$. Our proof extends to all generalized DCell connection rules for $n\ge 3$. Then, we propose an $O(t_k)$ algorithm for finding a Hamiltonian path in $DCell_{k}$, where $t_k$ is the number of servers in $DCell_{k}$. What's more, we prove that $DCell_{k}$ is $(n+k-4)$-fault Hamiltonian-connected and $(n+k-3)$-fault Hamiltonian. In addition, we show that a partial DCell is Hamiltonian connected if it conforms to a few practical restrictions.

preprint2015arXiv

Large-angle quasi-self-collimation effect in a rod-type photonic crystal

A rod-type photonic crystal (PC) with a rectangular lattice shows a large-angle quasi-self-collimation (quasi-SC) effect by changing the symmetry of its rectangular lattice to straighten one of the isofrequency contours. To investigate the straightness of the isofrequency contour as well as the quasi-SC effect, we propose a straightness factor L based on the method of least squares. With L smaller than L0 (L0 = 0.01 is the critical value), the isofrequency contour is sufficiently straight to induce quasi-SC effect with the beam quasi-collimating in the structure. Furthermore, the efficiency of light coupling to the quasi-SC PC is studied, and can be greatly improved by applying a carefully designed antireflection layer.

preprint2015arXiv

Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classification

Classifying videos according to content semantics is an important problem with a wide range of applications. In this paper, we propose a hybrid deep learning framework for video classification, which is able to model static spatial information, short-term motion, as well as long-term temporal clues in the videos. Specifically, the spatial and the short-term motion features are extracted separately by two Convolutional Neural Networks (CNN). These two types of CNN-based features are then combined in a regularized feature fusion network for classification, which is able to learn and utilize feature relationships for improved performance. In addition, Long Short Term Memory (LSTM) networks are applied on top of the two features to further model longer-term temporal clues. The main contribution of this work is the hybrid learning framework that can model several important aspects of the video data. We also show that (1) combining the spatial and the short-term motion features in the regularized fusion network is better than direct classification and fusion using the CNN with a softmax layer, and (2) the sequence-based LSTM is highly complementary to the traditional classification strategy without considering the temporal frame orders. Extensive experiments are conducted on two popular and challenging benchmarks, the UCF-101 Human Actions and the Columbia Consumer Videos (CCV). On both benchmarks, our framework achieves to-date the best reported performance: $91.3\%$ on the UCF-101 and $83.5\%$ on the CCV.

preprint2014arXiv

Rapidly reconfigurable radio-frequency arbitrary waveforms synthesized on a CMOS photonic chip

Photonic methods of radio-frequency waveform generation and processing provide performance and flexibility over electronic methods due to the ultrawide bandwidth offered by the optical carriers. However, they suffer from lack of integration and slow reconfiguration speed. Here we propose an architecture of integrated photonic RF waveform generation and processing, and implement it on a silicon chip fabricated in a semiconductor manufacturing foundry. Our device can generate programmable RF bursts or continuous waveforms with only the light source, electrical drives/controls and detectors being off chip. It turns on and off an individual pulse in the RF burst within 4 nanoseconds, achieving a reconfiguration speed three orders of magnitude faster than thermal tuning. The on-chip optical delay elements offers an integrated approach to accurately manipulate individual RF waveform features without constrains set by the speed and timing jitter of electronics, and should find broad applications ranging from high-speed wireless to defense electronics.

preprint2014arXiv

User recommendation in reciprocal and bipartite social networks -- a case study of online dating

Many social networks in our daily life are bipartite networks built on reciprocity. How can we recommend users/friends to a user, so that the user is interested in and attractive to recommended users? In this research, we propose a new collaborative filtering model to improve user recommendations in reciprocal and bipartite social networks. The model considers a user's "taste" in picking others and "attractiveness" in being picked by others. A case study of an online dating network shows that the new model has good performance in recommending both initial and reciprocal contacts.

preprint2013arXiv

Millimeter-scale and large-angle self-collimation in a photonic crystal composed of silicon nanorods

We report the observation of a large-angle self-collimation phenomenon occurring in photonic crystals composed of nanorods. Electromagnetic waves incident onto such photonic crystals from directions covering a wide-range of incident angles become highly localized along a single array of rods, which results in narrow-beam propagation without divergence. A propagation length of 0.4 mm is experimentally observed over the wavelength range of 1540 nm to 1570 nm, even in the large incident angle case, which is a very considerable length scale for on-chip optical interconnection.

preprint2011arXiv

3D Spatially Resolved Neutron Diffraction from a Disordered Vortex Lattice

The vortex matter in bulk type-II superconductors serves as a prototype system for studying the random pinning problem in condensed matter physics. Since the vortex lattice is embedded in an atomic lattice, small angle neutron scattering (SANS) is the only technique that allows for direct structural studies. In traditional SANS methods, the scattering intensity is a measure of the structure factor averaged over the entire sample. Recent studies in vortex physics have shown that it is highly desirable to develop a SANS technique which is capable of resolving the spatial inhomogeneities in the bulk vortex state. Here we report a novel slicing neutron diffraction technique using atypical collimation and an areal detector which allows for observing the three dimensional (3D) disorder of the vortex matter inside an as-grown Nb single crystal.

preprint2011arXiv

Comparative study on aging effect in BiFeO3 thin films substituted at A- and B-site

Typical characteristics of aging effect, double hysteresis loops, were observed in (100)-oriented Bi0.95Ca0.05FeO3 (BCFO) and BiFe0.95Ni0.05O3 (BFNO) films grown on LaNiO3(100)/Si substrates. The double hysteresis loops for BCFO film become less "constrained" with increasing applied voltage compared to that for BFNO, indicating that the aging effect is more severe in the latter. This can be demonstrated by the lower leakage current and smaller dielectric constant for BFNO. These phenomena are explained based on the crystal structure and defect chemistry. The defect states of the Bi, Ca, Fe, Ni and O ions were clarified by the XPS data.

preprint2011arXiv

Direct Evidence for Edge-Contaminated Vortex Phase in a Nb Single Crystal using Neutron Diffraction

We report the first direct observation of a disordered vortex matter phase existing near the edge of a bulk type-II superconductor Nb using a novel position-sensitive neutron diffraction technique. This "edge-contaminated" vortex state was implicated in previous studies using transport techniques and was postulated to have played a significant role in the behavior of vortex dynamics in a wide range of type-II superconductors. It is found that upon thermal annealing, the vortex matter in the bulk undergoes re-ordering, suggesting that the edge-contaminated bulk vortex state is metastable. The edge vortex state remains disordered after repeated thermal annealing, indicating spatial coexistence of a vortex glass with a Bragg glass. This observation resolves many outstanding issues concerning the peak effect in type-II superconductors.

preprint2011arXiv

Preparation and ferroelectric properties of (124)-oriented SrBi4Ti4O15 ferroelectric thin film on (110)-oriented LaNiO3 electrode

A (124)-oriented SrBi4Ti4O15 (SBTi) ferroelectric thin film with high volume fraction of αSBTi(124)=97% was obtained using a metal organic decomposition process on SiO2/Si substrate coated by (110)-oriented LaNiO3 (LNO) thin film. The remanent polarization and coercive field for (124)-oriented SBTi film are 12.1 μC/cm2 and 74 kV/cm, respectively. No evident fatigue of (124)-oriented SBTi thin film can be observed after 1{\times}10e9 switching cycles. Besides, the (124)-oriented SBTi film can be uniformly polarized over large areas using a piezoelectric-mode atomic force microscope. Considering that the annealing temperature was 650°C and the thickness of each deposited layer was merely 30 nm, a long-range epitaxial relationship between SBTi(124) and LNO(110) facets was proposed. The epitaxial relationship was demonstrated based on the crystal structures of SBTi and LNO.

Xi Wang

What is connected

Connect this record

See the researcher in context

Building this map preview

39 published item(s)

Modeling Subjective Urban Perception with Human Gaze

SF20K Competition 2025: Summary and findings

RHOBIN Challenge: Reconstruction of Human Object Interaction

The study of eleven contact binaries with mass ratios less than 0.1

Large-scale Knowledge Distillation with Elastic Heterogeneous Computing Resources

Length L-function for Network-Constrained Point Data

Light-Induced Ferromagnetism in Moiré Superlattices

Mass Testing and Characterization of 20-inch PMTs for JUNO

NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality

PeCLR: Self-Supervised 3D Hand Pose Estimation from monocular RGB via Equivariant Contrastive Learning

PrEF: Percolation-based Evolutionary Framework for the diffusion-source-localization problem in large networks

Self-supervised Context-aware Style Representation for Expressive Speech Synthesis

Sparse Local Patch Transformer for Robust Face Alignment and Landmarks Inherent Relation Learning

JUNO Physics and Detector

Leveraging Review Properties for Effective Recommendation

Observations of Ultrafast Superfluorescent Beatings in a Cesium Atomic Vapor Excited by Femtosecond Laser Pulses

Unraveling intrinsic flexoelectricity in twisted double bilayer graphene

Deep Mining External Imperfect Data for Chest X-ray Disease Screening

Dissipative Distillation of Supercritical Quantum Gases

Feasibility and physics potential of detecting $^8$B solar neutrinos at JUNO

LP-WaveNet: Linear Prediction-based WaveNet Speech Synthesis

Negative Confidence-Aware Weakly Supervised Binary Classification for Effective Review Helpfulness Classification

Performance Analysis of RSU-based Multihomed Multilane Vehicular Networks

TAO Conceptual Design Report: A Precision Measurement of the Reactor Antineutrino Spectrum with Sub-percent Energy Resolution

Time Series Data Cleaning with Regular and Irregular Time Intervals

Toward Quantifying Ambiguities in Artistic Images

Self-Assembled, Nanostructured, Tunable Metamaterials via Spinodal Decomposition

Evaluating Two-Stream CNN for Video Classification

Fusing Multi-Stream Deep Networks for Video Classification

Hamiltonian Properties of DCell Networks

Large-angle quasi-self-collimation effect in a rod-type photonic crystal

Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classification

Rapidly reconfigurable radio-frequency arbitrary waveforms synthesized on a CMOS photonic chip

User recommendation in reciprocal and bipartite social networks -- a case study of online dating

Millimeter-scale and large-angle self-collimation in a photonic crystal composed of silicon nanorods

3D Spatially Resolved Neutron Diffraction from a Disordered Vortex Lattice

Comparative study on aging effect in BiFeO3 thin films substituted at A- and B-site

Direct Evidence for Edge-Contaminated Vortex Phase in a Nb Single Crystal using Neutron Diffraction

Preparation and ferroelectric properties of (124)-oriented SrBi4Ti4O15 ferroelectric thin film on (110)-oriented LaNiO3 electrode