Source author record

Wei Xia

Wei Xia appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

38works

22topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Experience Sharing in Mutual Reinforcement Learning for Heterogeneous Language Models

We introduce Mutual Reinforcement Learning, a framework for concurrent RL post-training in which heterogeneous LLM policies exchange typed experience while keeping separate parameters, objectives, and tokenizers. The framework combines a Shared Experience Exchange (SEE), Multi-Worker Resource Allocation (MWRA), and a Tokenizer Heterogeneity Layer (THL) that retokenizes text and aligns token-level traces across incompatible vocabularies. This substrate makes the experience-sharing design question operational across model families. We instantiate three controlled probes on top of GRPO: data-level rollout sharing via Peer Rollout Pooling (PRP), value-level advantage sharing via Cross-Policy GRPO Advantage Sharing (XGRPO), and outcome-level success transfer via Success-Gated Transfer (SGT). A contextual-bandit analysis characterizes their structural positions on a stability-support trade-off: PRP pays density-ratio variance and THL residual costs, XGRPO preserves on-policy actor support while changing scalar baselines, and SGT supplies a rescue-set score direction toward verified peer successes. In the evaluated regime, outcome-level sharing occupies the favorable point of this trade-off.

preprint2024arXiv

Empirical Study of Large Language Models as Automated Essay Scoring Tools in English Composition__Taking TOEFL Independent Writing Task for Example

Large language models have demonstrated exceptional capabilities in tasks involving natural language generation, reasoning, and comprehension. This study aims to construct prompts and comments grounded in the diverse scoring criteria delineated within the official TOEFL guide. The primary objective is to assess the capabilities and constraints of ChatGPT, a prominent representative of large language models, within the context of automated essay scoring. The prevailing methodologies for automated essay scoring involve the utilization of deep neural networks, statistical machine learning techniques, and fine-tuning pre-trained models. However, these techniques face challenges when applied to different contexts or subjects, primarily due to their substantial data requirements and limited adaptability to small sample sizes. In contrast, this study employs ChatGPT to conduct an automated evaluation of English essays, even with a small sample size, employing an experimental approach. The empirical findings indicate that ChatGPT can provide operational functionality for automated essay scoring, although the results exhibit a regression effect. It is imperative to underscore that the effective design and implementation of ChatGPT prompts necessitate a profound domain expertise and technical proficiency, as these prompts are subject to specific threshold criteria. Keywords: ChatGPT, Automated Essay Scoring, Prompt Learning, TOEFL Independent Writing Task

preprint2022arXiv

Anisotropic Infrared Response and Orientation-dependent Strain-tuning of the Electronic Structure in Nb2SiTe4

Two-dimensional materials with tunable in-plane anisotropic infrared response promise versatile applications in polarized photodetectors and field-effect transistors. Black phosphorus is a prominent example. However, it suffers from poor ambient stability. Here, we report the strain-tunable anisotropic infrared response of a layered material Nb2SiTe4, whose lattice structure is similar to the 2H-phase transition metal dichalcogenides (TMDCs) with three different kinds of building units. Strikingly, some of the strain-tunable optical transitions are crystallographic axis-dependent, even showing opposite shift when uniaxial strain is applied along two in-plane principal axes. Moreover, G0W0-BSE calculations show good agreement with the anisotropic extinction spectra. The optical selection rules are obtained via group theory analysis, and the strain induced unusual shift trends are well explained by the orbital coupling analysis. Our comprehensive study suggests that Nb2SiTe4 is a good candidate for tunable polarization-sensitive optoelectronic devices.

preprint2022arXiv

Approaching a Minimal Topological Electronic Structure in Antiferromagnetic Topological Insulator MnBi2Te4 via Surface Modification

The topological electronic structure plays a central role in the non-trivial physical properties in topological quantum materials. A minimal, hydrogen-atom-like topological electronic structure is desired for researches. In this work, we demonstrate an effort towards the realization of such a system in the intrinsic magnetic topological insulator MnBi2Te4, by manipulating the topological surface state (TSS) via surface modification. Using high resolution laser- and synchrotron-based angle-resolved photoemission spectroscopy (ARPES), we found the TSS in MnBi2Te4 is heavily hybridized with a trivial Rashba-type surface state (RSS), which could be efficiently removed by the in situ surface potassium (K) dosing. By employing multiple experimental methods to characterize K dosed surface, we attribute such a modification to the electrochemical reactions of K clusters on the surface. Our work not only gives a clear band assignment in MnBi2Te4, but also provides possible new routes in accentuating the topological behavior in the magnetic topological quantum materials.

preprint2022arXiv

Direct Visualization and Manipulation of Tunable Quantum Well State in Semiconducting Nb2SiTe4

Quantum well states (QWSs) can form at the surface or interfaces of materials with confinement potential. They have broad applications in electronic and optical devices such as high mobility electron transistor, photodetector and quantum well laser. The properties of the QWSs are usually the key factors for the performance of the devices. However, direct visualization and manipulation of such states are in general challenging. In this work, by using angle-resolved photoemission spectroscopy (ARPES) and scanning tunneling microscopy/spectroscopy (STM/STS), we directly probe the QWSs generated on the vacuum interface of a narrow band gap semiconductor Nb2SiTe4. Interestingly, the position and splitting of QWSs could be easily manipulated via potassium (K) dosage onto the sample surface. Our results suggest Nb2SiTe4 to be an intriguing semiconductor system to study and engineer the QWSs, which has great potential in device applications.

preprint2022arXiv

MeMOT: Multi-Object Tracking with Memory

We propose an online tracking algorithm that performs the object detection and data association under a common framework, capable of linking objects after a long time span. This is realized by preserving a large spatio-temporal memory to store the identity embeddings of the tracked objects, and by adaptively referencing and aggregating useful information from the memory as needed. Our model, called MeMOT, consists of three main modules that are all Transformer-based: 1) Hypothesis Generation that produce object proposals in the current video frame; 2) Memory Encoding that extracts the core information from the memory for each tracked object; and 3) Memory Decoding that solves the object detection and data association tasks simultaneously for multi-object tracking. When evaluated on widely adopted MOT benchmark datasets, MeMOT observes very competitive performance.

preprint2022arXiv

Nontrivial topological states in BaSn5 superconductor probed by de Haas-van Alphen quantum oscillations

We report herein the nontrivial topological states in an intrinsic type-II superconductor BaSn5 (Tc ~ 4.4 K) probed via measuring the magnetizations, specific heat, de Haas-van Alphen (dHvA) effect and performing first principles calculations. The first principles calculations reveal a topological nodal ring structure centering at the H point in the kz = π plane of the Brillouin zone (BZ), which could be gapped by spin-orbit coupling (SOC), yielding rather small gaps below and above the Fermi level about 0.04 eV and 0.14 eV, respectively. The SOC also results in a pair of Dirac points along the Γ-A direction and located ~ 0.2 eV above the Fermi level. The analysis of the dHvA quantum oscillations supports the calculations by revealing nontrivial Berry phase originated from three hole and one electron pockets related to the bands forming the Dirac cones. Our study thus provides an excellent avenue for investigating the interplay between superconductivity and nontrivial topological states.

preprint2022arXiv

Observation of Dimension-Crossover of a Tunable 1D Dirac Fermion in Topological Semimetal NbSi$_x$Te$_2$

Condensed matter systems in low dimensions exhibit emergent physics that does not exist in three dimensions. When electrons are confined to one dimension (1D), some significant electronic states appear, such as charge density wave, spin-charge separations and Su-Schrieffer-Heeger (SSH) topological state. However, a clear understanding of how the 1D electronic properties connects with topology is currently lacking. Here we systematically investigated the characteristic 1D Dirac fermion electronic structure originated from the metallic NbTe$_2$ chains on the surface of the composition-tunable layered compound NbSi$_x$Te$_2$ ($x$ = 0.40 and 0.43) using angle-resolved photoemission spectroscopy. We found the Dirac fermion forms a Dirac nodal line structure protected by the combined $\widetilde{\mathcal{M}}{\rm_y}$ and time-reversal symmetry T and proves the NbSi$_x$Te$_2$ system as a topological semimetal, in consistent with the ab-initio calculations. As $x$ decreases, the interaction between adjacent NbTe2 chains increases and Dirac fermion goes through a dimension-crossover from 1D to 2D, as evidenced by the variation of its Fermi surface and Fermi velocity across the Brillouin zone in consistence with a Dirac SSH model. Our findings demonstrate a tunable 1D Dirac electron system, which offers a versatile platform for the exploration of intriguing 1D physics and device applications.

preprint2022arXiv

Stochastic Backpropagation: A Memory Efficient Strategy for Training Video Models

We propose a memory efficient method, named Stochastic Backpropagation (SBP), for training deep neural networks on videos. It is based on the finding that gradients from incomplete execution for backpropagation can still effectively train the models with minimal accuracy loss, which attributes to the high redundancy of video. SBP keeps all forward paths but randomly and independently removes the backward paths for each network layer in each training step. It reduces the GPU memory cost by eliminating the need to cache activation values corresponding to the dropped backward paths, whose amount can be controlled by an adjustable keep-ratio. Experiments show that SBP can be applied to a wide range of models for video tasks, leading to up to 80.0% GPU memory saving and 10% training speedup with less than 1% accuracy drop on action recognition and temporal action detection.

preprint2022arXiv

Turn-to-Diarize: Online Speaker Diarization Constrained by Transformer Transducer Speaker Turn Detection

In this paper, we present a novel speaker diarization system for streaming on-device applications. In this system, we use a transformer transducer to detect the speaker turns, represent each speaker turn by a speaker embedding, then cluster these embeddings with constraints from the detected speaker turns. Compared with conventional clustering-based diarization systems, our system largely reduces the computational cost of clustering due to the sparsity of speaker turns. Unlike other supervised speaker diarization systems which require annotations of time-stamped speaker labels for training, our system only requires including speaker turn tokens during the transcribing process, which largely reduces the human efforts involved in data collection.

preprint2021arXiv

CelebA-Spoof Challenge 2020 on Face Anti-Spoofing: Methods and Results

As facial interaction systems are prevalently deployed, security and reliability of these systems become a critical issue, with substantial research efforts devoted. Among them, face anti-spoofing emerges as an important area, whose objective is to identify whether a presented face is live or spoof. Recently, a large-scale face anti-spoofing dataset, CelebA-Spoof which comprised of 625,537 pictures of 10,177 subjects has been released. It is the largest face anti-spoofing dataset in terms of the numbers of the data and the subjects. This paper reports methods and results in the CelebA-Spoof Challenge 2020 on Face AntiSpoofing which employs the CelebA-Spoof dataset. The model evaluation is conducted online on the hidden test set. A total of 134 participants registered for the competition, and 19 teams made valid submissions. We will analyze the top ranked solutions and present some discussion on future work directions.

preprint2021arXiv

DEAAN: Disentangled Embedding and Adversarial Adaptation Network for Robust Speaker Representation Learning

Despite speaker verification has achieved significant performance improvement with the development of deep neural networks, domain mismatch is still a challenging problem in this field. In this study, we propose a novel framework to disentangle speaker-related and domain-specific features and apply domain adaptation on the speaker-related feature space solely. Instead of performing domain adaptation directly on the feature space where domain information is not removed, using disentanglement can efficiently boost adaptation performance. To be specific, our model's input speech from the source and target domains is first encoded into different latent feature spaces. The adversarial domain adaptation is conducted on the shared speaker-related feature space to encourage the property of domain-invariance. Further, we minimize the mutual information between speaker-related and domain-specific features for both domains to enforce the disentanglement. Experimental results on the VOiCES dataset demonstrate that our proposed framework can effectively generate more speaker-discriminative and domain-invariant speaker representations with a relative 20.3% reduction of EER compared to the original ResNet-based system.

preprint2021arXiv

DeeperForensics Challenge 2020 on Real-World Face Forgery Detection: Methods and Results

This paper reports methods and results in the DeeperForensics Challenge 2020 on real-world face forgery detection. The challenge employs the DeeperForensics-1.0 dataset, one of the most extensive publicly available real-world face forgery detection datasets, with 60,000 videos constituted by a total of 17.6 million frames. The model evaluation is conducted online on a high-quality hidden test set with multiple sources and diverse distortions. A total of 115 participants registered for the competition, and 25 teams made valid submissions. We will summarize the winning solutions and present some discussions on potential research directions.

preprint2021arXiv

Self-supervised Text-independent Speaker Verification using Prototypical Momentum Contrastive Learning

In this study, we investigate self-supervised representation learning for speaker verification (SV). First, we examine a simple contrastive learning approach (SimCLR) with a momentum contrastive (MoCo) learning framework, where the MoCo speaker embedding system utilizes a queue to maintain a large set of negative examples. We show that better speaker embeddings can be learned by momentum contrastive learning. Next, alternative augmentation strategies are explored to normalize extrinsic speaker variabilities of two random segments from the same speech utterance. Specifically, augmentation in the waveform largely improves the speaker representations for SV tasks. The proposed MoCo speaker embedding is further improved when a prototypical memory bank is introduced, which encourages the speaker embeddings to be closer to their assigned prototypes with an intermediate clustering step. In addition, we generalize the self-supervised framework to a semi-supervised scenario where only a small portion of the data is labeled. Comprehensive experiments on the Voxceleb dataset demonstrate that our proposed self-supervised approach achieves competitive performance compared with existing techniques, and can approach fully supervised results with partially labeled data.

preprint2021arXiv

The Reservoir Learning Power across Quantum Many-Boby Localization Transition

Harnessing the quantum computation power of the present noisy-intermediate-size-quantum devices has received tremendous interest in the last few years. Here we study the learning power of a one-dimensional long-range randomly-coupled quantum spin chain, within the framework of reservoir computing. In time sequence learning tasks, we find the system in the quantum many-body localized (MBL) phase holds long-term memory, which can be attributed to the emergent local integrals of motion. On the other hand, MBL phase does not provide sufficient nonlinearity in learning highly-nonlinear time sequences, which we show in a parity check task. This is reversed in the quantum ergodic phase, which provides sufficient nonlinearity but compromises memory capacity. In a complex learning task of Mackey-Glass prediction that requires both sufficient memory capacity and nonlinearity, we find optimal learning performance near the MBL-to-ergodic transition. This leads to a guiding principle of quantum reservoir engineering at the edge of quantum ergodicity reaching optimal learning power for generic complex reservoir learning tasks. Our theoretical finding can be readily tested with present experiments.

preprint2021arXiv

Towards Backward-Compatible Representation Learning

We propose a way to learn visual features that are compatible with previously computed ones even when they have different dimensions and are learned via different neural network architectures and loss functions. Compatible means that, if such features are used to compare images, then "new" features can be compared directly to "old" features, so they can be used interchangeably. This enables visual search systems to bypass computing new features for all previously seen images when updating the embedding models, a process known as backfilling. Backward compatibility is critical to quickly deploy new embedding models that leverage ever-growing large-scale training datasets and improvements in deep learning architectures and training methods. We propose a framework to train embedding models, called backward-compatible training (BCT), as a first step towards backward compatible representation learning. In experiments on learning embeddings for face recognition, models trained with BCT successfully achieve backward compatibility without sacrificing accuracy, thus enabling backfill-free model updates of visual embeddings.

preprint2020arXiv

6VecLM: Language Modeling in Vector Space for IPv6 Target Generation

Fast IPv6 scanning is challenging in the field of network measurement as it requires exploring the whole IPv6 address space but limited by current computational power. Researchers propose to obtain possible active target candidate sets to probe by algorithmically analyzing the active seed sets. However, IPv6 addresses lack semantic information and contain numerous addressing schemes, leading to the difficulty of designing effective algorithms. In this paper, we introduce our approach 6VecLM to explore achieving such target generation algorithms. The architecture can map addresses into a vector space to interpret semantic relationships and uses a Transformer network to build IPv6 language models for predicting address sequence. Experiments indicate that our approach can perform semantic classification on address space. By adding a new generation approach, our model possesses a controllable word innovation capability compared to conventional language models. The work outperformed the state-of-the-art target generation algorithms on two active address datasets by reaching more quality candidate sets.

preprint2020arXiv

Cross-domain Adaptation with Discrepancy Minimization for Text-independent Forensic Speaker Verification

Forensic audio analysis for speaker verification offers unique challenges due to location/scenario uncertainty and diversity mismatch between reference and naturalistic field recordings. The lack of real naturalistic forensic audio corpora with ground-truth speaker identity represents a major challenge in this field. It is also difficult to directly employ small-scale domain-specific data to train complex neural network architectures due to domain mismatch and loss in performance. Alternatively, cross-domain speaker verification for multiple acoustic environments is a challenging task which could advance research in audio forensics. In this study, we introduce a CRSS-Forensics audio dataset collected in multiple acoustic environments. We pre-train a CNN-based network using the VoxCeleb data, followed by an approach which fine-tunes part of the high-level network layers with clean speech from CRSS-Forensics. Based on this fine-tuned model, we align domain-specific distributions in the embedding space with the discrepancy loss and maximum mean discrepancy (MMD). This maintains effective performance on the clean set, while simultaneously generalizes the model to other acoustic domains. From the results, we demonstrate that diverse acoustic environments affect the speaker verification performance, and that our proposed approach of cross-domain adaptation can significantly improve the results in this scenario.

preprint2020arXiv

Divisibility results concerning truncated hypergeometric series

In this paper, using the well-known Karlsson-Minton formula, we mainly establish two divisibility results concerning truncated hypergeometric series. Let $n>2$ and $q>0$ be integers with $2\mid n$ or $2\nmid q$. We show that $$\sum_{k=0}^{p-1}\frac{(q-\frac{p}{n})_k^n}{(1)_k^n}\equiv0\pmod{p^3} $$ and $$p^n\sum_{k=0}^{p-1}\frac{(1)_k^n}{(\frac{p}{n}-q+2)_k^n}\equiv0\pmod{p^3}$$ for any prime $p>\max\{n,(q-1)n+1\}$, where $(x)_k$ denotes the Pochhammer symbol defined by $$ (x)_k=\begin{cases}1,\quad &k=0,\\ x(x+1)\cdots(x+k-1),\quad &k>0.\end{cases}$$ Let $n\geq4$ be an even integer. Then for any prime $p$ with $p\equiv-1\pmod{n}$, the first congruence above implies that $$\sum_{k=0}^{p-1} \frac{(\frac{1}{n})_k^n}{(1)_k^n}\equiv0\pmod{p^3}. $$ This confirms a recent conjecture of Guo.

preprint2020arXiv

Electronic Origin for the Enhanced Thermoelectric Efficiency of Cu2Se

Thermoelectric materials (TMs) can uniquely convert waste heat into electricity, which provides a potential solution for the global energy crisis that is increasingly severe. Bulk Cu2Se, with ionic conductivity of Cu ions, exhibits a significant enhancement of its thermoelectric figure of merit zT by a factor of ~3 near its structural transition around 400 K. Here, we show a systematic study of the electronic structure of Cu2Se and its temperature evolution using high-resolution angle-resolved photoemission spectroscopy. Upon heating across the structural transition, the electronic states near the corner of the Brillouin zone gradually disappear, while the bands near the centre of Brillouin zone shift abruptly towards high binding energies and develop an energy gap. Interestingly, the observed band reconstruction well reproduces the temperature evolution of the Seebeck coefficient of Cu2Se, providing an electronic origin for the drastic enhancement of the thermoelectric performance near 400 K. The current results not only bridge among structural phase transition, electronic structures, and thermoelectric properties in a condensed matter system, but also provide valuable insights into the search and design of new generation of thermoelectric materials.

preprint2020arXiv

Magnetic critical behavior of the van der Waals Fe5GeTe2 crystal with near room temperature ferromagnetism

The van der Waals ferromagnet Fe5GeTe2 has a Curie temperature TC of about 270 K, which can be raised above room temperature by tuning the Fe deficiency content. To achieve insights into its ferromagnetic exchange, we have studied the critical behavior by measuring the magnetization in bulk Fe5GeTe2 crystal around the ferromagnetic to paramagnetic phase transition. The analysis of the magnetization by employing various techniques including the modified Arrott plot, Kouvel-Fisher plot and critical isotherm analysis achieved a set of reliable critical exponents with TC = 273.7 K, beta = 0.3457, gamma = 1.40617, and delta = 5.021, suggesting a three-dimensional magnetic exchange with the distance decaying as J(r) ~ (r)$^-4.916, which is close to that of a three-dimensional Heisenberg model with long-range magnetic coupling.

preprint2020arXiv

Magnetism-induced topological transition in EuAs3

The nature of the interaction between magnetism and topology in magnetic topological semimetals remains mysterious, but may be expected to lead to a variety of novel physics. We present $ab$ $initio$ band calculations, electrical transport and angle-resolved photoemission spectroscopy (ARPES) measurements on the magnetic semimetal EuAs$_3$, demonstrating a magnetism-induced topological transition from a topological nodal-line semimetal in the paramagnetic or the spin-polarized state to a topological massive Dirac metal in the antiferromagnetic (AFM) ground state at low temperature, featuring a pair of massive Dirac points, inverted bands and topological surface states on the (010) surface. Shubnikov-de Haas (SdH) oscillations in the AFM state identify nonzero Berry phase and a negative longitudinal magnetoresistance ($n$-LMR) induced by the chiral anomaly, confirming the topological nature predicted by band calculations. When magnetic moments are fully polarized by an external magnetic field, an unsaturated and extremely large magnetoresistance (XMR) of $\sim$ 2$\times10^5$ % at 1.8 K and 28.3 T is observed, likely arising from topological protection. Consistent with band calculations for the spin-polarized state, four new bands in quantum oscillations different from those in the AFM state are discerned, of which two are topologically protected. Nodal-line structures at the $Y$ point in the Brillouin zone (BZ) are proposed in both the spin-polarized and paramagnetic states, and the latter is proven by ARPES. Moreover, a temperature-induced Lifshitz transition accompanied by the emergence of a new band below 3 K is revealed. These results indicate that magnetic EuAs$_3$ provides a rich platform to explore exotic physics arising from the interaction of magnetism with topology.

preprint2020arXiv

On Improving Temporal Consistency for Online Face Liveness Detection

In this paper, we focus on improving the online face liveness detection system to enhance the security of the downstream face recognition system. Most of the existing frame-based methods are suffering from the prediction inconsistency across time. To address the issue, a simple yet effective solution based on temporal consistency is proposed. Specifically, in the training stage, to integrate the temporal consistency constraint, a temporal self-supervision loss and a class consistency loss are proposed in addition to the softmax cross-entropy loss. In the deployment stage, a training-free non-parametric uncertainty estimation module is developed to smooth the predictions adaptively. Beyond the common evaluation approach, a video segment-based evaluation is proposed to accommodate more practical scenarios. Extensive experiments demonstrated that our solution is more robust against several presentation attacks in various scenarios, and significantly outperformed the state-of-the-art on multiple public datasets by at least 40% in terms of ACER. Besides, with much less computational complexity (33% fewer FLOPs), it provides great potential for low-latency online applications.

preprint2020arXiv

Open-set Short Utterance Forensic Speaker Verification using Teacher-Student Network with Explicit Inductive Bias

In forensic applications, it is very common that only small naturalistic datasets consisting of short utterances in complex or unknown acoustic environments are available. In this study, we propose a pipeline solution to improve speaker verification on a small actual forensic field dataset. By leveraging large-scale out-of-domain datasets, a knowledge distillation based objective function is proposed for teacher-student learning, which is applied for short utterance forensic speaker verification. The objective function collectively considers speaker classification loss, Kullback-Leibler divergence, and similarity of embeddings. In order to advance the trained deep speaker embedding network to be robust for a small target dataset, we introduce a novel strategy to fine-tune the pre-trained student model towards a forensic target domain by utilizing the model as a finetuning start point and a reference in regularization. The proposed approaches are evaluated on the 1st48-UTD forensic corpus, a newly established naturalistic dataset of actual homicide investigations consisting of short utterances recorded in uncontrolled conditions. We show that the proposed objective function can efficiently improve the performance of teacher-student learning on short utterances and that our fine-tuning strategy outperforms the commonly used weight decay method by providing an explicit inductive bias towards the pre-trained model.

preprint2020arXiv

Speaker Representation Learning using Global Context Guided Channel and Time-Frequency Transformations

In this study, we propose the global context guided channel and time-frequency transformations to model the long-range, non-local time-frequency dependencies and channel variances in speaker representations. We use the global context information to enhance important channels and recalibrate salient time-frequency locations by computing the similarity between the global context and local features. The proposed modules, together with a popular ResNet based model, are evaluated on the VoxCeleb1 dataset, which is a large scale speaker verification corpus collected in the wild. This lightweight block can be easily incorporated into a CNN model with little additional computational costs and effectively improves the speaker verification performance compared to the baseline ResNet-LDE model and the Squeeze&Excitation block by a large margin. Detailed ablation studies are also performed to analyze various factors that may impact the performance of the proposed modules. We find that by employing the proposed L2-tf-GTFC transformation block, the Equal Error Rate decreases from 4.56% to 3.07%, a relative 32.68% reduction, and a relative 27.28% improvement in terms of the DCF score. The results indicate that our proposed global context guided transformation modules can efficiently improve the learned speaker representations by achieving time-frequency and channel-wise feature recalibration.

preprint2020arXiv

Terahertz emission in the van der Waals magnet CrSiTe3

The van der Waals magnet CrSiTe3 (CST) has captured immense interest because it is capable of retaining the long-range ferromagnetic order even in its monolayer form, thus offering potential use in spintronic devices. Bulk CST crystal has inversion symmetry that is broken on the crystal surface. Here, by employing ultrafast terahertz (THz) emission spectroscopy and time resolved THz spectroscopy, the THz emission of the CST crystal was investigated, which shows a strong THz emission from the crystal surface under femtosecond (fs) pulse excitation at 800 nm. Theoretical analysis based on space symmetry of CST suggests the dominant role of shift current occurring on the surface with a thickness of a few quintuple layers in producing the THz emission, in consistence with the experimental observation that the emitted THz amplitude strongly depends on the azimuthal and pumping polarization angles. The present study offers a new efficient THz emitter as well as a better understanding of the nonlinear optical response of CST. It hopefully will open a window toward the investigation on the nonlinear optical response in the mono-/few-layer van der Waals crystals with low-dimensional magnetism.

preprint2020arXiv

The de Hass-van Alphen quantum oscillations in a three-dimensional Dirac semimetal TiSb2

We have used the de Hass-van Alphen (dHvA) effect to investigate the Fermi surface of high-quality crystalline TiSb2, which unveiled a nontrivial topologic nature by analyzing the dHvA quantum oscillations. Moreover, our analysis on the quantum oscillation frequencies associated with nonzero Berry phase when the magnetic field is parallel to both of the ab-plane and c-axis of TiSb2 finds that the Fermi surface topology has a three-dimensional (3D) feature. The results are supported by the first-principle calculations which revealed a symmetry-protected Dirac point appeared along the Γ-Z high symmetry line near the Fermi level. On the (001) surface, the bulk Dirac points are found to project onto the -Γ point with nontrivial surface states. Our finding will substantially enrich the family of 3D Dirac semimetals which are useful for topological applications.

preprint2020arXiv

The de Hass-van Alphen quantum oscillations in BaSn3 superconductor with multiple Dirac fermions

By measuring the de Hass-van Alphen effect and calculating the electronic band structure, we have investigated the bulk Fermi surface of the BaSn3 superconductor with a transition temperature of ~ 4.4 K. Striking de Haas-van Alphen (dHvA) quantum oscillations are observed when the magnetic field B is perpendicular to both (100) and (001) planes. Our analysis unveiled nontrivial Berry phase imposed in the quantum oscillations when B is perpendicular to (100), with two fundamental frequencies at 31.5 T and 306.7 T, which likely arise from two corresponding hole pockets of the bands forming a type-II Dirac point. The results are supported by the ab initio calculations indicating a type-II Dirac point setting and tilting along the high symmetric K-H line of the Brillouin zone, about 0.13 eV above the Fermi level. Moreover, the calculations also revealed other two type-I Dirac points on the high symmetric Γ-A direction, but slightly far below the Fermi level. The results demonstrate BaSn3 as an excellent platform for the study of not only exotic properties of different types of Dirac fermions in a single material, but also the interplay between nontrivial topological states and superconductivity.

preprint2020arXiv

Towards causal benchmarking of bias in face analysis algorithms

Measuring algorithmic bias is crucial both to assess algorithmic fairness, and to guide the improvement of algorithms. Current methods to measure algorithmic bias in computer vision, which are based on observational datasets, are inadequate for this task because they conflate algorithmic bias with dataset bias. To address this problem we develop an experimental method for measuring algorithmic bias of face analysis algorithms, which manipulates directly the attributes of interest, e.g., gender and skin tone, in order to reveal causal links between attribute variation and performance change. Our proposed method is based on generating synthetic ``transects'' of matched sample images that are designed to differ along specific attributes while leaving other attributes constant. A crucial aspect of our approach is relying on the perception of human observers, both to guide manipulations, and to measure algorithmic bias. Besides allowing the measurement of algorithmic bias, synthetic transects have other advantages with respect to observational datasets: they sample attributes more evenly allowing for more straightforward bias analysis on minority and intersectional groups, they enable prediction of bias in new scenarios, they greatly reduce ethical and legal challenges, and they are economical and fast to obtain, helping make bias testing affordable and widely available. We validate our method by comparing it to a study that employs the traditional observational method for analyzing bias in gender classification algorithms. The two methods reach different conclusions. While the observational method reports gender and skin color biases, the experimental method reveals biases due to gender, hair length, age, and facial hair.

preprint2020arXiv

Updating Weight Values for Function Point Counting

While software development productivity has grown rapidly, the weight values assigned to count standard Function Point (FP) created at IBM twenty-five years ago have never been updated. This obsolescence raises critical questions about the validity of the weight values; it also creates other problems such as ambiguous classification, crisp boundary, as well as subjective and locally defined weight values. All of these challenges reveal the need to calibrate FP in order to reflect both the specific software application context and the trend of todays software development techniques more accurately. We have created a FP calibration model that incorporates the learning ability of neural networks as well as the capability of capturing human knowledge using fuzzy logic. The empirical validation using ISBSG Data Repository (release 8) shows an average improvement of 22% in the accuracy of software effort estimations with the new calibration.

preprint2019arXiv

Bulk Fermi surface of the layered superconductor TaSe3 with three-dimensional strong topological insulator state

High magnetic field transport measurements and ab initio calculations on the layered superconductor TaSe3 have provided compelling evidences for the existence of a three-dimensional strong topological insulator state. Longitudinal magnetotransport measurements up to ~ 33 T unveiled striking Shubnikov-de Hass oscillations with two fundamental frequencies at 100 T and 175 T corresponding to a nontrivial electron Fermi pocket at the B point and a nontrivial hole Fermi pocket at the Γ point respectively in the Brillouin zone. However, calculations revealed one more electron pocket at the B point, which was not detected by the magnetotransport measurements, presumably due to the limited carrier momentum relaxation time. Angle dependent quantum oscillations by rotating the sample with respect to the magnetic field revealed clear changes in the two fundamental frequencies, indicating anisotropic electronic Fermi pockets. The ab initio calculations gave the topological Z2 invariants of (1; 100) and revealed a single Dirac cone on the (1 0 -1) surface at the X point with helical spin texture at a constant-energy contour, suggesting a strong topological insulator state. The results demonstrate TaSe3 an excellent platform to study the interplay between topological phase and superconductivity and a promising system for the exploration of topological superconductivity.

preprint2019arXiv

Cross-lingual Text-independent Speaker Verification using Unsupervised Adversarial Discriminative Domain Adaptation

Speaker verification systems often degrade significantly when there is a language mismatch between training and testing data. Being able to improve cross-lingual speaker verification system using unlabeled data can greatly increase the robustness of the system and reduce human labeling costs. In this study, we introduce an unsupervised Adversarial Discriminative Domain Adaptation (ADDA) method to effectively learn an asymmetric mapping that adapts the target domain encoder to the source domain, where the target domain and source domain are speech data from different languages. ADDA, together with a popular Domain Adversarial Training (DAT) approach, are evaluated on a cross-lingual speaker verification task: the training data is in English from NIST SRE04-08, Mixer 6 and Switchboard, and the test data is in Chinese from AISHELL-I. We show that with the ADDA adaptation, Equal Error Rate (EER) of the x-vector system decreases from 9.331\% to 7.645\%, relatively 18.07\% reduction of EER, and 6.32\% reduction from DAT as well. Further data analysis of ADDA adapted speaker embedding shows that the learned speaker embeddings can perform well on speaker classification for the target domain data, and are less dependent with respect to the shift in language.

preprint2019arXiv

Magnetotransport properties of the layered CaAl2Si2 semimetal hosting multiple nontrivial topological states

Combination of different nontrivial topological states in a single material is capable of realizing multiple functionalities and exotic physics, but such materials are still very sparse. We report herein the results of magnetotransport measurements and ab initio calculations on single crystalline CaAl2Si2 semimetal. The transport properties could be well understood in connection with the two-band model, agreeing well with the theoretical calculations indicating four main sheets of Fermi surface consisting of three hole pockets centered at the Γ point and one electron pocket centered at the M point in the Brillouin zone. The single fundamental frequency imposed in the quantum oscillations of magnetoresistance corresponds to the electron Fermi pocket. Without spin-orbit coupling (SOC), the ab initio calculations suggest CaAl2Si2 as a system hosting a topological nodal-line setting around the Γ point in the Brillouin zone close to the Fermi level. Once including the SOC, the fragile nodal-line will be gapped and a pair of Dirac points emerge along the high symmetric Γ-A direction, which is about 1.22 eV below the Fermi level. The SOC can also induce a topological insulator state along the Γ-A direction with a gap of about 3 meV. The results demonstrate CaAl2Si2 as an excellent platform for the study of novel topological physics with multiple topological states.

preprint2018arXiv

Complex Balanced Spaces

In this paper, the concept of balanced manifolds is generalized to reduced complex spaces: the class B and balanced spaces. Compared with the case of Kahlerian, the class B is similar to the Fujiki class C and the balanced space is similar to the Kahler space. Some properties about these complex spaces are obtained, and the relations between the balanced spaces and the class B are studied.

preprint2016arXiv

Strongly Gauduchon spaces

We define strongly Gauduchon spaces and the class SG which are generalization of strongly Gauduchon manifolds in complex spaces. Comparing with the case of Kahlerian, the strongly Gauduchon space and the class SG are similar to the Kahler space and the Fujiki class C respectively. Some properties about these complex spaces are obtained, and the relations between the strongly Gauduchon spaces and the class SG are studied.

preprint2015arXiv

A Neuro-Fuzzy Model for Function Point Calibration

The need to update the calibration of Function Point (FP) complexity weights is discussed, whose aims are to fit specific software application, to reflect software industry trend, and to improve cost estimation. Neuro-Fuzzy is a technique that incorporates the learning ability from neural network and the ability to capture human knowledge from fuzzy logic. The empirical validation using ISBSG data repository Release 8 shows a 22% improvement in software effort estimation after calibration using Neuro-Fuzzy technique.

preprint2015arXiv

Calibrating Function Points Using Neuro-Fuzzy Technique

The concepts of calibrating Function Points are discussed, whose aims are to fit specific software application, to reflect software industry trend, and to improve cost estimation. Neuro-Fuzzy is a technique which incorporates the learning ability from neural network and the ability to capture human knowledge from fuzzy logic. The empirical validation using ISBSG data repository Release 8 shows a 22% improvement in software effort estimation after calibration using Neuro-Fuzzy technique.

preprint2014arXiv

CNN: Single-label to Multi-label

Convolutional Neural Network (CNN) has demonstrated promising performance in single-label image classification tasks. However, how CNN best copes with multi-label images still remains an open problem, mainly due to the complex underlying object layouts and insufficient multi-label training images. In this work, we propose a flexible deep CNN infrastructure, called Hypotheses-CNN-Pooling (HCP), where an arbitrary number of object segment hypotheses are taken as the inputs, then a shared CNN is connected with each hypothesis, and finally the CNN output results from different hypotheses are aggregated with max pooling to produce the ultimate multi-label predictions. Some unique characteristics of this flexible deep CNN infrastructure include: 1) no ground truth bounding box information is required for training; 2) the whole HCP infrastructure is robust to possibly noisy and/or redundant hypotheses; 3) no explicit hypothesis label is required; 4) the shared CNN may be well pre-trained with a large-scale single-label image dataset, e.g. ImageNet; and 5) it may naturally output multi-label prediction results. Experimental results on Pascal VOC2007 and VOC2012 multi-label image datasets well demonstrate the superiority of the proposed HCP infrastructure over other state-of-the-arts. In particular, the mAP reaches 84.2% by HCP only and 90.3% after the fusion with our complementary result in [47] based on hand-crafted features on the VOC2012 dataset, which significantly outperforms the state-of-the-arts with a large margin of more than 7%.

Wei Xia

What is connected

Connect this record

See the researcher in context

Building this map preview

38 published item(s)

Experience Sharing in Mutual Reinforcement Learning for Heterogeneous Language Models

Empirical Study of Large Language Models as Automated Essay Scoring Tools in English Composition__Taking TOEFL Independent Writing Task for Example

Anisotropic Infrared Response and Orientation-dependent Strain-tuning of the Electronic Structure in Nb2SiTe4

Approaching a Minimal Topological Electronic Structure in Antiferromagnetic Topological Insulator MnBi2Te4 via Surface Modification

Direct Visualization and Manipulation of Tunable Quantum Well State in Semiconducting Nb2SiTe4

MeMOT: Multi-Object Tracking with Memory

Nontrivial topological states in BaSn5 superconductor probed by de Haas-van Alphen quantum oscillations

Observation of Dimension-Crossover of a Tunable 1D Dirac Fermion in Topological Semimetal NbSi$_x$Te$_2$

Stochastic Backpropagation: A Memory Efficient Strategy for Training Video Models

Turn-to-Diarize: Online Speaker Diarization Constrained by Transformer Transducer Speaker Turn Detection

CelebA-Spoof Challenge 2020 on Face Anti-Spoofing: Methods and Results

DEAAN: Disentangled Embedding and Adversarial Adaptation Network for Robust Speaker Representation Learning

DeeperForensics Challenge 2020 on Real-World Face Forgery Detection: Methods and Results

Self-supervised Text-independent Speaker Verification using Prototypical Momentum Contrastive Learning

The Reservoir Learning Power across Quantum Many-Boby Localization Transition

Towards Backward-Compatible Representation Learning

6VecLM: Language Modeling in Vector Space for IPv6 Target Generation

Cross-domain Adaptation with Discrepancy Minimization for Text-independent Forensic Speaker Verification

Divisibility results concerning truncated hypergeometric series

Electronic Origin for the Enhanced Thermoelectric Efficiency of Cu2Se

Magnetic critical behavior of the van der Waals Fe5GeTe2 crystal with near room temperature ferromagnetism

Magnetism-induced topological transition in EuAs3

On Improving Temporal Consistency for Online Face Liveness Detection

Open-set Short Utterance Forensic Speaker Verification using Teacher-Student Network with Explicit Inductive Bias

Speaker Representation Learning using Global Context Guided Channel and Time-Frequency Transformations

Terahertz emission in the van der Waals magnet CrSiTe3

The de Hass-van Alphen quantum oscillations in a three-dimensional Dirac semimetal TiSb2

The de Hass-van Alphen quantum oscillations in BaSn3 superconductor with multiple Dirac fermions

Towards causal benchmarking of bias in face analysis algorithms

Updating Weight Values for Function Point Counting

Bulk Fermi surface of the layered superconductor TaSe3 with three-dimensional strong topological insulator state

Cross-lingual Text-independent Speaker Verification using Unsupervised Adversarial Discriminative Domain Adaptation

Magnetotransport properties of the layered CaAl2Si2 semimetal hosting multiple nontrivial topological states

Complex Balanced Spaces

Strongly Gauduchon spaces

A Neuro-Fuzzy Model for Function Point Calibration

Calibrating Function Points Using Neuro-Fuzzy Technique

CNN: Single-label to Multi-label