Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
20works
0followers
17topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

20 published item(s)

preprint2022arXiv

A light-weight full-band speech enhancement model

Deep neural network based full-band speech enhancement systems face challenges of high demand of computational resources and imbalanced frequency distribution. In this paper, a light-weight full-band model is proposed with two dedicated strategies, i.e., a learnable spectral compression mapping for more effective high-band spectral information compression, and the utilization of the multi-head attention mechanism for more effective modeling of the global spectral pattern. Experiments validate the efficacy of the proposed strategies and show that the proposed model achieves competitive performance with only 0.89M parameters.

preprint2022arXiv

A two-stage full-band speech enhancement model with effective spectral compression mapping

The direct expansion of deep neural network (DNN) based wide-band speech enhancement (SE) to full-band processing faces the challenge of low frequency resolution in low frequency range, which would highly likely lead to deteriorated performance of the model. In this paper, we propose a learnable spectral compression mapping (SCM) to effectively compress the high frequency components so that they can be processed in a more efficient manner. By doing so, the model can pay more attention to low and middle frequency range, where most of the speech power is concentrated. Instead of suppressing noise in a single network structure, we first estimate a spectral magnitude mask, converting the speech to a high signal-to-ratio (SNR) state, and then utilize a subsequent model to further optimize the real and imaginary mask of the pre-enhanced signal. We conduct comprehensive experiments to validate the efficacy of the proposed method.

preprint2022arXiv

ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking Inference

State-of-the-art neural models typically encode document-query pairs using cross-attention for re-ranking. To this end, models generally utilize an encoder-only (like BERT) paradigm or an encoder-decoder (like T5) approach. These paradigms, however, are not without flaws, i.e., running the model on all query-document pairs at inference-time incurs a significant computational cost. This paper proposes a new training and inference paradigm for re-ranking. We propose to finetune a pretrained encoder-decoder model using in the form of document to query generation. Subsequently, we show that this encoder-decoder architecture can be decomposed into a decoder-only language model during inference. This results in significant inference time speedups since the decoder-only architecture only needs to learn to interpret static encoder embeddings during inference. Our experiments show that this new paradigm achieves results that are comparable to the more expensive cross-attention ranking approaches while being up to 6.8X faster. We believe this work paves the way for more efficient neural rankers that leverage large pretrained models.

preprint2022arXiv

From Data to Software to Science with the Rubin Observatory LSST

The Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST) dataset will dramatically alter our understanding of the Universe, from the origins of the Solar System to the nature of dark matter and dark energy. Much of this research will depend on the existence of robust, tested, and scalable algorithms, software, and services. Identifying and developing such tools ahead of time has the potential to significantly accelerate the delivery of early science from LSST. Developing these collaboratively, and making them broadly available, can enable more inclusive and equitable collaboration on LSST science. To facilitate such opportunities, a community workshop entitled "From Data to Software to Science with the Rubin Observatory LSST" was organized by the LSST Interdisciplinary Network for Collaboration and Computing (LINCC) and partners, and held at the Flatiron Institute in New York, March 28-30th 2022. The workshop included over 50 in-person attendees invited from over 300 applications. It identified seven key software areas of need: (i) scalable cross-matching and distributed joining of catalogs, (ii) robust photometric redshift determination, (iii) software for determination of selection functions, (iv) frameworks for scalable time-series analyses, (v) services for image access and reprocessing at scale, (vi) object image access (cutouts) and analysis at scale, and (vii) scalable job execution systems. This white paper summarizes the discussions of this workshop. It considers the motivating science use cases, identified cross-cutting algorithms, software, and services, their high-level technical specifications, and the principles of inclusive collaborations needed to develop them. We provide it as a useful roadmap of needs, as well as to spur action and collaboration between groups and individuals looking to develop reusable software for early LSST science.

preprint2022arXiv

High-performance and Low-power Transistors Based on Anisotropic Monolayer $β$-TeO$_2$

Two-dimensional (2D) semiconductors offer a promising prospect for high-performance and energy-efficient devices especially in the sub-10 nm regime. Inspired by the successful fabrication of 2D $β$-TeO$_2$ and the high on/off ratio and high air-stability of fabricated field effect transistors (FETs) [Nat. Electron. 2021, 4, 277], we provide a comprehensive investigation of the electronic structure of monolayer $β$-TeO$_2$ and the device performance of sub-10 nm metal oxide semiconductors FETs (MOSFETs) based on this material. The anisotropic electronic structure of monolayer $β$-TeO$_2$ plays a critical role in the anisotropy of transport properties for MOSFETs. We show that the 5.2-nm gate-length n-type MOSFET holds an ultra-high on-state current exceeding 3700 μA/μm according to International Roadmap for Devices and Systems (IRDS) 2020 goals for high-performance devices, which is benefited by the highly anisotropic electron effective mass. Moreover, monolayer $β$-TeO$_2$ MOSFETs can fulfill the IRDS 2020 goals for both high-performance and low-power devices in terms of on-state current, sub-threshold swing, delay time, and power-delay product. This study unveils monolayer $β$-TeO$_2$ as a promising candidate for ultra-scaled devices in future nanoelectronics.

preprint2022arXiv

Interpretable learning of voltage for electrode design of multivalent metal-ion batteries

Deep learning (DL) has indeed emerged as a powerful tool for rapidly and accurately predicting materials properties from big data, such as the design of current commercial Li-ion batteries. However, its practical utility for multivalent metal-ion batteries (MIBs), the most promising future solution of large-scale energy storage, is limited due to the scarce MIB data availability and poor DL model interpretability. Here, we develop an interpretable DL model as an effective and accurate method for learning electrode voltages of multivalent MIBs (divalent magnesium, calcium, zinc, and trivalent aluminum) at small dataset limits (150~500). Using the experimental results as validation, our model is much more accurate than machine-learning models which usually are better than DL in the small dataset regime. Besides the high accuracy, our feature-engineering-free DL model is explainable, which automatically extracts the atom covalent radius as the most important feature for the voltage learning by visualizing vectors from the layers of the neural network. The presented model potentially accelerates the design and optimization of multivalent MIB materials with fewer data and less domain-knowledge restriction, and is implemented into a publicly available online tool kit in http://batteries.2dmatpedia.org/ for the battery community.

preprint2022arXiv

On the approximation of queue-length distributions in transportation networks

This paper focuses on the analytical probabilistic modeling of vehicular traffic. It formulates a stochastic node model. It then formulates a network model by coupling the node model with the link model of Lu and Osorio (2018), which is a stochastic formulation of the traffic-theoretic link transmission model. The proposed network model is scalable and computationally efficient, making it suitable for urban network optimization. For a network with $r$ links, each of space capacity $\ell$, the model has a complexity of $\mathcal{O}(r\ell)$. The network model yields the marginal distribution of link states. The model is validated versus a simulation-based network implementation of the stochastic link transmission model. The validation experiments consider a set of small network with intricate traffic dynamics. For all scenarios, the proposed model accurately captures the traffic dynamics. The network model is used to address a signal control problem. Compared to the probabilistic link model of Lu and Osorio (2018) with an exogenous node model and a benchmark deterministic network loading model, the proposed network model derives signal plans with better performance. The case study highlights the added value of using between-link (i.e., across-node) interaction information for traffic management and accounting for stochasticity in the network.

preprint2022arXiv

PMAL: Open Set Recognition via Robust Prototype Mining

Open Set Recognition (OSR) has been an emerging topic. Besides recognizing predefined classes, the system needs to reject the unknowns. Prototype learning is a potential manner to handle the problem, as its ability to improve intra-class compactness of representations is much needed in discrimination between the known and the unknowns. In this work, we propose a novel Prototype Mining And Learning (PMAL) framework. It has a prototype mining mechanism before the phase of optimizing embedding space, explicitly considering two crucial properties, namely high-quality and diversity of the prototype set. Concretely, a set of high-quality candidates are firstly extracted from training samples based on data uncertainty learning, avoiding the interference from unexpected noise. Considering the multifarious appearance of objects even in a single category, a diversity-based strategy for prototype set filtering is proposed. Accordingly, the embedding space can be better optimized to discriminate therein the predefined classes and between known and unknowns. Extensive experiments verify the two good characteristics (i.e., high-quality and diversity) embraced in prototype mining, and show the remarkable performance of the proposed framework compared to state-of-the-arts.

preprint2022arXiv

Rapid-Flooding Time Synchronization for Large-Scale Wireless Sensor Networks

Accurate and fast-convergent time synchronization is very important for wireless sensor networks. The flooding time synchronization converges fast, but its transmission delay and by-hop error accumulation seriously reduce the synchronization accuracy. In this article, a rapidflooding multiple one-way broadcast time-synchronization (RMTS) protocol for large-scale wireless sensor networks is proposed. To minimize the by-hop error accumulation, the RMTS uses maximum likelihood estimations for clock skew estimation and clock offset estimation, and quickly shares the estimations among the networks. As a result, the synchronization error resulting from delays is greatly reduced, while faster convergence and higher-accuracy synchronization is achieved. Extensive experimental results demonstrate that, even over 24-hops networks, the RMTS is able to build accurate synchronization at the third synchronization period, and moreover, the by-hop error accumulation is slower when the network diameter increases.

preprint2022arXiv

Semi-blind source separation using convolutive transfer function for nonlinear acoustic echo cancellation

The recently proposed semi-blind source separation (SBSS) method for nonlinear acoustic echo cancellation (NAEC) outperforms adaptive NAEC in attenuating the nonlinear acoustic echo. However, the multiplicative transfer function (MTF) approximation makes it unsuitable for real-time applications especially in highly reverberant environments, and the natural gradient makes it hard to balance well between fast convergence speed and stability. In this paper, we propose two more effective SBSS methods based on auxiliary-function-based independent vector analysis (AuxIVA) and independent low-rank matrix analysis (ILRMA). The convolutive transfer function (CTF) approximation is used instead of MTF so that a long impulse response can be modeled with a short latency. The optimization schemes used in AuxIVA and ILRMA are carefully regularized according to the constrained demixing matrix of NAEC. Experimental results validate significantly better echo cancellation performance of the proposed methods.

preprint2021arXiv

Coronary Plaque Analysis for CT Angiography Clinical Research

The analysis of plaque deposits in the coronary vasculature is an important topic in current clinical research. From a technical side mostly new algorithms for different sub tasks - e.g. centerline extraction or vessel/plaque segmentation - are proposed. However, to enable clinical research with the help of these algorithms, a software solution, which enables manual correction, comprehensive visual feedback and tissue analysis capabilities, is needed. Therefore, we want to present such an integrated software solution. It is able to perform robust automatic centerline extraction and inner and outer vessel wall segmentation, while providing easy to use manual correction tools. Also, it allows for annotation of lesions along the centerlines, which can be further analyzed regarding their tissue composition. Furthermore, it enables research in upcoming technologies and research directions: it does support dual energy CT scans with dedicated plaque analysis and the quantification of the fatty tissue surrounding the vasculature, also in automated set-ups.

preprint2021arXiv

Few-photon optical diode in a chiral waveguide

We study the coherent transport of one or two photons in a 1D waveguide chirally coupled to a nonlinear resonator. Analytic solutions of the one-photon and two-photon scattering is derived. Although the resonator acts as a non-reciprocal phase shifter, light transmission is reciprocal at one-photon level. However, the forward and reverse transmitted probabilities for two photons incident from either the left side or the right side of the nonlinear resonator are nonreciprocal due to the energy redistribution of the two-photon bound state. Hence, the nonlinear resonator acts as an optical diode at two-photon level.

preprint2020arXiv

Category-Specific CNN for Visual-aware CTR Prediction at JD.com

As one of the largest B2C e-commerce platforms in China, JD com also powers a leading advertising system, serving millions of advertisers with fingertip connection to hundreds of millions of customers. In our system, as well as most e-commerce scenarios, ads are displayed with images.This makes visual-aware Click Through Rate (CTR) prediction of crucial importance to both business effectiveness and user experience. Existing algorithms usually extract visual features using off-the-shelf Convolutional Neural Networks (CNNs) and late fuse the visual and non-visual features for the finally predicted CTR. Despite being extensively studied, this field still face two key challenges. First, although encouraging progress has been made in offline studies, applying CNNs in real systems remains non-trivial, due to the strict requirements for efficient end-to-end training and low-latency online serving. Second, the off-the-shelf CNNs and late fusion architectures are suboptimal. Specifically, off-the-shelf CNNs were designed for classification thus never take categories as input features. While in e-commerce, categories are precisely labeled and contain abundant visual priors that will help the visual modeling. Unaware of the ad category, these CNNs may extract some unnecessary category-unrelated features, wasting CNN's limited expression ability. To overcome the two challenges, we propose Category-specific CNN (CSCNN) specially for CTR prediction. CSCNN early incorporates the category knowledge with a light-weighted attention-module on each convolutional layer. This enables CSCNN to extract expressive category-specific visual patterns that benefit the CTR prediction. Offline experiments on benchmark and a 10 billion scale real production dataset from JD, together with an Online A/B test show that CSCNN outperforms all compared state-of-the-art algorithms.

preprint2020arXiv

Efficient Independent Vector Extraction of Dominant Target Speech

The complete decomposition performed by blind source separation is computationally demanding and superfluous when only the speech of one specific target speaker is desired. In this paper, we propose a computationally efficient blind speech extraction method based on a proper modification of the commonly utilized independent vector analysis algorithm, under the mild assumption that the average power of signal of interest outweighs interfering speech sources. Considering that the minimum distortion principle cannot be implemented since the full demixing matrix is not available, we also design a one-unit scaling operation to solve the scaling ambiguity. Simulations validate the efficacy of the proposed method in extracting the dominant speech.

preprint2020arXiv

Entanglement of Two Jaynes-Cummings Atoms In Single Excitation Space

We study the entanglement dynamics of two atoms coupled to their own Jaynes-Cummings cavities in single-excitation space. Here we use the concurrence to measure the atomic entanglement. And the partial Bell states as initial states are considered. Our analysis suggests that there exist collapses and recovers in the entanglement dynamics. The physical mechanism behind the entanglement dynamics is the periodical information and energy exchange between atoms and light fields. For the initial Partial Bell states, only if the ratio of two atom-cavity coupling strengths is a rational number, the evolutionary periodicity of the atomic entanglement can be found. And whether there is time translation between two kinds of initial partial Bell state cases depends on the odd-even number of the coupling strength ratio.

preprint2020arXiv

Nonlinear Residual Echo Suppression Based on Multi-stream Conv-TasNet

Acoustic echo cannot be entirely removed by linear adaptive filters due to the nonlinear relationship between the echo and far-end signal. Usually a post processing module is required to further suppress the echo. In this paper, we propose a residual echo suppression method based on the modification of fully convolutional time-domain audio separation network (Conv-TasNet). Both the residual signal of the linear acoustic echo cancellation system, and the output of the adaptive filter are adopted to form multiple streams for the Conv-TasNet, resulting in more effective echo suppression while keeping a lower latency of the whole system. Simulation results validate the efficacy of the proposed method in both single-talk and double-talk situations.

preprint2020arXiv

Object-QA: Towards High Reliable Object Quality Assessment

In object recognition applications, object images usually appear with different quality levels. Practically, it is very important to indicate object image qualities for better application performance, e.g. filtering out low-quality object image frames to maintain robust video object recognition results and speed up inference. However, no previous works are explicitly proposed for addressing the problem. In this paper, we define the problem of object quality assessment for the first time and propose an effective approach named Object-QA to assess high-reliable quality scores for object images. Concretely, Object-QA first employs a well-designed relative quality assessing module that learns the intra-class-level quality scores by referring to the difference between object images and their estimated templates. Then an absolute quality assessing module is designed to generate the final quality scores by aligning the quality score distributions in inter-class. Besides, Object-QA can be implemented with only object-level annotations, and is also easily deployed to a variety of object recognition tasks. To our best knowledge this is the first work to put forward the definition of this problem and conduct quantitative evaluations. Validations on 5 different datasets show that Object-QA can not only assess high-reliable quality scores according with human cognition, but also improve application performance.

preprint2020arXiv

U-net Based Direct-path Dominance Test for Robust Direction-of-arrival Estimation

It has been noted that the identification of the time-frequency bins dominated by the contribution from the direct propagation of the target speaker can significantly improve the robustness of the direction-of-arrival estimation. However, the correct extraction of the direct-path sound is challenging especially in adverse environments. In this paper, a U-net based direct-path dominance test method is proposed. Exploiting the efficient segmentation capability of the U-net architecture, the direct-path information can be effectively retrieved from a dedicated multi-task neural network. Moreover, the training and inference of the neural network only need the input of a single microphone, circumventing the problem of array-structure dependence faced by common end-to-end deep learning based methods. Simulations demonstrate that significantly higher estimation accuracy can be achieved in high reverberant and low signal-to-noise ratio environments.

preprint2020arXiv

Valley pseudospin in monolayer MoSi2N4 and MoSi2As4

For a long time, two-dimensional (2D) hexagonal MoS2 was proposed as a promising material for valleytronic system. However, the limited size of growth and low carrier motilities in MoS2 restrict its further application. Very recently, a new kind of hexagonal 2D MXene, MoSi2N4, was successfully synthesized with large size, excellent ambient stability, and considerable hole mobility. In this paper, based on the first-principles calculations, we predict that the valley-contrast properties can be realized in monolayer MoSi2N4 and its derivative MoSi2As4. Beyond the traditional two-level valleys, the valleys in monolayer MoSi2As4 are multiple-folded, implying a new valley dimension. Such multiple-folded valleys can be described by a three-band low-power Hamiltonian. This study presents the theoretical advance and the potential applications of monolayer MoSi2N4 and MoSi2As4 in valleytronic devices, especially multiple information processing.

preprint2019arXiv

Correlating the Electronic Structures of Metallic/Semiconductor MoTe2 Interface to its Atomic Structures

Contact interface properties are important in determining the performances of devices based on atomically thin two-dimensional (2D) materials, especially those with short channels. Understanding the contact interface is therefore quite important to design better devices. Herein, we use scanning transmission electron microscopy, electron energy loss spectroscopy, and first-principles calculations to reveal the electronic structures within the metallic (1T')-semiconducting (2H) MoTe2 coplanar phase boundary across a wide spectral range and correlate its properties and atomic structure. We find that the 2H-MoTe2 excitonic peaks cross the phase boundary into the 1T' phase within a range of approximately 150 nm. The 1T'-MoTe2 crystal field can penetrate the boundary and extend into the 2H phase by approximately two unit cells. The plasmonic oscillations exhibit strong angle dependence, i.e., a red-shift (approximately 0.3 eV-1.2 eV) occurs within 4 nm at 1T'/2H-MoTe2 boundaries with large tilt angles, but there is no shift at zero-tilted boundaries. These atomic-scale measurements reveal the structure-property relationships of 1T'/2H-MoTe2 boundary, providing useful information for phase boundary engineering and device development based on 2D materials.