Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
13works
0followers
16topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

13 published item(s)

preprint2022arXiv

HQANN: Efficient and Robust Similarity Search for Hybrid Queries with Structured and Unstructured Constraints

The in-memory approximate nearest neighbor search (ANNS) algorithms have achieved great success for fast high-recall query processing, but are extremely inefficient when handling hybrid queries with unstructured (i.e., feature vectors) and structured (i.e., related attributes) constraints. In this paper, we present HQANN, a simple yet highly efficient hybrid query processing framework which can be easily embedded into existing proximity graph-based ANNS algorithms. We guarantee both low latency and high recall by leveraging navigation sense among attributes and fusing vector similarity search with attribute filtering. Experimental results on both public and in-house datasets demonstrate that HQANN is 10x faster than the state-of-the-art hybrid ANNS solutions to reach the same recall quality and its performance is hardly affected by the complexity of attributes. It can reach 99\% recall@10 in just around 50 microseconds On GLOVE-1.2M with thousands of attribute constraints.

preprint2022arXiv

Vocalsound: A Dataset for Improving Human Vocal Sounds Recognition

Recognizing human non-speech vocalizations is an important task and has broad applications such as automatic sound transcription and health condition monitoring. However, existing datasets have a relatively small number of vocal sound samples or noisy labels. As a consequence, state-of-the-art audio event classification models may not perform well in detecting human vocal sounds. To support research on building robust and accurate vocal sound recognition, we have created a VocalSound dataset consisting of over 21,000 crowdsourced recordings of laughter, sighs, coughs, throat clearing, sneezes, and sniffs from 3,365 unique subjects. Experiments show that the vocal sound recognition performance of a model can be significantly improved by 41.9% by adding VocalSound dataset to an existing dataset as training material. In addition, different from previous datasets, the VocalSound dataset contains meta information such as speaker age, gender, native language, country, and health condition.

preprint2021arXiv

Deep Learning for Distinguishing Normal versus Abnormal Chest Radiographs and Generalization to Unseen Diseases

Chest radiography (CXR) is the most widely-used thoracic clinical imaging modality and is crucial for guiding the management of cardiothoracic conditions. The detection of specific CXR findings has been the main focus of several artificial intelligence (AI) systems. However, the wide range of possible CXR abnormalities makes it impractical to build specific systems to detect every possible condition. In this work, we developed and evaluated an AI system to classify CXRs as normal or abnormal. For development, we used a de-identified dataset of 248,445 patients from a multi-city hospital network in India. To assess generalizability, we evaluated our system using 6 international datasets from India, China, and the United States. Of these datasets, 4 focused on diseases that the AI was not trained to detect: 2 datasets with tuberculosis and 2 datasets with coronavirus disease 2019. Our results suggest that the AI system generalizes to new patient populations and abnormalities. In a simulated workflow where the AI system prioritized abnormal cases, the turnaround time for abnormal cases reduced by 7-28%. These results represent an important step towards evaluating whether AI can be safely used to flag cases in a general setting where previously unseen abnormalities exist.

preprint2021arXiv

Distribution of ripples in graphene membrane

Intrinsic ripples with various configurations and sizes were reported to affect the physical and chemical properties of 2D materials. By performing molecular dynamics simulations and theoretical analysis, we use two geometric models of the ripple shape to explore numerically the distribution of ripples in graphene membrane. We focus on the ratio of ripple height to its diameter (t/D) which was recently shown to be the most relevant for chemical activity of graphene membranes. Our result demonstrates that the ripple density decreases as the coefficient t/D increases, in a qualitative agreement with the Boltzmann distribution derived analytically from the bending energy of the membrane. Our theoretical study provides also specific quantitative information on the ripple distribution in graphene and gives new insights applicable to other 2D materials.

preprint2021arXiv

Electronic and Optical properties of transition metal dichalcogenides under symmetric and asymmetric field-effect doping

Doping via electrostatic gating is a powerful and widely used technique to tune the electron densities in layered materials. The microscopic details of how these setups affect the layered material are, however, subtle and call for careful theoretical treatments. Using semiconducting monolayers of transition metal dichalcogenides (TMDs) as prototypical systems affected by electrostatic gating, we show that the electronic and optical properties change indeed dramatically when the gating geometry is properly taken into account. This effect is implemented by a self-consistent calculation of the Coulomb interaction between the charges in different sub-layers within the tight-binding approximation. Thereby we consider both, single- and double-sided gating. Our results show that, at low doping levels of $10^{13}$ cm$^{-2}$, the electronic bands of monolayer TMDs shift rigidly for both types of gating, and subsequently undergo a Lifshitz transition. When approaching the doping level of $10^{14}$ cm$^{-2}$, the band structure changes dramatically, especially in the case of single-sided gating where we find that monolayer \ce{MoS2} and \ce{WS2} become indirect gap semiconductors. The optical conductivities calculated within linear response theory also show clear signatures of these doping-induced band structure renormalizations. Our numerical results based on light-weighted tight-binding models indicate the importance of electronic screening in doped layered structures, and pave the way for further understanding gated super-lattice structures formed by mutlilayers with extended Moiré pattern.

preprint2021arXiv

From Machine Learning to Transfer Learning in Laser-Induced Breakdown Spectroscopy: the Case of Rock Analysis for Mars Exploration

With the ChemCam instrument, laser-induced breakdown spectroscopy (LIBS) has successively contributed to Mars exploration by determining elemental compositions of the soil, crust and rocks. Two new lunched missions, Chinese Tianwen 1 and American Perseverance, will further increase the number of LIBS instruments on Mars after the planned landings in spring 2021. Such unprecedented situation requires a reinforced research effort on the methods of LIBS spectral data treatment. Although the matrix effects correspond to a general issue in LIBS, they become accentuated in the case of rock analysis for Mars exploration, because of the large variation of rock composition leading to the chemical matrix effect, and the difference in morphology between laboratory standard samples (in pressed pellet, glass or ceramics) used to establish calibration models and natural rocks encountered on Mars, leading to the physical matric effect. The chemical matrix effect has been tackled in the ChemCam project with large sets of laboratory standard samples offering a good representation of various compositions of Mars rocks. The present work deals with the physical matrix effect which is still expecting a satisfactory solution. The approach consists in introducing transfer learning in LIBS data treatment. For the specific case of total alkali-silica (TAS) classification of natural rocks, the results show a significant improvement of the prediction capacity of pellet sample-based models when trained together with suitable information from rocks in a procedure of transfer learning. The correct classification rate of rocks increases from 33.3% with a machine learning model to 83.3% with a transfer learning model.

preprint2021arXiv

Two-Phase Dynamics of DNA Supercoiling based on DNA Polymer Physics

DNA supercoils are generated in genome regulation processes such as transcription and replication, and provide mechanical feedback to such processes. Under tension, DNA supercoil can present a coexistence state of plectonemic (P) and stretched (S) phases. Experiments have revealed the dynamic behaviors of plectoneme, e.g. diffusion, nucleation and hopping. To represent these dynamics with computational changes, we demonstrated first the fast dynamics on the DNA to reach torque equilibrium within the P and S phases, and then identified the two-phase boundaries as collective slow variables to describe the essential dynamics. According to the time scale separation demonstrated here, we accordingly developed a two-phase model on the dynamics of DNA supercoiling, which can capture physiologically relevant events across time scales of several orders of magnitudes. In this model, we systematically characterized the slow dynamics between the two phases, and compared the numerical results with that from the DNA polymer physics-based worm-like chain model. The supercoiling dynamics, including the nucleation, diffusion, and hopping of plectoneme, have been well represented and reproduced, using the two-phase dynamic model, at trivial computational costs. Our current developments, therefore, can be implemented to explore multi-scale physical mechanisms of the DNA supercoiling-dependent physiological processes.

preprint2020arXiv

A Multi-Semantic Metapath Model for Large Scale Heterogeneous Network Representation Learning

Network Embedding has been widely studied to model and manage data in a variety of real-world applications. However, most existing works focus on networks with single-typed nodes or edges, with limited consideration of unbalanced distributions of nodes and edges. In real-world applications, networks usually consist of billions of various types of nodes and edges with abundant attributes. To tackle these challenges, in this paper we propose a multi-semantic metapath (MSM) model for large scale heterogeneous representation learning. Specifically, we generate multi-semantic metapath-based random walks to construct the heterogeneous neighborhood to handle the unbalanced distributions and propose a unified framework for the embedding learning. We conduct systematical evaluations for the proposed framework on two challenging datasets: Amazon and Alibaba. The results empirically demonstrate that MSM can achieve relatively significant gains over previous state-of-arts on link prediction.

preprint2020arXiv

Comprehensive Information Integration Modeling Framework for Video Titling

In e-commerce, consumer-generated videos, which in general deliver consumers' individual preferences for the different aspects of certain products, are massive in volume. To recommend these videos to potential consumers more effectively, diverse and catchy video titles are critical. However, consumer-generated videos seldom accompany appropriate titles. To bridge this gap, we integrate comprehensive sources of information, including the content of consumer-generated videos, the narrative comment sentences supplied by consumers, and the product attributes, in an end-to-end modeling framework. Although automatic video titling is very useful and demanding, it is much less addressed than video captioning. The latter focuses on generating sentences that describe videos as a whole while our task requires the product-aware multi-grained video analysis. To tackle this issue, the proposed method consists of two processes, i.e., granular-level interaction modeling and abstraction-level story-line summarization. Specifically, the granular-level interaction modeling first utilizes temporal-spatial landmark cues, descriptive words, and abstractive attributes to builds three individual graphs and recognizes the intra-actions in each graph through Graph Neural Networks (GNN). Then the global-local aggregation module is proposed to model inter-actions across graphs and aggregate heterogeneous graphs into a holistic graph representation. The abstraction-level story-line summarization further considers both frame-level video features and the holistic graph to utilize the interactions between products and backgrounds, and generate the story-line topic of the video. We collect a large-scale dataset accordingly from real-world data in Taobao, a world-leading e-commerce platform, and will make the desensitized version publicly available to nourish further development of the research community...

preprint2020arXiv

Grounded and Controllable Image Completion by Incorporating Lexical Semantics

In this paper, we present an approach, namely Lexical Semantic Image Completion (LSIC), that may have potential applications in art, design, and heritage conservation, among several others. Existing image completion procedure is highly subjective by considering only visual context, which may trigger unpredictable results which are plausible but not faithful to a grounded knowledge. To permit both grounded and controllable completion process, we advocate generating results faithful to both visual and lexical semantic context, i.e., the description of leaving holes or blank regions in the image (e.g., hole description). One major challenge for LSIC comes from modeling and aligning the structure of visual-semantic context and translating across different modalities. We term this process as structure completion, which is realized by multi-grained reasoning blocks in our model. Another challenge relates to the unimodal biases, which occurs when the model generates plausible results without using the textual description. This can be true since the annotated captions for an image are often semantically equivalent in existing datasets, and thus there is only one paired text for a masked image in training. We devise an unsupervised unpaired-creation learning path besides the over-explored paired-reconstruction path, as well as a multi-stage training strategy to mitigate the insufficiency of labeled data. We conduct extensive quantitative and qualitative experiments as well as ablation studies, which reveal the efficacy of our proposed LSIC.

preprint2020arXiv

Poet: Product-oriented Video Captioner for E-commerce

In e-commerce, a growing number of user-generated videos are used for product promotion. How to generate video descriptions that narrate the user-preferred product characteristics depicted in the video is vital for successful promoting. Traditional video captioning methods, which focus on routinely describing what exists and happens in a video, are not amenable for product-oriented video captioning. To address this problem, we propose a product-oriented video captioner framework, abbreviated as Poet. Poet firstly represents the videos as product-oriented spatial-temporal graphs. Then, based on the aspects of the video-associated product, we perform knowledge-enhanced spatial-temporal inference on those graphs for capturing the dynamic change of fine-grained product-part characteristics. The knowledge leveraging module in Poet differs from the traditional design by performing knowledge filtering and dynamic memory modeling. We show that Poet achieves consistent performance improvement over previous methods concerning generation quality, product aspects capturing, and lexical diversity. Experiments are performed on two product-oriented video captioning datasets, buyer-generated fashion video dataset (BFVD) and fan-generated fashion video dataset (FFVD), collected from Mobile Taobao. We will release the desensitized datasets to promote further investigations on both video captioning and general video analysis problems.

preprint2020arXiv

Strain-induced semiconductor to metal transition in MA2Z4 bilayers

Very recently, a new type of two-dimensional layered material MoSi2N4 has been fabricated, which is semiconducting with weak interlayer interaction, high strength, and excellent stability. We systematically investigate theoretically the effect of vertical strain on the electronic structure of MA2Z4 (M=Ti/Cr/Mo, A=Si, Z=N/P) bilayers. Taking bilayer MoSi2N4 as an example, our first principle calculations show that its indirect band gap decreases monotonically as the vertical compressive strain increases. Under a critical strain around 22%, it undergoes a transition from semiconductor to metal. We attribute this to the opposite energy shift of states in different layers, which originates from the built-in electric field induced by the asymmetric charge transfer between two inner sublayers near the interface. Similar semiconductor to metal transitions are observed in other strained MA2Z4 bilayers, and the estimated critical pressures to realize such transitions are within the same order as semiconducting transition metal dichalcogenides. The semiconductor to metal transitions observed in the family of MA2Z4 bilayers present interesting possibilities for strain-induced engineering of their electronic properties.

preprint2019arXiv

Tunable magneto-optical properties of single-layer tin diselenide: From GW approximation to large-scale tight-binding calculations

A parameterized tight-binding (TB) model based on the first-principles GW calculations is developed for single layer tin diselenide (SnSe$_2$) and used to study its electronic and optical properties under external magnetic field. The truncated model is derived from six maximally localized wannier orbitals on Se site, which accurately describes the quasi-particle electronic states of single layer SnSe$_2$ in a wide energy range. The quasi-particle electronic states are dominated by the hoppings between nearest wannier orbitals ($t_1$-$t_6$). Our numerical calculation shows that, due to the electron-hole asymmetry, two sets of Landau Level spectrum are obtained when a perpendicular magnetic field is applied. The Landau Level spectrum follows linear dependence on the level index and magnetic field, exhibiting properties of two-dimensional electron gas in traditional semiconductors. The optical conductivity calculation shows that the optical gap is very close to the GW value, and can be tuned by external magnetic field. Our proposed TB model can be used for further exploring the electronic, optical, and transport properties of SnSe$_2$, especially in the presence of external magnetic fields.