Source author record

Hu Xu

Hu Xu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language physics.chem-ph cond-mat.mtrl-sci Artificial Intelligence cond-mat.mes-hall Machine Learning Computer Vision eess.AS physics.comp-ph Sound

Catalog footprint

What is connected

16works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

CiT: Curation in Training for Effective Vision-Language Data

Large vision-language models are generally applicable to many downstream tasks, but come at an exorbitant training cost that only large institutions can afford. This paper trades generality for efficiency and presents Curation in Training (CiT), a simple and efficient vision-text learning algorithm that couples a data objective into training. CiT automatically yields quality data to speed-up contrastive image-text training and alleviates the need for an offline data filtering pipeline, allowing broad data sources (including raw image-text pairs from the web). CiT contains two loops: an outer loop curating the training data and an inner loop consuming the curated training data. The text encoder connects the two loops. Given metadata for tasks of interest, e.g., class names, and a large pool of image-text pairs, CiT alternatively selects relevant training data from the pool by measuring the similarity of their text embeddings and embeddings of the metadata. In our experiments, we observe that CiT can speed up training by over an order of magnitude, especially if the raw data size is large.

preprint2023arXiv

Masked Autoencoders that Listen

This paper studies a simple extension of image-based Masked Autoencoders (MAE) to self-supervised representation learning from audio spectrograms. Following the Transformer encoder-decoder design in MAE, our Audio-MAE first encodes audio spectrogram patches with a high masking ratio, feeding only the non-masked tokens through encoder layers. The decoder then re-orders and decodes the encoded context padded with mask tokens, in order to reconstruct the input spectrogram. We find it beneficial to incorporate local window attention in the decoder, as audio spectrograms are highly correlated in local time and frequency bands. We then fine-tune the encoder with a lower masking ratio on target datasets. Empirically, Audio-MAE sets new state-of-the-art performance on six audio and speech classification tasks, outperforming other recent models that use external supervised pre-training. The code and models will be at https://github.com/facebookresearch/AudioMAE.

preprint2022arXiv

CM3: A Causal Masked Multimodal Model of the Internet

We introduce CM3, a family of causally masked generative models trained over a large corpus of structured multi-modal documents that can contain both text and image tokens. Our new causally masked approach generates tokens left to right while also masking out a small number of long token spans that are generated at the end of the string, instead of their original positions. The casual masking object provides a type of hybrid of the more common causal and masked language models, by enabling full generative modeling while also providing bidirectional context when generating the masked spans. We train causally masked language-image models on large-scale web and Wikipedia articles, where each document contains all of the text, hypertext markup, hyperlinks, and image tokens (from a VQVAE-GAN), provided in the order they appear in the original HTML source (before masking). The resulting CM3 models can generate rich structured, multi-modal outputs while conditioning on arbitrary masked document contexts, and thereby implicitly learn a wide range of text, image, and cross modal tasks. They can be prompted to recover, in a zero-shot fashion, the functionality of models such as DALL-E, GENRE, and HTLM. We set the new state-of-the-art in zero-shot summarization, entity linking, and entity disambiguation while maintaining competitive performance in the fine-tuning setting. We can generate images unconditionally, conditioned on text (like DALL-E) and do captioning all in a zero-shot setting with a single model.

preprint2022arXiv

Identifying the Alloy Structures of Germanene Grown on Al(111) Surface

While the growth of germanene has been claimed on many substrates, the exact crystal structures remain controversial. Here, we systematically explore the possible structures formed by Ge deposition onto Al(111) surface by combining density-functional theory (DFT) and global optimization algorithm. We show that, by high-level random-phase approximation (RPA) calculations, the formation of germanene on Al(111) is energetically unfavorable with positive formation energy. The two experimental phases are identified as honeycomb alloys Al3Ge3/Al(111)(r7xr7) and Al3Ge4/Al(111)(3x3), by combining ab initio evolutionary simulations, RPA calculations, and available experimental data from scanning tunneling microscopy (STM) and low-energy electron diffraction (LEED). Al3Ge4/Al(111)(3x3) is an interesting structure with a vacancy in the substrate, which accounts for the dark clover pattern in the experimental STM image. Our results clarify the structural controversy of the Ge/Al(111) system and indicate the fabrication of germanene may remain challenging.

preprint2022arXiv

Zero-Shot Aspect-Based Sentiment Analysis

Aspect-based sentiment analysis (ABSA) typically requires in-domain annotated data for supervised training/fine-tuning. It is a big challenge to scale ABSA to a large number of new domains. This paper aims to train a unified model that can perform zero-shot ABSA without using any annotated data for a new domain. We propose a method called contrastive post-training on review Natural Language Inference (CORN). Later ABSA tasks can be cast into NLI for zero-shot transfer. We evaluate CORN on ABSA tasks, ranging from aspect extraction (AE), aspect sentiment classification (ASC), to end-to-end aspect-based sentiment analysis (E2E ABSA), which show ABSA can be conducted without any human annotated ABSA data.

preprint2020arXiv

DomBERT: Domain-oriented Language Model for Aspect-based Sentiment Analysis

This paper focuses on learning domain-oriented language models driven by end tasks, which aims to combine the worlds of both general-purpose language models (such as ELMo and BERT) and domain-specific language understanding. We propose DomBERT, an extension of BERT to learn from both in-domain corpus and relevant domain corpora. This helps in learning domain language models with low-resources. Experiments are conducted on an assortment of tasks in aspect-based sentiment analysis, demonstrating promising results.

preprint2020arXiv

User Memory Reasoning for Conversational Recommendation

We study a conversational recommendation model which dynamically manages users' past (offline) preferences and current (online) requests through a structured and cumulative user memory knowledge graph, to allow for natural interactions and accurate recommendations. For this study, we create a new Memory Graph (MG) <--> Conversational Recommendation parallel corpus called MGConvRex with 7K+ human-to-human role-playing dialogs, grounded on a large-scale user memory bootstrapped from real-world user scenarios. MGConvRex captures human-level reasoning over user memory and has disjoint training/testing sets of users for zero-shot (cold-start) reasoning for recommendation. We propose a simple yet expandable formulation for constructing and updating the MG, and a reasoning model that predicts optimal dialog policies and recommendation items in unconstrained graph space. The prediction of our proposed model inherits the graph structure, providing a natural way to explain the model's recommendation. Experiments are conducted for both offline metrics and online simulation, showing competitive results.

preprint2016arXiv

CER: Complementary Entity Recognition via Knowledge Expansion on Large Unlabeled Product Reviews

Product reviews contain a lot of useful information about product features and customer opinions. One important product feature is the complementary entity (products) that may potentially work together with the reviewed product. Knowing complementary entities of the reviewed product is very important because customers want to buy compatible products and avoid incompatible ones. In this paper, we address the problem of Complementary Entity Recognition (CER). Since no existing method can solve this problem, we first propose a novel unsupervised method to utilize syntactic dependency paths to recognize complementary entities. Then we expand category-level domain knowledge about complementary entities using only a few general seed verbs on a large amount of unlabeled reviews. The domain knowledge helps the unsupervised method to adapt to different products and greatly improves the precision of the CER task. The advantage of the proposed method is that it does not require any labeled data for training. We conducted experiments on 7 popular products with about 1200 reviews in total to demonstrate that the proposed approach is effective.

preprint2016arXiv

Mining Compatible/Incompatible Entities from Question and Answering via Yes/No Answer Classification using Distant Label Expansion

Product Community Question Answering (PCQA) provides useful information about products and their features (aspects) that may not be well addressed by product descriptions and reviews. We observe that a product's compatibility issues with other products are frequently discussed in PCQA and such issues are more frequently addressed in accessories, i.e., via a yes/no question "Does this mouse work with windows 10?". In this paper, we address the problem of extracting compatible and incompatible products from yes/no questions in PCQA. This problem can naturally have a two-stage framework: first, we perform Complementary Entity (product) Recognition (CER) on yes/no questions; second, we identify the polarities of yes/no answers to assign the complementary entities a compatibility label (compatible, incompatible or unknown). We leverage an existing unsupervised method for the first stage and a 3-class classifier by combining a distant PU-learning method (learning from positive and unlabeled examples) together with a binary classifier for the second stage. The benefit of using distant PU-learning is that it can help to expand more implicit yes/no answers without using any human annotated data. We conduct experiments on 4 products to show that the proposed method is effective.

preprint2016arXiv

Spontaneous dehydrogenation of methanol over defect-free MgO(100) thin film deposited on molybdenum

The dehydrogenation reaction of methanol on metal supported MgO(100) films has been studied by employing periodic density functional calculations. As far as we know, the dehydrogenation of single methanol molecule over inert oxide insulators such as MgO has never been realized before without the introduction of defects and low coordinated atoms. By depositing the very thin oxide films on Mo substrate we have successfully obtained the dissociative state of methanol. The dehydrogenation reaction is energetically exothermic and nearly barrierless. The metal supported thin oxide films studied here provide a versatile approach to enhance the activity and properties of oxides.

preprint2016arXiv

Supervised Opinion Aspect Extraction by Exploiting Past Extraction Results

One of the key tasks of sentiment analysis of product reviews is to extract product aspects or features that users have expressed opinions on. In this work, we focus on using supervised sequence labeling as the base approach to performing the task. Although several extraction methods using sequence labeling methods such as Conditional Random Fields (CRF) and Hidden Markov Models (HMM) have been proposed, we show that this supervised approach can be significantly improved by exploiting the idea of concept sharing across multiple domains. For example, "screen" is an aspect in iPhone, but not only iPhone has a screen, many electronic devices have screens too. When "screen" appears in a review of a new domain (or product), it is likely to be an aspect too. Knowing this information enables us to do much better extraction in the new domain. This paper proposes a novel extraction method exploiting this idea in the context of supervised sequence labeling. Experimental results show that it produces markedly better results than without using the past information.

preprint2015arXiv

Generation of highly reactive oxygen species by co-adsorption of oxygen and water on metal-supported MgO(100) thinfilms

The formation of highly reactive oxygen species (ROS) on metal oxide surfaces have attracted considerable interest due to their diverse applications. In this work, we have performed densi-ty-functional theory calculations to investigate the co-adsorption of oxygen and water on ul-trathin MgO(100) films deposited on Mo(100) substrate. We reveal that the molecular oxygen can be stepwise decomposed completely with the assistance of water. Consequently, a series of highly ROS including superoxide, hydroperoxide, hydroxyl and single oxygen adatom are formed on Mo(100) supported MgO(100) thinfilms. The reaction barriers accompanied by the generation of ROS are reported, and the influence of the thickness of MgO(100) films is also discussed. The most promising routes to produce these fascinating species provide valuable information to understand the importance of synergistic effect, namely the substrate, the co-adorbed species, and the film thickness, in multiphase catalyst design.

preprint2015arXiv

Strain-Induced Water Dissociation on Supported Ultrathin Oxide Films

Controlling the dissociation of single water molecule on an insulating surface plays a crucial role in many catalytic reactions. In this Letter, we have identified the enhanced chemical reactivity of ultrathin MgO(100) films deposited on Mo(100) substrate that causes water dissociation. We reveal that the ability to split water on insulating surface closely depends on the lattice mismatch between ultrathin films and the underlying substrate, and substrate-induced in-plane tensile strain dramatically results in water dissociation on MgO(100). Three dissociative adsorption configurations of water with lower energy are predicted, and the structural transition going from molecular form to dissociative form is almost barrierless. Our results provide an effective avenue to achieve water dissociation at the single-molecule level and shed light on how to tune the chemical reactions of insulating surfaces by choosing the suitable substrates.

preprint2015arXiv

Surface energy calculations from Zinc blende (111)/(-1-1-1) to Wurtzite (0001)/(000-1):a study of ZnO and GaN

The accurate absolute surface energies of (0001)/(000-1) surfaces of wurtzite structures are crucial in determining the thin film growth mode of important energy materials. However, the surface energies still remain to be solved due to the intrinsic difficulty of calculating dangling bond energy of asymmetrically bonded surface atoms. In this study, we used a pseudo-hydrogen passivation method to estimate the dangling bond energy and calculate the polar surfaces of ZnO and GaN. The calculations were based on the pseudo chemical potentials obtained from a set of tetrahedral clusters or simple pseudo-molecules, using density functional theory approaches. And the surface energies of (0001)/(000-1) surfaces of wurtzite ZnO and GaN we obtained showed relatively high self-consistencies. A wedge structure calculation with a new bottom surface passivation scheme of group I and group VII elements was also proposed and performed to show converged absolute surface energy of wurtzite ZnO polar surfaces, and the result were also compared with the above method. These calculations and comparisons may provide important insights to crystal growths of the above materials, thereby leading to significant performance enhancements of semiconductor devices.

preprint2015arXiv

Unusual dissociative adsorption of H2 over stoichiometric MgO thin film supported on molybdenum

The dissociation of a hydrogen molecule on MgO(001) films deposited on Mo(001) surface is investigated systematically using periodic density-functional theory method. The unusual adsorption behavior of heterolytic dissociative hydrogen molecule at neighboring surface oxygen and surface magnesium, is clarified here. To my knowledge, this heterolytic dissociative state has never been found before on bulk MgO(001) or metal supported MgO(001) surfaces. The results confirm that, in all cases, the heterolytic dissociation is much more favorable that homolytic dissociation both energetically and kinetically. The energy difference between two dissociative states are very large, in the range of 1.1 eV ~ 1.5 eV for Mo supported 1 ML ~ 3 ML oxide films, which inhibits, to a great extent, the homolytic dissociation in the respect of reaction thermodynamics. The energy barrier of heterolytic dissociation are about 0.5 eV, much lower that the barrier of homolytic dissociation. The transformation reaction on thick films will be more endothermic. Passing through heterolytic dissociation state have significantly lowered the reaction heat and the energy barrier for obtaining homolytic dissociative structure, which make the homolytic splitting of H2 easier on 2 ML oxide films. The results provides a useful strategy for enhancing the reactivity of the nonreducible metal oxide.

preprint2010arXiv

Stability of hydrogenated group-IV nanostructures: magic structures of diamond nanocrystals and Silicon quantum dots

We have developed an effective model to investigate the energetic stability of hydrogenated group-IV nanostructures, followed by validations from first-principles calculations. It is found that the Hamiltonian of X$_{m}$H$_{n}$ (X=C, Si, Ge and Sn) can be expressed analytically by a linear combination of the atom numbers ($m$, $n$), indicating a dominating contribution of X$-$X and X$-$H local interactions. As a result, we explain the stable nanostructures observed experimentally, and provide a reliable and efficient technique of searching the magic structures of diamond nanocrystals(Dia-NCs) and Silicon quantum dots(SiQDs).

Hu Xu

What is connected

Connect this record

See the researcher in context

Building this map preview

16 published item(s)

CiT: Curation in Training for Effective Vision-Language Data

Masked Autoencoders that Listen

CM3: A Causal Masked Multimodal Model of the Internet

Identifying the Alloy Structures of Germanene Grown on Al(111) Surface

Zero-Shot Aspect-Based Sentiment Analysis

DomBERT: Domain-oriented Language Model for Aspect-based Sentiment Analysis

User Memory Reasoning for Conversational Recommendation

CER: Complementary Entity Recognition via Knowledge Expansion on Large Unlabeled Product Reviews

Mining Compatible/Incompatible Entities from Question and Answering via Yes/No Answer Classification using Distant Label Expansion

Spontaneous dehydrogenation of methanol over defect-free MgO(100) thin film deposited on molybdenum

Supervised Opinion Aspect Extraction by Exploiting Past Extraction Results

Generation of highly reactive oxygen species by co-adsorption of oxygen and water on metal-supported MgO(100) thinfilms

Strain-Induced Water Dissociation on Supported Ultrathin Oxide Films

Surface energy calculations from Zinc blende (111)/(-1-1-1) to Wurtzite (0001)/(000-1):a study of ZnO and GaN

Unusual dissociative adsorption of H2 over stoichiometric MgO thin film supported on molybdenum

Stability of hydrogenated group-IV nanostructures: magic structures of diamond nanocrystals and Silicon quantum dots