Source author record

Huijun Liu

Huijun Liu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language cond-mat.mes-hall cond-mat.mtrl-sci Information Retrieval physics.chem-ph

Catalog footprint

What is connected

12works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A Context-Aware Approach for Textual Adversarial Attack through Probability Difference Guided Beam Search

Textual adversarial attacks expose the vulnerabilities of text classifiers and can be used to improve their robustness. Existing context-aware methods solely consider the gold label probability and use the greedy search when searching an attack path, often limiting the attack efficiency. To tackle these issues, we propose PDBS, a context-aware textual adversarial attack model using Probability Difference guided Beam Search. The probability difference is an overall consideration of all class label probabilities, and PDBS uses it to guide the selection of attack paths. In addition, PDBS uses the beam search to find a successful attack path, thus avoiding suffering from limited search space. Extensive experiments and human evaluation demonstrate that PDBS outperforms previous best models in a series of evaluation metrics, especially bringing up to a +19.5% attack success rate. Ablation studies and qualitative analyses further confirm the efficiency of PDBS.

preprint2022arXiv

A Two-Phase Paradigm for Joint Entity-Relation Extraction

An exhaustive study has been conducted to investigate span-based models for the joint entity and relation extraction task. However, these models sample a large number of negative entities and negative relations during the model training, which are essential but result in grossly imbalanced data distributions and in turn cause suboptimal model performance. In order to address the above issues, we propose a two-phase paradigm for the span-based joint entity and relation extraction, which involves classifying the entities and relations in the first phase, and predicting the types of these entities and relations in the second phase. The two-phase paradigm enables our model to significantly reduce the data distribution gap, including the gap between negative entities and other entities, as well as the gap between negative relations and other relations. In addition, we make the first attempt at combining entity type and entity distance as global features, which has proven effective, especially for the relation extraction. Experimental results on several datasets demonstrate that the spanbased joint extraction model augmented with the two-phase paradigm and the global features consistently outperforms previous state-of-the-art span-based models for the joint extraction task, establishing a new standard benchmark. Qualitative and quantitative analyses further validate the effectiveness the proposed paradigm and the global features.

preprint2022arXiv

Boosting Span-based Joint Entity and Relation Extraction via Squence Tagging Mechanism

Span-based joint extraction simultaneously conducts named entity recognition (NER) and relation extraction (RE) in text span form. Recent studies have shown that token labels can convey crucial task-specific information and enrich token semantics. However, as far as we know, due to completely abstain from sequence tagging mechanism, all prior span-based work fails to use token label in-formation. To solve this problem, we pro-pose Sequence Tagging enhanced Span-based Network (STSN), a span-based joint extrac-tion network that is enhanced by token BIO label information derived from sequence tag-ging based NER. By stacking multiple atten-tion layers in depth, we design a deep neu-ral architecture to build STSN, and each atten-tion layer consists of three basic attention units. The deep neural architecture first learns seman-tic representations for token labels and span-based joint extraction, and then constructs in-formation interactions between them, which also realizes bidirectional information interac-tions between span-based NER and RE. Fur-thermore, we extend the BIO tagging scheme to make STSN can extract overlapping en-tity. Experiments on three benchmark datasets show that our model consistently outperforms previous optimal models by a large margin, creating new state-of-the-art results.

preprint2022arXiv

Few-shot Named Entity Recognition with Entity-level Prototypical Network Enhanced by Dispersedly Distributed Prototypes

Few-shot named entity recognition (NER) enables us to build a NER system for a new domain using very few labeled examples. However, existing prototypical networks for this task suffer from roughly estimated label dependency and closely distributed prototypes, thus often causing misclassifications. To address the above issues, we propose EP-Net, an Entity-level Prototypical Network enhanced by dispersedly distributed prototypes. EP-Net builds entity-level prototypes and considers text spans to be candidate entities, so it no longer requires the label dependency. In addition, EP-Net trains the prototypes from scratch to distribute them dispersedly and aligns spans to prototypes in the embedding space using a space projection. Experimental results on two evaluation tasks and the Few-NERD settings demonstrate that EP-Net consistently outperforms the previous strong models in terms of overall performance. Extensive analyses further validate the effectiveness of EP-Net.

preprint2022arXiv

Topic-Grained Text Representation-based Model for Document Retrieval

Document retrieval enables users to find their required documents accurately and quickly. To satisfy the requirement of retrieval efficiency, prevalent deep neural methods adopt a representation-based matching paradigm, which saves online matching time by pre-storing document representations offline. However, the above paradigm consumes vast local storage space, especially when storing the document as word-grained representations. To tackle this, we present TGTR, a Topic-Grained Text Representation-based Model for document retrieval. Following the representation-based matching paradigm, TGTR stores the document representations offline to ensure retrieval efficiency, whereas it significantly reduces the storage requirements by using novel topicgrained representations rather than traditional word-grained. Experimental results demonstrate that compared to word-grained baselines, TGTR is consistently competitive with them on TREC CAR and MS MARCO in terms of retrieval accuracy, but it requires less than 1/10 of the storage space required by them. Moreover, TGTR overwhelmingly surpasses global-grained baselines in terms of retrieval accuracy.

preprint2022arXiv

Win-Win Cooperation: Bundling Sequence and Span Models for Named Entity Recognition

For Named Entity Recognition (NER), sequence labeling-based and span-based paradigms are quite different. Previous research has demonstrated that the two paradigms have clear complementary advantages, but few models have attempted to leverage these advantages in a single NER model as far as we know. In our previous work, we proposed a paradigm known as Bundling Learning (BL) to address the above problem. The BL paradigm bundles the two NER paradigms, enabling NER models to jointly tune their parameters by weighted summing each paradigm's training loss. However, three critical issues remain unresolved: When does BL work? Why does BL work? Can BL enhance the existing state-of-the-art (SOTA) NER models? To address the first two issues, we implement three NER models, involving a sequence labeling-based model--SeqNER, a span-based NER model--SpanNER, and BL-NER that bundles SeqNER and SpanNER together. We draw two conclusions regarding the two issues based on the experimental results on eleven NER datasets from five domains. We then apply BL to five existing SOTA NER models to investigate the third issue, consisting of three sequence labeling-based models and two span-based models. Experimental results indicate that BL consistently enhances their performance, suggesting that it is possible to construct a new SOTA NER system by incorporating BL into the current SOTA system. Moreover, we find that BL reduces both entity boundary and type prediction errors. In addition, we compare two commonly used labeling tagging methods as well as three types of span semantic representations.

preprint2020arXiv

Artificial Intelligence for High-Throughput Discovery of Topological Insulators: the Example of Alloyed Tetradymites

Significant advances have been made in predicting new topological materials using high-throughput empirical descriptors or symmetry-based indicators. To date, these approaches have been applied to materials in existing databases, and are severely limited to systems with well-defined symmetries, leaving a much larger materials space unexplored. Using tetradymites as a prototypical class of examples, we uncover a novel two-dimensional descriptor by applying an artificial intelligence (AI) based approach for fast and reliable identification of the topological characters of a drastically expanded range of materials, without prior determination of their specific symmetries and detailed band structures. By leveraging this descriptor that contains only the atomic number and electronegativity of the constituent species, we have readily scanned a huge number of alloys in the tetradymite family. Strikingly, nearly half of which are identified to be topological insulators, revealing a much larger territory of the topological materials world. The present work also attests the increasingly important role of such AI-based approaches in modern materials discovery.

preprint2020arXiv

Screening potential topological insulators in half-Heusler compounds via compressed-sensing

Ternary half-Heusler compounds with widely tunable electronic structures, present a new platform to discover topological insulators. Due to time-consuming computations and synthesis procedures, the identification of new topological insulators is however a rough task. Here, we adopt a compressed-sensing approach to rapidly screen potential topological insulators in half-Heusler family, which is realized via a two-dimensional descriptor that only depends on the fundamental properties of the constituent atoms. Beyond the finite training data, the proposed descriptor is employed to screen many new half-Heusler compounds, including those with integer and fractional stoichiometry, and a larger number of possible topological insulators are predicted.

preprint2019arXiv

High thermoelectric performance of two-dimensional (PbTe)2 layer

The electronic, phonon and thermoelectric transport properties of (PbTe)2 layer are systematically investigated by using first-principles pseudopotential method and Boltzmann transport equation. Our calculations demonstrate that there is a valley degeneracy of six for the top valence band, which leads to larger carrier concentration and thus higher electrical conductivity without obvious reduction in the Seebeck coefficient. Moreover, the intrinsic van der Waals interactions between neighboring Pb layers induce additional phonon scattering and thus ultrasmall lattice thermal conductivity. As a consequence, a maximum p-type ZT value of 2.9 can be achieved at 1000 K. Moreover, we find almost identical n- and p-type ZT in the temperature range from 300 K to 800 K.

preprint2016arXiv

Maximizing the thermoelectric performance of topological insulator Bi2Te3 films in the few-quintuple layer regime

Using first-principles calculations and Boltzmann theory, we explore the feasibility to maximize the thermoelectric figure of merit (ZT) of topological insulator Bi2Te3 films in the few-quintuple layer regime. We discover that the delicate competitions between the surface and bulk contributions, coupled with the overall quantum size effects, lead to a novel and generic non-monotonous dependence of ZT on the film thickness. In particular, when the system crosses into the topologically non-trivial regime upon increasing the film thickness, the much longer surface relaxation time associated with the robust nature of the topological surface states results in a maximal ZT value, which can be further optimized to ~2.0 under physically realistic conditions. We also reveal the appealing potential of bridging the long-standing ZT asymmetry of p- and n-type Bi2Te3 systems.

preprint2016arXiv

Phonon-limited electrical transport properties of intermetallic compound YbAl3 from first-principles calculations

We combine first-principles calculations and Boltzmann transport theory to study the electrical transport properties of intermetallic compound YbAl3. To accurately predict the electronic relaxation time, we use the density functional perturbation theory and Wannier interpolation techniques which can effectively treat the electron-phonon scattering. Our calculated transport coefficients of YbAl3 are in reasonable agreement with the experimentally measured results. Strikingly, we discover that in evaluating the Seebeck coefficient of YbAl3, the scattering term has a larger contribution than the band term and should be explicitly considered in the calculations, especially for the case with localized bands near the Fermi level. Moreover, we demonstrate that by reducing the sample size to less than ~30 nm, the electronic thermal conductivity of YbAl3 can be sufficiently suppressed so that the thermoelectric figure of merit can be further enhanced.

preprint2012arXiv

Determination of the forms of calcium present in coal chars by Ca K-edge XANES with Synchrotron Radiation

This work is concerned with the Ca transformations during the pyrolysis of Ca(OH)2 or CaCO3-added coals. Ca K-edge X-ray absorption near edge structure (XANES) spectroscopy was applied to determine the forms of Ca in chars prepared from the pyrolysis of Ca-added coal. Results showed that Ca(OH)2 and CaSO4 existed in both the Ca(OH)2-added chars and the CaCO3-added chars, while CaS and CaO only existed in the chars prepared from the Ca(OH)2-added coal. Moreover, it was found that carboxyl Ca was formed during pyrolysis for either the Ca(OH)2-added coal or the CaCO3-added coals.

Huijun Liu

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

A Context-Aware Approach for Textual Adversarial Attack through Probability Difference Guided Beam Search

A Two-Phase Paradigm for Joint Entity-Relation Extraction

Boosting Span-based Joint Entity and Relation Extraction via Squence Tagging Mechanism

Few-shot Named Entity Recognition with Entity-level Prototypical Network Enhanced by Dispersedly Distributed Prototypes

Topic-Grained Text Representation-based Model for Document Retrieval

Win-Win Cooperation: Bundling Sequence and Span Models for Named Entity Recognition

Artificial Intelligence for High-Throughput Discovery of Topological Insulators: the Example of Alloyed Tetradymites

Screening potential topological insulators in half-Heusler compounds via compressed-sensing

High thermoelectric performance of two-dimensional (PbTe)2 layer

Maximizing the thermoelectric performance of topological insulator Bi2Te3 films in the few-quintuple layer regime

Phonon-limited electrical transport properties of intermetallic compound YbAl3 from first-principles calculations

Determination of the forms of calcium present in coal chars by Ca K-edge XANES with Synchrotron Radiation