Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
14works
0followers
14topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

14 published item(s)

preprint2024arXiv

DCR-Consistency: Divide-Conquer-Reasoning for Consistency Evaluation and Improvement of Large Language Models

Evaluating the quality and variability of text generated by Large Language Models (LLMs) poses a significant, yet unresolved research challenge. Traditional evaluation methods, such as ROUGE and BERTScore, which measure token similarity, often fail to capture the holistic semantic equivalence. This results in a low correlation with human judgments and intuition, which is especially problematic in high-stakes applications like healthcare and finance where reliability, safety, and robust decision-making are highly critical. This work proposes DCR, an automated framework for evaluating and improving the consistency of LLM-generated texts using a divide-conquer-reasoning approach. Unlike existing LLM-based evaluators that operate at the paragraph level, our method employs a divide-and-conquer evaluator (DCE) that breaks down the paragraph-to-paragraph comparison between two generated responses into individual sentence-to-paragraph comparisons, each evaluated based on predefined criteria. To facilitate this approach, we introduce an automatic metric converter (AMC) that translates the output from DCE into an interpretable numeric score. Beyond the consistency evaluation, we further present a reason-assisted improver (RAI) that leverages the analytical reasons with explanations identified by DCE to generate new responses aimed at reducing these inconsistencies. Through comprehensive and systematic empirical analysis, we show that our approach outperforms state-of-the-art methods by a large margin (e.g., +19.3% and +24.3% on the SummEval dataset) in evaluating the consistency of LLM generation across multiple benchmarks in semantic, factual, and summarization consistency tasks. Our approach also substantially reduces nearly 90% of output inconsistencies, showing promise for effective hallucination mitigation.

preprint2023arXiv

Recoverability and estimation of causal effects under typical multivariable missingness mechanisms

In the context of missing data, the identifiability or "recoverability" of the average causal effect (ACE) depends on causal and missingness assumptions. The latter can be depicted by adding variable-specific missingness indicators to causal diagrams, creating "missingness-directed acyclic graphs" (m-DAGs). Previous research described ten canonical m-DAGs, representing typical multivariable missingness mechanisms in epidemiological studies, and determined the recoverability of the ACE in the absence of effect modification. We extend the research by determining the recoverability of the ACE in settings with effect modification and conducting a simulation study evaluating the performance of widely used missing data methods when estimating the ACE using correctly specified g-computation, which has not been previously studied. Methods assessed were complete case analysis (CCA) and various multiple imputation (MI) implementations regarding the degree of compatibility with the outcome model used in g-computation. Simulations were based on an example from the Victorian Adolescent Health Cohort Study (VAHCS), where interest was in estimating the ACE of adolescent cannabis use on mental health in young adulthood. In the canonical m-DAGs that excluded unmeasured common causes of missingness indicators, we derived the recoverable ACE if no incomplete variable causes its missingness, and non-recoverable otherwise. Besides, the simulation showed that compatible MI approaches may enable approximately unbiased ACE estimation, unless the outcome causes its missingness or it causes the missingness of a variable that causes its missingness. Researchers must consider sensitivity analysis methods incorporating external information in the latter setting. The VAHCS case study illustrates the practical implications of these findings.

preprint2022arXiv

Atomic structure generation from reconstructing structural fingerprints

Data-driven machine learning methods have the potential to dramatically accelerate the rate of materials design over conventional human-guided approaches. These methods would help identify or, in the case of generative models, even create novel crystal structures of materials with a set of specified functional properties to then be synthesized or isolated in the laboratory. For crystal structure generation, a key bottleneck lies in developing suitable atomic structure fingerprints or representations for the machine learning model, analogous to the graph-based or SMILES representations used in molecular generation. However, finding data-efficient representations that are invariant to translations, rotations, and permutations, while remaining invertible to the Cartesian atomic coordinates remains an ongoing challenge. Here, we propose an alternative approach to this problem by taking existing non-invertible representations with the desired invariances and developing an algorithm to reconstruct the atomic coordinates through gradient-based optimization using automatic differentiation. This can then be coupled to a generative machine learning model which generates new materials within the representation space, rather than in the data-inefficient Cartesian space. In this work, we implement this end-to-end structure generation approach using atom-centered symmetry functions as the representation and conditional variational autoencoders as the generative model. We are able to successfully generate novel and valid atomic structures of sub-nanometer Pt nanoparticles as a proof of concept. Furthermore, this method can be readily extended to any suitable structural representation, thereby providing a powerful, generalizable framework towards structure-based generation.

preprint2022arXiv

Auditing Privacy Defenses in Federated Learning via Generative Gradient Leakage

Federated Learning (FL) framework brings privacy benefits to distributed learning systems by allowing multiple clients to participate in a learning task under the coordination of a central server without exchanging their private data. However, recent studies have revealed that private information can still be leaked through shared gradient information. To further protect user's privacy, several defense mechanisms have been proposed to prevent privacy leakage via gradient information degradation methods, such as using additive noise or gradient compression before sharing it with the server. In this work, we validate that the private training data can still be leaked under certain defense settings with a new type of leakage, i.e., Generative Gradient Leakage (GGL). Unlike existing methods that only rely on gradient information to reconstruct data, our method leverages the latent space of generative adversarial networks (GAN) learned from public image datasets as a prior to compensate for the informational loss during gradient degradation. To address the nonlinearity caused by the gradient operator and the GAN model, we explore various gradient-free optimization methods (e.g., evolution strategies and Bayesian optimization) and empirically show their superiority in reconstructing high-quality images from gradients compared to gradient-based optimizers. We hope the proposed method can serve as a tool for empirically measuring the amount of privacy leakage to facilitate the design of more robust defense mechanisms.

preprint2022arXiv

Data Driven Modeling of Interfacial Traction Separation Relations using a Thermodynamically Consistent Neural Network

For multilayer structures, interfacial failure is one of the most important elements related to device reliability. For cohesive zone modelling, traction-separation relations represent the adhesive interactions across interfaces. However, existing theoretical models do not currently capture traction-separation relations that have been extracted using direct methods, particularly under mixed-mode conditions. Given the complexity of the problem, models derived from the neural network approach are attractive. Although they can be trained to fit data along the loading paths taken in a particular set of mixed-mode fracture experiments, they may fail to obey physical laws for paths not covered by the training data sets. In this paper, a thermodynamically consistent neural network (TCNN) approach is established to model the constitutive behavior of interfaces when faced with sparse training data sets. Accordingly, three conditions are examined and implemented here: (i) thermodynamic consistency, (ii) maximum energy dissipation path control and (iii) J-integral conservation. These conditions are treated as constraints and are implemented as such in the loss function. The feasibility of this approach is demonstrated by comparing the modeling results with a range of physical constraints. Moreover, a Bayesian optimization algorithm is then adopted to optimize the weight factors associated with each of the constraints in order to overcome convergence issues that can arise when multiple constraints are present. The resultant numerical implementation of the ideas presented here produced well-behaved, mixed-mode traction separation surfaces that maintained the fidelity of the experimental data that was provided as input. The proposed approach heralds a new autonomous, point-to-point constitutive modeling concept for interface mechanics.

preprint2022arXiv

Machine Learning for High-entropy Alloys: Progress, Challenges and Opportunities

High-entropy alloys (HEAs) have attracted extensive interest due to their exceptional mechanical properties and the vast compositional space for new HEAs. However, understanding their novel physical mechanisms and then using these mechanisms to design new HEAs are confronted with their high-dimensional chemical complexity, which presents unique challenges to (i) the theoretical modeling that needs accurate atomic interactions for atomistic simulations and (ii) constructing reliable macro-scale models for high-throughput screening of vast amounts of candidate alloys. Machine learning (ML) sheds light on these problems with its capability to represent extremely complex relations. This review highlights the success and promising future of utilizing ML to overcome these challenges. We first introduce the basics of ML algorithms and application scenarios. We then summarize the state-of-the-art ML models describing atomic interactions and atomistic simulations of thermodynamic and mechanical properties. Special attention is paid to phase predictions, planar-defect calculations, and plastic deformation simulations. Next, we review ML models for macro-scale properties, such as lattice structures, phase formations, and mechanical properties. Examples of machine-learned phase-formation rules and order parameters are used to illustrate the workflow. Finally, we discuss the remaining challenges and present an outlook of research directions, including uncertainty quantification and ML-guided inverse materials design.

preprint2022arXiv

SPTS: Single-Point Text Spotting

Existing scene text spotting (i.e., end-to-end text detection and recognition) methods rely on costly bounding box annotations (e.g., text-line, word-level, or character-level bounding boxes). For the first time, we demonstrate that training scene text spotting models can be achieved with an extremely low-cost annotation of a single-point for each instance. We propose an end-to-end scene text spotting method that tackles scene text spotting as a sequence prediction task. Given an image as input, we formulate the desired detection and recognition results as a sequence of discrete tokens and use an auto-regressive Transformer to predict the sequence. The proposed method is simple yet effective, which can achieve state-of-the-art results on widely used benchmarks. Most significantly, we show that the performance is not very sensitive to the positions of the point annotation, meaning that it can be much easier to be annotated or even be automatically generated than the bounding box that requires precise positions. We believe that such a pioneer attempt indicates a significant opportunity for scene text spotting applications of a much larger scale than previously possible. The code is available at https://github.com/shannanyinxiang/SPTS.

preprint2021arXiv

Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution

Visual information extraction (VIE) has attracted considerable attention recently owing to its various advanced applications such as document understanding, automatic marking and intelligent education. Most existing works decoupled this problem into several independent sub-tasks of text spotting (text detection and recognition) and information extraction, which completely ignored the high correlation among them during optimization. In this paper, we propose a robust visual information extraction system (VIES) towards real-world scenarios, which is a unified end-to-end trainable framework for simultaneous text detection, recognition and information extraction by taking a single document image as input and outputting the structured information. Specifically, the information extraction branch collects abundant visual and semantic representations from text spotting for multimodal feature fusion and conversely, provides higher-level semantic clues to contribute to the optimization of text spotting. Moreover, regarding the shortage of public benchmarks, we construct a fully-annotated dataset called EPHOIE (https://github.com/HCIILAB/EPHOIE), which is the first Chinese benchmark for both text spotting and visual information extraction. EPHOIE consists of 1,494 images of examination paper head with complex layouts and background, including a total of 15,771 Chinese handwritten or printed text instances. Compared with the state-of-the-art methods, our VIES shows significant superior performance on the EPHOIE dataset and achieves a 9.01% F-score gain on the widely used SROIE dataset under the end-to-end scenario.

preprint2020arXiv

A Novel Evolution Strategy with Directional Gaussian Smoothing for Blackbox Optimization

We propose an improved evolution strategy (ES) using a novel nonlocal gradient operator for high-dimensional black-box optimization. Standard ES methods with $d$-dimensional Gaussian smoothing suffer from the curse of dimensionality due to the high variance of Monte Carlo (MC) based gradient estimators. To control the variance, Gaussian smoothing is usually limited in a small region, so existing ES methods lack nonlocal exploration ability required for escaping from local minima. We develop a nonlocal gradient operator with directional Gaussian smoothing (DGS) to address this challenge. The DGS conducts 1D nonlocal explorations along $d$ orthogonal directions in $\mathbb{R}^d$, each of which defines a nonlocal directional derivative as a 1D integral. We then use Gauss-Hermite quadrature, instead of MC sampling, to estimate the $d$ 1D integrals to ensure high accuracy (i.e., small variance). Our method enables effective nonlocal exploration to facilitate the global search in high-dimensional optimization. We demonstrate the superior performance of our method in three sets of examples, including benchmark functions for global optimization, and real-world science and engineering applications.

preprint2020arXiv

Antiferromagnetic quantum spin Hall states in iron halogenide

It is widely known that quantum spin Hall (QSH) insulator can be viewed as two copies of quantum anomalous Hall (QAH) insulator with opposite local magnetic moments. However, nearly every QSH insulator discovered so far is a nonmagnetic semiconductor. Due to the vanishing local magnetic moment of each copy, the QAH states only conceptually exist in these QSH insulators. In this work, we show a realistic construction of QSH states with finite local magnetic moment by staking bilayer QAH insulators. Our explicit construction benefits from an effective QAH model with a large topological gap and is further supported by a class of two-dimensional ferromagnetic materials. Our work not only validates the conceptual relationship of QSH and QAH but also provides an ideal material platform for realizing antiferromagnetic QSH state which is highly tunable between QAH and QSH states as a function of the number of layers.

preprint2020arXiv

Fast and stable deep-learning predictions of material properties for solid solution alloys

We present a novel deep learning (DL) approach to produce highly accurate predictions of macroscopic physical properties of solid solution binary alloys and magnetic systems. The major idea is to make use of the correlations between different physical properties in alloy systems to improve the prediction accuracy of neural network (NN) models. We use multitasking NN models to simultaneously predict the total energy, charge density and magnetic moment. These physical properties mutually serve as constraints during the training of the multitasking NN, resulting in more reliable DL models because multiple physics properties are correctly learned by a single model. Two binary alloys, copper-gold (CuAu) and iron-platinum (FePt), were studied. Our results show that once the multitasking NN's are trained, they can estimate the material properties for a specific configuration hundreds of times faster than first-principles density functional theory calculations while retaining comparable accuracy. We used a simple measure based on the root-mean-squared errors (RMSE) to quantify the quality of the NN models, and found that the inclusion of charge density and magnetic moment as physical constraints leads to more stable models that exhibit improved accuracy and reduced uncertainty for the energy predictions.

preprint2020arXiv

On the quantification and efficient propagation of imprecise probabilities with copula dependence

This paper addresses the problem of quantification and propagation of uncertainties associated with dependence modeling when data for characterizing probability models are limited. Practically, the system inputs are often assumed to be mutually independent or correlated by a multivariate Gaussian distribution. However, this subjective assumption may introduce bias in the response estimate if the real dependence structure deviates from this assumption. In this work, we overcome this limitation by introducing a flexible copula dependence model to capture complex dependencies. A hierarchical Bayesian multimodel approach is proposed to quantify uncertainty in dependence model-form and model parameters that result from small data sets. This approach begins by identifying, through Bayesian multimodel inference, a set of candidate marginal models and their corresponding model probabilities, and then estimating the uncertainty in the copula-based dependence structure, which is conditional on the marginals and their parameters. The overall uncertainties integrating marginals and copulas are probabilistically represented by an ensemble of multivariate candidate densities. A novel importance sampling reweighting approach is proposed to efficiently propagate the overall uncertainties through a computational model. Through an example studying the influence of constituent properties on the out-of-plane properties of transversely isotropic E- glass fiber composites, we show that the composite property with copula-based dependence model converges to the true estimate as data set size increases, while an independence or arbitrary Gaussian correlation assumption leads to a biased estimate.

preprint2020arXiv

PBGen: Partial Binarization of Deconvolution-Based Generators for Edge Intelligence

This work explores the binarization of the deconvolution-based generator in a GAN for memory saving and speedup of image construction. Our study suggests that different from convolutional neural networks (including the discriminator) where all layers can be binarized, only some of the layers in the generator can be binarized without significant performance loss. Supported by theoretical analysis and verified by experiments, a direct metric based on the dimension of deconvolution operations is established, which can be used to quickly decide which layers in the generator can be binarized. Our results also indicate that both the generator and the discriminator should be binarized simultaneously for balanced competition and better performance. Experimental results based on CelebA suggest that directly applying state-of-the-art binarization techniques to all the layers of the generator will lead to 2.83$\times$ performance loss measured by sliced Wasserstein distance compared with the original generator, while applying them to selected layers only can yield up to 25.81$\times$ saving in memory consumption, and 1.96$\times$ and 1.32$\times$ speedup in inference and training respectively with little performance loss.

preprint2019arXiv

Machine Learning the Effective Hamiltonian in High Entropy Alloys

The development of machine learning sheds new light on the problem of statistical thermodynamics in multicomponent alloys. However, a data-driven approach to construct the effective Hamiltonian requires sufficiently large data sets, which is expensive to calculate with conventional density functional theory (DFT). To solve this problem, we propose to use the atomic local energy as the target variable, and harness the power of the linear-scaling DFT to accelerate the data generating process. Using the large amounts of DFT data sets, various complex models are devised and applied to learn the effective Hamiltonians of a range of refractory high entropy alloys (HEAs). The testing $R^2$ scores of the effective pair interaction model are higher than 0.99, demonstrating that the pair interactions within the 6-th coordination shell provide an excellent description of the atomic local energies for all the four HEAs. This model is further improved by including nonlinear and multi-site interactions. In particular, the deep neural networks (DNNs) built directly in the local configuration space (therefore no hand-crafted features) are employed to model the effective Hamiltonian. The results demonstrate that neural networks are promising for the modeling of effective Hamiltonian due to its excellent representation power.