Source author record

Hongmin Wang

Hongmin Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language hep-ph nucl-th Artificial Intelligence math.GM nucl-ex

Catalog footprint

What is connected

5works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

Revisiting Challenges in Data-to-Text Generation with Fact Grounding

Data-to-text generation models face challenges in ensuring data fidelity by referring to the correct input source. To inspire studies in this area, Wiseman et al. (2017) introduced the RotoWire corpus on generating NBA game summaries from the box- and line-score tables. However, limited attempts have been made in this direction and the challenges remain. We observe a prominent bottleneck in the corpus where only about 60% of the summary contents can be grounded to the boxscore records. Such information deficiency tends to misguide a conditioned language model to produce unconditioned random facts and thus leads to factual hallucinations. In this work, we restore the information balance and revamp this task to focus on fact-grounded data-to-text generation. We introduce a purified and larger-scale dataset, RotoWire-FG (Fact-Grounding), with 50% more data from the year 2017-19 and enriched input tables, hoping to attract more research focuses in this direction. Moreover, we achieve improved data fidelity over the state-of-the-art models by integrating a new form of table reconstruction as an auxiliary task to boost the generation quality.

preprint2020arXiv

TabFact: A Large-scale Dataset for Table-based Fact Verification

The problem of verifying whether a textual hypothesis holds based on the given evidence, also known as fact verification, plays an important role in the study of natural language understanding and semantic representation. However, existing studies are mainly restricted to dealing with unstructured evidence (e.g., natural language sentences and documents, news, etc), while verification under structured evidence, such as tables, graphs, and databases, remains under-explored. This paper specifically aims to study the fact verification given semi-structured data as evidence. To this end, we construct a large-scale dataset called TabFact with 16k Wikipedia tables as the evidence for 118k human-annotated natural language statements, which are labeled as either ENTAILED or REFUTED. TabFact is challenging since it involves both soft linguistic reasoning and hard symbolic reasoning. To address these reasoning challenges, we design two different models: Table-BERT and Latent Program Algorithm (LPA). Table-BERT leverages the state-of-the-art pre-trained language model to encode the linearized tables and statements into continuous vectors for verification. LPA parses statements into programs and executes them against the tables to obtain the returned binary value for verification. Both methods achieve similar accuracy but still lag far behind human performance. We also perform a comprehensive analysis to demonstrate great future opportunities. The data and code of the dataset are provided in \url{https://github.com/wenhuchen/Table-Fact-Checking}.

preprint2015arXiv

Energy Dependent Growth of Nucleon and Inclusive Charged Hadron Distributions

In the Color Glass Condensate formalism, charged hadron p_{T} distributions in p+p collisions are studied by considering an energy-dependent broadening of nucleon's density distribution. Then, in the Glasma flux tube picture, the n-particle multiplicity distributions at different pseudo-rapidity ranges are investigated. Both of the theoretical results show good agreement with the recent experimental data from ALICE and CMS at \sqrt{s}=0.9, 2.36, 7 TeV. The predictive results for p_{T} and multiplicity distributions in p+p and p+Pb collisions at the Large Hadron Collider are also given in this paper.

preprint2014arXiv

Hadron Multiplicities in p+p and p+Pb Collisions

Experiments at the Large Hadron Collider (LHC) have measured multiplicity distributions in p+p and p+Pb collisions at a new domain of collision energy. Based on considering an energy-dependent broadening of the nucleon's density distribution, charged hadron multiplicities are studied with the phenomenological saturation model and the evolution equation dependent saturation model. By assuming the saturation scale have a small dependence on the 3-dimensional root mean square (rms) radius at different energy, the theoretical results are in good agreement with the experimental data from CMS and ALICE collaboration. Then, the predictive results in p+p collisions at $\sqrt{s}=$ 14 TeV of the LHC are also given.

preprint2013arXiv

Influence of the Nucleon Hard Partons Distribution on J/ΨSuppression in a GMC Framework

In a Glauber Monte Carlo framework, taking account of the transverse spatial distribution of hard partons in the nucleon, we analyse the nuclear modification factor $R_{dAu}$ for $J/ψ$ in d+Au collisions with the EPS09 shadowing parametrization. After the influence of nucleon hard partons distribution is considered, a clearly upward correction is revealed for the dependence of $R_{dAu}$ on $N_{coll}$ in peripheral d+Au collisions, however, an unconspicuous correction is shown for the results versus $p_{T}$. The theoretical results are in good agreement with the experimental data from PHENIX.