Source author record

Jingyang Li

Jingyang Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Machine Learning math.OC math.ST Statistics Theory Artificial Intelligence astro-ph.IM Information Theory math.IT Methodology Multimedia physics.pop-ph

Catalog footprint

What is connected

6works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A Survey on Neural Open Information Extraction: Current Status and Future Directions

Open Information Extraction (OpenIE) facilitates domain-independent discovery of relational facts from large corpora. The technique well suits many open-world natural language understanding scenarios, such as automatic knowledge base construction, open-domain question answering, and explicit reasoning. Thanks to the rapid development in deep learning technologies, numerous neural OpenIE architectures have been proposed and achieve considerable performance improvement. In this survey, we provide an extensive overview of the-state-of-the-art neural OpenIE models, their key design decisions, strengths and weakness. Then, we discuss limitations of current solutions and the open issues in OpenIE problem itself. Finally we list recent trends that could help expand its scope and applicability, setting up promising directions for future research in OpenIE. To our best knowledge, this paper is the first review on this specific topic.

preprint2022arXiv

Generalized Low-rank plus Sparse Tensor Estimation by Fast Riemannian Optimization

We investigate a generalized framework to estimate a latent low-rank plus sparse tensor, where the low-rank tensor often captures the multi-way principal components and the sparse tensor accounts for potential model mis-specifications or heterogeneous signals that are unexplainable by the low-rank part. The framework is flexible covering both linear and non-linear models, and can easily handle continuous or categorical variables. We propose a fast algorithm by integrating the Riemannian gradient descent and a novel gradient pruning procedure. Under suitable conditions, the algorithm converges linearly and can simultaneously estimate both the low-rank and sparse tensors. The statistical error bounds of final estimates are established in terms of the gradient of loss function. The error bounds are generally sharp under specific statistical models, e.g., the robust tensor PCA and the community detection in hypergraph networks with outlier vertices. Moreover, our method achieves non-trivial error bounds for heavy-tailed tensor PCA whenever the noise has a finite $2+\varepsilon$ moment. We apply our method to analyze the international trade flow dataset and the statistician hypergraph co-authorship network, both yielding new and interesting findings.

preprint2022arXiv

Layout-Aware Information Extraction for Document-Grounded Dialogue: Dataset, Method and Demonstration

Building document-grounded dialogue systems have received growing interest as documents convey a wealth of human knowledge and commonly exist in enterprises. Wherein, how to comprehend and retrieve information from documents is a challenging research problem. Previous work ignores the visual property of documents and treats them as plain text, resulting in incomplete modality. In this paper, we propose a Layout-aware document-level Information Extraction dataset, LIE, to facilitate the study of extracting both structural and semantic knowledge from visually rich documents (VRDs), so as to generate accurate responses in dialogue systems. LIE contains 62k annotations of three extraction tasks from 4,061 pages in product and official documents, becoming the largest VRD-based information extraction dataset to the best of our knowledge. We also develop benchmark methods that extend the token-based language model to consider layout features like humans. Empirical results show that layout is critical for VRD-based extraction, and system demonstration also verifies that the extracted knowledge can help locate the answers that users care about.

preprint2022arXiv

Provable Tensor-Train Format Tensor Completion by Riemannian Optimization

The tensor train (TT) format enjoys appealing advantages in handling structural high-order tensors. The recent decade has witnessed the wide applications of TT-format tensors from diverse disciplines, among which tensor completion has drawn considerable attention. Numerous fast algorithms, including the Riemannian gradient descent (RGrad), have been proposed for the TT-format tensor completion. However, the theoretical guarantees of these algorithms are largely missing or sub-optimal, partly due to the complicated and recursive algebraic operations in TT-format decomposition. Moreover, existing results established for the tensors of other formats, for example, Tucker and CP, are inapplicable because the algorithms treating TT-format tensors are substantially different and more involved. In this paper, we provide, to our best knowledge, the first theoretical guarantees of the convergence of RGrad algorithm for TT-format tensor completion, under a nearly optimal sample size condition. The RGrad algorithm converges linearly with a constant contraction rate that is free of tensor condition number without the necessity of re-conditioning. We also propose a novel approach, referred to as the sequential second-order moment method, to attain a warm initialization under a similar sample size requirement. As a byproduct, our result even significantly refines the prior investigation of RGrad algorithm for matrix completion. Lastly, statistically (near) optimal rate is derived for RGrad algorithm if the observed entries consist of random sub-Gaussian noise. Numerical experiments confirm our theoretical discovery and showcase the computational speedup gained by the TT-format decomposition.

preprint2019arXiv

Rule-Guided Compositional Representation Learning on Knowledge Graphs

Representation learning on a knowledge graph (KG) is to embed entities and relations of a KG into low-dimensional continuous vector spaces. Early KG embedding methods only pay attention to structured information encoded in triples, which would cause limited performance due to the structure sparseness of KGs. Some recent attempts consider paths information to expand the structure of KGs but lack explainability in the process of obtaining the path representations. In this paper, we propose a novel Rule and Path-based Joint Embedding (RPJE) scheme, which takes full advantage of the explainability and accuracy of logic rules, the generalization of KG embedding as well as the supplementary semantic structure of paths. Specifically, logic rules of different lengths (the number of relations in rule body) in the form of Horn clauses are first mined from the KG and elaborately encoded for representation learning. Then, the rules of length 2 are applied to compose paths accurately while the rules of length 1 are explicitly employed to create semantic associations among relations and constrain relation embeddings. Besides, the confidence level of each rule is also considered in optimization to guarantee the availability of applying the rule to representation learning. Extensive experimental results illustrate that RPJE outperforms other state-of-the-art baselines on KG completion task, which also demonstrate the superiority of utilizing logic rules as well as paths for improving the accuracy and explainability of representation learning.

preprint2015arXiv

Debris Engine: A Potential Thruster for Space Debris Removal

We present a design concept for a space engine that can continuously remove the orbit debris by using the debris as a propellant. Space robotic cleaner is adopted to capture the targeting debris and to transfer them into the engine. Debris with larger size is first disintegrated into small pieces by using a mechanical method. The planetary ball mill is then adopted to grind the pieces into micrometer or smaller powder. The energy needed in this process is get from the nuclear and solar power. By the effect of gamma-ray photoelectric or the behavior of tangently rub of tungsten needles, the debris powered is charged. This behavior can be used to speed up the movement of powder in a tandem electrostatic particle accelerator. By ejecting the high-temperture and high-pressure charged powered from the nozzle of the engine,the continuously thrust is obtained. This thrust can be used to perform orbital maneuver and debris rendezvous for the spacecraft and robotic cleaner. The ejected charged particle will be blown away from the circumterrestrial orbit by the solar wind. By digesting the space debris, we obtain not only the previous thrust but also the clean space. In the near future, start trek will not just a dream, human exploration will extend to deep universe. The analysis shown, the magnitude of the specific impulse for debris engine is determined by the accelerating electrostatic potential and the charge-to-mass ratio of the powder.

Jingyang Li

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

A Survey on Neural Open Information Extraction: Current Status and Future Directions

Generalized Low-rank plus Sparse Tensor Estimation by Fast Riemannian Optimization

Layout-Aware Information Extraction for Document-Grounded Dialogue: Dataset, Method and Demonstration

Provable Tensor-Train Format Tensor Completion by Riemannian Optimization

Rule-Guided Compositional Representation Learning on Knowledge Graphs

Debris Engine: A Potential Thruster for Space Debris Removal