Researcher profile

Lihan Wang

Lihan Wang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
11works
0followers
11topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

11 published item(s)

preprint2022arXiv

A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future Directions

Text-to-SQL parsing is an essential and challenging task. The goal of text-to-SQL parsing is to convert a natural language (NL) question to its corresponding structured query language (SQL) based on the evidences provided by relational databases. Early text-to-SQL parsing systems from the database community achieved a noticeable progress with the cost of heavy human engineering and user interactions with the systems. In recent years, deep neural networks have significantly advanced this task by neural generation models, which automatically learn a mapping function from an input NL question to an output SQL query. Subsequently, the large pre-trained language models have taken the state-of-the-art of the text-to-SQL parsing task to a new level. In this survey, we present a comprehensive review on deep learning approaches for text-to-SQL parsing. First, we introduce the text-to-SQL parsing corpora which can be categorized as single-turn and multi-turn. Second, we provide a systematical overview of pre-trained language models and existing methods for text-to-SQL parsing. Third, we present readers with the challenges faced by text-to-SQL parsing and explore some potential future directions in this field.

preprint2022arXiv

Complexity of zigzag sampling algorithm for strongly log-concave distributions

We study the computational complexity of zigzag sampling algorithm for strongly log-concave distributions. The zigzag process has the advantage of not requiring time discretization for implementation, and that each proposed bouncing event requires only one evaluation of partial derivative of the potential, while its convergence rate is dimension independent. Using these properties, we prove that the zigzag sampling algorithm achieves $\varepsilon$ error in chi-square divergence with a computational cost equivalent to $O\bigl(κ^2 d^\frac{1}{2}(\log\frac{1}{\varepsilon})^{\frac{3}{2}}\bigr)$ gradient evaluations in the regime $κ\ll \frac{d}{\log d}$ under a warm start assumption, where $κ$ is the condition number and $d$ is the dimension.

preprint2022arXiv

Generic properties of Steklov eigenfunctions

Let $M^n$ be a smooth compact manifolds with smooth boundary. We show that for a generic $C^k$ metic on $\bar{M^n}$ with $k>n-1$, the nonzero Steklov eigenvalues are simple. Moreover, we also prove that the non-constant Steklov eigenfunctions have zero as a regular value and are Morse functions on the boundary for such generic metric. These results generalize the celebrated results on Laplacians by Uhlenbeck to the Steklov setting.

preprint2022arXiv

Linking-Enhanced Pre-Training for Table Semantic Parsing

Recently pre-training models have significantly improved the performance of various NLP tasks by leveraging large-scale text corpora to improve the contextual representation ability of the neural network. The large pre-training language model has also been applied in the area of table semantic parsing. However, existing pre-training approaches have not carefully explored explicit interaction relationships between a question and the corresponding database schema, which is a key ingredient for uncovering their semantic and structural correspondence. Furthermore, the question-aware representation learning in the schema grounding context has received less attention in pre-training objective.To alleviate these issues, this paper designs two novel pre-training objectives to impose the desired inductive bias into the learned representations for table pre-training. We further propose a schema-aware curriculum learning approach to mitigate the impact of noise and learn effectively from the pre-training data in an easy-to-hard manner. We evaluate our pre-trained framework by fine-tuning it on two benchmarks, Spider and SQUALL. The results demonstrate the effectiveness of our pre-training objective and curriculum compared to a variety of baselines.

preprint2022arXiv

Proton: Probing Schema Linking Information from Pre-trained Language Models for Text-to-SQL Parsing

The importance of building text-to-SQL parsers which can be applied to new databases has long been acknowledged, and a critical step to achieve this goal is schema linking, i.e., properly recognizing mentions of unseen columns or tables when generating SQLs. In this work, we propose a novel framework to elicit relational structures from large-scale pre-trained language models (PLMs) via a probing procedure based on Poincaré distance metric, and use the induced relations to augment current graph-based parsers for better schema linking. Compared with commonly-used rule-based methods for schema linking, we found that probing relations can robustly capture semantic correspondences, even when surface forms of mentions and entities differ. Moreover, our probing procedure is entirely unsupervised and requires no additional parameters. Extensive experiments show that our framework sets new state-of-the-art performance on three benchmarks. We empirically verify that our probing procedure can indeed find desired relational structures through qualitative analysis. Our code can be found at https://github.com/AlibabaResearch/DAMO-ConvAI.

preprint2022arXiv

Rigidity of complete manifolds with weighted Poincaré inequality

We consider complete Riemannian manifolds which satisfy a weighted Poincarè inequality and have the Ricci curvature bounded below in terms of the weight function. When the weight function has a non-zero limit at infinity, the structure of this class of manifolds at infinity are studied and certain splitting result is obtained. Our result can be viewed as an improvement of Li-Wang's result in \cite{LW3}.

preprint2022arXiv

S$^2$SQL: Injecting Syntax to Question-Schema Interaction Graph Encoder for Text-to-SQL Parsers

The task of converting a natural language question into an executable SQL query, known as text-to-SQL, is an important branch of semantic parsing. The state-of-the-art graph-based encoder has been successfully used in this task but does not model the question syntax well. In this paper, we propose S$^2$SQL, injecting Syntax to question-Schema graph encoder for Text-to-SQL parsers, which effectively leverages the syntactic dependency information of questions in text-to-SQL to improve the performance. We also employ the decoupling constraint to induce diverse relational edge embedding, which further improves the network's performance. Experiments on the Spider and robustness setting Spider-Syn demonstrate that the proposed approach outperforms all existing methods when pre-training models are used, resulting in a performance ranks first on the Spider leaderboard.

preprint2021arXiv

Complexity of randomized algorithms for underdamped Langevin dynamics

We establish an information complexity lower bound of randomized algorithms for simulating underdamped Langevin dynamics. More specifically, we prove that the worst $L^2$ strong error is of order $Ω(\sqrt{d}\, N^{-3/2})$, for solving a family of $d$-dimensional underdamped Langevin dynamics, by any randomized algorithm with only $N$ queries to $\nabla U$, the driving Brownian motion and its weighted integration, respectively. The lower bound we establish matches the upper bound for the randomized midpoint method recently proposed by Shen and Lee [NIPS 2019], in terms of both parameters $N$ and $d$.

preprint2021arXiv

On explicit $L^2$-convergence rate estimate for piecewise deterministic Markov processes in MCMC algorithms

We establish $L^2$-exponential convergence rate for three popular piecewise deterministic Markov processes for sampling: the randomized Hamiltonian Monte Carlo method, the zigzag process, and the bouncy particle sampler. Our analysis is based on a variational framework for hypocoercivity, which combines a Poincaré-type inequality in time-augmented state space and a standard $L^2$ energy estimate. Our analysis provides explicit convergence rate estimates, which are more quantitative than existing results.

preprint2020arXiv

Dual-comb delay spectroscopy with attometer resolution

Spectroscopy has attracted much attention in molecular detection, biomolecular identification, and chemical analysis for providing accurate measurement. However, it is almost unable to distinguish different sources with overlapped resonances in mixed analytes. Here, we present dual-comb delay spectroscopy to overcome this problem. The introduction of group delay spectroscopy provides a new tool to identify sources that would lead to overlapped resonances in intensity or phase spectroscopy. To obtain sufficiently high spectral resolution and signal-to-noise ratio for achieving reliable group delay spectrum, a probe comb with the wavelengths precisely scaned by a microwave source is applied, leading to attometer-level resolution and million-level signal-to-noise ratio. In an experiment, spectroscopy with an optional resolution up to 1 kHz (8 attometer), an average signal-to-noise ratio surpassing 2,000,000, and a span exceeding 33 nm is demonstrated. Two overlapped resonances from two different sources are clearly differentiated. Our work offers a new perspective for exploring the interaction between matter and light.