Source author record

Can Huang

Can Huang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence physics.plasm-ph physics.space-ph astro-ph.SR Computation and Language Databases astro-ph.EP astro-ph.HE Machine Learning math.NA

Catalog footprint

What is connected

12works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2025arXiv

Grounding Natural Language to SQL Translation with Data-Based Self-Explanations

Natural Language Interfaces for Databases empower non-technical users to interact with data using natural language (NL). Advanced approaches, utilizing either neural sequence-to-sequence or more recent sophisticated large-scale language models, typically implement NL to SQL (NL2SQL) translation in an end-to-end fashion. However, like humans, these end-to-end translation models may not always generate the best SQL output on their first try. In this paper, we propose CycleSQL, an iterative framework designed for end-to-end translation models to autonomously generate the best output through self-evaluation. The main idea of CycleSQL is to introduce data-grounded NL explanations of query results as self-provided feedback, and use the feedback to validate the correctness of the translation iteratively, hence improving the overall translation accuracy. Extensive experiments, including quantitative and qualitative evaluations, are conducted to study CycleSQL by applying it to seven existing translation models on five widely used benchmarks. The results show that 1) the feedback loop introduced in CycleSQL can consistently improve the performance of existing models, and in particular, by applying CycleSQL to RESDSQL, obtains a translation accuracy of 82.0% (+2.6%) on the validation set, and 81.6% (+3.2%) on the test set of Spider benchmark; 2) the generated NL explanations can also provide insightful information for users, aiding in the comprehension of translation results and consequently enhancing the interpretability of NL2SQL translation.

preprint2025arXiv

UniRec-0.1B: Unified Text and Formula Recognition with 0.1B Parameters

Text and formulas constitute the core informational components of many documents. Accurately and efficiently recognizing both is crucial for developing robust and generalizable document parsing systems. Recently, vision-language models (VLMs) have achieved impressive unified recognition of text and formulas. However, they are large-sized and computationally demanding, restricting their usage in many applications. In this paper, we propose UniRec-0.1B, a unified recognition model with only 0.1B parameters. It is capable of performing text and formula recognition at multiple levels, including characters, words, lines, paragraphs, and documents. To implement this task, we first establish UniRec40M, a large-scale dataset comprises 40 million text, formula and their mix samples, enabling the training of a powerful yet lightweight model. Secondly, we identify two challenges when building such a lightweight but unified expert model. They are: structural variability across hierarchies and semantic entanglement between textual and formulaic content. To tackle these, we introduce a hierarchical supervision training that explicitly guides structural comprehension, and a semantic-decoupled tokenizer that separates text and formula representations. Finally, we develop a comprehensive evaluation benchmark covering Chinese and English documents from multiple domains and with multiple levels. Experimental results on this and public benchmarks demonstrate that UniRec-0.1B outperforms both general-purpose VLMs and leading document parsing expert models, while achieving a 2-9$\times$ speedup, validating its effectiveness and efficiency. Codebase and Dataset: https://github.com/Topdu/OpenOCR.

preprint2024arXiv

GloTSFormer: Global Video Text Spotting Transformer

Video Text Spotting (VTS) is a fundamental visual task that aims to predict the trajectories and content of texts in a video. Previous works usually conduct local associations and apply IoU-based distance and complex post-processing procedures to boost performance, ignoring the abundant temporal information and the morphological characteristics in VTS. In this paper, we propose a novel Global Video Text Spotting Transformer GloTSFormer to model the tracking problem as global associations and utilize the Gaussian Wasserstein distance to guide the morphological correlation between frames. Our main contributions can be summarized as three folds. 1). We propose a Transformer-based global tracking method GloTSFormer for VTS and associate multiple frames simultaneously. 2). We introduce a Wasserstein distance-based method to conduct positional associations between frames. 3). We conduct extensive experiments on public datasets. On the ICDAR2015 video dataset, GloTSFormer achieves 56.0 MOTA with 4.6 absolute improvement compared with the previous SOTA method and outperforms the previous Transformer-based method by a significant 8.3 MOTA.

preprint2024arXiv

Metasql: A Generate-then-Rank Framework for Natural Language to SQL Translation

The Natural Language Interface to Databases (NLIDB) empowers non-technical users with database access through intuitive natural language (NL) interactions. Advanced approaches, utilizing neural sequence-to-sequence models or large-scale language models, typically employ auto-regressive decoding to generate unique SQL queries sequentially. While these translation models have greatly improved the overall translation accuracy, surpassing 70% on NLIDB benchmarks, the use of auto-regressive decoding to generate single SQL queries may result in sub-optimal outputs, potentially leading to erroneous translations. In this paper, we propose Metasql, a unified generate-then-rank framework that can be flexibly incorporated with existing NLIDBs to consistently improve their translation accuracy. Metasql introduces query metadata to control the generation of better SQL query candidates and uses learning-to-rank algorithms to retrieve globally optimized queries. Specifically, Metasql first breaks down the meaning of the given NL query into a set of possible query metadata, representing the basic concepts of the semantics. These metadata are then used as language constraints to steer the underlying translation model toward generating a set of candidate SQL queries. Finally, Metasql ranks the candidates to identify the best matching one for the given NL query. Extensive experiments are performed to study Metasql on two public NLIDB benchmarks. The results show that the performance of the translation models can be effectively improved using Metasql.

preprint2022arXiv

Knowing Where and What: Unified Word Block Pretraining for Document Understanding

Due to the complex layouts of documents, it is challenging to extract information for documents. Most previous studies develop multimodal pre-trained models in a self-supervised way. In this paper, we focus on the embedding learning of word blocks containing text and layout information, and propose UTel, a language model with Unified TExt and Layout pre-training. Specifically, we propose two pre-training tasks: Surrounding Word Prediction (SWP) for the layout learning, and Contrastive learning of Word Embeddings (CWE) for identifying different word blocks. Moreover, we replace the commonly used 1D position embedding with a 1D clipped relative position embedding. In this way, the joint training of Masked Layout-Language Modeling (MLLM) and two newly proposed tasks enables the interaction between semantic and spatial features in a unified way. Additionally, the proposed UTel can process arbitrary-length sequences by removing the 1D position embedding, while maintaining competitive performance. Extensive experimental results show UTel learns better joint representations and achieves superior performance than previous methods on various downstream tasks, though requiring no image modality. Code is available at \url{https://github.com/taosong2019/UTel}.

preprint2020arXiv

M^3VSNet: Unsupervised Multi-metric Multi-view Stereo Network

The present Multi-view stereo (MVS) methods with supervised learning-based networks have an impressive performance comparing with traditional MVS methods. However, the ground-truth depth maps for training are hard to be obtained and are within limited kinds of scenarios. In this paper, we propose a novel unsupervised multi-metric MVS network, named M^3VSNet, for dense point cloud reconstruction without any supervision. To improve the robustness and completeness of point cloud reconstruction, we propose a novel multi-metric loss function that combines pixel-wise and feature-wise loss function to learn the inherent constraints from different perspectives of matching correspondences. Besides, we also incorporate the normal-depth consistency in the 3D point cloud format to improve the accuracy and continuity of the estimated depth maps. Experimental results show that M3VSNet establishes the state-of-the-arts unsupervised method and achieves comparable performance with previous supervised MVSNet on the DTU dataset and demonstrates the powerful generalization ability on the Tanks and Temples benchmark with effective improvement. Our code is available at https://github.com/whubaichuan/M3VSNet.

preprint2020arXiv

M^3VSNet: Unsupervised Multi-metric Multi-view Stereo Network

preprint2016arXiv

Global explicit particle-in-cell simulations of the nonstationary bow shock and magnetosphere

We carry out two-dimensional global particle-in-cell simulations of the interaction between the solar wind and a dipole field to study the formation of the bow shock and magnetosphere. A self-reforming bow shock ahead of a dipole field is presented by using relatively high temporal-spatial resolutions. We find that (1) the bow shock and the magnetosphere are formed and reach a quasi-stable state after several ion cyclotron periods, and (2) under the Bz southward solar wind condition the bow shock undergoes a self-reformation for low \b{eta}i and high MA. Simultaneously, a magnetic reconnection in the magnetotail is found. For high \b{eta}i and low MA, the shock becomes quasi-stationary, and the magnetotail reconnection disappears. In addition, (3) the magnetopause deflects the magnetosheath plasmas. The sheath particles injected at the quasi-perpendicular region of the bow shock can be convected to downstream of an oblique shock region. A fraction of these sheath particles can leak out from the magnetosheath at the wings of the bow shock. Hence, the downstream situation is more complicated than that for a planar shock produced in local simulations.

preprint2016arXiv

The Mechanisms of Electron Acceleration During Multiple X Line Magnetic Reconnection with a Guide Field

The interactions between magnetic islands are considered to play an important role in electron acceleration during magnetic reconnection. In this paper, two-dimensional (2-D) particle-in-cell (PIC) simulations are performed to study electron acceleration during multiple X line reconnection with a guide field. The electrons remain almost magnetized, and we can then analyze the contributions of the parallel electric field, Fermi and betatron mechanisms to electron acceleration during the evolution of magnetic reconnection by comparing with a guide-center theory. The results show that with the proceeding of magnetic reconnection, two magnetic islands are formed in the simulation domain. The electrons are accelerated by both the parallel electric field in the vicinity of the X lines and Fermi mechanism due to the contraction of the two magnetic islands. Then the two magnetic islands begin to merge into one, and in such a process electrons can be accelerated by the parallel electric field and betatron mechanisms. During the betatron acceleration, the electrons are locally accelerated in the regions where the magnetic field is piled up by the high-speed flow from the X line. At last, when the coalescence of the two islands into a big one finishes, electrons can further be accelerated by the Fermi mechanism because of the contraction of the big island. With the increase of the guide field, the contributions of Fermi and betatron mechanisms to electron acceleration become less and less important. When the guide field is sufficiently large, the contributions of Fermi and betatron mechanisms are almost negligible.

preprint2015arXiv

Impact of pickup ions on the shock front nonstationarity and energy dissipation of the heliospheric termination shock: Two-dimensional full particle simulations and comparison with Voyager 2 observations

The transition between the supersonic solar wind and the subsonic heliosheath, the termination shock (TS), was observed by Voyager 2 (V2) on 2007 August 31-September 1 at a distance of 84 AU from the Sun. The data reveal multiple crossings of a complex, quasi-perpendicular supercritical shock. These experimental data are the starting point for a more sophisticated analysis that includes computer modeling of a shock in the presence of pickup ions (PUIs). here, we present two-dimensional (2-D) particle-in-cell (PIC) simulations of the TS including PUIs self-consistently. We also report the ion velocity distribution across the TS using the Faraday cup data from V2. A relatively complete plasma and magnetic field data set from V2 gives us the opportunity to do a full comparison between the experimental data and PIC simulation results. Our results show that: (1) The nonstationarity of the shock front is mainly caused by the ripples along the shock front and these ripples from even if the percentage of PUIs is high. (2) PUIs play a key role in the energy dissipation of the TS, and most of the incident ion dynamic energy is transferred to the thermal energy of PUIs instead of solar wind ions (SWIs). (3) The simulated composite heliosheath ion velocity distribution function is a superposition of a cold core formed by transmitted SWIs, the shoulders contributed by the hot reflected SWIs and directly transmitted PUIs, and the wings of the distribution dominated by the very hot reflected PUIs. (4) The V2 Faraday cups observed the cool core of the distribution, so they saw only a tip of the iceberg. For the evolution of the cool core distribution function across the TS, the computed results agree reasonably well with the V2experimental results.

preprint2015arXiv

Well-Conditioned Fractional Collocation Methods Using Fractional Birkhoff Interpolation Basis

The purpose of this paper is twofold. Firstly, we provide explicit and compact formulas for computing both Caputo and (modified) Riemann-Liouville (RL) fractional pseudospectral differentiation matrices (F-PSDMs) of any order at general Jacobi-Gauss-Lobatto (JGL) points. We show that in the Caputo case, it suffices to compute F-PSDM of order $μ\in (0,1)$ to compute that of any order $k+μ$ with integer $k\ge 0,$ while in the modified RL case, it is only necessary to evaluate a fractional integral matrix of order $μ\in (0,1).$ Secondly, we introduce suitable fractional JGL Birkhoff interpolation problems leading to new interpolation polynomial basis functions with remarkable properties: (i) the matrix generated from the new basis yields the exact inverse of F-PSDM at "interior" JGL points; (ii) the matrix of the highest fractional derivative in a collocation scheme under the new basis is diagonal; and (iii) the resulted linear system is well-conditioned in the Caputo case, while in the modified RL case, the eigenvalues of the coefficient matrix are highly concentrated. In both cases, the linear systems of the collocation schemes using the new basis can solved by an iterative solver within a few iterations. Notably, the inverse can be computed in a very stable manner, so this offers optimal preconditioners for usual fractional collocation methods for fractional differential equations (FDEs). It is also noteworthy that the choice of certain special JGL points with parameters related to the order of the equations can ease the implementation. We highlight that the use of the Bateman's fractional integral formulas and fast transforms between Jacobi polynomials with different parameters, are essential for our algorithm development.

preprint2012arXiv

Plasmoid ejection and secondary current sheet generation from magnetic reconnection in laser-plasma interaction

Reconnection of the self-generated magnetic fields in laser-plasma interaction was first investigated experimentally by Nilson {\it et al.} [Phys. Rev. Lett. 97, 255001 (2006)] by shining two laser pulses a distance apart on a solid target layer. An elongated current sheet (CS) was observed in the plasma between the two laser spots. In order to more closely model magnetotail reconnection, here two side-by-side thin target layers, instead of a single one, are used. It is found that at one end of the elongated CS a fan-like electron outflow region including three well-collimated electron jets appears. The ($>1$ MeV) tail of the jet energy distribution exhibits a power-law scaling. The enhanced electron acceleration is attributed to the intense inductive electric field in the narrow electron dominated reconnection region, as well as additional acceleration as they are trapped inside the rapidly moving plasmoid formed in and ejected from the CS. The ejection also induces a secondary CS.

Can Huang

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

Grounding Natural Language to SQL Translation with Data-Based Self-Explanations

UniRec-0.1B: Unified Text and Formula Recognition with 0.1B Parameters

GloTSFormer: Global Video Text Spotting Transformer

Metasql: A Generate-then-Rank Framework for Natural Language to SQL Translation

Knowing Where and What: Unified Word Block Pretraining for Document Understanding

M^3VSNet: Unsupervised Multi-metric Multi-view Stereo Network

M^3VSNet: Unsupervised Multi-metric Multi-view Stereo Network

Global explicit particle-in-cell simulations of the nonstationary bow shock and magnetosphere

The Mechanisms of Electron Acceleration During Multiple X Line Magnetic Reconnection with a Guide Field

Impact of pickup ions on the shock front nonstationarity and energy dissipation of the heliospheric termination shock: Two-dimensional full particle simulations and comparison with Voyager 2 observations

Well-Conditioned Fractional Collocation Methods Using Fractional Birkhoff Interpolation Basis

Plasmoid ejection and secondary current sheet generation from magnetic reconnection in laser-plasma interaction