Source author record

Qi Han

Qi Han appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision cond-mat.mes-hall Computation and Language Machine Learning cond-mat.mtrl-sci Cryptography and Security Human-Computer Interaction math.AP math.CV Programming Languages quant-ph

Catalog footprint

What is connected

17works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

CktFormalizer: Autoformalization of Natural Language into Circuit Representations

LLMs can generate hardware descriptions from natural language specifications, but the resulting Verilog often contains width mismatches, combinational loops, and incomplete case logic that pass syntax checks yet fail in synthesis or silicon. We present CktFormalizer, a framework that redirects LLM-driven hardware generation through a dependently-typed HDL embedded in Lean 4. Lean serves three roles: (i) type checker:dependent types encode bit-width constraints, case coverage, and acyclicity, turning hardware defects into compile-time errors that guide iterative repair; (ii) correctness firewall:compiled designs are structurally free of defects that cause silent backend failures (the baseline loses 20% of correct designs during synthesis and routing; CktFormalizer preserves all of them); (iii) proof assistant:the agent constructs machine-checked equivalence proofs over arbitrary input sequences and parameterized widths, beyond the reach of bounded SMT-based checking. On VerilogEval (156 problems), RTLLM (50 problems), and ResBench (56 problems), CktFormalizer achieves simulation pass rates competitive with direct Verilog generation while delivering substantially higher backend realizability: 95--100% of compiled designs complete the full synthesis, place-and-route, DRC, and LVS flow. A closed-loop PPA optimization stage yields up to 35% area reduction and 30% power reduction through validated architecture exploration, with automated theorem proof ensuring that each optimized variant remains functionally equivalent to its formal specification.

preprint2026arXiv

MMFormalizer: Multimodal Autoformalization in the Wild

Autoformalization, which translates natural language mathematics into formal statements to enable machine reasoning, faces fundamental challenges in the wild due to the multimodal nature of the physical world, where physics requires inferring hidden constraints (e.g., mass or energy) from visual elements. To address this, we propose MMFormalizer, which extends autoformalization beyond text by integrating adaptive grounding with entities from real-world mathematical and physical domains. MMFormalizer recursively constructs formal propositions from perceptually grounded primitives through recursive grounding and axiom composition, with adaptive recursive termination ensuring that every abstraction is supported by visual evidence and anchored in dimensional or axiomatic grounding. We evaluate MMFormalizer on a new benchmark, PhyX-AF, comprising 115 curated samples from MathVerse, PhyX, Synthetic Geometry, and Analytic Geometry, covering diverse multimodal autoformalization tasks. Results show that frontier models such as GPT-5 and Gemini-3-Pro achieve the highest compile and semantic accuracy, with GPT-5 excelling in physical reasoning, while geometry remains the most challenging domain. Overall, MMFormalizer provides a scalable framework for unified multimodal autoformalization, bridging perception and formal reasoning. To the best of our knowledge, this is the first multimodal autoformalization method capable of handling classical mechanics (derived from the Hamiltonian), as well as relativity, quantum mechanics, and thermodynamics. More details are available on our project page: MMFormalizer.github.io

preprint2026arXiv

PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning

We introduce Parallel Coordinated Reasoning (PaCoRe), a training-and-inference framework designed to overcome a central limitation of contemporary language models: their inability to scale test-time compute (TTC) far beyond sequential reasoning under a fixed context window. PaCoRe departs from the traditional sequential paradigm by driving TTC through massive parallel exploration coordinated via a message-passing architecture in multiple rounds. Each round launches many parallel reasoning trajectories, compacts their findings into context-bounded messages, and synthesizes these messages to guide the next round and ultimately produce the final answer. Trained end-to-end with large-scale, outcome-based reinforcement learning, the model masters the synthesis abilities required by PaCoRe and scales to multi-million-token effective TTC without exceeding context limits. The approach yields strong improvements across diverse domains, and notably pushes reasoning beyond frontier systems in mathematics: an 8B model reaches 94.5% on HMMT 2025, surpassing GPT-5's 93.2% by scaling effective TTC to roughly two million tokens. We open-source model checkpoints, training data, and the full inference pipeline to accelerate follow-up work.

preprint2026arXiv

STEP3-VL-10B Technical Report

We present STEP3-VL-10B, a lightweight open-source foundation model designed to redefine the trade-off between compact efficiency and frontier-level multimodal intelligence. STEP3-VL-10B is realized through two strategic shifts: first, a unified, fully unfrozen pre-training strategy on 1.2T multimodal tokens that integrates a language-aligned Perception Encoder with a Qwen3-8B decoder to establish intrinsic vision-language synergy; and second, a scaled post-training pipeline featuring over 1k iterations of reinforcement learning. Crucially, we implement Parallel Coordinated Reasoning (PaCoRe) to scale test-time compute, allocating resources to scalable perceptual reasoning that explores and synthesizes diverse visual hypotheses. Consequently, despite its compact 10B footprint, STEP3-VL-10B rivals or surpasses models 10$\times$-20$\times$ larger (e.g., GLM-4.6V-106B, Qwen3-VL-235B) and top-tier proprietary flagships like Gemini 2.5 Pro and Seed-1.5-VL. Delivering best-in-class performance, it records 92.2% on MMBench and 80.11% on MMMU, while excelling in complex reasoning with 94.43% on AIME2025 and 75.95% on MathVision. We release the full model suite to provide the community with a powerful, efficient, and reproducible baseline.

preprint2026arXiv

Vision Foundation Models as Generalist Tokenizers for Image Generation

In this work, we explore the largely unexplored direction of building a generalist image tokenizer directly on top of a frozen vision foundation model (VFM). To build this tokenizer, we utilize a frozen VFM as the encoder and introduce two key innovations: (1) a region-adaptive quantization framework to eliminate spatial redundancy in standard 2D grid features, and (2) a semantic reconstruction objective that aligns the decoded outputs with the VFM's representations to preserve semantic fidelity. Grounded in these designs, we propose VFMTok, a generalist visual tokenizer capable of operating seamlessly in both discrete and continuous latent spaces. VFMTok achieves substantial improvements in synthesis quality while drastically enhancing token efficiency. For discrete autoregressive (AR) generation, it accelerates model convergence by \textbf{3 times} and achieves a state-of-the-art gFID of \textbf{1.36} on ImageNet class-conditional synthesis. Similarly, for continuous-space generation, integrating VFMTok with a denoising model yields an exceptional gFID of \textbf{1.25}. Furthermore, because the latent space inherently captures rich spatial semantics, VFMTok enables high-fidelity class-conditional synthesis without classifier-free guidance (\textbf{w/o CFG}) across both generative paradigms, significantly accelerating inference speed. Beyond these remarkable empirical results, we systematically investigate the underlying mechanisms of our approach. We discover that the specific self-supervised learning objectives utilized during VFM pre-training dictate its effectiveness as a tokenizer. Specifically, a VFM jointly optimized with global contrastive learning and latent masked image modeling provides the optimal representations for image tokenization. These insights establish a strong foundation and offer valuable guidance for the design of future image tokenizers.

preprint2022arXiv

On the Connection between Local Attention and Dynamic Depth-wise Convolution

Vision Transformer (ViT) attains state-of-the-art performance in visual recognition, and the variant, Local Vision Transformer, makes further improvements. The major component in Local Vision Transformer, local attention, performs the attention separately over small local windows. We rephrase local attention as a channel-wise locally-connected layer and analyze it from two network regularization manners, sparse connectivity and weight sharing, as well as weight computation. Sparse connectivity: there is no connection across channels, and each position is connected to the positions within a small local window. Weight sharing: the connection weights for one position are shared across channels or within each group of channels. Dynamic weight: the connection weights are dynamically predicted according to each image instance. We point out that local attention resembles depth-wise convolution and its dynamic version in sparse connectivity. The main difference lies in weight sharing - depth-wise convolution shares connection weights (kernel weights) across spatial positions. We empirically observe that the models based on depth-wise convolution and the dynamic variant with lower computation complexity perform on-par with or sometimes slightly better than Swin Transformer, an instance of Local Vision Transformer, for ImageNet classification, COCO object detection and ADE semantic segmentation. These observations suggest that Local Vision Transformer takes advantage of two regularization forms and dynamic weight to increase the network capacity. Code is available at https://github.com/Atten4Vis/DemystifyLocalViT.

preprint2022arXiv

RF-Next: Efficient Receptive Field Search for Convolutional Neural Networks

Temporal/spatial receptive fields of models play an important role in sequential/spatial tasks. Large receptive fields facilitate long-term relations, while small receptive fields help to capture the local details. Existing methods construct models with hand-designed receptive fields in layers. Can we effectively search for receptive field combinations to replace hand-designed patterns? To answer this question, we propose to find better receptive field combinations through a global-to-local search scheme. Our search scheme exploits both global search to find the coarse combinations and local search to get the refined receptive field combinations further. The global search finds possible coarse combinations other than human-designed patterns. On top of the global search, we propose an expectation-guided iterative local search scheme to refine combinations effectively. Our RF-Next models, plugging receptive field search to various models, boost the performance on many tasks, e.g., temporal action segmentation, object detection, instance segmentation, and speech synthesis. The source code is publicly available on http://mmcheng.net/rfnext.

preprint2020arXiv

Dependency Aware Filter Pruning

Convolutional neural networks (CNNs) are typically over-parameterized, bringing considerable computational overhead and memory footprint in inference. Pruning a proportion of unimportant filters is an efficient way to mitigate the inference cost. For this purpose, identifying unimportant convolutional filters is the key to effective filter pruning. Previous work prunes filters according to either their weight norms or the corresponding batch-norm scaling factors, while neglecting the sequential dependency between adjacent layers. In this paper, we further develop the norm-based importance estimation by taking the dependency between the adjacent layers into consideration. Besides, we propose a novel mechanism to dynamically control the sparsity-inducing regularization so as to achieve the desired sparsity. In this way, we can identify unimportant filters and search for the optimal network architecture within certain resource budgets in a more principled manner. Comprehensive experimental results demonstrate the proposed method performs favorably against the existing strong baseline on the CIFAR, SVHN, and ImageNet datasets. The training sources will be publicly available after the review process.

preprint2020arXiv

Mathematical Model and Topology Evaluation of Quantum Secure Communication Network

Due to the intrinsic point-to-point characteristic of quantum key distribution (QKD) systems, it is necessary to study and develop QKD network technology to provide a secure communication service for a large-scale of nodes over a large area. Considering the quality assurance required for such a network and the cost limitations, building an effective mathematical model of a QKD network becomes a critical task. In this paper, a flow-based mathematical model is proposed to describe a QKD network using mathematical concepts and language. In addition, an investigation on QKD network topology evaluation was conducted using a unique and novel QKD network performance indicator, the Information-Theoretic Secure communication bound, and the corresponding linear programming-based calculation algorithm. A large number of simulation results based on the topologies of SECOQC network and NSFNET network validate the effectiveness of the proposed model and indicator.

preprint2020arXiv

Search What You Want: Barrier Panelty NAS for Mixed Precision Quantization

Emergent hardwares can support mixed precision CNN models inference that assign different bitwidths for different layers. Learning to find an optimal mixed precision model that can preserve accuracy and satisfy the specific constraints on model size and computation is extremely challenge due to the difficult in training a mixed precision model and the huge space of all possible bit quantizations. In this paper, we propose a novel soft Barrier Penalty based NAS (BP-NAS) for mixed precision quantization, which ensures all the searched models are inside the valid domain defined by the complexity constraint, thus could return an optimal model under the given constraint by conducting search only one time. The proposed soft Barrier Penalty is differentiable and can impose very large losses to those models outside the valid domain while almost no punishment for models inside the valid domain, thus constraining the search only in the feasible domain. In addition, a differentiable Prob-1 regularizer is proposed to ensure learning with NAS is reasonable. A distribution reshaping training strategy is also used to make training more stable. BP-NAS sets new state of the arts on both classification (Cifar-10, ImageNet) and detection (COCO), surpassing all the efficient mixed precision methods designed manually and automatically. Particularly, BP-NAS achieves higher mAP (up to 2.7\% mAP improvement) together with lower bit computation cost compared with the existing best mixed precision model on COCO detection.

preprint2020arXiv

The role of Hume-Rothery's rules play in the MAX phases formability

MAX phases are a family of layered, hexagonal-structure ternary carbides or nitrides of a transitional metal and an A-group element. What makes this type of material fascinating and potentially useful is their remarkable combinations of metallic and ceramic characteristics; as well as the indispensable role in 'top-down' synthesis of their 2D counterparts, MXenes. To enhance the efficiency in the successful search for potential novel MAX phases, the main efforts could go toward creating an informationprediction system incorporating all MAX phases' databases, as well as generally valid principles and the high-quality regularities. In this work, we employ structure mapping methodology, which has shown its merit of being useful guides in materials design, with Hume-Rothery parameters to provide guiding principles in the search of novel MAX phases. The formable/non-formable data on MAX phases can be ordered within a twodimensional plot by using proposed expression of geometrical and electron concentration factors.

preprint2019arXiv

Elliptic variational problems with mixed nonlinearities

In this paper, we study the existence and multiplicity results of nontrivial positive solutions to a quasilinear elliptic equation in $\RN$, when $N\geq2$, as \begin{equation} \Lp u+u^{p-1}=λ\hspace{0.2mm}k(x)u^{r-1}-h(x)u^{q-1}.\nonumber \end{equation} Here, $h(x),k(x)>0$ are Lebesgue measurable functions, $1<p<q<\infty$, $p<r<\min\{p^*,q\}$ if $p<N$ while $p<r<q$ if $p\geq N$, and $λ>0$ is a parameter.

preprint2016arXiv

ActiveCrowd: A Framework for Optimized Multi-Task Allocation in Mobile Crowdsensing Systems

Worker selection is a key issue in Mobile Crowd Sensing (MCS). While previous worker selection approaches mainly focus on selecting a proper subset of workers for a single MCS task, multi-task-oriented worker selection is essential and useful for the efficiency of large-scale MCS platforms. This paper proposes ActiveCrowd, a worker selection framework for multi-task MCS environments.

preprint2016arXiv

Electrical control of intervalley scattering in graphene via the charge state of defects

We study the intervalley scattering in defected graphene by low-temperature transport measurements. The scattering rate is strongly suppressed when defects are charged. This finding highlights "screening" of the short-range part of a potential by the long-range part. Experiments on calcium-adsorbed graphene confirm the role of a long-range Coulomb potential. This effect is applicable to other multivalley systems, provided that the charge state of a defect can be electrically tuned. Our result provides a means to electrically control valley relaxation and has important implications in valley dynamics in valleytronic materials.

preprint2014arXiv

Absence of a transport signature of spin-orbit coupling in graphene with indium adatoms

Enhancement of the spin-orbit coupling in graphene may lead to various topological phenomena and also find applications in spintronics. Adatom absorption has been proposed as an effective way to achieve the goal. In particular, great hope has been held for indium in strengthening the spin-orbit coupling and realizing the quantum spin Hall effect. To search for evidence of the spin-orbit coupling in graphene absorbed with indium adatoms, we carry out extensive transport measurements, i.e., weak localization magnetoresistance, quantum Hall effect and non-local spin Hall effect. No signature of the spin-orbit coupling is found. Possible explanations are discussed.

preprint2014arXiv

Observation of vacancy-induced suppression of electronic cooling in defected graphene

Previous studies of electron-phonon interaction in impure graphene have found that static disorder can give rise to an enhancement of electronic cooling. We investigate the effect of dynamic disorder and observe over an order of magnitude suppression of electronic cooling compared with clean graphene. The effect is stronger in graphene with more vacancies, confirming its vacancy-induced nature. The dependence of the coupling constant on the phonon temperature implies its link to the dynamics of disorder. Our study highlights the effect of disorder on electron-phonon interaction in graphene. In addition, the suppression of electronic cooling holds great promise for improving the performance of graphene-based bolometer and photo-detector devices.

preprint2014arXiv

On the uniqueness problems of entire functions and their linear differential polynomials

The uniqueness problems on transcendental meromorphic or entire functions sharing at least two values with their derivatives or linear differential polynomials have been studied and many results have been obtained. In this paper, we study a transcendental entire function f that shares a non-zero polynomial a with f', together with its linear differential polynomials of the form with rational function coefficients.

Qi Han

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

CktFormalizer: Autoformalization of Natural Language into Circuit Representations

MMFormalizer: Multimodal Autoformalization in the Wild

PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning

STEP3-VL-10B Technical Report

Vision Foundation Models as Generalist Tokenizers for Image Generation

On the Connection between Local Attention and Dynamic Depth-wise Convolution

RF-Next: Efficient Receptive Field Search for Convolutional Neural Networks

Dependency Aware Filter Pruning

Mathematical Model and Topology Evaluation of Quantum Secure Communication Network

Search What You Want: Barrier Panelty NAS for Mixed Precision Quantization

The role of Hume-Rothery's rules play in the MAX phases formability

Elliptic variational problems with mixed nonlinearities

ActiveCrowd: A Framework for Optimized Multi-Task Allocation in Mobile Crowdsensing Systems

Electrical control of intervalley scattering in graphene via the charge state of defects

Absence of a transport signature of spin-orbit coupling in graphene with indium adatoms

Observation of vacancy-induced suppression of electronic cooling in defected graphene

On the uniqueness problems of entire functions and their linear differential polynomials