Source author record

Mingjie Liu

Mingjie Liu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computer Vision cond-mat.mtrl-sci Machine Learning cond-mat.mes-hall Emerging Technologies physics.chem-ph Computation and Language physics.optics

Catalog footprint

What is connected

9works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

DSAA: Dual-Stage Attribute Activation for Fine-grained Open Vocabulary Detection

Open-Vocabulary Object Detection (OVD) models break the limitations of closed-set detection, enabling the iden- tification of unseen categories through natural language prompts. However, they exhibit notable limitations in fine- grained detection tasks involving attributes like color, ma- terial, and texture. We attribute this performance bottle- neck in OVD models to a core issue: when category sig- nals dominate, OVD models tend to marginalize attribute information during inference. This leads to incorrect bind- ing between attributes and target objects. To address this, we propose the Dual-Stage Attribute Activation (DSAA) framework, which enhances fine-grained detection capa- bilities by strengthening attribute semantics at two criti- cal stages. In the text embedding stage, we employ At- tribute Prefix Adapter (APA) module to generate attribute prefixes that inject explicit attribute priors. To further am- plify the influence of these attributes, our Key/Value (K/V) Modulator module then intervenes during the BERT encod- ing phase, selectively enhancing the Key and Value vec- tors of the corresponding attribute tokens. In addition, we introduce an attribute-aware contrastive loss to improve discrimination among same-category instances with differ- ent attributes during training. Experimental results on the FG-OVD benchmark demonstrate the effectiveness of our method across various mainstream open-vocabulary mod- els.

preprint2026arXiv

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

As language models become increasingly capable, users expect them to provide not only accurate responses but also behaviors aligned with diverse human preferences across a variety of scenarios. To achieve this, Reinforcement learning (RL) pipelines have begun incorporating multiple rewards, each capturing a distinct preference, to guide models toward these desired behaviors. However, recent work has defaulted to apply Group Relative Policy Optimization (GRPO) under multi-reward setting without examining its suitability. In this paper, we demonstrate that directly applying GRPO to normalize distinct rollout reward combinations causes them to collapse into identical advantage values, reducing the resolution of the training signal and resulting in suboptimal convergence and, in some cases, early training failure. We then introduce Group reward-Decoupled Normalization Policy Optimization (GDPO), a new policy optimization method to resolve these issues by decoupling the normalization of individual rewards, more faithfully preserving their relative differences and enabling more accurate multi-reward optimization, along with substantially improved training stability. We compare GDPO with GRPO across three tasks: tool calling, math reasoning, and coding reasoning, evaluating both correctness metrics (accuracy, bug ratio) and constraint adherence metrics (format, length). Across all settings, GDPO consistently outperforms GRPO, demonstrating its effectiveness and generalizability for multi-reward reinforcement learning optimization.

preprint2026arXiv

RE-VLM: Event-Augmented Vision-Language Model for Scene Understanding

Conventional vision-language models (VLMs) struggle to interpret scenes captured under adverse conditions (e.g., low light, high dynamic range, or fast motion) because standard RGB images degrade in such environments. Event cameras provide a complementary modality: they asynchronously record per-pixel brightness changes with high temporal resolution and wide dynamic range, preserving motion cues where frames fail. We propose RE-VLM, the first dual-stream vision-language model that jointly leverages RGB images and event streams for robust scene understanding across both normal and challenging conditions. RE-VLM employs parallel RGB and event encoders together with a progressive training strategy that aligns heterogeneous visual features with language. To address the scarcity of RGB-Event-Text supervision, we further propose a graph-driven pipeline that converts synchronized RGB-Event streams into verifiable scene graphs, from which we synthesize captions and question-answer (QA) pairs. To develop and evaluate RE-VLM, we construct two datasets: PEOD-Chat, targeting illumination-challenged scenes, and RGBE-Chat, covering diverse scenarios. On captioning and VQA benchmarks, RE-VLM consistently outperforms state-of-the-art RGB-only and event-only models with comparable parameter counts, with particularly large gains under challenging conditions. These results demonstrate the effectiveness of event-augmented VLMs in achieving robust vision-language understanding across a wide range of real-world environments. Code and datasets are available at https://github.com/bupt-ai-cz/RE-VLM.

preprint2022arXiv

ADEPT: Automatic Differentiable DEsign of Photonic Tensor Cores

Photonic tensor cores (PTCs) are essential building blocks for optical artificial intelligence (AI) accelerators based on programmable photonic integrated circuits. PTCs can achieve ultra-fast and efficient tensor operations for neural network (NN) acceleration. Current PTC designs are either manually constructed or based on matrix decomposition theory, which lacks the adaptability to meet various hardware constraints and device specifications. To our best knowledge, automatic PTC design methodology is still unexplored. It will be promising to move beyond the manual design paradigm and "nurture" photonic neurocomputing with AI and design automation. Therefore, in this work, for the first time, we propose a fully differentiable framework, dubbed ADEPT, that can efficiently search PTC designs adaptive to various circuit footprint constraints and foundry PDKs. Extensive experiments show superior flexibility and effectiveness of the proposed ADEPT framework to explore a large PTC design space. On various NN models and benchmarks, our searched PTC topology outperforms prior manually-designed structures with competitive matrix representability, 2-30x higher footprint compactness, and better noise robustness, demonstrating a new paradigm in photonic neural chip design. The code of ADEPT is available at https://github.com/JeremieMelo/ADEPT using the https://github.com/JeremieMelo/pytorch-onn (TorchONN) library.

preprint2022arXiv

Delving into Effective Gradient Matching for Dataset Condensation

As deep learning models and datasets rapidly scale up, network training is extremely time-consuming and resource-costly. Instead of training on the entire dataset, learning with a small synthetic dataset becomes an efficient solution. Extensive research has been explored in the direction of dataset condensation, among which gradient matching achieves state-of-the-art performance. The gradient matching method directly targets the training dynamics by matching the gradient when training on the original and synthetic datasets. However, there are limited deep investigations into the principle and effectiveness of this method. In this work, we delve into the gradient matching method from a comprehensive perspective and answer the critical questions of what, how, and where to match. We propose to match the multi-level gradients to involve both intra-class and inter-class gradient information. We demonstrate that the distance function should focus on the angle, considering the magnitude simultaneously to delay the overfitting. An overfitting-aware adaptive learning step strategy is also proposed to trim unnecessary optimization steps for algorithmic efficiency improvement. Ablation and comparison experiments demonstrate that our proposed methodology shows superior accuracy, efficiency, and generalization compared to prior work.

preprint2022arXiv

RobustAnalog: Fast Variation-Aware Analog Circuit Design Via Multi-task RL

Analog/mixed-signal circuit design is one of the most complex and time-consuming stages in the whole chip design process. Due to various process, voltage, and temperature (PVT) variations from chip manufacturing, analog circuits inevitably suffer from performance degradation. Although there has been plenty of work on automating analog circuit design under the typical condition, limited research has been done on exploring robust designs under real and unpredictable silicon variations. Automatic analog design against variations requires prohibitive computation and time costs. To address the challenge, we present RobustAnalog, a robust circuit design framework that involves the variation information in the optimization process. Specifically, circuit optimizations under different variations are considered as a set of tasks. Similarities among tasks are leveraged and competitions are alleviated to realize a sample-efficient multi-task training. Moreover, RobustAnalog prunes the task space according to the current performance in each iteration, leading to a further simulation cost reduction. In this way, RobustAnalog can rapidly produce a set of circuit parameters that satisfies diverse constraints (e.g. gain, bandwidth, noise...) across variations. We compare RobustAnalog with Bayesian optimization, Evolutionary algorithm, and Deep Deterministic Policy Gradient (DDPG) and demonstrate that RobustAnalog can significantly reduce required optimization time by 14-30 times. Therefore, our study provides a feasible method to handle various real silicon conditions.

preprint2013arXiv

Carbyne from first principles: Chain of C atoms, a nanorod or a nanorope?

We report an extensive study of the properties of carbyne using first-principles calculations. We investigate carbyne's mechanical response to tension, bending, and torsion deformations. Under tension, carbyne is about twice as stiff as the stiffest known materials and has an unrivaled specific strength of up to 7.5*10^7 Nm/kg, requiring a force of ~10 nN to break a single atomic chain. Carbyne has a fairly large room-temperature persistence length of about 14 nm. Surprisingly, the torsional stiffness of carbyne can be zero but can be 'switched on' by appropriate functional groups at the ends. Further, under appropriate termination, carbyne can be switched into a magnetic-semiconductor state by mechanical twisting. We reconstruct the equivalent continuum-elasticity representation, providing the full set of elastic moduli for carbyne, showing its extreme mechanical performance (e.g. a nominal Young's modulus of 32.7 TPa with an effective mechanical thickness of 0.772 A). We also find an interesting coupling between strain and band gap of carbyne, which is strongly increased under tension, from 3.2 to 4.4 eV under a 10% strain. Finally, we study the performance of carbyne as a nanoscale electrical cable, and estimate its chemical stability against self-aggregation, finding an activation barrier of 0.6 eV for the carbyne-carbyne cross-linking reaction and an equilibrium cross-link density for two parallel carbyne chains of 1 cross-link per 17 C atoms (2.2 nm).

preprint2013arXiv

Feasibility of Lithium Storage on Graphene and Its Derivatives

Nanomaterials are anticipated to be promising storage media, owing to their high surface-to-mass ratio. The high hydrogen capacity achieved by using graphene has reinforced this opinion and motivated investigations of the possibility to use it to store another important energy carrier - lithium (Li). While the first-principles computations show that the Li capacity of pristine graphene, limited by Li clustering and phase separation, is lower than that offered by Li intercalation in graphite, we explore the feasibility of modifying graphene for better Li storage. It is found that certain structural defects in graphene can bind Li stably, yet more efficacious approach is through substitution doping with boron (B). In particular, the layered C3B compound stands out as a promising Li storage medium. The monolayer C3B has a capacity of 714 mAh/g (as Li1.25C3B), and the capacity of stacked C3B is 857 mAh/g (as Li1.5C3B), which is about twice as large as graphite's 372 mAh/g (as LiC6). Our results help clarify the mechanism of Li storage in low-dimensional materials, and shed light on the rational design of nano-architectures for energy storage.

preprint2013arXiv

Mechanically induced metal-insulator transition in carbyne

First-principles calculations for carbyne under strain predict that the Peierls transition from symmetric cumulene to broken-symmetry polyyne structure is enhanced as the material is stretched. Interpretation within a simple and instructive analytical model suggests that this behavior is valid for arbitrary 1D metals. Further, numerical calculations of the anharmonic quantum vibrational structure of carbyne show that zero-point atomic vibrations alone eliminate the Peierls distortion in a mechanically free chain, preserving the cumulene symmetry. The emergence and increase of Peierls dimerization under tension then implies a qualitative transition between the two forms, which our computations place around 3% strain. Thus, zero-point vibrations and mechanical strain jointly produce a change in symmetry resulting in the transition from metallic to insulating state. In any practical realization, it is important that the effect is also chemically modulated by the choice of terminating groups. Our findings are promising for applications such as electromechanical switching and band gap tuning via strain, and besides carbyne itself, they directly extend to numerous other systems that show Peierls distortion.

Mingjie Liu

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

DSAA: Dual-Stage Attribute Activation for Fine-grained Open Vocabulary Detection

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

RE-VLM: Event-Augmented Vision-Language Model for Scene Understanding

ADEPT: Automatic Differentiable DEsign of Photonic Tensor Cores

Delving into Effective Gradient Matching for Dataset Condensation

RobustAnalog: Fast Variation-Aware Analog Circuit Design Via Multi-task RL

Carbyne from first principles: Chain of C atoms, a nanorod or a nanorope?

Feasibility of Lithium Storage on Graphene and Its Derivatives

Mechanically induced metal-insulator transition in carbyne