Source author record

Dexin Wang

Dexin Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computer Vision eess.SY Systems and Control Computation and Language eess.SP Machine Learning Robotics

Catalog footprint

What is connected

6works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Fill the GAP: A Granular Alignment Paradigm for Visual Reasoning in Multimodal Large Language Models

Visual latent reasoning lets a multimodal large language model (MLLM) create intermediate visual evidence as continuous tokens, avoiding external tools or image generators. However, existing methods usually follow an output-as-input latent paradigm and yield unstable gains. We identify evidence for a feature-space mismatch that can contribute to this instability: dominant visual-latent models build on pre-norm MLLMs and reuse decoder hidden states as predicted latent inputs, even though these states occupy a substantially different norm regime from the input embeddings the model was trained to consume~\citep{xie2025mhc,li2026siamesenorm,team2026attention}. This mismatch can make direct latent feedback unreliable. Motivated by this diagnosis, we propose \textbf{GAP}, a \textbf{G}ranular \textbf{A}lignment \textbf{P}aradigm for visual latent modeling. GAP aligns visual latent reasoning at three levels: feature-level alignment maps decoder outputs into input-compatible visual latents through a lightweight PCA-aligned latent head; context-level alignment grounds latent targets with inspectable auxiliary visual supervision; and capacity-guided alignment assigns latent supervision selectively to examples where the base MLLM struggles. On Qwen2.5-VL 7B, the resulting model achieves the best mean aggregate perception and reasoning performance among our supervised variants. Inference-time intervention probing further suggests that generated latents provide task-relevant visual signal beyond merely adding token slots.

preprint2023arXiv

RIS-Enabled Integrated Sensing and Communication for 6G Systems

The following paper proposes a new target localization system design using an architecture based on reconfigurable intelligent surfaces (RISs) and passive radars (PRs) for integrated sensing and communications systems. The preamble of the communication signal is exploited in order to perform target sensing tasks, which involve detection and localization. The RIS in this case can aid the PR in sensing targets that are otherwise not seen by the PR itself, due to the many obstacles encountered within the propagation channel. Therefore, this work proposes a localization algorithm tailored for the integrated sensing and communications RIS-aided architecture, which is capable of uniquely positioning targets within the scene. The algorithm is capable of detecting the number of targets along with estimating the position of targets via angles and times of arrival. Our simulation results demonstrate the performance of the localization method in terms of different localization and detection metrics and for increasing RIS sizes.

preprint2022arXiv

Efficient Topology Assessment for Integrated Transmission and Distribution Network with 10,000+ Inverter-based Resources

The renewable energy proliferation calls upon the grid operators and planners to systematically evaluate the potential impacts of distributed energy resources (DERs). Considering the significant differences between various inverter-based resources (IBRs), especially the different capabilities between grid-forming inverters and grid-following inverters, it is crucial to develop an efficient and effective assessment procedure besides available co-simulation framework with high computation burdens. This paper presents a streamlined graph-based topology assessment for the integrated power system transmission and distribution networks. Graph analyses were performed based on the integrated graph of modified miniWECC grid model and IEEE 8500-node test feeder model, high performance computing platform with 40 nodes and total 2400 CPUs has been utilized to process this integrated graph, which has 100,000+ nodes and 10,000+ IBRs. The node ranking results not only verified the applicability of the proposed method, but also revealed the potential of distributed grid forming (GFM) and grid following (GFL) inverters interacting with the centralized power plants.

preprint2021arXiv

Component Importance and Interdependence Analysis for Transmission, Distribution and Communication Systems

For critical infrastructure restoration planning, the real-time scheduling and coordination of system restoration efforts, the key in decision-making is to prioritize those critical components that are out of service during the restoration. For this purpose, there is a need for component importance analysis. While it has been investigated extensively for individual systems, component importance considering interdependence among transmission, distribution and communication (T&D&C) systems has not been systematically analyzed and widely adopted. In this study, we propose a component importance assessment method in the context of interdependence between T&D&C networks. Analytic methods for multilayer networks and a set of metrics have been applied for assessing the component importance and interdependence between T&D&C networks based on their physical characteristics. The proposed methodology is further validated with integrated synthetic Illinois regional transmission, distribution, and communication (T&D&C) systems, the results reveal the unique characteristics of component/node importance, which may be strongly affected by the network topology and cross-domain node mapping.

preprint2020arXiv

Efficient Object-Level Visual Context Modeling for Multimodal Machine Translation: Masking Irrelevant Objects Helps Grounding

Visual context provides grounding information for multimodal machine translation (MMT). However, previous MMT models and probing studies on visual features suggest that visual information is less explored in MMT as it is often redundant to textual information. In this paper, we propose an object-level visual context modeling framework (OVC) to efficiently capture and explore visual information for multimodal machine translation. With detected objects, the proposed OVC encourages MMT to ground translation on desirable visual objects by masking irrelevant objects in the visual modality. We equip the proposed with an additional object-masking loss to achieve this goal. The object-masking loss is estimated according to the similarity between masked objects and the source texts so as to encourage masking source-irrelevant objects. Additionally, in order to generate vision-consistent target words, we further propose a vision-weighted translation loss for OVC. Experiments on MMT datasets demonstrate that the proposed OVC model outperforms state-of-the-art MMT models and analyses show that masking irrelevant objects helps grounding in MMT.

preprint2020arXiv

SGDN: Segmentation-Based Grasp Detection Network For Unsymmetrical Three-Finger Gripper

In this paper, we present Segmentation-Based Grasp Detection Network (SGDN) to predict a feasible robotic grasping for a unsymmetrical three-finger robotic gripper using RGB images. The feasible grasping of a target should be a collection of grasp regions with the same grasp angle and width. In other words, a simplified planar grasp representation should be pixel-level rather than region-level such as five-dimensional grasp representation.Therefore, we propose a pixel-level grasp representation, oriented base-fixed triangle. It is also more suitable for unsymmetrical three-finger gripper which cannot grasp symmetrically when grasping some objects, the grasp angle is at [0, 2π) instead of [0, π) of parallel plate gripper.In order to predict the appropriate grasp region and its corresponding grasp angle and width in the RGB image, SGDN uses DeepLabv3+ as a feature extractor, and uses a three-channel grasp predictor to predict feasible oriented base-fixed triangle grasp representation of each pixel.On the re-annotated Cornell Grasp Dataset, our model achieves an accuracy of 96.8% and 92.27% on image-wise split and object-wise split respectively, and obtains accurate predictions consistent with the state-of-the-art methods.

Dexin Wang

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

Fill the GAP: A Granular Alignment Paradigm for Visual Reasoning in Multimodal Large Language Models

RIS-Enabled Integrated Sensing and Communication for 6G Systems

Efficient Topology Assessment for Integrated Transmission and Distribution Network with 10,000+ Inverter-based Resources

Component Importance and Interdependence Analysis for Transmission, Distribution and Communication Systems

Efficient Object-Level Visual Context Modeling for Multimodal Machine Translation: Masking Irrelevant Objects Helps Grounding

SGDN: Segmentation-Based Grasp Detection Network For Unsymmetrical Three-Finger Gripper