Source author record

Shizhao Sun

Shizhao Sun appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence Human-Computer Interaction Machine Learning Robotics

Catalog footprint

What is connected

5works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

CADMorph: Geometry-Driven Parametric CAD Editing via a Plan-Generate-Verify Loop

A Computer-Aided Design (CAD) model encodes an object in two coupled forms: a parametric construction sequence and its resulting visible geometric shape. During iterative design, adjustments to the geometric shape inevitably require synchronized edits to the underlying parametric sequence, called geometry-driven parametric CAD editing. The task calls for 1) preserving the original sequence's structure, 2) ensuring each edit's semantic validity, and 3) maintaining high shape fidelity to the target shape, all under scarce editing data triplets. We present CADMorph, an iterative plan-generate-verify framework that orchestrates pretrained domain-specific foundation models during inference: a parameter-to-shape (P2S) latent diffusion model and a masked-parameter-prediction (MPP) model. In the planning stage, cross-attention maps from the P2S model pinpoint the segments that need modification and offer editing masks. The MPP model then infills these masks with semantically valid edits in the generation stage. During verification, the P2S model embeds each candidate sequence in shape-latent space, measures its distance to the target shape, and selects the closest one. The three stages leverage the inherent geometric consciousness and design knowledge in pretrained priors, and thus tackle structure preservation, semantic validity, and shape fidelity respectively. Besides, both P2S and MPP models are trained without triplet data, bypassing the data-scarcity bottleneck. CADMorph surpasses GPT-4o and specialized CAD baselines, and supports downstream applications such as iterative editing and reverse-engineering enhancement.

preprint2026arXiv

Orchestrating Spatial Semantics via a Zone-Graph Paradigm for Intricate Indoor Scene Generation

Autonomous 3D indoor scene synthesis breaks down in non-convex rooms with tightly coupled spatial constraints. Data-driven generators lack topological priors for long-horizon planning, while iterative agents fragment semantics and become geometrically brittle. We present ZoneMaestro, a unified framework that shifts the paradigm from object-centric synthesis to Zone-Graph Orchestration. By internalizing a novel zone-based logic, ZoneMaestro translates high-level semantic intent into functional zones and topological constraints, enabling robust adaptation to diverse architectural forms. To support this, we construct Zone-Scene-10K, a large-scale dataset enriched with explicit Zone-Graph annotations. We further introduce an Alternating Alignment Strategy that cycles between reasoning internalization and Zone-Aware Group Relative Policy Optimization (Z-GRPO), effectively reconciling the tension between semantic richness and geometric validity without relying on external physics engines. To rigorously evaluate spatial intelligence beyond convex primitives, we formally define the task of Intricate Spatial Orchestration and release SCALE, a stress-test benchmark for irregular indoor scenarios with complex, dense spatial relations. Extensive experiments demonstrate that ZoneMaestro resolves the density-safety dichotomy, significantly outperforming state-of-the-art baselines in both structural coherence and intent adherence.

preprint2026arXiv

Thinking with Blueprints: Assisting Vision-Language Models in Spatial Reasoning via Structured Object Representation

Spatial reasoning -- the ability to perceive and reason about relationships in space -- advances vision-language models (VLMs) from visual perception toward spatial semantic understanding. Existing approaches either revisit local image patches, improving fine-grained perception but weakening global spatial awareness, or mark isolated coordinates, which capture object locations but overlook their overall organization. In this work, we integrate the cognitive concept of an object-centric blueprint into VLMs to enhance spatial reasoning. Given an image and a question, the model first constructs a JSON-style blueprint that records the positions, sizes, and attributes of relevant objects, and then reasons over this structured representation to produce the final answer. To achieve this, we introduce three key techniques: (1) blueprint-embedded reasoning traces for supervised fine-tuning to elicit basic reasoning skills; (2) blueprint-aware rewards in reinforcement learning to encourage the blueprint to include an appropriate number of objects and to align final answers with this causal reasoning; and (3) anti-shortcut data augmentation that applies targeted perturbations to images and questions, discouraging reliance on superficial visual or linguistic cues. Experiments show that our method consistently outperforms existing VLMs and specialized spatial reasoning models.

preprint2020arXiv

Retrieve-Then-Adapt: Example-based Automatic Generation for Proportion-related Infographics

Infographic is a data visualization technique which combines graphic and textual descriptions in an aesthetic and effective manner. Creating infographics is a difficult and time-consuming process which often requires significant attempts and adjustments even for experienced designers, not to mention novice users with limited design expertise. Recently, a few approaches have been proposed to automate the creation process by applying predefined blueprints to user information. However, predefined blueprints are often hard to create, hence limited in volume and diversity. In contrast, good infogrpahics have been created by professionals and accumulated on the Internet rapidly. These online examples often represent a wide variety of design styles, and serve as exemplars or inspiration to people who like to create their own infographics. Based on these observations, we propose to generate infographics by automatically imitating examples. We present a two-stage approach, namely retrieve-then-adapt. In the retrieval stage, we index online examples by their visual elements. For a given user information, we transform it to a concrete query by sampling from a learned distribution about visual elements, and then find appropriate examples in our example library based on the similarity between example indexes and the query. For a retrieved example, we generate an initial drafts by replacing its content with user information. However, in many cases, user information cannot be perfectly fitted to retrieved examples. Therefore, we further introduce an adaption stage. Specifically, we propose a MCMC-like approach and leverage recursive neural networks to help adjust the initial draft and improve its visual appearance iteratively, until a satisfactory result is obtained. We implement our approach on proportion-related infographics, and demonstrate its effectiveness by sample results and expert reviews.

preprint2015arXiv

On the Depth of Deep Neural Networks: A Theoretical View

People believe that depth plays an important role in success of deep neural networks (DNN). However, this belief lacks solid theoretical justifications as far as we know. We investigate role of depth from perspective of margin bound. In margin bound, expected error is upper bounded by empirical margin error plus Rademacher Average (RA) based capacity term. First, we derive an upper bound for RA of DNN, and show that it increases with increasing depth. This indicates negative impact of depth on test performance. Second, we show that deeper networks tend to have larger representation power (measured by Betti numbers based complexity) than shallower networks in multi-class setting, and thus can lead to smaller empirical margin error. This implies positive impact of depth. The combination of these two results shows that for DNN with restricted number of hidden units, increasing depth is not always good since there is a tradeoff between positive and negative impacts. These results inspire us to seek alternative ways to achieve positive impact of depth, e.g., imposing margin-based penalty terms to cross entropy loss so as to reduce empirical margin error without increasing depth. Our experiments show that in this way, we achieve significantly better test performance.

Shizhao Sun

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

CADMorph: Geometry-Driven Parametric CAD Editing via a Plan-Generate-Verify Loop

Orchestrating Spatial Semantics via a Zone-Graph Paradigm for Intricate Indoor Scene Generation

Thinking with Blueprints: Assisting Vision-Language Models in Spatial Reasoning via Structured Object Representation

Retrieve-Then-Adapt: Example-based Automatic Generation for Proportion-related Infographics

On the Depth of Deep Neural Networks: A Theoretical View