Researcher profile

Shan Zhao

Shan Zhao contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2026arXiv

ParkGaussian: Surround-view 3D Gaussian Splatting for Autonomous Parking

Parking is a critical task for autonomous driving systems (ADS), with unique challenges in crowded parking slots and GPS-denied environments. However, existing works focus on 2D parking slot perception, mapping, and localization, 3D reconstruction remains underexplored, which is crucial for capturing complex spatial geometry in parking scenarios. Naively improving the visual quality of reconstructed parking scenes does not directly benefit autonomous parking, as the key entry point for parking is the slots perception module. To address these limitations, we curate the first benchmark named ParkRecon3D, specifically designed for parking scene reconstruction. It includes sensor data from four surround-view fisheye cameras with calibrated extrinsics and dense parking slot annotations. We then propose ParkGaussian, the first framework that integrates 3D Gaussian Splatting (3DGS) for parking scene reconstruction. To further improve the alignment between reconstruction and downstream parking slot detection, we introduce a slot-aware reconstruction strategy that leverages existing parking perception methods to enhance the synthesis quality of slot regions. Experiments on ParkRecon3D demonstrate that ParkGaussian achieves state-of-the-art reconstruction quality and better preserves perception consistency for downstream tasks. The code and dataset will be released at: https://github.com/wm-research/ParkGaussian

preprint2026arXiv

Xiaomi EV World Model: A Joint World Model Integrating Reconstruction and Generation for Autonomous Driving

This report presents a unified technical system addressing the two core capabilities of world models for autonomous driving: world representation and world generation. For world representation, we propose WorldRec, a feed-forward reconstruction architecture driven by sparse scene queries. WorldRec initializes structured queries in 3D space, leveraging them to aggregate cross-view, cross-temporal features, thereby naturally enforcing spatial consistency across frames and yielding compact yet high-fidelity 3D Gaussian scene representations. For world generation, we propose WorldGen, a two-stage training framework of bidirectional pretraining followed by causal fine-tuning through three progressive stages (Teacher Forcing, ODE distillation, and DMD), enabling high-quality online causal video generation in as few as 4 denoising steps. Building on both modules, we further introduce the JWM, which deeply integrates WorldRec and WorldGen to achieve synergistic gains in generation stability, cross-frame consistency, and visual fidelity, providing a solid foundation for closed-loop simulation, data synthesis, and end-to-end training in autonomous driving.

preprint2022arXiv

Reiterative Domain Aware Multi-Target Adaptation

Most domain adaptation methods focus on single-source-single-target adaptation settings. Multi-target domain adaptation is a powerful extension in which a single classifier is learned for multiple unlabeled target domains. To build a multi-target classifier, it is important to have: a feature extractor that generalizes well across domains; and effective aggregation of features from the labeled source and different unlabeled target domains. Towards the first, we use the recently popular Transformer as a feature extraction backbone. Towards the second, we use a co-teaching-based approach using a dual-classifier head, one of which is based on the graph neural network. The proposed approach uses a sequential adaptation strategy that adapts one domain at a time starting from the target domains that are more similar to the source, assuming that the network finds it easier to adapt to such target domains. After adapting on each target, samples with a softmax-based confidence score greater than a threshold are added to the pseudo-source, thus aggregating knowledge from different domains. However, softmax is not entirely trustworthy as a confidence score and may generate a high score for unreliable samples if trained for many iterations. To mitigate this effect, we adopt a reiterative approach, where we reduce target adaptation iterations, however, reiterate multiple times over the target domains. The experimental evaluation on the Office-Home, Office-31 and DomainNet datasets shows significant improvement over the existing methods. We have achieved 10.7$\%$ average improvement in Office-Home dataset over the state-of-art methods.

preprint2022arXiv

Revealing sign-reversal $s^{+-}$-wave pairing by quasiparticle interference in the heavy-fermion superconductor CeCu$_2$Si$_2$

Recent observations of two nodeless gaps in superconducting CeCu$_2$Si$_2$ have raised intensive debates as to its exact gap structure of either sign-reversal ($s^{+-}$) or sign-preserving ($s^{++}$) pairing. Here we investigate the quasiparticle interference (QPI) using realistic Fermi surface topology for both weak and strong interband impurity scatterings. Our calculations of the QPI and integrated antisymmetrized local density of states reveal qualitative distinctions between $s^{+-}$ and $s^{++}$ pairing states, which include the intragap impurity resonance and a significant energy-dependence difference between two gap energies. Our predictions provide a guide for phase-sensitive QPI measurements to uncover decisively the true pairing symmetry in the heavy-fermion superconductor CeCu$_2$Si$_2$.

preprint2022arXiv

Semi-Supervised Wide-Angle Portraits Correction by Multi-Scale Transformer

We propose a semi-supervised network for wide-angle portraits correction. Wide-angle images often suffer from skew and distortion affected by perspective distortion, especially noticeable at the face regions. Previous deep learning based approaches need the ground-truth correction flow maps for training guidance. However, such labels are expensive, which can only be obtained manually. In this work, we design a semi-supervised scheme and build a high-quality unlabeled dataset with rich scenarios, allowing us to simultaneously use labeled and unlabeled data to improve performance. Specifically, our semi-supervised scheme takes advantage of the consistency mechanism, with several novel components such as direction and range consistency (DRC) and regression consistency (RC). Furthermore, different from the existing methods, we propose the Multi-Scale Swin-Unet (MS-Unet) based on the multi-scale swin transformer block (MSTB), which can simultaneously learn short-distance and long-distance information to avoid artifacts. Extensive experiments demonstrate that the proposed method is superior to the state-of-the-art methods and other representative baselines. The source code and dataset are available at: https://github.com/megvii-research/Portraits_Correction.

preprint2020arXiv

The Utility of General Domain Transfer Learning for Medical Language Tasks

The purpose of this study is to analyze the efficacy of transfer learning techniques and transformer-based models as applied to medical natural language processing (NLP) tasks, specifically radiological text classification. We used 1,977 labeled head CT reports, from a corpus of 96,303 total reports, to evaluate the efficacy of pretraining using general domain corpora and a combined general and medical domain corpus with a bidirectional representations from transformers (BERT) model for the purpose of radiological text classification. Model performance was benchmarked to a logistic regression using bag-of-words vectorization and a long short-term memory (LSTM) multi-label multi-class classification model, and compared to the published literature in medical text classification. The BERT models using either set of pretrained checkpoints outperformed the logistic regression model, achieving sample-weighted average F1-scores of 0.87 and 0.87 for the general domain model and the combined general and biomedical-domain model. General text transfer learning may be a viable technique to generate state-of-the-art results within medical NLP tasks on radiological corpora, outperforming other deep models such as LSTMs. The efficacy of pretraining and transformer-based models could serve to facilitate the creation of groundbreaking NLP models in the uniquely challenging data environment of medical text.

preprint2019arXiv

Modulation of heat transport in two-dimensional group-III chalcogenides

We systematically investigated the modulation of heat transport of experimentally accessible two-dimensional (2D) group-III chalcogenides by firstprinciples calculations. It was found that intrinsic thermal conductivity (kappa) of chalcogenides MX (M = Ga, In; X = S, Se) were desirable for efficient heat dissipation. Meanwhile, we showed that the long-range anharmonic interactions played an important role in heat transport of the chalcogenides. The difference of kappa among the 2D group-III chalcogenides can be well described by the Slack model and can be mainly attributed to phonon group velocity. Based on that, we proposed three methods including strain engineering, size effect and making Janus structures to effectively modulate the kappa of 2D group-III chalcogenides, with different underlying mechanisms. We found that tensile strain and rough boundary scattering could continuously decrease the kappa while compressive strain could increase the kappa of 2D group-III chalcogenides. On the other side, the change of kappa by producing Janus structures is permanent and dependent on the structural details. These results provide guilds to modulate heat transport properties of 2D group-III chalcogenides for devices application

preprint2019arXiv

Theoretical study of structure and magnetism of Ga$_{1-x}$V$_x$Sb compounds for spintronic applications

In this paper, the structural, electronic and magnetic properties of Zinc-blende Ga1-xVxSb compounds, with x from dilute doping situation to extreme doping limiting, were systematically investigated by first-principles calculations. V atoms prefer to substitute the Ga atoms and the formation energy is lower in Sb-rich than Ga-rich growth condition. Meantime, the SbGa antisite defects can effectively decrease the energy barrier of substitution process, from 0.85 eV to 0.53 eV. The diffusion of V atom in GaSb lattice is through meta-stable interstitial sites with an energy barrier of 0.6 eV. At a low V concentration x = 0.0625, V atoms prefer a homogeneous distribution and an antiferromagnetic coupling among them. However, starting from x = 0.5, the magnetic coupling among V atoms changes to be ferromagnetic, due to enhanced superexchange interaction between eg and t2g states of neighbouring V atoms. At the extreme limiting of x = 1.00, we found that Zinc-blende VSb as well as its analogs VAs and VP are intrinsic ferromagneitc semiconductors, with a large change of light absorption at the curie temperature. These results indicate that Ga1-xVxSb compounds can provide a platform to design the new electronic, spintronic and optoelectronic devices.