Researcher profile

Hui Zhou

Hui Zhou contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
15works
0followers
11topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

15 published item(s)

preprint2026arXiv

PosterVerse: A Full-Workflow Framework for Commercial-Grade Poster Generation with HTML-Based Scalable Typography

Commercial-grade poster design demands the seamless integration of aesthetic appeal with precise, informative content delivery. Current automated poster generation systems face significant limitations, including incomplete design workflows, poor text rendering accuracy, and insufficient flexibility for commercial applications. To address these challenges, we propose PosterVerse, a full-workflow, commercial-grade poster generation method that seamlessly automates the entire design process while delivering high-density and scalable text rendering. PosterVerse replicates professional design through three key stages: (1) blueprint creation using fine-tuned LLMs to extract key design elements from user requirements, (2) graphical background generation via customized diffusion models to create visually appealing imagery, and (3) unified layout-text rendering with an MLLM-powered HTML engine to guarantee high text accuracy and flexible customization. In addition, we introduce PosterDNA, a commercial-grade, HTML-based dataset tailored for training and validating poster design models. To the best of our knowledge, PosterDNA is the first Chinese poster generation dataset to introduce HTML typography files, enabling scalable text rendering and fundamentally solving the challenges of rendering small and high-density text. Experimental results demonstrate that PosterVerse consistently produces commercial-grade posters with appealing visuals, accurate text alignment, and customizable layouts, making it a promising solution for automating commercial poster design. The code and model are available at https://github.com/wuhaer/PosterVerse.

preprint2026arXiv

ReCoVer: Resilient LLM Pre-Training System via Fault-Tolerant Collective and Versatile Workload

Pre-training large language models on massive GPU clusters has made hardware faults routine rather than rare, driving the need for resilient training systems. Yet existing frameworks either focus on specific parallelism schemes or risk drifting away from a failure-free training trajectory. We propose ReCoVer, a resilient LLM pre-training system that upholds a single invariant: each iteration keeps the number of microbatches constant, ensuring per-iteration gradients remain stochastically equivalent to a failure-free run. The framework is organized as three decoupled protocol layers: (1) Fault-tolerant collectives that isolate faults from propagating across replicas; (2) in-step fine-grained recovery that preserves intra-iteration progress and prevents gradient corruption; (3) versatile-workload policy that dynamically redistributes microbatch quotas across the survivors. The design is parallelism-agnostic, integrating directly with both 3D parallelism and Hybrid Sharded Data Parallel (HSDP) as a drop-in substrate. We evaluate our implementation on end-to-end pre-training tasks for up to 512 GPUs, ReCoVer successfully preserves the training trajectory from a failure-free reference despite of 256 GPUs lost spread across the run. For comparison with checkpoint-and-restart baselines, ReCoVer demonstrates $2.23\times$ higher effective throughput after successive failures. This advantage results in ReCoVer processing 74.9% more tokens at 234 GPU-hours, with the gap widening as the training prolongs.

preprint2022arXiv

Analyzing Novel Grant-Based and Grant-Free Access Schemes for Small Data Transmission

Fifth Generation (5G) New Radio (NR) does not support data transmission during random access (RA) procedures, which results in unnecessary control signalling overhead and power consumption, especially for small data transmission (SDT). Motivated by this, 3GPP has proposed 4/2-step SDT RA schemes based on the existing grant-based (4-step) and grant-free (2-step) RA schemes, with the aim to enable data transmission during RA procedures in Radio Resource Control (RRC) Inactive state. To compare the 4/2-step SDT RA schemes with the benchmark 4/2-step RA schemes, we provide a spatio-temporal analytical framework to evaluate the RA schemes, which jointly models the preamble detection, Physical Uplink Shared Channel (PUSCH) decoding, and data transmission procedures. Based on this analytical model, we derive the analytical expressions for the overall packet transmission success probability and average throughput in each RACH attempt. We also derive the average energy consumption in each RACH attempt. Our results show that 2-step SDT RA scheme provides the highest overall packet transmission success probability, and the lowest average energy consumption, but the performance gain decreases with the increase of device intensity.

preprint2022arXiv

LiDAR-based 4D Panoptic Segmentation via Dynamic Shifting Network

With the rapid advances of autonomous driving, it becomes critical to equip its sensing system with more holistic 3D perception. However, existing works focus on parsing either the objects (e.g. cars and pedestrians) or scenes (e.g. trees and buildings) from the LiDAR sensor. In this work, we address the task of LiDAR-based panoptic segmentation, which aims to parse both objects and scenes in a unified manner. As one of the first endeavors towards this new challenging task, we propose the Dynamic Shifting Network (DS-Net), which serves as an effective panoptic segmentation framework in the point cloud realm. In particular, DS-Net has three appealing properties: 1) Strong backbone design. DS-Net adopts the cylinder convolution that is specifically designed for LiDAR point clouds. 2) Dynamic Shifting for complex point distributions. We observe that commonly-used clustering algorithms are incapable of handling complex autonomous driving scenes with non-uniform point cloud distributions and varying instance sizes. Thus, we present an efficient learnable clustering module, dynamic shifting, which adapts kernel functions on the fly for different instances. 3) Extension to 4D prediction. Furthermore, we extend DS-Net to 4D panoptic LiDAR segmentation by the temporally unified instance clustering on aligned LiDAR frames. To comprehensively evaluate the performance of LiDAR-based panoptic segmentation, we construct and curate benchmarks from two large-scale autonomous driving LiDAR datasets, SemanticKITTI and nuScenes. Extensive experiments demonstrate that our proposed DS-Net achieves superior accuracies over current state-of-the-art methods in both tasks. Notably, in the single frame version of the task, we outperform the SOTA method by 1.8% in terms of the PQ metric. In the 4D version of the task, we surpass 2nd place by 5.4% in terms of the LSTQ metric.

preprint2022arXiv

MPIX Stream: An Explicit Solution to Hybrid MPI+X Programming

The hybrid MPI+X programming paradigm, where X refers to threads or GPUs, has gained prominence in the high-performance computing arena. This corresponds to a trend of system architectures growing more heterogeneous. The current MPI standard only specifies the compatibility levels between MPI and threading runtimes. No MPI concept or interface exists for applications to pass thread context or GPU stream context to MPI implementations explicitly. This lack has made performance optimization complicated in some cases and impossible in other cases. We propose a new concept in MPI, called MPIX stream, to represent the general serial execution context that exists in X runtimes. MPIX streams can be directly mapped to threads or GPU execution streams. Passing thread context into MPI allows implementations to precisely map the execution contexts to network endpoints. Passing GPU execution context into MPI allows implementations to directly operate on GPU streams, lowering the CPU/GPU synchronization cost.

preprint2022arXiv

Observation of one-dimensional Dirac fermions in silicon nanoribbons

Dirac materials, which feature Dirac cones in the reciprocal space, have been one of the hottest topics in condensed matter physics in the past decade. To date, 2D and 3D Dirac Fermions have been extensively studied, while their 1D counterparts are rare. Recently, Si nanoribbons (SiNRs), which are composed of alternating pentagonal Si rings, have attracted intensive attention. However, the electronic structure and topological properties of SiNRs are still elusive. Here, by angle-resolved photoemission spectroscopy, scanning tunneling microscopy/spectroscopy measurements, first-principles calculations, and tight-binding model analysis, we demonstrate the existence of 1D Dirac Fermions in SiNRs. Our theoretical analysis shows that the Dirac cones derive from the armchairlike Si chain in the center of the nanoribbon and can be described by the Su-Schrieffer-Heeger model. These results establish SiNRs as a platform for studying the novel physical properties in 1D Dirac materials.

preprint2022arXiv

Observation of topological flat bands in the kagome semiconductor Nb$_3$Cl$_8$

The destructive interference of wavefunctions in a kagome lattice can give rise to topological flat bands (TFBs) with a highly degenerate state of electrons. Recently, TFBs have been observed in several kagome metals, including Fe$_3$Sn$_2$, FeSn, CoSn, and YMn$_6$Sn$_6$. Nonetheless, kagome materials that are both exfoliable and semiconducting are lacking, which seriously hinders their device applications. Herein, we show that Nb$_3$Cl$_8$, which hosts a breathing kagome lattice, is gapped out because of the absence of inversion symmetry, while the TFBs survive because of the protection of the mirror reflection symmetry. By angle-resolved photoemission spectroscopy measurements and first-principles calculations, we directly observe the TFB and a moderate band gap in Nb$_3$Cl$_8$. By mechanical exfoliation, we successfully obtain monolayers of Nb$_3$Cl$_8$ and confirm that they are stable under ambient conditions. In addition, our calculations show that monolayers of Nb$_3$Cl$_8$ have a magnetic ground state, thus providing opportunities to study the interplay between geometry, topology, and magnetism.

preprint2022arXiv

Research on spatial information transmission efficiency and capability of safe evacuation signs

As an indispensable spatial direction information indicator for emergency evacuation, the spatial relationship between safety evacuation signs and evacuees will affect the response time of evacuees and the evacuation efficiency. This paper takes 2 kinds of common safety evacuation signs, hangtag-type and embedded, as the research object and designs space direction information transmission efficiency and capability simulation experiment and fire drill, the efficiency and capability of spatial direction information transmission of safety evacuation signs are studied. The results show that the space angle of the hangtag-type safety evacuation sign is inversely proportional to the information transmission efficiency and capability of the space direction, and the fire drill also confirms this conclusion. When the spatial angle of the embedded safety evacuation sign is 5°, the spatial direction information transmission efficiency and capability increase. Simultaneously, the average escape time of the participants in the fire drill was lower, and the percentage of choosing unfamiliarity exports increased. The evolution of spatial angle has no significant effect on the intention of the response of subjects of different genders; when choosing the direction, males are more easily affected by the change of spatial angle than females; the confidence level of females' choice is more easily affected by spatial angle. In addition, according to the research results, the corresponding three-dimensional structure safety evacuation signs are designed. The functional structure of the safety evacuation signs is perfected, which can effectively improve the efficiency of fire emergency evacuation.

preprint2020arXiv

Cylinder3D: An Effective 3D Framework for Driving-scene LiDAR Semantic Segmentation

State-of-the-art methods for large-scale driving-scene LiDAR semantic segmentation often project and process the point clouds in the 2D space. The projection methods includes spherical projection, bird-eye view projection, etc. Although this process makes the point cloud suitable for the 2D CNN-based networks, it inevitably alters and abandons the 3D topology and geometric relations. A straightforward solution to tackle the issue of 3D-to-2D projection is to keep the 3D representation and process the points in the 3D space. In this work, we first perform an in-depth analysis for different representations and backbones in 2D and 3D spaces, and reveal the effectiveness of 3D representations and networks on LiDAR segmentation. Then, we develop a 3D cylinder partition and a 3D cylinder convolution based framework, termed as Cylinder3D, which exploits the 3D topology relations and structures of driving-scene point clouds. Moreover, a dimension-decomposition based context modeling module is introduced to explore the high-rank context information in point clouds in a progressive manner. We evaluate the proposed model on a large-scale driving-scene dataset, i.e. SematicKITTI. Our method achieves state-of-the-art performance and outperforms existing methods by 6% in terms of mIoU.

preprint2020arXiv

Epitaxial Growth and Band Structure of Antiferromagnetic Mott Insulator CeOI

The van der Waals material CeOI is predicted to be a layered antiferromagnetic Mott insulator by DFT+U calculation. We successfully grow the CeOI films down to monolayer on graphene/6H-SiC(0001) substrate by using molecular beam epitaxy. Films are studied by {\it in-situ} scanning tunneling microscopy and spectroscopy, which shows a band gap of 4.4 eV. A metallic phase with composition unidentified also exists. This rare earth oxyhalide adds a new member to the two-dimensional magnetic materials.

preprint2020arXiv

Imitation Learning for Fashion Style Based on Hierarchical Multimodal Representation

Fashion is a complex social phenomenon. People follow fashion styles from demonstrations by experts or fashion icons. However, for machine agent, learning to imitate fashion experts from demonstrations can be challenging, especially for complex styles in environments with high-dimensional, multimodal observations. Most existing research regarding fashion outfit composition utilizes supervised learning methods to mimic the behaviors of style icons. These methods suffer from distribution shift: because the agent greedily imitates some given outfit demonstrations, it can drift away from one style to another styles given subtle differences. In this work, we propose an adversarial inverse reinforcement learning formulation to recover reward functions based on hierarchical multimodal representation (HM-AIRL) during the imitation process. The hierarchical joint representation can more comprehensively model the expert composited outfit demonstrations to recover the reward function. We demonstrate that the proposed HM-AIRL model is able to recover reward functions that are robust to changes in multimodal observations, enabling us to learn policies under significant variation between different styles.

preprint2020arXiv

Recovering Geometric Information with Learned Texture Perturbations

Regularization is used to avoid overfitting when training a neural network; unfortunately, this reduces the attainable level of detail hindering the ability to capture high-frequency information present in the training data. Even though various approaches may be used to re-introduce high-frequency detail, it typically does not match the training data and is often not time coherent. In the case of network inferred cloth, these sentiments manifest themselves via either a lack of detailed wrinkles or unnaturally appearing and/or time incoherent surrogate wrinkles. Thus, we propose a general strategy whereby high-frequency information is procedurally embedded into low-frequency data so that when the latter is smeared out by the network the former still retains its high-frequency detail. We illustrate this approach by learning texture coordinates which when smeared do not in turn smear out the high-frequency detail in the texture itself but merely smoothly distort it. Notably, we prescribe perturbed texture coordinates that are subsequently used to correct the over-smoothed appearance of inferred cloth, and correcting the appearance from multiple camera views naturally recovers lost geometric information.

preprint2020arXiv

SegVoxelNet: Exploring Semantic Context and Depth-aware Features for 3D Vehicle Detection from Point Cloud

3D vehicle detection based on point cloud is a challenging task in real-world applications such as autonomous driving. Despite significant progress has been made, we observe two aspects to be further improved. First, the semantic context information in LiDAR is seldom explored in previous works, which may help identify ambiguous vehicles. Second, the distribution of point cloud on vehicles varies continuously with increasing depths, which may not be well modeled by a single model. In this work, we propose a unified model SegVoxelNet to address the above two problems. A semantic context encoder is proposed to leverage the free-of-charge semantic segmentation masks in the bird's eye view. Suspicious regions could be highlighted while noisy regions are suppressed by this module. To better deal with vehicles at different depths, a novel depth-aware head is designed to explicitly model the distribution differences and each part of the depth-aware head is made to focus on its own target detection range. Extensive experiments on the KITTI dataset show that the proposed method outperforms the state-of-the-art alternatives in both accuracy and efficiency with point cloud as input only.

preprint2020arXiv

Skinning a Parameterization of Three-Dimensional Space for Neural Network Cloth

We present a novel learning framework for cloth deformation by embedding virtual cloth into a tetrahedral mesh that parametrizes the volumetric region of air surrounding the underlying body. In order to maintain this volumetric parameterization during character animation, the tetrahedral mesh is constrained to follow the body surface as it deforms. We embed the cloth mesh vertices into this parameterization of three-dimensional space in order to automatically capture much of the nonlinear deformation due to both joint rotations and collisions. We then train a convolutional neural network to recover ground truth deformation by learning cloth embedding offsets for each skeletal pose. Our experiments show significant improvement over learning cloth offsets from body surface parameterizations, both quantitatively and visually, with prior state of the art having a mean error five standard deviations higher than ours. Moreover, our results demonstrate the efficacy of a general learning paradigm where high-frequency details can be embedded into low-frequency parameterizations.