Source author record

Hui Zhou

Hui Zhou appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision cond-mat.mtrl-sci cond-mat.stat-mech Distributed, Parallel, and Cluster Computing physics.comp-ph quant-ph Artificial Intelligence cond-mat.mes-hall eess.SY Information Theory Machine Learning math.CO math.IT Networking and Internet Architecture Performance stat.OT Systems and Control

Catalog footprint

What is connected

22works

17topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

PosterVerse: A Full-Workflow Framework for Commercial-Grade Poster Generation with HTML-Based Scalable Typography

Commercial-grade poster design demands the seamless integration of aesthetic appeal with precise, informative content delivery. Current automated poster generation systems face significant limitations, including incomplete design workflows, poor text rendering accuracy, and insufficient flexibility for commercial applications. To address these challenges, we propose PosterVerse, a full-workflow, commercial-grade poster generation method that seamlessly automates the entire design process while delivering high-density and scalable text rendering. PosterVerse replicates professional design through three key stages: (1) blueprint creation using fine-tuned LLMs to extract key design elements from user requirements, (2) graphical background generation via customized diffusion models to create visually appealing imagery, and (3) unified layout-text rendering with an MLLM-powered HTML engine to guarantee high text accuracy and flexible customization. In addition, we introduce PosterDNA, a commercial-grade, HTML-based dataset tailored for training and validating poster design models. To the best of our knowledge, PosterDNA is the first Chinese poster generation dataset to introduce HTML typography files, enabling scalable text rendering and fundamentally solving the challenges of rendering small and high-density text. Experimental results demonstrate that PosterVerse consistently produces commercial-grade posters with appealing visuals, accurate text alignment, and customizable layouts, making it a promising solution for automating commercial poster design. The code and model are available at https://github.com/wuhaer/PosterVerse.

preprint2026arXiv

ReCoVer: Resilient LLM Pre-Training System via Fault-Tolerant Collective and Versatile Workload

Pre-training large language models on massive GPU clusters has made hardware faults routine rather than rare, driving the need for resilient training systems. Yet existing frameworks either focus on specific parallelism schemes or risk drifting away from a failure-free training trajectory. We propose ReCoVer, a resilient LLM pre-training system that upholds a single invariant: each iteration keeps the number of microbatches constant, ensuring per-iteration gradients remain stochastically equivalent to a failure-free run. The framework is organized as three decoupled protocol layers: (1) Fault-tolerant collectives that isolate faults from propagating across replicas; (2) in-step fine-grained recovery that preserves intra-iteration progress and prevents gradient corruption; (3) versatile-workload policy that dynamically redistributes microbatch quotas across the survivors. The design is parallelism-agnostic, integrating directly with both 3D parallelism and Hybrid Sharded Data Parallel (HSDP) as a drop-in substrate. We evaluate our implementation on end-to-end pre-training tasks for up to 512 GPUs, ReCoVer successfully preserves the training trajectory from a failure-free reference despite of 256 GPUs lost spread across the run. For comparison with checkpoint-and-restart baselines, ReCoVer demonstrates $2.23\times$ higher effective throughput after successive failures. This advantage results in ReCoVer processing 74.9% more tokens at 234 GPU-hours, with the gap widening as the training prolongs.

preprint2022arXiv

Analyzing Novel Grant-Based and Grant-Free Access Schemes for Small Data Transmission

Fifth Generation (5G) New Radio (NR) does not support data transmission during random access (RA) procedures, which results in unnecessary control signalling overhead and power consumption, especially for small data transmission (SDT). Motivated by this, 3GPP has proposed 4/2-step SDT RA schemes based on the existing grant-based (4-step) and grant-free (2-step) RA schemes, with the aim to enable data transmission during RA procedures in Radio Resource Control (RRC) Inactive state. To compare the 4/2-step SDT RA schemes with the benchmark 4/2-step RA schemes, we provide a spatio-temporal analytical framework to evaluate the RA schemes, which jointly models the preamble detection, Physical Uplink Shared Channel (PUSCH) decoding, and data transmission procedures. Based on this analytical model, we derive the analytical expressions for the overall packet transmission success probability and average throughput in each RACH attempt. We also derive the average energy consumption in each RACH attempt. Our results show that 2-step SDT RA scheme provides the highest overall packet transmission success probability, and the lowest average energy consumption, but the performance gain decreases with the increase of device intensity.

preprint2022arXiv

LiDAR-based 4D Panoptic Segmentation via Dynamic Shifting Network

With the rapid advances of autonomous driving, it becomes critical to equip its sensing system with more holistic 3D perception. However, existing works focus on parsing either the objects (e.g. cars and pedestrians) or scenes (e.g. trees and buildings) from the LiDAR sensor. In this work, we address the task of LiDAR-based panoptic segmentation, which aims to parse both objects and scenes in a unified manner. As one of the first endeavors towards this new challenging task, we propose the Dynamic Shifting Network (DS-Net), which serves as an effective panoptic segmentation framework in the point cloud realm. In particular, DS-Net has three appealing properties: 1) Strong backbone design. DS-Net adopts the cylinder convolution that is specifically designed for LiDAR point clouds. 2) Dynamic Shifting for complex point distributions. We observe that commonly-used clustering algorithms are incapable of handling complex autonomous driving scenes with non-uniform point cloud distributions and varying instance sizes. Thus, we present an efficient learnable clustering module, dynamic shifting, which adapts kernel functions on the fly for different instances. 3) Extension to 4D prediction. Furthermore, we extend DS-Net to 4D panoptic LiDAR segmentation by the temporally unified instance clustering on aligned LiDAR frames. To comprehensively evaluate the performance of LiDAR-based panoptic segmentation, we construct and curate benchmarks from two large-scale autonomous driving LiDAR datasets, SemanticKITTI and nuScenes. Extensive experiments demonstrate that our proposed DS-Net achieves superior accuracies over current state-of-the-art methods in both tasks. Notably, in the single frame version of the task, we outperform the SOTA method by 1.8% in terms of the PQ metric. In the 4D version of the task, we surpass 2nd place by 5.4% in terms of the LSTQ metric.

preprint2022arXiv

MPIX Stream: An Explicit Solution to Hybrid MPI+X Programming

The hybrid MPI+X programming paradigm, where X refers to threads or GPUs, has gained prominence in the high-performance computing arena. This corresponds to a trend of system architectures growing more heterogeneous. The current MPI standard only specifies the compatibility levels between MPI and threading runtimes. No MPI concept or interface exists for applications to pass thread context or GPU stream context to MPI implementations explicitly. This lack has made performance optimization complicated in some cases and impossible in other cases. We propose a new concept in MPI, called MPIX stream, to represent the general serial execution context that exists in X runtimes. MPIX streams can be directly mapped to threads or GPU execution streams. Passing thread context into MPI allows implementations to precisely map the execution contexts to network endpoints. Passing GPU execution context into MPI allows implementations to directly operate on GPU streams, lowering the CPU/GPU synchronization cost.

preprint2022arXiv

Observation of one-dimensional Dirac fermions in silicon nanoribbons

Dirac materials, which feature Dirac cones in the reciprocal space, have been one of the hottest topics in condensed matter physics in the past decade. To date, 2D and 3D Dirac Fermions have been extensively studied, while their 1D counterparts are rare. Recently, Si nanoribbons (SiNRs), which are composed of alternating pentagonal Si rings, have attracted intensive attention. However, the electronic structure and topological properties of SiNRs are still elusive. Here, by angle-resolved photoemission spectroscopy, scanning tunneling microscopy/spectroscopy measurements, first-principles calculations, and tight-binding model analysis, we demonstrate the existence of 1D Dirac Fermions in SiNRs. Our theoretical analysis shows that the Dirac cones derive from the armchairlike Si chain in the center of the nanoribbon and can be described by the Su-Schrieffer-Heeger model. These results establish SiNRs as a platform for studying the novel physical properties in 1D Dirac materials.

preprint2022arXiv

Observation of topological flat bands in the kagome semiconductor Nb$_3$Cl$_8$

The destructive interference of wavefunctions in a kagome lattice can give rise to topological flat bands (TFBs) with a highly degenerate state of electrons. Recently, TFBs have been observed in several kagome metals, including Fe$_3$Sn$_2$, FeSn, CoSn, and YMn$_6$Sn$_6$. Nonetheless, kagome materials that are both exfoliable and semiconducting are lacking, which seriously hinders their device applications. Herein, we show that Nb$_3$Cl$_8$, which hosts a breathing kagome lattice, is gapped out because of the absence of inversion symmetry, while the TFBs survive because of the protection of the mirror reflection symmetry. By angle-resolved photoemission spectroscopy measurements and first-principles calculations, we directly observe the TFB and a moderate band gap in Nb$_3$Cl$_8$. By mechanical exfoliation, we successfully obtain monolayers of Nb$_3$Cl$_8$ and confirm that they are stable under ambient conditions. In addition, our calculations show that monolayers of Nb$_3$Cl$_8$ have a magnetic ground state, thus providing opportunities to study the interplay between geometry, topology, and magnetism.

preprint2022arXiv

Research on spatial information transmission efficiency and capability of safe evacuation signs

As an indispensable spatial direction information indicator for emergency evacuation, the spatial relationship between safety evacuation signs and evacuees will affect the response time of evacuees and the evacuation efficiency. This paper takes 2 kinds of common safety evacuation signs, hangtag-type and embedded, as the research object and designs space direction information transmission efficiency and capability simulation experiment and fire drill, the efficiency and capability of spatial direction information transmission of safety evacuation signs are studied. The results show that the space angle of the hangtag-type safety evacuation sign is inversely proportional to the information transmission efficiency and capability of the space direction, and the fire drill also confirms this conclusion. When the spatial angle of the embedded safety evacuation sign is 5°, the spatial direction information transmission efficiency and capability increase. Simultaneously, the average escape time of the participants in the fire drill was lower, and the percentage of choosing unfamiliarity exports increased. The evolution of spatial angle has no significant effect on the intention of the response of subjects of different genders; when choosing the direction, males are more easily affected by the change of spatial angle than females; the confidence level of females' choice is more easily affected by spatial angle. In addition, according to the research results, the corresponding three-dimensional structure safety evacuation signs are designed. The functional structure of the safety evacuation signs is perfected, which can effectively improve the efficiency of fire emergency evacuation.

preprint2020arXiv

Cylinder3D: An Effective 3D Framework for Driving-scene LiDAR Semantic Segmentation

State-of-the-art methods for large-scale driving-scene LiDAR semantic segmentation often project and process the point clouds in the 2D space. The projection methods includes spherical projection, bird-eye view projection, etc. Although this process makes the point cloud suitable for the 2D CNN-based networks, it inevitably alters and abandons the 3D topology and geometric relations. A straightforward solution to tackle the issue of 3D-to-2D projection is to keep the 3D representation and process the points in the 3D space. In this work, we first perform an in-depth analysis for different representations and backbones in 2D and 3D spaces, and reveal the effectiveness of 3D representations and networks on LiDAR segmentation. Then, we develop a 3D cylinder partition and a 3D cylinder convolution based framework, termed as Cylinder3D, which exploits the 3D topology relations and structures of driving-scene point clouds. Moreover, a dimension-decomposition based context modeling module is introduced to explore the high-rank context information in point clouds in a progressive manner. We evaluate the proposed model on a large-scale driving-scene dataset, i.e. SematicKITTI. Our method achieves state-of-the-art performance and outperforms existing methods by 6% in terms of mIoU.

preprint2020arXiv

Epitaxial Growth and Band Structure of Antiferromagnetic Mott Insulator CeOI

The van der Waals material CeOI is predicted to be a layered antiferromagnetic Mott insulator by DFT+U calculation. We successfully grow the CeOI films down to monolayer on graphene/6H-SiC(0001) substrate by using molecular beam epitaxy. Films are studied by {\it in-situ} scanning tunneling microscopy and spectroscopy, which shows a band gap of 4.4 eV. A metallic phase with composition unidentified also exists. This rare earth oxyhalide adds a new member to the two-dimensional magnetic materials.

preprint2020arXiv

Imitation Learning for Fashion Style Based on Hierarchical Multimodal Representation

Fashion is a complex social phenomenon. People follow fashion styles from demonstrations by experts or fashion icons. However, for machine agent, learning to imitate fashion experts from demonstrations can be challenging, especially for complex styles in environments with high-dimensional, multimodal observations. Most existing research regarding fashion outfit composition utilizes supervised learning methods to mimic the behaviors of style icons. These methods suffer from distribution shift: because the agent greedily imitates some given outfit demonstrations, it can drift away from one style to another styles given subtle differences. In this work, we propose an adversarial inverse reinforcement learning formulation to recover reward functions based on hierarchical multimodal representation (HM-AIRL) during the imitation process. The hierarchical joint representation can more comprehensively model the expert composited outfit demonstrations to recover the reward function. We demonstrate that the proposed HM-AIRL model is able to recover reward functions that are robust to changes in multimodal observations, enabling us to learn policies under significant variation between different styles.

preprint2020arXiv

On distance matrices of distance-regular graphs

In this paper, we give a characterization of distance matrices of distance-regular graphs to be invertible.

preprint2020arXiv

Recovering Geometric Information with Learned Texture Perturbations

Regularization is used to avoid overfitting when training a neural network; unfortunately, this reduces the attainable level of detail hindering the ability to capture high-frequency information present in the training data. Even though various approaches may be used to re-introduce high-frequency detail, it typically does not match the training data and is often not time coherent. In the case of network inferred cloth, these sentiments manifest themselves via either a lack of detailed wrinkles or unnaturally appearing and/or time incoherent surrogate wrinkles. Thus, we propose a general strategy whereby high-frequency information is procedurally embedded into low-frequency data so that when the latter is smeared out by the network the former still retains its high-frequency detail. We illustrate this approach by learning texture coordinates which when smeared do not in turn smear out the high-frequency detail in the texture itself but merely smoothly distort it. Notably, we prescribe perturbed texture coordinates that are subsequently used to correct the over-smoothed appearance of inferred cloth, and correcting the appearance from multiple camera views naturally recovers lost geometric information.

preprint2020arXiv

SegVoxelNet: Exploring Semantic Context and Depth-aware Features for 3D Vehicle Detection from Point Cloud

3D vehicle detection based on point cloud is a challenging task in real-world applications such as autonomous driving. Despite significant progress has been made, we observe two aspects to be further improved. First, the semantic context information in LiDAR is seldom explored in previous works, which may help identify ambiguous vehicles. Second, the distribution of point cloud on vehicles varies continuously with increasing depths, which may not be well modeled by a single model. In this work, we propose a unified model SegVoxelNet to address the above two problems. A semantic context encoder is proposed to leverage the free-of-charge semantic segmentation masks in the bird's eye view. Suspicious regions could be highlighted while noisy regions are suppressed by this module. To better deal with vehicles at different depths, a novel depth-aware head is designed to explicitly model the distribution differences and each part of the depth-aware head is made to focus on its own target detection range. Extensive experiments on the KITTI dataset show that the proposed method outperforms the state-of-the-art alternatives in both accuracy and efficiency with point cloud as input only.

preprint2020arXiv

Skinning a Parameterization of Three-Dimensional Space for Neural Network Cloth

We present a novel learning framework for cloth deformation by embedding virtual cloth into a tetrahedral mesh that parametrizes the volumetric region of air surrounding the underlying body. In order to maintain this volumetric parameterization during character animation, the tetrahedral mesh is constrained to follow the body surface as it deforms. We embed the cloth mesh vertices into this parameterization of three-dimensional space in order to automatically capture much of the nonlinear deformation due to both joint rotations and collisions. We then train a convolutional neural network to recover ground truth deformation by learning cloth embedding offsets for each skeletal pose. Our experiments show significant improvement over learning cloth offsets from body surface parameterizations, both quantitatively and visually, with prior state of the art having a mean error five standard deviations higher than ours. Moreover, our results demonstrate the efficacy of a general learning paradigm where high-frequency details can be embedded into low-frequency parameterizations.

preprint2016arXiv

Crafting GBD-Net for Object Detection

The visual cues from multiple support regions of different sizes and resolutions are complementary in classifying a candidate box in object detection. Effective integration of local and contextual visual cues from these regions has become a fundamental problem in object detection. In this paper, we propose a gated bi-directional CNN (GBD-Net) to pass messages among features from different support regions during both feature learning and feature extraction. Such message passing can be implemented through convolution between neighboring support regions in two directions and can be conducted in various layers. Therefore, local and contextual visual patterns can validate the existence of each other by learning their nonlinear relationships and their close interactions are modeled in a more complex way. It is also shown that message passing is not always helpful but dependent on individual samples. Gated functions are therefore needed to control message transmission, whose on-or-offs are controlled by extra visual evidence from the input sample. The effectiveness of GBD-Net is shown through experiments on three object detection datasets, ImageNet, Pascal VOC2007 and Microsoft COCO. This paper also shows the details of our approach in wining the ImageNet object detection challenge of 2016, with source code provided on \url{https://github.com/craftGBD/craftGBD}.

preprint2016arXiv

Pure State Tomography with Pauli Measurements

We examine the problem of finding the minimum number of Pauli measurements needed to uniquely determine an arbitrary $n$-qubit pure state among all quantum states. We show that only $11$ Pauli measurements are needed to determine an arbitrary two-qubit pure state compared to the full quantum state tomography with $16$ measurements, and only $31$ Pauli measurements are needed to determine an arbitrary three-qubit pure state compared to the full quantum state tomography with $64$ measurements. We demonstrate that our protocol is robust under depolarizing error with simulated random pure states. We experimentally test the protocol on two- and three-qubit systems with nuclear magnetic resonance techniques. We show that the pure state tomography protocol saves us a number of measurements without considerable loss of fidelity. We compare our protocol with same-size sets of randomly selected Pauli operators and find that our selected set of Pauli measurements significantly outperforms those random sampling sets. As a direct application, our scheme can also be used to reduce the number of settings needed for pure-state tomography in quantum optical systems.

preprint2014arXiv

Observation of Lee-Yang zeros

Lee-Yang zeros are points on the complex plane of magnetic field where the partition function of a spin system is zero and therefore the free energy diverges. Lee-Yang zeros and their generalizations are ubiquitous in many-body systems and they fully characterize the analytic properties of the free energy and hence thermodynamics of the systems. Determining the Lee-Yang zeros is not only fundamentally important for conceptual completeness of thermodynamics and statistical physics but also technically useful for studying many-body systems. However, Lee-Yang zeros have never been observed in experiments, due to the intrinsic difficulty that Lee-Yang zeros would occur only at complex values of magnetic field, which are unphysical. Here we report the first observation of Lee-Yang zeros, by measuring quantum coherence of a probe spin coupled to an Ising-type spin bath. As recently proposed, the quantum evolution of the probe spin introduces a complex phase factor, and therefore effectively realizes an imaginary magnetic field on the bath. From the measured Lee-Yang zeros, we reconstructed the free energy of the spin bath and determined its phase transition temperature. This experiment demonstrates quantum coherence probe as a useful approach to studying thermodynamics in the complex plane, which may reveal a broad range of new phenomena that would otherwise be inaccessible if physical parameters are restricted to be real numbers.

preprint2013arXiv

Competing thermodynamic and dynamic factors select molecular assemblies on a gold surface

Controlling the self-assembly of surface-adsorbed molecules into nanostructures requires understanding physical mechanisms that act across multiple length and time scales. By combining scanning tunneling microscopy with hierarchical ab initio and statistical mechanical modeling of 1,4-substituted benzenediamine (BDA) molecules adsorbed on a gold (111) surface, we demonstrate that apparently simple nanostructures are selected by a subtle competition of thermodynamics and dynamics. Of the collection of possible BDA nanostructures mechanically stabilized by hydrogen bonding, the interplay of intermolecular forces, surface modulation, and assembly dynamics select at low temperature a particular subset: low free energy oriented linear chains of monomers, and high free energy branched chains.

preprint2012arXiv

Cell Association and Handover Management in Femtocell Networks

Although the technology of femtocells is highly promising, many challenging problems should be addressed before fully harvesting its potential. In this paper, we investigate the problem of cell association and handover management in femtocell networks. Two extreme cases for cell association are first discussed and analyzed. Then we propose our algorithm to maximize network capacity while achieving fairness among users. Based on this algorithm, we further develop a handover algorithm to reduce the number of unnecessary handovers using Bayesian estimation. The proposed handover algorithm is demonstrated to outperform a heuristic scheme with considerable gains in our simulation study.

preprint2011arXiv

Visualizing Individual Nitrogen Dopants in Monolayer Graphene

In monolayer graphene, substitutional doping during growth can be used to alter its electronic properties. We used scanning tunneling microscopy (STM), Raman spectroscopy, x-ray spectroscopy, and first principles calculations to characterize individual nitrogen dopants in monolayer graphene grown on a copper substrate. Individual nitrogen atoms were incorporated as graphitic dopants, and a fraction of the extra electron on each nitrogen atom was delocalized into the graphene lattice. The electronic structure of nitrogen-doped graphene was strongly modified only within a few lattice spacings of the site of the nitrogen dopant. These findings show that chemical doping is a promising route to achieving high-quality graphene films with a large carrier concentration.

preprint2010arXiv

Joint Channel Probing and Proportional Fair Scheduling in Wireless Networks

The design of a scheduling scheme is crucial for the efficiency and user-fairness of wireless networks. Assuming that the quality of all user channels is available to a central controller, a simple scheme which maximizes the utility function defined as the sum logarithm throughput of all users has been shown to guarantee proportional fairness. However, to acquire the channel quality information may consume substantial amount of resources. In this work, it is assumed that probing the quality of each user's channel takes a fraction of the coherence time, so that the amount of time for data transmission is reduced. The multiuser diversity gain does not always increase as the number of users increases. In case the statistics of the channel quality is available to the controller, the problem of sequential channel probing for user scheduling is formulated as an optimal stopping time problem. A joint channel probing and proportional fair scheduling scheme is developed. This scheme is extended to the case where the channel statistics are not available to the controller, in which case a joint learning, probing and scheduling scheme is designed by studying a generalized bandit problem. Numerical results demonstrate that the proposed scheduling schemes can provide significant gain over existing schemes.

Hui Zhou

What is connected

Connect this record

See the researcher in context

Building this map preview

22 published item(s)

PosterVerse: A Full-Workflow Framework for Commercial-Grade Poster Generation with HTML-Based Scalable Typography

ReCoVer: Resilient LLM Pre-Training System via Fault-Tolerant Collective and Versatile Workload

Analyzing Novel Grant-Based and Grant-Free Access Schemes for Small Data Transmission

LiDAR-based 4D Panoptic Segmentation via Dynamic Shifting Network

MPIX Stream: An Explicit Solution to Hybrid MPI+X Programming

Observation of one-dimensional Dirac fermions in silicon nanoribbons

Observation of topological flat bands in the kagome semiconductor Nb$_3$Cl$_8$

Research on spatial information transmission efficiency and capability of safe evacuation signs

Cylinder3D: An Effective 3D Framework for Driving-scene LiDAR Semantic Segmentation

Epitaxial Growth and Band Structure of Antiferromagnetic Mott Insulator CeOI

Imitation Learning for Fashion Style Based on Hierarchical Multimodal Representation

On distance matrices of distance-regular graphs

Recovering Geometric Information with Learned Texture Perturbations

SegVoxelNet: Exploring Semantic Context and Depth-aware Features for 3D Vehicle Detection from Point Cloud

Skinning a Parameterization of Three-Dimensional Space for Neural Network Cloth

Crafting GBD-Net for Object Detection

Pure State Tomography with Pauli Measurements

Observation of Lee-Yang zeros

Competing thermodynamic and dynamic factors select molecular assemblies on a gold surface

Cell Association and Handover Management in Femtocell Networks

Visualizing Individual Nitrogen Dopants in Monolayer Graphene

Joint Channel Probing and Proportional Fair Scheduling in Wireless Networks