Source author record

Zhihui Chen

Zhihui Chen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Applications Computer Vision cond-mat.mes-hall eess.SP Machine Learning Multimedia

Catalog footprint

What is connected

3works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

HiDream-O1-Image: A Natively Unified Image Generative Foundation Model with Pixel-level Unified Transformer

The evolution of visual generative models has long been constrained by fragmented architectures relying on disjoint text encoders and external VAEs. In this report, we present HiDream-O1-Image, a natively unified generative foundation model via pixel-space Diffusion Transformer, that pioneers a paradigm shift from modular architectures to an end-to-end in-context visual generation engine. By mapping raw image pixels, text tokens, and task-specific conditions into a single shared token space, HiDream-O1-Image achieves a structural unification of multimodal inputs within an Unified Transformer (UiT) architecture. This native encoding paradigm eliminates the need for separate VAEs or disjoint pre-trained text encoders, allowing the model to treat diverse generation and editing tasks as a consistent in-context reasoning process. Extensive experiments show that HiDream-O1-Image excels across various generation tasks, including text-to-image generation, instruction-based editing, and subject-driven personalization. Notably, with only 8B parameters, HiDream-O1-Image (8B) achieves performance parity with or even surpasses established state-of-the-art models with significantly larger parameters (e.g., 27B Qwen-Image). Crucially, to validate the immense scalability of this paradigm, we successfully scale the architecture up to over 200B parameters. Experimental results demonstrate that this massive-scale version HiDream-O1-Image-Pro (200B+) unlocks unprecedented generative capabilities and superior performance, establishing new state-of-the-art benchmarks. Ultimately, HiDream-O1-Image highlights the immense potential of natively unified architectures and charts a highly scalable path toward next-generation multimodal AI.

preprint2022arXiv

Clustering Enabled Few-Shot Load Forecasting

While the advanced machine learning algorithms are effective in load forecasting, they often suffer from low data utilization, and hence their superior performance relies on massive datasets. This motivates us to design machine learning algorithms with improved data utilization. Specifically, we consider the load forecasting for a new user in the system by observing only few shots (data points) of its energy consumption. This task is challenging since the limited samples are insufficient to exploit the temporal characteristics, essential for load forecasting. Nonetheless, we notice that there are not too many temporal characteristics for residential loads due to the limited kinds of human lifestyle. Hence, we propose to utilize the historical load profile data from existing users to conduct effective clustering, which mitigates the challenges brought by the limited samples. Specifically, we first design a feature extraction clustering method for categorizing historical data. Then, inheriting the prior knowledge from the clustering results, we propose a two-phase Long Short Term Memory (LSTM) model to conduct load forecasting for new users. The proposed method outperforms the traditional LSTM model, especially when the training sample size fails to cover a whole period (i.e., 24 hours in our task). Extensive case studies on two real-world datasets and one synthetic dataset verify the effectiveness and efficiency of our method.

preprint2022arXiv

Switching modulation of spin transport in ferromagnetic tetragonal silicene

We study the band structure and transport properties of ferromagnetic tetragonal silicene nanoribbons by using the non-equilibrium Green's function method. The band structure and spin-dependent conductance are discussed under the combined effect of the external electric field, potential energy, exchange field and the spin-orbit coupling. One can easily realize a phase transition from a semimetallic to a semiconducting state by changing the transverse width of the nanoribbon. Separation of spin-dependent conductances arises from the effect of exchange field and the spin-orbit coupling, while zero-conductance behaviors exhibit spin-dependent band gaps induced by the electric field. We propose a device configuration of four-terminal tetragonal silicene nanoribbon with two central channels. It is found that spin current can be controlled by utilizing two switches. The switch with a high potential barrier can block electrons flowing from the central scattering region into other terminals. Interestingly, applying only one switch can realize spin-dependent zero conductance and large spin polarization. Two switches can provide multiple operations for controlling spin-dependent transport properties. The two-channel ferromagnetic tetragonal silicene nanoribbon can realize an effective separation of spin current, which may be a potential candidate for spintronic devices.

Zhihui Chen

What is connected

Connect this record

See the researcher in context

Building this map preview

3 published item(s)

HiDream-O1-Image: A Natively Unified Image Generative Foundation Model with Pixel-level Unified Transformer

Clustering Enabled Few-Shot Load Forecasting

Switching modulation of spin transport in ferromagnetic tetragonal silicene