Source author record

Zhiwei Zhao

Zhiwei Zhao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computation and Language Computer Vision Human-Computer Interaction Machine Learning Networking and Internet Architecture

Catalog footprint

What is connected

4works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Advancing Aesthetic Image Generation via Composition Transfer

Composition is a cornerstone of visual aesthetics, influencing the appeal of an image. While its principles operate independently of specific content, in practice, composition is often coupled with semantics. As a result, existing methods often enhance composition either through implicit learning or by semantics-based layout control, rather than explicitly modeling composition itself. To address this gap, we introduce Composer, a framework rooted in aesthetic theory, designed to model composition in a semantic-agnostic manner. First, it supports composition transfer by extracting key composition-aware representations from a reference image and leveraging a tailored conditional guidance module to control composition based on pre-trained diffusion models. Second, when users specify only text themes without a composition reference, Composer supports theme-driven composition retrieval by leveraging the in-context learning capabilities of Large Vision-Language Models (LVLMs), achieving explicit composition planning. To enhance composition in a reference-free mode, we conduct text-to-composition fine-tuning on the trained control module to enable implicit composition planning. Furthermore, we curated a high-quality dataset comprising 2 million image-text pairs using state-of-the-art generative models to support model training. Experimental results demonstrate that Composer significantly enhances aesthetic quality in text-to-image tasks and facilitates personalized composition control and transfer, offering users precision and flexibility in the creative process.

preprint2022arXiv

Implementation of an Automated Learning System for Non-experts

Automated machine learning systems for non-experts could be critical for industries to adopt artificial intelligence to their own applications. This paper detailed the engineering system implementation of an automated machine learning system called YMIR, which completely relies on graphical interface to interact with users. After importing training/validation data into the system, a user without AI knowledge can label the data, train models, perform data mining and evaluation by simply clicking buttons. The paper described: 1) Open implementation of model training and inference through docker containers. 2) Implementation of task and resource management. 3) Integration of Labeling software. 4) Implementation of HCI (Human Computer Interaction) with a rebuilt collaborative development paradigm. We also provide subsequent case study on training models with the system. We hope this paper can facilitate the prosperity of our automated machine learning community from industry application perspective. The code of the system has already been released to GitHub (https://github.com/industryessentials/ymir).

preprint2020arXiv

Hierarchical Context Enhanced Multi-Domain Dialogue System for Multi-domain Task Completion

Task 1 of the DSTC8-track1 challenge aims to develop an end-to-end multi-domain dialogue system to accomplish complex users' goals under tourist information desk settings. This paper describes our submitted solution, Hierarchical Context Enhanced Dialogue System (HCEDS), for this task. The main motivation of our system is to comprehensively explore the potential of hierarchical context for sufficiently understanding complex dialogues. More specifically, we apply BERT to capture token-level information and employ the attention mechanism to capture sentence-level information. The results listed in the leaderboard show that our system achieves first place in automatic evaluation and the second place in human evaluation.

preprint2015arXiv

3D Wireless: Modeling Wireless Performance by Combining Spatial and Temporal Behaviors

Performance characterization is a fundamental issue in wireless networks for real time routing, wireless network simulation, and etc. There are four basic wireless operations that are required to be modeled, i.e., unicast, anycast, broadcast, and multicast. As observed in many recent works, the temporal and spatial distribution of packet receptions can have significant impact on wireless performance involving multiple links (anycast/broadcast/multicast). However, existing performance models and simulations overlook these two wireless behaviors, leading to biased performance estimation and simulation results. In this paper, we first explicitly identify the necessary "3-Dimension" information for wireless performance modeling, i.e., packet reception rate (PRR), PRR spatial distribution, and temporal distribution. We then propose a comprehensive modeling approach considering 3-Dimension Wireless information (called 3DW model). Further, we demonstrate the generality and wide applications of 3DW model by two case studies: 3DWbased network simulation and 3DW-based real time routing protocol. Extensive simulation and testbed experiments have been conducted. The results show that 3DW model achieves much more accurate performance estimation for both anycast and broadcast/multicast. 3DW-based simulation can effectively reserve the end-to-end performance metric of the input empirical traces. 3DW-based routing can select more efficient senders, achieving better transmission efficiency.