Source author record

Shiyao Wang

Shiyao Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Information Retrieval Artificial Intelligence eess.IV physics.ins-det physics.optics

Catalog footprint

What is connected

4works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Unleashing the Native Recommendation Potential: LLM-Based Generative Recommendation via Structured Term Identifiers

Leveraging the vast open-world knowledge and understanding capabilities of Large Language Models (LLMs) to develop general-purpose, semantically-aware recommender systems has emerged as a pivotal research direction in generative recommendation. However, existing methods face bottlenecks in constructing item identifiers. Text-based methods introduce LLMs' vast output space, leading to hallucination, while methods based on Semantic IDs (SIDs) encounter a semantic gap between SIDs and LLMs' native vocabulary, requiring costly vocabulary expansion and alignment training. To address this, this paper introduces Term IDs (TIDs), defined as a set of semantically rich and standardized textual keywords, to serve as robust item identifiers. We propose GRLM, a novel framework centered on TIDs, employs Context-aware Term Generation to convert item's metadata into standardized TIDs and utilizes Integrative Instruction Fine-tuning to collaboratively optimize term internalization and sequential recommendation. Additionally, Elastic Identifier Grounding is designed for robust item mapping. Extensive experiments on real-world datasets demonstrate that GRLM significantly outperforms baselines across multiple scenarios, pointing a promising direction for generalizable and high-performance generative recommendation systems.

preprint2022arXiv

Self-Supervised Text Erasing with Controllable Image Synthesis

Recent efforts on scene text erasing have shown promising results. However, existing methods require rich yet costly label annotations to obtain robust models, which limits the use for practical applications. To this end, we study an unsupervised scenario by proposing a novel Self-supervised Text Erasing (STE) framework that jointly learns to synthesize training images with erasure ground-truth and accurately erase texts in the real world. We first design a style-aware image synthesis function to generate synthetic images with diverse styled texts based on two synthetic mechanisms. To bridge the text style gap between the synthetic and real-world data, a policy network is constructed to control the synthetic mechanisms by picking style parameters with the guidance of two specifically designed rewards. The synthetic training images with erasure ground-truth are then fed to train a coarse-to-fine erasing network. To produce better erasing outputs, a triplet erasure loss is designed to enforce the refinement stage to recover background textures. Moreover, we provide a new dataset (called PosterErase), which contains 60K high-resolution posters with texts and is more challenging for the text erasing task. The proposed method has been extensively evaluated with both PosterErase and the widely-used SCUT-Enstext dataset. Notably, on PosterErase, our unsupervised method achieves 5.07 in terms of FID, with a relative performance of 20.9% over existing supervised baselines.

preprint2021arXiv

A Hybrid Bandit Model with Visual Priors for Creative Ranking in Display Advertising

Creative plays a great important role in e-commerce for exhibiting products. Sellers usually create multiple creatives for comprehensive demonstrations, thus it is crucial to display the most appealing design to maximize the Click-Through Rate~(CTR). For this purpose, modern recommender systems dynamically rank creatives when a product is proposed for a user. However, this task suffers more cold-start problem than conventional products recommendation In this paper, we propose a hybrid bandit model with visual priors which first makes predictions with a visual evaluation, and then naturally evolves to focus on the specialities through the hybrid bandit model. Our contributions are three-fold: 1) We present a visual-aware ranking model (called VAM) that incorporates a list-wise ranking loss for ordering the creatives according to the visual appearance. 2) Regarding visual evaluations as a prior, the hybrid bandit model (called HBM) is proposed to evolve consistently to make better posteriori estimations by taking more observations into consideration for online scenarios. 3) A first large-scale creative dataset, CreativeRanking, is constructed, which contains over 1.7M creatives of 500k products as well as their real impression and click data. Extensive experiments have also been conducted on both our dataset and public Mushroom dataset, demonstrating the effectiveness of the proposed method.

preprint2020arXiv

Wide-field, high-resolution lensless on-chip microscopy via near-field blind ptychographic modulation

We report a novel lensless on-chip microscopy platform based on near-field blind ptychographic modulation. In this platform, we place a thin diffuser in between the object and the image sensor for light wave modulation. By blindly scanning the unknown diffuser to different x-y positions, we acquire a sequence of modulated intensity images for quantitative object recovery. Different from previous ptychographic implementations, we employ a unit magnification configuration with a Fresnel number of ~50,000, which is orders of magnitude higher than previous ptychographic setups. The unit magnification configuration allows us to have the entire sensor area, 6.4 mm by 4.6 mm, as the imaging field of view. The ultra-high Fresnel number enables us to directly recover the positional shift of the diffuser in the phase retrieval process, addressing the positioning accuracy issue plagued in regular ptychographic experiments. In our implementation, we use a low-cost, DIY scanning stage to perform blind diffuser modulation. Precise mechanical scanning that is critical in conventional ptychography experiments is no longer needed in our setup. We further employ an up-sampling phase retrieval scheme to bypass the resolution limit set by the imager pixel size and demonstrate a half-pitch resolution of 0.78 micron. We validate the imaging performance via in vitro cell cultures, transparent and stained tissue sections, and a thick biological sample. We show that the recovered quantitative phase map can be used to perform effective cell segmentation of the dense yeast culture. We also demonstrate 3D digital refocusing of the thick biological sample based on the recovered wavefront. The reported platform provides a cost-effective and turnkey solution for large field-of-view, high-resolution, and quantitative on-chip microscopy.