Source author record

Zehua Zhang

Zehua Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision eess.AS Information Retrieval Machine Learning Robotics

Catalog footprint

What is connected

4works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2025arXiv

GCRank: A Generative Contextual Comprehension Paradigm for Takeout Ranking Model

The ranking stage serves as the central optimization and allocation hub in advertising systems, governing economic value distribution through eCPM and orchestrating the user-centric blending of organic and advertising content. Prevailing ranking models often rely on fragmented modules and hand-crafted features, limiting their ability to interpret complex user intent. This challenge is further amplified in location-based services such as food delivery, where user decisions are shaped by dynamic spatial, temporal, and individual contexts. To address these limitations, we propose a novel generative framework that reframes ranking as a context comprehension task, modeling heterogeneous signals in a unified architecture. Our architecture consists of two core components: the Generative Contextual Encoder (GCE) and the Generative Contextual Fusion (GCF). The GCE comprises three specialized modules: a Personalized Context Enhancer (PCE) for user-specific modeling, a Collective Context Enhancer (CCE) for group-level patterns, and a Dynamic Context Enhancer (DCE) for real-time situational adaptation. The GCF module then seamlessly integrates these contextual representations through low-rank adaptation. Extensive experiments confirm that our method achieves significant gains in critical business metrics, including click-through rate and platform revenue. We have successfully deployed our method on a large-scale food delivery advertising platform, demonstrating its substantial practical impact. This work pioneers a new perspective on generative recommendation and highlights its practical potential in industrial advertising systems.

preprint2022arXiv

FB-MSTCN: A Full-Band Single-Channel Speech Enhancement Method Based on Multi-Scale Temporal Convolutional Network

In recent years, deep learning-based approaches have significantly improved the performance of single-channel speech enhancement. However, due to the limitation of training data and computational complexity, real-time enhancement of full-band (48 kHz) speech signals is still very challenging. Because of the low energy of spectral information in the high-frequency part, it is more difficult to directly model and enhance the full-band spectrum using neural networks. To solve this problem, this paper proposes a two-stage real-time speech enhancement model with extraction-interpolation mechanism for a full-band signal. The 48 kHz full-band time-domain signal is divided into three sub-channels by extracting, and a two-stage processing scheme of `masking + compensation' is proposed to enhance the signal in the complex domain. After the two-stage enhancement, the enhanced full-band speech signal is restored by interval interpolation. In the subjective listening and word accuracy test, our proposed model achieves superior performance and outperforms the baseline model overall by 0.59 MOS and 4.0% WAcc for the non-personalized speech denoising task.

preprint2020arXiv

Category-Specific CNN for Visual-aware CTR Prediction at JD.com

As one of the largest B2C e-commerce platforms in China, JD com also powers a leading advertising system, serving millions of advertisers with fingertip connection to hundreds of millions of customers. In our system, as well as most e-commerce scenarios, ads are displayed with images.This makes visual-aware Click Through Rate (CTR) prediction of crucial importance to both business effectiveness and user experience. Existing algorithms usually extract visual features using off-the-shelf Convolutional Neural Networks (CNNs) and late fuse the visual and non-visual features for the finally predicted CTR. Despite being extensively studied, this field still face two key challenges. First, although encouraging progress has been made in offline studies, applying CNNs in real systems remains non-trivial, due to the strict requirements for efficient end-to-end training and low-latency online serving. Second, the off-the-shelf CNNs and late fusion architectures are suboptimal. Specifically, off-the-shelf CNNs were designed for classification thus never take categories as input features. While in e-commerce, categories are precisely labeled and contain abundant visual priors that will help the visual modeling. Unaware of the ad category, these CNNs may extract some unnecessary category-unrelated features, wasting CNN's limited expression ability. To overcome the two challenges, we propose Category-specific CNN (CSCNN) specially for CTR prediction. CSCNN early incorporates the category knowledge with a light-weighted attention-module on each convolutional layer. This enables CSCNN to extract expressive category-specific visual patterns that benefit the CTR prediction. Offline experiments on benchmark and a 10 billion scale real production dataset from JD, together with an Online A/B test show that CSCNN outperforms all compared state-of-the-art algorithms.

preprint2020arXiv

Interaction Graphs for Object Importance Estimation in On-road Driving Videos

A vehicle driving along the road is surrounded by many objects, but only a small subset of them influence the driver's decisions and actions. Learning to estimate the importance of each object on the driver's real-time decision-making may help better understand human driving behavior and lead to more reliable autonomous driving systems. Solving this problem requires models that understand the interactions between the ego-vehicle and the surrounding objects. However, interactions among other objects in the scene can potentially also be very helpful, e.g., a pedestrian beginning to cross the road between the ego-vehicle and the car in front will make the car in front less important. We propose a novel framework for object importance estimation using an interaction graph, in which the features of each object node are updated by interacting with others through graph convolution. Experiments show that our model outperforms state-of-the-art baselines with much less input and pre-processing.

Zehua Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

GCRank: A Generative Contextual Comprehension Paradigm for Takeout Ranking Model

FB-MSTCN: A Full-Band Single-Channel Speech Enhancement Method Based on Multi-Scale Temporal Convolutional Network

Category-Specific CNN for Visual-aware CTR Prediction at JD.com

Interaction Graphs for Object Importance Estimation in On-road Driving Videos