Source author record

Liangjie Hong

Liangjie Hong appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence Information Retrieval Applications math.ST Methodology Social and Information Networks Statistics Theory

Catalog footprint

What is connected

6works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

SAPO: Step-Aligned Policy Optimization for Reasoning-Based Generative Recommendation

Generative recommendation treats next-item prediction as autoregressive item-identifier generation. Specifically, items are encoded as semantic identifiers (SIDs), which are short coarse-to-fine token sequences whose early tokens capture broad semantics and later tokens refine them. Recent work augments this paradigm with reasoning traces and optimizes them via reinforcement learning with verifiable rewards, typically outcome-reward algorithm with exact-match feedback on the generated SID. However, in large-catalog recommendation, exact-match feedback on the generated SID only reports whether the final item is correct; when a generated SID mismatches, outcome-reward cannot identify which SID-token prediction caused the mismatch and may penalize matched SID-token positions together with the mismatched position. We identify that the natural unit of credit assignment in this setting is a single reasoning step (one thinking block paired with one SID token). We instantiate this idea in SAPO (Step-Aligned Policy Optimization): rather than broadcasting one advantage to the whole response, SAPO computes a separate group-relative advantage for each reasoning step and applies it only to the corresponding thinking block and SID token. Across three real-world recommendation datasets, SAPO stabilizes reinforcement-learning training and consistently improves over existing generative recommendation baselines, with the largest gains where sparse exact-match feedback makes reasoning-step credit assignment important. Our results suggest that reinforcement-learning objectives for structured generation should mirror the decoder's own decomposition of the output.

preprint2022arXiv

Remote Work Optimization with Robust Multi-channel Graph Neural Networks

The spread of COVID-19 leads to the global shutdown of many corporate offices, and encourages companies to open more opportunities that allow employees to work from a remote location. As the workplace type expands from onsite offices to remote areas, an emerging challenge for an online hiring marketplace is how these remote opportunities and user intentions to work remotely can be modeled and matched without prior information. Despite the unprecedented amount of remote jobs posted amid COVID-19, there is no existing approach that can be directly applied. Introducing a brand new workplace type naturally leads to the cold-start problem, which is particularly more common for less active job seekers. It is challenging, if not impossible, to onboard a new workplace type for any predictive model if existing information sources can provide little information related to a new category of jobs, including data from resumes and job descriptions. Hence, in this work, we aim to propose a principled approach that jointly models the remoteness of job seekers and job opportunities with limited information, which also suffices the needs of web-scale applications. Existing research on the emerging type of remote workplace mainly focuses on qualitative studies, and classic predictive modeling approaches are inapplicable considering the problem of cold-start and information scarcity. We precisely try to close this gap with a novel graph neural architecture. Extensive experiments on large-scale data from real-world applications have been conducted to validate the superiority of the proposed approach over competitive baselines. The improvement may translate to more rapid onboarding of the new workplace type that can benefit job seekers who are interested in working remotely.

preprint2019arXiv

The Identification and Estimation of Direct and Indirect Effects in A/B Tests through Causal Mediation Analysis

E-commerce companies have a number of online products, such as organic search, sponsored search, and recommendation modules, to fulfill customer needs. Although each of these products provides a unique opportunity for users to interact with a portion of the overall inventory, they are all similar channels for users and compete for limited time and monetary budgets of users. To optimize users' overall experiences on an E-commerce platform, instead of understanding and improving different products separately, it is important to gain insights into the evidence that a change in one product would induce users to change their behaviors in others, which may be due to the fact that these products are functionally similar. In this paper, we introduce causal mediation analysis as a formal statistical tool to reveal the underlying causal mechanisms. Existing literature provides little guidance on cases where multiple unmeasured causally-dependent mediators exist, which are common in A/B tests. We seek a novel approach to identify in those scenarios direct and indirect effects of the treatment. In the end, we demonstrate the effectiveness of the proposed method in data from Etsy's real A/B tests and shed lights on complex relationships between different products.

preprint2016arXiv

An Unbiased Data Collection and Content Exploitation/Exploration Strategy for Personalization

One of missions for personalization systems and recommender systems is to show content items according to users' personal interests. In order to achieve such goal, these systems are learning user interests over time and trying to present content items tailoring to user profiles. Recommending items according to users' preferences has been investigated extensively in the past few years, mainly thanks for the popularity of Netflix competition. In a real setting, users may be attracted by a subset of those items and interact with them, only leaving partial feedbacks to the system to learn in the next cycle, which leads to significant biases into systems and hence results in a situation where user engagement metrics cannot be improved over time. The problem is not just for one component of the system. The data collected from users is usually used in many different tasks, including learning ranking functions, building user profiles and constructing content classifiers. Once the data is biased, all these downstream use cases would be impacted as well. Therefore, it would be beneficial to gather unbiased data through user interactions. Traditionally, unbiased data collection is done through showing items uniformly sampling from the content pool. However, this simple scheme is not feasible as it risks user engagement metrics and it takes long time to gather user feedbacks. In this paper, we introduce a user-friendly unbiased data collection framework, by utilizing methods developed in the exploitation and exploration literature. We discuss how the framework is different from normal multi-armed bandit problems and why such method is needed. We layout a novel Thompson sampling for Bernoulli ranked-list to effectively balance user experiences and data collection. The proposed method is validated from a real bucket test and we show strong results comparing to old algorithms

preprint2016arXiv

Learning Optimal Card Ranking from Query Reformulation

Mobile search has recently been shown to be the major contributor to the growing search market. The key difference between mobile search and desktop search is that information presentation is limited to the screen space of the mobile device. Thus, major search engines have adopted a new type of search result presentation, known as \textit{information cards}, in which each card presents summarized results from one domain/vertical, for a given query, to augment the standard blue-links search results. While it has been widely acknowledged that information cards are particularly suited to mobile user experience, it is also challenging to optimize such result sets. Typically, user engagement metrics like query reformulation are based on whole ranked list of cards for each query and most traditional learning to rank algorithms require per-item relevance labels. In this paper, we investigate the possibility of interpreting query reformulation into effective relevance labels for query-card pairs. We inherit the concept of conventional learning-to-rank, and propose pointwise, pairwise and listwise interpretations for query reformulation. In addition, we propose a learning-to-label strategy that learns the contribution of each card, with respect to a query, where such contributions can be used as labels for training card ranking models. We utilize a state-of-the-art ranking model and demonstrate the effectiveness of proposed mechanisms on a large-scale mobile data from a major search engine, showing that models trained from labels derived from user engagement can significantly outperform ones trained from human judgment labels.

preprint2012arXiv

A Tutorial on Probabilistic Latent Semantic Analysis

In this tutorial, I will discuss the details about how Probabilistic Latent Semantic Analysis (PLSA) is formalized and how different learning algorithms are proposed to learn the model.