Source author record

Qi Qi

Qi Qi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Machine Learning Computer Science and Game Theory Networking and Internet Architecture Artificial Intelligence Computational Complexity econ.TH eess.IV

Catalog footprint

What is connected

10works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

ROI-Reasoning: Rational Optimization for Inference via Pre-Computation Meta-Cognition

Large language models (LLMs) can achieve strong reasoning performance with sufficient computation, but they do not inherently know how much computation a task requires. We study budgeted inference-time reasoning for multiple tasks under a strict global token constraint and formalize it as a Ordered Stochastic Multiple-Choice Knapsack Problem(OS-MCKP). This perspective highlights a meta-cognitive requirement -- anticipating task difficulty, estimating return over investment (ROI), and allocating computation strategically. We propose ROI-Reasoning, a two-stage framework that endows LLMs with intrinsic, budget-aware rationality. In the first stage, Meta-Cognitive Fine-Tuning teaches models to predict reasoning cost and expected utility before generation, enabling explicit solve-or-skip decisions. Next, Rationality-Aware Reinforcement Learning optimizes sequential decision making under a hard token budget, allowing models to learn long-horizon allocation strategies. Across budgeted mathematical reasoning benchmarks, ROI-Reasoning consistently improves overall score while substantially reducing regret under tight computation budgets.

preprint2026arXiv

X-OmniClaw Technical Report: A Unified Mobile Agent for Multimodal Understanding and Interaction

Inspired by the development of OpenClaw, there is a growing demand for mobile-based personal agents capable of handling complex and intuitive interactions. In this technical report, we introduce X-OmniClaw, a unified mobile agent designed for multimodal understanding and interaction in the Android ecosystem. This unified architecture of perception, memory, and action enables the agent to handle complex mobile tasks with high contextual awareness. Specifically, Omni Perception provides a unified multimodal ingress pipeline that integrates UI states, real-world visual contexts, and speech inputs, leveraging a temporal alignment module to decompose raw data into structured multimodal intent representations. Omni Memory leverages multimodal memory optimization to enhance personalized intelligence by integrating runtime working memory for task continuity with long-term personal memory distilled from local data, enabling highly context-aware and personalized interactions. Finally, Omni Action employs a hybrid grounding strategy that combines structural XML metadata with visual perception for robust interaction. Through Behavior Cloning and Trajectory Replay, the system captures user navigation as reusable skills, enabling precise direct-access execution. Demonstrations across diverse scenarios show that X-OmniClaw effectively enhances interaction efficiency and task reliability, providing a practical architectural blueprint for the next generation of mobile-native personal assistants.

preprint2022arXiv

Can Shuffling Video Benefit Temporal Bias Problem: A Novel Training Framework for Temporal Grounding

Temporal grounding aims to locate a target video moment that semantically corresponds to the given sentence query in an untrimmed video. However, recent works find that existing methods suffer a severe temporal bias problem. These methods do not reason the target moment locations based on the visual-textual semantic alignment but over-rely on the temporal biases of queries in training sets. To this end, this paper proposes a novel training framework for grounding models to use shuffled videos to address temporal bias problem without losing grounding accuracy. Our framework introduces two auxiliary tasks, cross-modal matching and temporal order discrimination, to promote the grounding model training. The cross-modal matching task leverages the content consistency between shuffled and original videos to force the grounding model to mine visual contents to semantically match queries. The temporal order discrimination task leverages the difference in temporal order to strengthen the understanding of long-term temporal contexts. Extensive experiments on Charades-STA and ActivityNet Captions demonstrate the effectiveness of our method for mitigating the reliance on temporal biases and strengthening the model's generalization ability against the different temporal distributions. Code is available at https://github.com/haojc/ShufflingVideosForTSG.

preprint2022arXiv

Optimally Integrating Ad Auction into E-Commerce Platforms

Advertising becomes one of the most popular ways of monetizing an online transaction platform. Usually, sponsored advertisements are posted on the most attractive positions to enhance the number of clicks. However, multiple e-commerce platforms are aware that this action may hurt the search experience of users, even though it can bring more incomes. To balance the advertising revenue and the user experience loss caused by advertisements, most e-commerce platforms choose fixing some areas for advertisements and adopting some simple restrictions on the number of ads, such as a fixed number K of ads on the top positions or one advertisement for every N organic searched results. Different from these common rules of treating the allocation of ads separately (from the arrangements of the organic searched items), in this work we build up an integrated system with mixed arrangements of advertisements and organic items. We focus on the design of truthful mechanisms to properly list the advertisements and organic items and optimally trade off the instant revenue and the user experience. Furthermore, for different settings and practical requirements, we extend our optimal truthful allocation mechanisms to cater for these realistic conditions. Finally, we exert several experiments to verify the improvement of our mechanism compared to the common-used advertising mechanism.

preprint2020arXiv

A Simple and Effective Framework for Pairwise Deep Metric Learning

Deep metric learning (DML) has received much attention in deep learning due to its wide applications in computer vision. Previous studies have focused on designing complicated losses and hard example mining methods, which are mostly heuristic and lack of theoretical understanding. In this paper, we cast DML as a simple pairwise binary classification problem that classifies a pair of examples as similar or dissimilar. It identifies the most critical issue in this problem--imbalanced data pairs. To tackle this issue, we propose a simple and effective framework to sample pairs in a batch of data for updating the model. The key to this framework is to define a robust loss for all pairs over a mini-batch of data, which is formulated by distributionally robust optimization. The flexibility in constructing the uncertainty decision set of the dual variable allows us to recover state-of-the-art complicated losses and also to induce novel variants. Empirical studies on several benchmark data sets demonstrate that our simple and effective method outperforms the state-of-the-art results. Codes are available at: https://github.com/qiqi-helloworld/A-Simple-and-Effective-Framework-for-Pairewise-Distance-Metric-Learning

preprint2020arXiv

AWR: Adaptive Weighting Regression for 3D Hand Pose Estimation

In this paper, we propose an adaptive weighting regression (AWR) method to leverage the advantages of both detection-based and regression-based methods. Hand joint coordinates are estimated as discrete integration of all pixels in dense representation, guided by adaptive weight maps. This learnable aggregation process introduces both dense and joint supervision that allows end-to-end training and brings adaptability to weight maps, making the network more accurate and robust. Comprehensive exploration experiments are conducted to validate the effectiveness and generality of AWR under various experimental settings, especially its usefulness for different types of dense representation and input modality. Our method outperforms other state-of-the-art methods on four publicly available datasets, including NYU, ICVL, MSRA and HANDS 2017 dataset.

preprint2020arXiv

Computations and Complexities of Tarski's Fixed Points and Supermodular Games

We consider two models of computation for Tarski's order preserving function f related to fixed points in a complete lattice: the oracle function model and the polynomial function model. In both models, we find the first polynomial time algorithm for finding a Tarski's fixed point. In addition, we provide a matching oracle bound for determining the uniqueness in the oracle function model and prove it is Co-NP hard in the polynomial function model. The existence of the pure Nash equilibrium in supermodular games is proved by Tarski's fixed point theorem. Exploring the difference between supermodular games and Tarski's fixed point, we also develop the computational results for finding one pure Nash equilibrium and determining the uniqueness of the equilibrium in supermodular games.

preprint2020arXiv

Variance-Reduced Off-Policy Memory-Efficient Policy Search

Off-policy policy optimization is a challenging problem in reinforcement learning (RL). The algorithms designed for this problem often suffer from high variance in their estimators, which results in poor sample efficiency, and have issues with convergence. A few variance-reduced on-policy policy gradient algorithms have been recently proposed that use methods from stochastic optimization to reduce the variance of the gradient estimate in the REINFORCE algorithm. However, these algorithms are not designed for the off-policy setting and are memory-inefficient, since they need to collect and store a large ``reference'' batch of samples from time to time. To achieve variance-reduced off-policy-stable policy optimization, we propose an algorithm family that is memory-efficient, stochastically variance-reduced, and capable of learning from off-policy samples. Empirical studies validate the effectiveness of the proposed approaches.

preprint2020arXiv

Vehicular Edge Computing via Deep Reinforcement Learning

The smart vehicles construct Vehicle of Internet which can execute various intelligent services. Although the computation capability of the vehicle is limited, multi-type of edge computing nodes provide heterogeneous resources for vehicular services.When offloading the complicated service to the vehicular edge computing node, the decision should consider numerous factors.The offloading decision work mostly formulate the decision to a resource scheduling problem with single or multiple objective function and some constraints, and explore customized heuristics algorithms. However, offloading multiple data dependency tasks in a service is a difficult decision, as an optimal solution must understand the resource requirement, the access network, the user mobility, and importantly the data dependency. Inspired by recent advances in machine learning, we propose a knowledge driven (KD) service offloading decision framework for Vehicle of Internet, which provides the optimal policy directly from the environment. We formulate the offloading decision of multi-task in a service as a long-term planning problem, and explores the recent deep reinforcement learning to obtain the optimal solution. It considers the future data dependency of the following tasks when making decision for a current task from the learned offloading knowledge. Moreover, the framework supports the pre-training at the powerful edge computing node and continually online learning when the vehicular service is executed, so that it can adapt the environment changes and learns policy that are sensible in hindsight. The simulation results show that KD service offloading decision converges quickly, adapts to different conditions, and outperforms the greedy offloading decision algorithm.

preprint2013arXiv

Cloud Service-Aware Location Update in Mobile Cloud Computing

Mobile devices are becoming the primary platforms for many users who always roam around when accessing the cloud computing services. From this, the cloud computing is integrated into the mobile environment by introducing a new paradigm, mobile cloud computing. In the context of mobile computing, the battery life of mobile device is limited, and it is important to balance the mobility performance and energy consumption. Fortunately, cloud services provide both opportunities and challenges for mobility management. Taking the activities of cloud services accessing into consideration, we propose a service-aware location update mechanism, which can detect the presence and location of the mobile device without traditional periodic registration update. Analytic model and simulation are developed to investigate the new mechanism. The results demonstrate that the service-aware location update management can reduce the location update times and handoff signaling, which can efficiently save power consumption for mobile devices.

Qi Qi

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

ROI-Reasoning: Rational Optimization for Inference via Pre-Computation Meta-Cognition

X-OmniClaw Technical Report: A Unified Mobile Agent for Multimodal Understanding and Interaction

Can Shuffling Video Benefit Temporal Bias Problem: A Novel Training Framework for Temporal Grounding

Optimally Integrating Ad Auction into E-Commerce Platforms

A Simple and Effective Framework for Pairwise Deep Metric Learning

AWR: Adaptive Weighting Regression for 3D Hand Pose Estimation

Computations and Complexities of Tarski's Fixed Points and Supermodular Games

Variance-Reduced Off-Policy Memory-Efficient Policy Search

Vehicular Edge Computing via Deep Reinforcement Learning

Cloud Service-Aware Location Update in Mobile Cloud Computing