Source author record

Ke Zhou

Ke Zhou appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.PR Artificial Intelligence Computer Vision cs.CY Information Retrieval Computation and Language physics.app-ph physics.chem-ph physics.class-ph q-fin.PM

Catalog footprint

What is connected

13works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

FAVOR: Efficient Filter-Agnostic Vector ANNS Based on Selectivity-Aware Exclusion Distances

Modern retrieval systems increasingly require integrating approximate nearest neighbor search (ANNS) with complex attribute filtering to handle hybrid queries in applications such as recommendation systems and retrieval-augmented generation (RAG). While HNSW-based inline-filtering methods show promise, existing approaches struggle to deliver high throughput under low-selectivity scenarios while balancing search efficiency, filtering generality, and index connectivity. To address these challenges, we propose FAVOR, an efficient filter-agnostic vector ANNS that supports arbitrary filtering conditions while maintaining stable performance across varying selectivity levels. FAVOR introduces three novel features: (1) an integrated architecture that unifies selectivity estimation and filtered ANNS execution, providing a cohesive solution for hybrid vector-attribute queries; (2) a HNSW-based inline-filtering algorithm that introduces an exclusion distance mechanism to dynamically reshape the vector distance distribution, pushing non-target vectors away from the query while promoting valid candidates toward the query, thus improving search efficiency without compromising generality or graph connectivity; and (3) a selectivity-driven search selector that estimates query selectivity and dynamically routes queries between a pre-filtering brute-force algorithm for low-selectivity cases and an optimized HNSW-based search algorithm for other scenarios, ensuring consistent performance. Extensive experiments on real-world datasets demonstrate that FAVOR achieves a 1.3-5$\times$ higher QPS at $Recall@10 = 95\%$ compared to state-of-the-art methods for arbitrary filtering conditions, while maintaining competitive performance even against tailored solutions in some filtering conditions.

preprint2026arXiv

Ion Clustering Regulated by Extreme Nanoconfinement Enables Mechanosensitive Nanochannels

Mechanosensitive ion nanochannels regulate transport by undergoing conformational changes within nanopores. However, achieving precise control over these conformational states remains a major challenge for both artificial soft or solid pores. Here, we propose an alternative mechanism that modulates the charge carrier density inside nanopores, inspired by transistors in solid-state electronics. This strategy leverages a novel phenomenon of confinement-regulated ion clustering in two-dimensional extremely confined nanochannels, revealed by extensive $μ$s-scale enhanced-sampling molecular simulations based on an \emph{ab initio}-refined force field and nucleation theory. The resulting \emph{force-ion transistor} enables mechanically gated control of ion transport and provides a conceptual foundation for designing ionic mechanical logic gates. Our findings offer new insights into piezochannel mechanosensing and electromechanical coupling in biosystems beyond conformational signaling, opening pathways to integrate artificial ion channels with neuromorphic devices for processing mechanical stimuli.

preprint2026arXiv

SoulX-FlashTalk: Real-Time Infinite Streaming of Audio-Driven Avatars via Self-Correcting Bidirectional Distillation

Deploying massive diffusion models for real-time, infinite-duration, audio-driven avatar generation presents a significant engineering challenge, primarily due to the conflict between computational load and strict latency constraints. Existing approaches often compromise visual fidelity by enforcing strictly unidirectional attention mechanisms or reducing model capacity. To address this problem, we introduce \textbf{SoulX-FlashTalk}, a 14B-parameter framework optimized for high-fidelity real-time streaming. Diverging from conventional unidirectional paradigms, we use a \textbf{Self-correcting Bidirectional Distillation} strategy that retains bidirectional attention within video chunks. This design preserves critical spatiotemporal correlations, significantly enhancing motion coherence and visual detail. To ensure stability during infinite generation, we incorporate a \textbf{Multi-step Retrospective Self-Correction Mechanism}, enabling the model to autonomously recover from accumulated errors and preventing collapse. Furthermore, we engineered a full-stack inference acceleration suite incorporating hybrid sequence parallelism, Parallel VAE, and kernel-level optimizations. Extensive evaluations confirm that SoulX-FlashTalk is the first 14B-scale system to achieve a \textbf{sub-second start-up latency (0.87s)} while reaching a real-time throughput of \textbf{32 FPS}, setting a new standard for high-fidelity interactive digital human synthesis.

preprint2024arXiv

Characterizing Fake News Targeting Corporations

Misinformation proliferates in the online sphere, with evident impacts on the political and social realms, influencing democratic discourse and posing risks to public health and safety. The corporate world is also a prime target for fake news dissemination. While recent studies have attempted to characterize corporate misinformation and its effects on companies, their findings often suffer from limitations due to qualitative or narrative approaches and a narrow focus on specific industries. To address this gap, we conducted an analysis utilizing social media quantitative methods and crowd-sourcing studies to investigate corporate misinformation across a diverse array of industries within the S\&P 500 companies. Our study reveals that corporate misinformation encompasses topics such as products, politics, and societal issues. We discovered companies affected by fake news also get reputable news coverage but less social media attention, leading to heightened negativity in social media comments, diminished stock growth, and increased stress mentions among employee reviews. Additionally, we observe that a company is not targeted by fake news all the time, but there are particular times when a critical mass of fake news emerges. These findings hold significant implications for regulators, business leaders, and investors, emphasizing the necessity to vigilantly monitor the escalating phenomenon of corporate misinformation.

preprint2022arXiv

Foreground Object Structure Transfer for Unsupervised Domain Adaptation

Unsupervised domain adaptation aims to train a classification model from the labeled source domain for the unlabeled target domain. Since the data distributions of the two domains are different, the model often performs poorly on the target domain. Existing methods align the feature distributions of the source and target domains and learn domain-invariant features to improve the performance of the model. However, the features are usually aligned as a whole, and the domain adaptation task fails to serve the classification, which will ignore the class information and lead to misalignment.In this paper, we investigate those features that should be used for domain alignment, introduce prior knowledge to extract foreground features to guide the domain adaptation task for classification tasks, and perform alignment in the local structure of objects. We propose a method called Foreground Object Structure Transfer(FOST). The key to FOST is the new clustering based condition, which combines the relative position relationship of foreground objects. Based on this conditions, FOST makes the data distribution of the same class more compact in geometry. In practice, since the label of the target domain is not available, we use the clustering information of the source domain to assign pseudo labels to the target domain samples, and then according to the source domain data prior knowledge guides those positive features to maximum the inter-class distance between different classes and mimimum the intra-class distance. Extensive experimental results on various benchmarks ($i.e.$ ImageCLEF-DA, Office-31, Office-Home, Visda-2017) under different domain adaptation settings prove that our FOST compares favorably against the existing state-of-the-art domain adaptation methods.

preprint2021arXiv

The Healthy States of America: Creating a Health Taxonomy with Social Media

Since the uptake of social media, researchers have mined online discussions to track the outbreak and evolution of specific diseases or chronic conditions such as influenza or depression. To broaden the set of diseases under study, we developed a Deep Learning tool for Natural Language Processing that extracts mentions of virtually any medical condition or disease from unstructured social media text. With that tool at hand, we processed Reddit and Twitter posts, analyzed the clusters of the two resulting co-occurrence networks of conditions, and discovered that they correspond to well-defined categories of medical conditions. This resulted in the creation of the first comprehensive taxonomy of medical conditions automatically derived from online discussions. We validated the structure of our taxonomy against the official International Statistical Classification of Diseases and Related Health Problems (ICD-11), finding matches of our clusters with 20 official categories, out of 22. Based on the mentions of our taxonomy's sub-categories on Reddit posts geo-referenced in the U.S., we were then able to compute disease-specific health scores. As opposed to counts of disease mentions or counts with no knowledge of our taxonomy's structure, we found that our disease-specific health scores are causally linked with the officially reported prevalence of 18 conditions.

preprint2014arXiv

A note on the passage time of finite state Markov chains

Consider a Markov chain with finite state $\{0, 1, ..., d\}$. We give the generation functions (or Laplace transforms) of absorbing (passage) time in the following two situations : (1) the absorbing time of state $d$ when the chain starts from any state $i$ and absorbing at state $d$; (2) the passage time of any state $i$ when the chain starts from the stationary distribution supposed the chain is time reversible and ergodic. Example shows that it is more convenient compared with the existing methods, especially we can calculate the expectation of the absorbing time directly.

preprint2014arXiv

Dynamic Mean-LPM and Mean-CVaR Portfolio Optimization in Continuous-time

Instead of controlling "symmetric" risks measured by central moments of investment return or terminal wealth, more and more portfolio models have shifted their focus to manage "asymmetric" downside risks that the investment return is below certain threshold. Among the existing downside risk measures, the lower-partial moments (LPM) and conditional value-at-risk (CVaR) are probably most promising. In this paper we investigate the dynamic mean-LPM and mean-CVaR portfolio optimization problems in continuous-time, while the current literature has only witnessed their static versions. Our contributions are two-fold, in both building up tractable formulations and deriving corresponding analytical solutions. By imposing a limit funding level on the terminal wealth, we conquer the ill-posedness exhibited in the class of mean-downside risk portfolio models. The limit funding level not only enables us to solve both dynamic mean-LPM and mean-CVaR portfolio optimization problems, but also offers a flexibility to tame the aggressiveness of the portfolio policies generated from such mean - downside risk models. More specifically, for a general market setting, we prove the existence and uniqueness of the Lagrangian multiplies, which is a key step in applying the martingale approach, and establish a theoretical foundation for developing efficient numerical solution approaches. Moreover, for situations where the opportunity set of the market setting is deterministic, we derive analytical portfolio policies for both dynamic mean-LPM and mean-CVaR formulations.

preprint2014arXiv

Scaling limit of the local time of the Sinai's random walk

We prove that the local times of a sequence of Sinai's random walks convergence to those of Brox's diffusion by proper scaling, which is accord with the result of Seignourel (2000). Our proof is based on the convergence of the branching processes in random environment by Kurtz (1979).

preprint2013arXiv

Hitting Time Distribution for Skip-Free Markov Chains: A Simple Proof

A well-known theorem for an irreducible skip-free chain with absorbing state $d$, under some conditions, is that the hitting (absorbing) time of state $d$ starting from state 0 is distributed as the sum of $d$ independent geometric (or exponential) random variables. The purpose of this paper is to present a direct and simple proof of the theorem in the cases of both discrete and continuous time skip-free Markov chains. Our proof is to calculate directly the generation functions (or Laplace transforms) of hitting times in terms of the iteration method.

preprint2013arXiv

Tail asymptotic of the stationary distribution for the state dependent (1,R)-reflecting random walk: near critical

In this paper, we consider the $(1,R)$ state-dependent reflecting random walk (RW) on the half line, allowing the size of jumps to the right at maximal $R$ and to the left only 1. We provide an explicit criterion for positive recurrence and the explicit expression of the stationary distribution based on the intrinsic branching structure within the walk. As an application, we obtain the tail asymptotic of the stationary distribution in the "near critical" situation.

preprint2012arXiv

Explicit stationary distribution of the $(L,1)$-reflecting random walk on the half line

In this paper, we consider the $(L,1)$ state-dependent reflecting random walk (RW) on the half line, which is a RW allowing jumps to the left at a maxial size $L$. For this model, we provide an explicit criterion for (positive) recurrence and an explicit expression for the stationary distribution.As an application, we prove the geometric tail asymptotic behavior of the stationary distribution under certain conditions. The main tool employed in the paper is the intrinsic branching structure within the $(L,1)$-random walk.

preprint2012arXiv

Learning the Gain Values and Discount Factors of DCG

Evaluation metrics are an essential part of a ranking system, and in the past many evaluation metrics have been proposed in information retrieval and Web search. Discounted Cumulated Gains (DCG) has emerged as one of the evaluation metrics widely adopted for evaluating the performance of ranking functions used in Web search. However, the two sets of parameters, gain values and discount factors, used in DCG are determined in a rather ad-hoc way. In this paper we first show that DCG is generally not coherent, meaning that comparing the performance of ranking functions using DCG very much depends on the particular gain values and discount factors used. We then propose a novel methodology that can learn the gain values and discount factors from user preferences over rankings. Numerical simulations illustrate the effectiveness of our proposed methods. Please contact the authors for the full version of this work.

Ke Zhou

What is connected

Connect this record

See the researcher in context

Building this map preview

13 published item(s)

FAVOR: Efficient Filter-Agnostic Vector ANNS Based on Selectivity-Aware Exclusion Distances

Ion Clustering Regulated by Extreme Nanoconfinement Enables Mechanosensitive Nanochannels

SoulX-FlashTalk: Real-Time Infinite Streaming of Audio-Driven Avatars via Self-Correcting Bidirectional Distillation

Characterizing Fake News Targeting Corporations

Foreground Object Structure Transfer for Unsupervised Domain Adaptation

The Healthy States of America: Creating a Health Taxonomy with Social Media

A note on the passage time of finite state Markov chains

Dynamic Mean-LPM and Mean-CVaR Portfolio Optimization in Continuous-time

Scaling limit of the local time of the Sinai's random walk

Hitting Time Distribution for Skip-Free Markov Chains: A Simple Proof

Tail asymptotic of the stationary distribution for the state dependent (1,R)-reflecting random walk: near critical

Explicit stationary distribution of the $(L,1)$-reflecting random walk on the half line

Learning the Gain Values and Discount Factors of DCG