Source author record

Xiao Zhou

Xiao Zhou appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence astro-ph.SR Computer Vision Computational Complexity cs.CY Data Structures and Algorithms Discrete Mathematics Populations and Evolution Software Engineering Biological Physics Computation and Language Computer Science and Game Theory cond-mat.mtrl-sci eess.IV Information Retrieval Social and Information Networks

Catalog footprint

What is connected

19works

16topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

AcademiClaw: When Students Set Challenges for AI Agents

Benchmarks within the OpenClaw ecosystem have thus far evaluated exclusively assistant-level tasks, leaving the academic-level capabilities of OpenClaw largely unexamined. We introduce AcademiClaw, a bilingual benchmark of 80 complex, long-horizon tasks sourced directly from university students' real academic workflows -- homework, research projects, competitions, and personal projects -- that they found current AI agents unable to solve effectively. Curated from 230 student-submitted candidates through rigorous expert review, the final task set spans 25+ professional domains, ranging from olympiad-level mathematics and linguistics problems to GPU-intensive reinforcement learning and full-stack system debugging, with 16 tasks requiring CUDA GPU execution. Each task executes in an isolated Docker sandbox and is scored on task completion by multi-dimensional rubrics combining six complementary techniques, with an independent five-category safety audit providing additional behavioral analysis. Experiments on six frontier models show that even the best achieves only a 55\% pass rate. Further analysis uncovers sharp capability boundaries across task domains, divergent behavioral strategies among models, and a disconnect between token consumption and output quality, providing fine-grained diagnostic signals beyond what aggregate metrics reveal. We hope that AcademiClaw and its open-sourced data and code can serve as a useful resource for the OpenClaw community, driving progress toward agents that are more capable and versatile across the full breadth of real-world academic demands. All data and code are available at https://github.com/GAIR-NLP/AcademiClaw.

preprint2026arXiv

MoHoBench: Assessing Honesty of Multimodal Large Language Models via Unanswerable Visual Questions

Recently Multimodal Large Language Models (MLLMs) have achieved considerable advancements in vision-language tasks, yet produce potentially harmful or untrustworthy content. Despite substantial work investigating the trustworthiness of language models, MMLMs' capability to act honestly, especially when faced with visually unanswerable questions, remains largely underexplored. This work presents the first systematic assessment of honesty behaviors across various MLLMs. We ground honesty in models' response behaviors to unanswerable visual questions, define four representative types of such questions, and construct MoHoBench, a large-scale MMLM honest benchmark, consisting of 12k+ visual question samples, whose quality is guaranteed by multi-stage filtering and human verification. Using MoHoBench, we benchmarked the honesty of 28 popular MMLMs and conducted a comprehensive analysis. Our findings show that: (1) most models fail to appropriately refuse to answer when necessary, and (2) MMLMs' honesty is not solely a language modeling issue, but is deeply influenced by visual information, necessitating the development of dedicated methods for multimodal honesty alignment. Therefore, we implemented initial alignment methods using supervised and preference learning to improve honesty behavior, providing a foundation for future work on trustworthy MLLMs. Our data and code can be found at https://github.com/yanxuzhu/MoHoBench.

preprint2026arXiv

OpenComputer: Verifiable Software Worlds for Computer-Use Agents

We present OpenComputer, a verifier-grounded framework for constructing verifiable software worlds for computer-use agents. OpenComputer integrates four components: (1) app-specific state verifiers that expose structured inspection endpoints over real applications, (2) a self-evolving verification layer that improves verifier reliability using execution-grounded feedback, (3) a task-generation pipeline that synthesizes realistic and machine-checkable desktop tasks, and (4) an evaluation harness that records full trajectories and computes auditable partial-credit rewards. In its current form, OpenComputer covers 33 desktop applications and 1,000 finalized tasks spanning browsers, office tools, creative software, development environments, file managers, and communication applications. Experiments show that OpenComputer's hard-coded verifiers align more closely with human adjudication than LLM-as-judge evaluation, especially when success depends on fine-grained application state. Frontier agents struggle with end-to-end completion despite partial progress, and open-source models exhibit sharp drops from their OSWorld-Verified scores, exposing a persistent gap in robust computer automation.

preprint2026arXiv

Why not Collaborative Filtering in Dual View? Bridging Sparse and Dense Models

Collaborative Filtering (CF) remains the cornerstone of modern recommender systems, with dense embedding--based methods dominating current practice. However, these approaches suffer from a critical limitation: our theoretical analysis reveals a fundamental signal-to-noise ratio (SNR) ceiling when modeling unpopular items, where parameter-based dense models experience diminishing SNR under severe data sparsity. To overcome this bottleneck, we propose SaD (Sparse and Dense), a unified framework that integrates the semantic expressiveness of dense embeddings with the structural reliability of sparse interaction patterns. We theoretically show that aligning these dual views yields a strictly superior global SNR. Concretely, SaD introduces a lightweight bidirectional alignment mechanism: the dense view enriches the sparse view by injecting semantic correlations, while the sparse view regularizes the dense model through explicit structural signals. Extensive experiments demonstrate that, under this dual-view alignment, even a simple matrix factorization--style dense model can achieve state-of-the-art performance. Moreover, SaD is plug-and-play and can be seamlessly applied to a wide range of existing recommender models, highlighting the enduring power of collaborative filtering when leveraged from dual perspectives. Further evaluations on real-world benchmarks show that SaD consistently outperforms strong baselines, ranking first on the BarsMatch leaderboard. The code is publicly available at https://github.com/harris26-G/SaD.

preprint2022arXiv

Effect of compositional fluctuation on the survival of bet-hedging species

Understanding the coexistence of diverse species in a changing environment is an important problem in community ecology. Bet-hedging is a strategy that helps species survive in such changing environments. However, studies of bet-hedging have often focused on the expected long-term growth rate of the species by itself, neglecting competition with other coexisting species. Here we study the extinction risk of a bet-hedging species in competition with others. We show that there are three contributions to the extinction risk. The first is the usual demographic fluctuation due to stochastic reproduction and selection processes in finite populations. The second, due to the fluctuation of population growth rate caused by environmental changes, may counterintuitively reduce the extinction risk for small populations. Besides those two, we reveal a third contribution, which is unique to bet-hedging species that diversify into multiple phenotypes: The phenotype composition of the population will fluctuate over time, resulting in increased extinction risk. We compare such compositional fluctuation to the demographic and environmental contributions, showing how they have different effects on the extinction risk depending on the population size, generation overlap, and environmental correlation.

preprint2022arXiv

Evaluation of non-pharmaceutical interventions and optimal strategies for containing the COVID-19 pandemic

Given multiple new COVID-19 variants are continuously emerging, non-pharmaceutical interventions are still primary control strategies to curb the further spread of coronavirus. However, implementing strict interventions over extended periods of time is inevitably hurting the economy. With an aim to solve this multi-objective decision-making problem, we investigate the underlying associations between policies, mobility patterns, and virus transmission. We further evaluate the relative performance of existing COVID-19 control measures and explore potential optimal strategies that can strike the right balance between public health and socio-economic recovery for individual states in the US. The results highlight the power of state of emergency declaration and wearing face masks and emphasize the necessity of pursuing tailor-made strategies for different states and phases of epidemiological transmission. Our framework enables policymakers to create more refined designs of COVID-19 strategies and can be extended to inform policy makers of any country about best practices in pandemic response.

preprint2022arXiv

Neural Global Shutter: Learn to Restore Video from a Rolling Shutter Camera with Global Reset Feature

Most computer vision systems assume distortion-free images as inputs. The widely used rolling-shutter (RS) image sensors, however, suffer from geometric distortion when the camera and object undergo motion during capture. Extensive researches have been conducted on correcting RS distortions. However, most of the existing work relies heavily on the prior assumptions of scenes or motions. Besides, the motion estimation steps are either oversimplified or computationally inefficient due to the heavy flow warping, limiting their applicability. In this paper, we investigate using rolling shutter with a global reset feature (RSGR) to restore clean global shutter (GS) videos. This feature enables us to turn the rectification problem into a deblur-like one, getting rid of inaccurate and costly explicit motion estimation. First, we build an optic system that captures paired RSGR/GS videos. Second, we develop a novel algorithm incorporating spatial and temporal designs to correct the spatial-varying RSGR distortion. Third, we demonstrate that existing image-to-image translation algorithms can recover clean GS videos from distorted RSGR inputs, yet our algorithm achieves the best performance with the specific designs. Our rendered results are not only visually appealing but also beneficial to downstream tasks. Compared to the state-of-the-art RS solution, our RSGR solution is superior in both effectiveness and efficiency. Considering it is easy to realize without changing the hardware, we believe our RSGR solution can potentially replace the RS solution in taking distortion-free videos with low noise and low budget.

preprint2022arXiv

NTIRE 2022 Challenge on Super-Resolution and Quality Enhancement of Compressed Video: Dataset, Methods and Results

This paper reviews the NTIRE 2022 Challenge on Super-Resolution and Quality Enhancement of Compressed Video. In this challenge, we proposed the LDV 2.0 dataset, which includes the LDV dataset (240 videos) and 95 additional videos. This challenge includes three tracks. Track 1 aims at enhancing the videos compressed by HEVC at a fixed QP. Track 2 and Track 3 target both the super-resolution and quality enhancement of HEVC compressed video. They require x2 and x4 super-resolution, respectively. The three tracks totally attract more than 600 registrations. In the test phase, 8 teams, 8 teams and 12 teams submitted the final results to Tracks 1, 2 and 3, respectively. The proposed methods and solutions gauge the state-of-the-art of super-resolution and quality enhancement of compressed video. The proposed LDV 2.0 dataset is available at https://github.com/RenYang-home/LDV_dataset. The homepage of this challenge (including open-sourced codes) is at https://github.com/RenYang-home/NTIRE22_VEnh_SR.

preprint2022arXiv

UNet#: A UNet-like Redesigning Skip Connections for Medical Image Segmentation

As an essential prerequisite for developing a medical intelligent assistant system, medical image segmentation has received extensive research and concentration from the neural network community. A series of UNet-like networks with encoder-decoder architecture has achieved extraordinary success, in which UNet2+ and UNet3+ redesign skip connections, respectively proposing dense skip connection and full-scale skip connection and dramatically improving compared with UNet in medical image segmentation. However, UNet2+ lacks sufficient information explored from the full scale, which will affect the learning of organs' location and boundary. Although UNet3+ can obtain the full-scale aggregation feature map, owing to the small number of neurons in the structure, it does not satisfy the segmentation of tiny objects when the number of samples is small. This paper proposes a novel network structure combining dense skip connections and full-scale skip connections, named UNet-sharp (UNet\#) for its shape similar to symbol \#. The proposed UNet\# can aggregate feature maps of different scales in the decoder sub-network and capture fine-grained details and coarse-grained semantics from the full scale, which benefits learning the exact location and accurately segmenting the boundary of organs or lesions. We perform deep supervision for model pruning to speed up testing and make it possible for the model to run on mobile devices; furthermore, designing two classification-guided modules to reduce false positives achieves more accurate segmentation results. Various experiments of semantic segmentation and instance segmentation on different modalities (EM, CT, MRI) and dimensions (2D, 3D) datasets, including the nuclei, brain tumor, liver, and lung, demonstrate that the proposed method outperforms state-of-the-art models.

preprint2020arXiv

Application of Seq2Seq Models on Code Correction

We apply various seq2seq models on programming language correction tasks on Juliet Test Suite for C/C++ and Java of Software Assurance Reference Datasets(SARD), and achieve 75\%(for C/C++) and 56\%(for Java) repair rates on these tasks. We introduce Pyramid Encoder in these seq2seq models, which largely increases the computational efficiency and memory efficiency, while remain similar repair rate to their non-pyramid counterparts. We successfully carry out error type classification task on ITC benchmark examples (with only 685 code instances) using transfer learning with models pre-trained on Juliet Test Suite, pointing out a novel way of processing small programing language datasets.

preprint2020arXiv

ASAS J174406+2446.8 is identified as a marginal-contact binary with a possible cool third body

ASAS J174406+2446.8 was originally found as a $δ$ Scuti-type pulsating star with the period P=0.189068 $days$ by ASAS survey. However, the LAMOST stellar parameters reveal that it is far beyond the red edge of pulsational instability strip on the $\log g-T$ diagram of $δ$ Scuti pulsating stars. To understand the physical properties of the variable star, we observed it by the 1.0-m Cassegrain reflecting telescope at Yunnan Observatories. Multi-color light curves in B, V, R$_{c}$ and I$_{c}$ bands were obtained and are analyzed by using the W-D program. It is found that this variable star is a shallow-contact binary with an EB-type light curve and an orbital period of 0.3781\,days rather than a $δ$ Scuti star. It is a W-subtype contact binary with a mass ratio of $1.135(\pm0.019)$ and a fill-out factor of $10.4(\pm5.6)\,\%$. The situation of ASAS J174406+2446.8 resembles those of other EB-type marginal-contact binaries such as UU Lyn, II Per and GW Tau. All of them are at a key evolutionary phase from a semi-detached configuration to a contact system predicted by the thermal relaxation oscillation theory. The linear ephemeris was corrected by using 303 new determined times of light minimum. It is detected that the O - C curve shows a sinusoidal variationthat could be explained by the light-travel-time effect via the presence of a cool red dwarf. The present investigation reveals that some of the $δ$ Scuti-type stars beyond the red edge of pulsating instability strip on the $\log g-T$ diagram are misclassified eclipsing binaries. To understand their structures and evolutionary states, more studies are required in the future.

preprint2020arXiv

Diversifying Dialogue Generation with Non-Conversational Text

Neural network-based sequence-to-sequence (seq2seq) models strongly suffer from the low-diversity problem when it comes to open-domain dialogue generation. As bland and generic utterances usually dominate the frequency distribution in our daily chitchat, avoiding them to generate more interesting responses requires complex data filtering, sampling techniques or modifying the training objective. In this paper, we propose a new perspective to diversify dialogue generation by leveraging non-conversational text. Compared with bilateral conversations, non-conversational text are easier to obtain, more diverse and cover a much broader range of topics. We collect a large-scale non-conversational corpus from multi sources including forum comments, idioms and book snippets. We further present a training paradigm to effectively incorporate these text via iterative back translation. The resulting model is tested on two conversational datasets and is shown to produce significantly more diverse responses without sacrificing the relevance with context.

preprint2019arXiv

Photometric investigation on the W-subtype contact binary V1197 Her

Multi-color light curves of V1197 Her were obtained with the 2.4 meter optical telescope at Thai National Observatory and the Wilson-Devinney (W-D) program is used to model the observational light curves. The photometric solutions reveal that V1197 Her is a W-subtype shallow contact binary system with a mass ratio of $q = 2.61 $ and fill-out factor to be $f = 15.7\,\%$. The temperature difference between the primary star and secondary star is only $140K$ in spite of the low degree of contact, which means that V1197 Her is not only in geometrical contact configuration but also already under thermal contact status. The orbital inclination of V1197 Her is as high as $i = 82.7^{\circ}$, and the primary star is completely eclipsed at the primary minimum. The totally eclipsing characteristic implies that the determined physical parameters are highly reliable. The masses, radii and luminosities of the primary star (star 1) and secondary star (star 2) are estimated to be $M_{1} = 0.30(1)M_\odot$, $M_{2} = 0.77(2)M_\odot$, $R_{1} = 0.54(1)R_\odot$, $R_{2} = 0.83(1)R_\odot$, $L_{1} = 0.18(1)L_\odot$ and $L_{2} = 0.38(1)L_\odot$. The evolutionary status of the two component stars are drawn in the H - R diagram, which shows that the less massive but hotter primary star is more evolved than the secondary star. The period of V1197 Her is decreasing continuously at a rate of $dP/dt=-2.58\times{10^{-7}}day\cdot year^{-1}$, which can be explained by mass transfer from the more massive star to the less massive one with a rate of $\frac{dM_{2}}{dt}=- 1.61\times{10^{-7}}M_\odot/year$. The light curves of V1197 Her is reported to have the O'Connell effect. Thus, a cool spot is added to the massive star to model the asymmetry on light curves.

preprint2016arXiv

A low-mass-ratio and deep contact binary as the progenitor of the merger V1309 Sco

Nova Sco 2008 (=V1309 Sco) is an example of a V838 Mon type eruption rather than a typical classical nova. This enigmatic object was recently shown to have resulted from the merger of two stars in a contact binary. It is the first stellar merger that was identified to be undergoing a common envelope transient. To understand the properties of its binary progenitor, the pre-outburst light curves were analyzed by using the W-D method. The photometric solution of the 2002 light curve shows that it is a deep contact binary (f = 89.5(+-40.5)%) with a mass ratio of 0.094. The asymmetry of the light curve is explained by the presence of a dark spot on the more massive component. The extremely high fill-out factor suggests that the merging of the contact binary is driven by dynamical mass loss from the outer Lagrange point. However, the analysis of the 2004 light curve indicates that no solutions were obtained even at an extremely low mass ratio of q = 0.03. This suggests that the common convective envelope of the binary system disappeared and the secondary component spiraled into the envelope of the primary in 2004. Finally, the ejection of the envelope of the primary produced the outburst.

preprint2016arXiv

The Complexity of (List) Edge-Coloring Reconfiguration Problem

Let $G$ be a graph such that each edge has its list of available colors, and assume that each list is a subset of the common set consisting of $k$ colors. Suppose that we are given two list edge-colorings $f_0$ and $f_r$ of $G$, and asked whether there exists a sequence of list edge-colorings of $G$ between $f_0$ and $f_r$ such that each list edge-coloring can be obtained from the previous one by changing a color assignment of exactly one edge. This problem is known to be PSPACE-complete for every integer $k \ge 6$ and planar graphs of maximum degree three, but any complexity hardness was unknown for the non-list variant. In this paper, we first improve the known result by proving that, for every integer $k \ge 4$, the problem remains PSPACE-complete even if an input graph is planar, bounded bandwidth, and of maximum degree three. We then give the first complexity hardness result for the non-list variant: for every integer $k \ge 5$, we prove that the non-list variant is PSPACE-complete even if an input graph is planar, of bandwidth linear in $k$, and of maximum degree $k$.

preprint2016arXiv

The Photometric Investigation of V921 Her using the Lunar-based Ultraviolet Telescope of Chang'e-3 mission

The light curve of V921 Her in ultraviolet band observed by the Lunar-based Ultraviolet Telescope (LUT) is analyzed by the Wilson-Devinney code. Our solutions conclude that V921 Her is an early type marginal contact binary system with an additional close-in component. The binary system is under poor thermal contact with a temperature difference of nearly $700K$ between the two components. The close-in component contributes about $19\,\%$ of the total luminosity in the triple system. Combining the radial velocity study together with our photometric solutions, the mass of the primary star and secondary one are calculated to be $M_1 = 1.784(\pm0.055)M_\odot$, $M_2 = 0.403(\pm0.012)M_\odot$. The evolutionary scenario of V921 Her is discussed. All times of light minimum of V921 Her available in the bibliography are taken into account and the $O - C$ curve is analyzed for the first time. The most probable fitting results are discussed in the paper, which also confirm the existence of a third component ($P_3=10.2$ year) around the binary system. The period of V921 Her is also undergoing a continuously rapid increase at a rate of $dP/dt=+2.79\times{10^{-7}}day\cdot year^{-1}$, which may due to mass transfer from the less massive component to the more massive one.

preprint2015arXiv

Chemomechanical Origin of Hydrogen Trapping at Grain Boundaries in FCC Metals

Hydrogen embrittlement of metals is widely observed, but its atomistic origins remain little understood and much debated. Combining a unique identification of interstitial sites through polyhedral tessellation and first-principles calculations, we study hydrogen adsorption at grain boundaries in a variety of face-centered cubic metals of Ni, Cu, gamma-Fe and Pd. We discover the chemomechanical origin of variation of adsorption energetics for interstitial hydrogen at grain boundaries. A general chemomechanical formula is established to provide accurate assessments of hydrogen trapping and segregation energetics at grain boundaries, and it also offers direct explanations for certain experimental observations. The present study deepens our mechanistic understanding of the role of grain boundaries in hydrogen embrittlement, and promises a viable path towards predictive microstructure engineering against hydrogen embrittlement in structural metals.

preprint2014arXiv

Computational Complexity of Competitive Diffusion on (Un)weighted Graphs

Consider an undirected graph modeling a social network, where the vertices represent users, and the edges do connections among them. In the competitive diffusion game, each of a number of players chooses a vertex as a seed to propagate his/her opinion, and then it spreads along the edges in the graphs. The objective of every player is to maximize the number of vertices the opinion infects. In this paper, we investigate a computational problem of asking whether a pure Nash equilibrium exists in the competitive diffusion game on unweighed and weighted graphs, and present several negative and positive results. We first prove that the problem is W[1]-hard when parameterized by the number of players even for unweighted graphs. We also show that the problem is NP-hard even for series-parallel graphs with positive integer weights, and is NP-hard even for forests with arbitrary integer weights. Furthermore, we show that the problem for forest of paths with arbitrary weights is solvable in pseudo-polynomial time; and it is solvable in quadratic time if a given graph is unweighted. We also prove that the problem for chain, cochain, and threshold graphs with arbitrary integer weights is solvable in polynomial time.

preprint2014arXiv

The List Coloring Reconfiguration Problem for Bounded Pathwidth Graphs

We study the problem of transforming one list (vertex) coloring of a graph into another list coloring by changing only one vertex color assignment at a time, while at all times maintaining a list coloring, given a list of allowed colors for each vertex. This problem is known to be PSPACE-complete for bipartite planar graphs. In this paper, we first show that the problem remains PSPACE-complete even for bipartite series-parallel graphs, which form a proper subclass of bipartite planar graphs. We note that our reduction indeed shows the PSPACE-completeness for graphs with pathwidth two, and it can be extended for threshold graphs. In contrast, we give a polynomial-time algorithm to solve the problem for graphs with pathwidth one. Thus, this paper gives precise analyses of the problem with respect to pathwidth.

Xiao Zhou

What is connected

Connect this record

See the researcher in context

Building this map preview

19 published item(s)

AcademiClaw: When Students Set Challenges for AI Agents

MoHoBench: Assessing Honesty of Multimodal Large Language Models via Unanswerable Visual Questions

OpenComputer: Verifiable Software Worlds for Computer-Use Agents

Why not Collaborative Filtering in Dual View? Bridging Sparse and Dense Models

Effect of compositional fluctuation on the survival of bet-hedging species

Evaluation of non-pharmaceutical interventions and optimal strategies for containing the COVID-19 pandemic

Neural Global Shutter: Learn to Restore Video from a Rolling Shutter Camera with Global Reset Feature

NTIRE 2022 Challenge on Super-Resolution and Quality Enhancement of Compressed Video: Dataset, Methods and Results

UNet#: A UNet-like Redesigning Skip Connections for Medical Image Segmentation

Application of Seq2Seq Models on Code Correction

ASAS J174406+2446.8 is identified as a marginal-contact binary with a possible cool third body

Diversifying Dialogue Generation with Non-Conversational Text

Photometric investigation on the W-subtype contact binary V1197 Her

A low-mass-ratio and deep contact binary as the progenitor of the merger V1309 Sco

The Complexity of (List) Edge-Coloring Reconfiguration Problem

The Photometric Investigation of V921 Her using the Lunar-based Ultraviolet Telescope of Chang'e-3 mission

Chemomechanical Origin of Hydrogen Trapping at Grain Boundaries in FCC Metals

Computational Complexity of Competitive Diffusion on (Un)weighted Graphs

The List Coloring Reconfiguration Problem for Bounded Pathwidth Graphs