Researcher profile

Yuxuan Zhu

Yuxuan Zhu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2026arXiv

MM-OptBench: A Solver-Grounded Benchmark for Multimodal Optimization Modeling

Optimization modeling translates real decision-making problems into mathematical optimization models and solver-executable implementations. Although language models are increasingly used to generate optimization formulations and solver code, existing benchmarks are almost entirely text-only. This omits many optimization-modeling tasks that arise in operational practice, where requirements are described in text but instance information is conveyed through visual artifacts such as tables, graphs, maps, schedules, and dashboards. We introduce multimodal optimization modeling, a benchmark setting in which models must construct both a mathematical formulation and executable solver code from a text-and-visual problem specification. To evaluate this setting, we develop a solver-grounded framework that generates structured optimization instances, verifies each with an exact solver, and builds both the model-facing inputs and hidden reference files from the same verified source. We instantiate the framework as MM-OptBench, a benchmark of 780 solver-verified instances spanning 6 optimization families, 26 subcategories, and 3 structural difficulty levels. We evaluate 9 multimodal large language models (MLLMs), including 6 frontier general-purpose models and 3 math-specialized models, with aggregate, family-level, difficulty-level, and failure-mode analyses. The results show that the task remains far from solved: the best two models reach 52.1% and 51.3% pass@1, while on average across the six general-purpose MLLMs, pass@1 is 43.4% on easy instances and 15.9% on hard instances. All three math-specialized MLLMs solve 0/780 instances. Failure attribution shows that errors arise both when extracting instance data from text and visuals and when turning extracted data into solver-correct formulations and code. MM-OptBench provides a testbed for solver-grounded, decision-oriented multimodal intelligence.

preprint2026arXiv

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

AI agents may soon become capable of autonomously completing valuable, long-horizon tasks in diverse domains. Current benchmarks either do not measure real-world tasks, or are not sufficiently difficult to meaningfully measure frontier models. To this end, we present Terminal-Bench 2.0: a carefully curated hard benchmark composed of 89 tasks in computer terminal environments inspired by problems from real workflows. Each task features a unique environment, human-written solution, and comprehensive tests for verification. We show that frontier models and agents score less than 65\% on the benchmark and conduct an error analysis to identify areas for model and agent improvement. We publish the dataset and evaluation harness to assist developers and researchers in future work at https://www.tbench.ai/ .

preprint2021arXiv

Performance of a focal plane detector for soft X-ray imaging spectroscopy based on back-illuminated sCMOS

Spectroscopy focusing array (SFA) and Polarimetry focusing array (PFA) are the two major payloads of enhanced X-ray Timing and Polarimetry mission (eXTP). Nested Wolter-\RNum{1} X-ray mirror module is implemented in SFA and PFA to achive high effective area. When evaluating the properties of the mirror module, the alignment of the optical axis of the X-ray mirror module and a quasi-parallel X-ray beam is a prerequisite to ensure the accuracy of the results. Hence, to assist the alignment of the X-ray mirror module, an X-ray focal plane detector is designed based on the back-illuminated scientific Complementary Metal-Oxide-Semiconductor Transistor (sCMOS) sensor GSENSE6060BSI, one of the largest detection areas, is produced by \textit{Gpixel Inc}. Then the characteristics of readout noise, dark current, and split-pixel event properties of the detector are studied with the self-developed multi-target fluorescence X-ray source in a 100 m long X-ray test facility. The energy calibration is carried out with the single-pixel event and the energy non-linearity of the detector is also obtained. Eventually, the simulation of the eXTP mirror module based on the optical model is conducted and the alignment test of the Wolter-\RNum{1} X-ray mirror module designed for \textit{EP/FXT} (Einstein Probe/Follow-up X-ray Telescope) with "Burkert test" method is shown.

preprint2019arXiv

Overview to the Hard X-ray Modulation Telescope (Insight-HXMT) Satellite

As China's first X-ray astronomical satellite, the Hard X-ray Modulation Telescope (HXMT), which was dubbed as Insight-HXMT after the launch on June 15, 2017, is a wide-band (1-250 keV) slat-collimator-based X-ray astronomy satellite with the capability of all-sky monitoring in 0.2-3 MeV. It was designed to perform pointing, scanning and gamma-ray burst (GRB) observations and, based on the Direct Demodulation Method (DDM), the image of the scanned sky region can be reconstructed. Here we give an overview of the mission and its progresses, including payload, core sciences, ground calibration/facility, ground segment, data archive, software, in-orbit performance, calibration, background model, observations and some preliminary results.