Source author record

Yilin Liu

Yilin Liu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Applications Artificial Intelligence Computation and Language Computer Vision Graphics Methodology

Catalog footprint

What is connected

3works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

AgenticMath: Enhancing LLM Reasoning via Agentic-based Math Data Generation

The creation of high-quality datasets to improve Large Language Model (LLM) reasoning remains a significant challenge, as current methods often suffer from generating low-quality/incorrect answers and limited information richness from available data sources. To address this, we propose AgenticMath, a novel agentic method for generating high-quality mathematical question-answer pairs to enhance the supervised fine-tuning of LLMs. Our method operates through four stages: (1) Seed Question Filter that selects questions with high information richness, complexity, and clarity; (2) an Agentic Question Rephrase step that employs a multi-agent system to generate diverse, logically consistent paraphrases; (3) an Answer Augment step where rewrite answers using chain-of-thought reasoning to enhance numerical and logical correctness, without reliance on human-provided labels; and (4) a final Question and Answer Evaluation that retains only the most superior pairs. Extensive experiments demonstrate that, fine-tuning 3B-8B parameter LLMs on AgenticMath generated datasets (comprising only 30-60K math samples) achieves competitive or superior performance on diverse in domain and out-of-domain mathematical reasoning benchmarks compared to baselines trained on much more data (e.g., 400K or 2.3M samples). Our work demonstrates that targeted, high-quality data generation is a more efficient path to improving mathematical reasoning in LLMs than large-scale, low-quality alternatives.

preprint2022arXiv

Bayesian local exchangeability design for phase II basket trials

We propose an information borrowing strategy for the design and monitoring of phase II basket trials based on the local multisource exchangeability assumption between baskets (disease types). In our proposed local-MEM framework, information borrowing is only allowed to occur locally, i.e., among baskets with similar response rate and the amount of information borrowing is determined by the level of similarity in response rate, whereas baskets not considered similar are not allowed to share information. We construct a two-stage design for phase II basket trials using the proposed strategy. The proposed method is compared to competing Bayesian methods and Simon's two-stage design in a variety of simulation scenarios. We demonstrate the proposed method is able to maintain the family-wise type I error rate at a reasonable level and has desirable basket-wise power compared to Simon's two-stage design. In addition, our method is computationally efficient compared to existing Bayesian methods in that the posterior profiles of interest can be derived explicitly without the need for sampling algorithms.

preprint2022arXiv

Capturing, Reconstructing, and Simulating: the UrbanScene3D Dataset

We present UrbanScene3D, a large-scale data platform for research of urban scene perception and reconstruction. UrbanScene3D contains over 128k high-resolution images covering 16 scenes including large-scale real urban regions and synthetic cities with 136 km^2 area in total. The dataset also contains high-precision LiDAR scans and hundreds of image sets with different observation patterns, which provide a comprehensive benchmark to design and evaluate aerial path planning and 3D reconstruction algorithms. In addition, the dataset, which is built on Unreal Engine and Airsim simulator together with the manually annotated unique instance label for each building in the dataset, enables the generation of all kinds of data, e.g., 2D depth maps, 2D/3D bounding boxes, and 3D point cloud/mesh segmentations, etc. The simulator with physical engine and lighting system not only produce variety of data but also enable users to simulate cars or drones in the proposed urban environment for future research.

Yilin Liu

What is connected

Connect this record

See the researcher in context

Building this map preview

3 published item(s)

AgenticMath: Enhancing LLM Reasoning via Agentic-based Math Data Generation

Bayesian local exchangeability design for phase II basket trials

Capturing, Reconstructing, and Simulating: the UrbanScene3D Dataset