Source author record

Jie Hao

Jie Hao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Applications Artificial Intelligence Information Theory math.HO math.IT math.ST Statistics Theory

Catalog footprint

What is connected

6works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Beyond Perfect APIs: A Comprehensive Evaluation of LLM Agents Under Real-World API Complexity

We introduce WildAGTEval, a benchmark designed to evaluate large language model (LLM) agents' function-calling capabilities under realistic API complexity. Unlike prior work that assumes an idealized API system and disregards real-world factors such as noisy API outputs, WildAGTEval accounts for two dimensions of real-world complexity: 1. API specification, which includes detailed documentation and usage constraints, and 2. API execution, which captures runtime challenges. Consequently, WildAGTEval offers (i) an API system encompassing 60 distinct complexity scenarios that can be composed into approximately 32K test configurations, and (ii) user-agent interactions for evaluating LLM agents on these scenarios. Using WildAGTEval, we systematically assess several advanced LLMs and observe that most scenarios are challenging, with irrelevant information complexity posing the greatest difficulty and reducing the performance of strong LLMs by 27.3%. Furthermore, our qualitative analysis reveals that LLMs occasionally distort user intent merely to claim task completion, critically affecting user satisfaction.

preprint2020arXiv

Robust Dialogue Utterance Rewriting as Sequence Tagging

The task of dialogue rewriting aims to reconstruct the latest dialogue utterance by copying the missing content from the dialogue context. Until now, the existing models for this task suffer from the robustness issue, i.e., performances drop dramatically when testing on a different domain. We address this robustness issue by proposing a novel sequence-tagging-based model so that the search space is significantly reduced, yet the core of this task is still well covered. As a common issue of most tagging models for text generation, the model's outputs may lack fluency. To alleviate this issue, we inject the loss signal from BLEU or GPT-2 under a REINFORCE framework. Experiments show huge improvements of our model over the current state-of-the-art systems on domain transfer.

preprint2016arXiv

Constructions of Optimal Cyclic $(r,δ)$ Locally Repairable Codes

A code is said to be a $r$-local locally repairable code (LRC) if each of its coordinates can be repaired by accessing at most $r$ other coordinates. When some of the $r$ coordinates are also erased, the $r$-local LRC can not accomplish the local repair, which leads to the concept of $(r,δ)$-locality. A $q$-ary $[n, k]$ linear code $\cC$ is said to have $(r, δ)$-locality ($δ\ge 2$) if for each coordinate $i$, there exists a punctured subcode of $\cC$ with support containing $i$, whose length is at most $r + δ- 1$, and whose minimum distance is at least $δ$. The $(r, δ)$-LRC can tolerate $δ-1$ erasures in total, which degenerates to a $r$-local LRC when $δ=2$. A $q$-ary $(r,δ)$ LRC is called optimal if it meets the Singleton-like bound for $(r,δ)$-LRCs. A class of optimal $q$-ary cyclic $r$-local LRCs with lengths $n\mid q-1$ were constructed by Tamo, Barg, Goparaju and Calderbank based on the $q$-ary Reed-Solomon codes. In this paper, we construct a class of optimal $q$-ary cyclic $(r,δ)$-LRCs ($δ\ge 2$) with length $n\mid q-1$, which generalizes the results of Tamo \emph{et al.} Moreover, we construct a new class of optimal $q$-ary cyclic $r$-local LRCs with lengths $n\mid q+1$ and a new class of optimal $q$-ary cyclic $(r,δ)$-LRCs ($δ\ge 2$) with lengths $n\mid q+1$. The constructed optimal LRCs with length $n=q+1$ have the best-known length $q+1$ for the given finite field with size $q$ when the minimum distance is larger than $4$.

preprint2014arXiv

Distribution of the Maximum and Minimum of a Random Number of Bounded Random Variables

We study a new family of random variables, that each arise as the distribution of the maximum or minimum of a random number $N$ of i.i.d.~random variables $X_1,X_2,\ldots,X_N$, each distributed as a variable $X$ with support on $[0,1]$. The general scheme is first outlined, and several special cases are studied in detail. Wherever appropriate, we find estimates of the parameter $θ$ in the one-parameter family in question.

preprint2014arXiv

Telescoping Sums, Permutations, and First Occurrence Distributions

Telescoping sums very naturally lead to probability distributions on ${\mathbb Z}^+$. But are these distributions typically cosmetic and devoid of motivation? In this paper we give three examples of "first occurrence" distributions, each defined by telescoping sums, and that each arise from concrete questions about the structure of permutations.

preprint2012arXiv

BATMAN-an R package for the automated quantification of metabolites from NMR spectra using a Bayesian Model

Motivation: NMR spectra are widely used in metabolomics to obtain metabolite profiles in complex biological mixtures. Common methods used to assign and estimate concentrations of metabolites involve either an expert manual peak fitting or extra pre-processing steps, such as peak alignment and binning. Peak fitting is very time consuming and is subject to human error. Conversely, alignment and binning can introduce artefacts and limit immediate biological interpretation of models. Results: We present the Bayesian AuTomated Metabolite Analyser for NMR spectra (BATMAN), an R package which deconvolutes peaks from 1-dimensional NMR spectra, automatically assigns them to specific metabolites from a target list and obtains concentration estimates. The Bayesian model incorporates information on charac-teristic peak patterns of metabolites and is able to account for shifts in the position of peaks commonly seen in NMR spectra of biological samples. It applies a Markov Chain Monte Carlo (MCMC) algorithm to sample from a joint posterior distribution of the model parameters and obtains concentration estimates with reduced error compared with conventional numerical integration and comparable to manual deconvolution by experienced spectroscopists. Availability: http://www1.imperial.ac.uk/medicine/people/t.ebbels/ Contact: t.ebbels@imperial.ac.uk