Source author record

Zhuoyi Huang

Zhuoyi Huang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

astro-ph.CO Artificial Intelligence astro-ph.GA astro-ph.IM Computation and Language Information Retrieval

Catalog footprint

What is connected

4works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Fine-tuning Small Language Models as Efficient Enterprise Search Relevance Labelers

In enterprise search, building high-quality datasets at scale remains a central challenge due to the difficulty of acquiring labeled data. To resolve this challenge, we propose an efficient approach to fine-tune small language models (SLMs) for accurate relevance labeling, enabling high-throughput, domain-specific labeling comparable or even better in quality to that of state-of-the-art large language models (LLMs). To overcome the lack of high-quality and accessible datasets in the enterprise domain, our method leverages on synthetic data generation. Specifically, we employ an LLM to synthesize realistic enterprise queries from a seed document, apply BM25 to retrieve hard negatives, and use a teacher LLM to assign relevance scores. The resulting dataset is then distilled into an SLM, producing a compact relevance labeler. We evaluate our approach on a high-quality benchmark consisting of 923 enterprise query-document pairs annotated by trained human annotators, and show that the distilled SLM achieves agreement with human judgments on par with or better than the teacher LLM. Furthermore, our fine-tuned labeler substantially improves throughput, achieving 17 times increase while also being 19 times more cost-effective. This approach enables scalable and cost-effective relevance labeling for enterprise-scale retrieval applications, supporting rapid offline evaluation and iteration in real-world settings.

preprint2015arXiv

The first and second data releases of the Kilo-Degree Survey

The Kilo-Degree Survey (KiDS) is an optical wide-field imaging survey carried out with the VLT Survey Telescope and the OmegaCAM camera. KiDS will image 1500 square degrees in four filters (ugri), and together with its near-infrared counterpart VIKING will produce deep photometry in nine bands. Designed for weak lensing shape and photometric redshift measurements, the core science driver of the survey is mapping the large-scale matter distribution in the Universe back to a redshift of ~0.5. Secondary science cases are manifold, covering topics such as galaxy evolution, Milky Way structure, and the detection of high-redshift clusters and quasars. KiDS is an ESO Public Survey and dedicated to serving the astronomical community with high-quality data products derived from the survey data, as well as with calibration data. Public data releases will be made on a yearly basis, the first two of which are presented here. For a total of 148 survey tiles (~160 sq.deg.) astrometrically and photometrically calibrated, coadded ugri images have been released, accompanied by weight maps, masks, source lists, and a multi-band source catalog. A dedicated pipeline and data management system based on the Astro-WISE software system, combined with newly developed masking and source classification software, is used for the data production of the data products described here. The achieved data quality and early science projects based on the data products in the first two data releases are reviewed in order to validate the survey data. Early scientific results include the detection of nine high-z QSOs, fifteen candidate strong gravitational lenses, high-quality photometric redshifts and galaxy structural parameters for hundreds of thousands of galaxies. (Abridged)

preprint2012arXiv

On the shear estimation bias induced by the spatial variation of colour across galaxy profiles

The spatial variation of the colour of a galaxy may introduce a bias in the measurement of its shape if the PSF profile depends on wavelength. We study how this bias depends on the properties of the PSF and the galaxies themselves. The bias depends on the scales used to estimate the shape, which may be used to optimise methods to reduce the bias. Here we develop a general approach to quantify the bias. Although applicable to any weak lensing survey, we focus on the implications for the ESA Euclid mission. Based on our study of synthetic galaxies we find that the bias is a few times 10^-3 for a typical galaxy observed by Euclid. Consequently, it cannot be neglected and needs to be accounted for. We demonstrate how one can do so using spatially resolved observations of galaxies in two filters. We show that HST observations in the F606W and F814W filters allow us to model and reduce the bias by an order of magnitude, sufficient to meet Euclid's scientific requirements. The precision of the correction is ultimately determined by the number of galaxies for which spatially-resolved observations in at least two filters are available. We use results from the Millennium Simulation to demonstrate that archival HST data will be sufficient for the tomographic cosmic shear analysis with the Euclid dataset.

preprint2011arXiv

A weak lensing analysis of the Abell 383 cluster

In this paper we use deep CFHT and SUBARU $uBVRIz$ archival images of the Abell 383 cluster (z=0.187) to estimate its mass by weak lensing. To this end, we first use simulated images to check the accuracy provided by our KSB pipeline. Such simulations include both the STEP 1 and 2 simulations, and more realistic simulations of the distortion of galaxy shapes by a cluster with a Navarro-Frenk-White (NFW) profile. From such simulations we estimate the effect of noise on shear measurement and derive the correction terms. The R-band image is used to derive the mass by fitting the observed tangential shear profile with a NFW mass profile. Photometric redshifts are computed from the uBVRIz catalogs. Different methods for the foreground/background galaxy selection are implemented, namely selection by magnitude, color and photometric redshifts, and results are compared. In particular, we developed a semi-automatic algorithm to select the foreground galaxies in the color-color diagram, based on observed colors. Using color selection or photometric redshifts improves the correction of dilution from foreground galaxies: this leads to higher signals in the inner parts of the cluster. We obtain a cluster mass that is ~ 20% higher than previous estimates, and is more consistent the mass expected from X--ray data. The R-band luminosity function of the cluster is finally computed.