Source author record

Aitzaz Ahmad

Aitzaz Ahmad appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Information Theory math.IT Artificial Intelligence Computer Vision

Catalog footprint

What is connected

4works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Controlled Automatic Task-Specific Synthetic Data Generation for Hallucination Detection

We present a novel approach to automatically generate non-trivial task-specific synthetic datasets for hallucination detection. Our approach features a two-step generation-selection pipeline, using hallucination pattern guidance and a language style alignment during generation. Hallucination pattern guidance leverages the most important task-specific hallucination patterns while language style alignment aligns the style of the synthetic dataset with benchmark text. To obtain robust supervised detectors from synthetic datasets, we also adopt a data mixture strategy to improve performance robustness and generalization. Our results on three datasets show that our generated hallucination text is more closely aligned with non-hallucinated text versus baselines, to train hallucination detectors with better generalization. Our hallucination detectors trained on synthetic datasets outperform in-context-learning (ICL)-based detectors by a large margin of 32%. Our extensive experiments confirm the benefits of our approach with cross-task and cross-generator generalization. Our data-mixture-based training further improves the generalization and robustness of hallucination detection.

preprint2026arXiv

Efficient Continual Pre-training for Building Domain Specific Large Language Models

Large language models (LLMs) have demonstrated remarkable open-domain capabilities. LLMs tailored for a domain are typically trained entirely on domain corpus to excel at handling domain-specific tasks. In this work, we explore an alternative strategy of continual pre-training as a means to develop domain-specific LLMs over an existing open-domain LLM. We introduce FinPythia-6.9B, developed through domain-adaptive continual pre-training on the financial domain. Continual pre-trained FinPythia showcases consistent improvements on financial tasks over the original foundational model. We further explore simple but effective data selection strategies for continual pre-training. Our data selection strategies outperform vanilla continual pre-training's performance with just 10% of corpus size and cost, without any degradation on open-domain standard tasks. Our work proposes an alternative solution to building domain-specific LLMs cost-effectively.

preprint2012arXiv

A Factor Graph Approach to Clock Offset Estimation in Wireless Sensor Networks

The problem of clock offset estimation in a two way timing message exchange regime is considered when the likelihood function of the observation time stamps is Gaussian, exponential or log-normally distributed. A parametrized solution to the maximum likelihood (ML) estimation of clock offset, based on convex optimization, is presented, which differs from the earlier approaches where the likelihood function is maximized graphically. In order to capture the imperfections in node oscillators, which may render a time-varying nature to the clock offset, a novel Bayesian approach to the clock offset estimation is proposed by using a factor graph representation of the posterior density. Message passing using the max-product algorithm yields a closed form expression for the Bayesian inference problem. Several lower bounds on the variance of an estimator are derived for arbitrary exponential family distributed likelihood functions which, while serving as stepping stones to benchmark the performance of the proposed clock offset estimators, can be useful in their own right in classical as well Bayesian parameter estimation theory. To corroborate the theoretical findings, extensive simulation results are discussed for classical as well as Bayesian estimators in various scenarios. It is observed that the performance of the proposed estimators is fairly close to the fundamental limits established by the lower bounds.

preprint2012arXiv

Time-varying Clock Offset Estimation in Two-way Timing Message Exchange in Wireless Sensor Networks Using Factor Graphs

The problem of clock offset estimation in a two-way timing exchange regime is considered when the likelihood function of the observation time stamps is exponentially distributed. In order to capture the imperfections in node oscillators, which render a time-varying nature to the clock offset, a novel Bayesian approach to the clock offset estimation is proposed using a factor graph representation of the posterior density. Message passing using the max-product algorithm yields a closed form expression for the Bayesian inference problem.

Aitzaz Ahmad

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

Controlled Automatic Task-Specific Synthetic Data Generation for Hallucination Detection

Efficient Continual Pre-training for Building Domain Specific Large Language Models

A Factor Graph Approach to Clock Offset Estimation in Wireless Sensor Networks

Time-varying Clock Offset Estimation in Two-way Timing Message Exchange in Wireless Sensor Networks Using Factor Graphs