Source author record

Jawad Ahmed

Jawad Ahmed appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

2works
5topics
2close collaborators

Actions

Connect this record

Log in to claim

Research graph

See the researcher in context

Open full explorer

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2026arXiv

Qalb: Largest State-of-the-Art Urdu Large Language Model for 230M Speakers with Systematic Continued Pre-training

Despite remarkable progress in large language models, Urdu-a language spoken by over 230 million people-remains critically underrepresented in modern NLP systems. Existing multilingual models demonstrate poor performance on Urdu-specific tasks, struggling with the language's complex morphology, right-to-left Nastaliq script, and rich literary traditions. Even the base LLaMA-3.1 8B-Instruct model shows limited capability in generating fluent, contextually appropriate Urdu text. We introduce Qalb, an Urdu language model developed through a two-stage approach: continued pre-training followed by supervised fine-tuning. Starting from LLaMA 3.1 8B, we perform continued pre-training on a dataset of 1.97 billion tokens. This corpus comprises 1.84 billion tokens of diverse Urdu text-spanning news archives, classical and contemporary literature, government documents, and social media-combined with 140 million tokens of English Wikipedia data to prevent catastrophic forgetting. We then fine-tune the resulting model on the Alif Urdu-instruct dataset. Through extensive evaluation on Urdu-specific benchmarks, Qalb demonstrates substantial improvements, achieving a weighted average score of 90.34 and outperforming the previous state-of-the-art Alif-1.0-Instruct model (87.1) by 3.24 points, while also surpassing the base LLaMA-3.1 8B-Instruct model by 44.64 points. Qalb achieves state-of-the-art performance with comprehensive evaluation across seven diverse tasks including Classification, Sentiment Analysis, and Reasoning. Our results demonstrate that continued pre-training on diverse, high-quality language data, combined with targeted instruction fine-tuning, effectively adapts foundation models to low-resource languages.

preprint2022arXiv

Monitoring Security of Enterprise Hosts via DNS Data Analysis

Enterprise Networks are growing in scale and complexity, with heterogeneous connected assets needing to be secured in different ways. Nevertheless, virtually all connected assets use the Domain Name System (DNS) for address resolution, and DNS has thus become a convenient vehicle for attackers to covertly perform Command and Control (C&C) communication, data theft, and service disruption across a wide range of assets. Enterprise security appliances that monitor network traffic typically allow all DNS traffic through as it is vital for accessing any web service; they may at best match against a database of known malicious patterns, and are therefore ineffective against zero-day attacks. This thesis focuses on three high-impact cyber-attacks that leverage DNS, specifically data exfiltration, malware C&C communication, and service disruption. Using big data (over 10B packets) of DNS network traffic collected from a University campus and a Government research organization over a 6-month period, we illustrate the anatomy of these attacks, train machines for automatically detecting such attacks, and evaluate their efficacy in the field.