Source author record

Rifat Shahriyar

Rifat Shahriyar appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Artificial Intelligence Databases Machine Learning

Catalog footprint

What is connected

4works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

ChakmaNMT: Machine Translation for a Low-Resource and Endangered Language via Transliteration

We present the first systematic study of machine translation for Chakma, an endangered and extremely low-resource Indo-Aryan language, with the goal of supporting language access and preservation. We introduce a new Chakma-Bangla parallel and monolingual dataset, along with a trilingual Chakma-Bangla-English benchmark for evaluation. To address script mismatch and data scarcity, we propose a character-level transliteration framework that exploits the close orthographic and phonological relationship between Chakma and Bangla, preserving semantic content while enabling effective transfer from Bangla and multilingual pretrained models. We benchmark from-scratch MT, fine-tuned pretrained models, and large language models via in-context learning. Results show that transliteration is essential and that fine-tuning and in-context learning substantially outperform from-scratch baselines, with strong asymmetry across translation directions.

preprint2022arXiv

A Crowd-enabled Solution for Privacy-Preserving and Personalized Safe Route Planning for Fixed or Flexible Destinations (Full Version)

Ensuring travelers' safety on roads has become a research challenge in recent years. We introduce a novel safe route planning problem and develop an efficient solution to ensure the travelers' safety on roads. Though few research attempts have been made in this regard, all of them assume that people share their sensitive travel experiences with a centralized entity for finding the safest routes, which is not ideal in practice for privacy reasons. Furthermore, existing works formulate safe route planning in ways that do not meet a traveler's need for safe travel on roads. Our approach finds the safest routes within a user-specified distance threshold based on the personalized travel experience of the knowledgeable crowd without involving any centralized computation. We develop a privacy-preserving model to quantify the travel experience of a user into personalized safety scores. Our algorithms for finding the safest route further enhance user privacy by minimizing the exposure of personalized safety scores with others. Our safe route planner can find the safest routes for individuals and groups by considering both a fixed and a set of flexible destination locations. Extensive experiments using real datasets show that our approach finds the safest route in seconds. Compared to the direct algorithm, our iterative algorithm requires 47% less exposure of personalized safety scores.

preprint2022arXiv

BanglaBERT: Language Model Pretraining and Benchmarks for Low-Resource Language Understanding Evaluation in Bangla

In this work, we introduce BanglaBERT, a BERT-based Natural Language Understanding (NLU) model pretrained in Bangla, a widely spoken yet low-resource language in the NLP literature. To pretrain BanglaBERT, we collect 27.5 GB of Bangla pretraining data (dubbed `Bangla2B+') by crawling 110 popular Bangla sites. We introduce two downstream task datasets on natural language inference and question answering and benchmark on four diverse NLU tasks covering text classification, sequence labeling, and span prediction. In the process, we bring them under the first-ever Bangla Language Understanding Benchmark (BLUB). BanglaBERT achieves state-of-the-art results outperforming multilingual and monolingual models. We are making the models, datasets, and a leaderboard publicly available at https://github.com/csebuetnlp/banglabert to advance Bangla NLP.

preprint2022arXiv

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

Evaluation in machine learning is usually informed by past choices, for example which datasets or metrics to use. This standardization enables the comparison on equal footing using leaderboards, but the evaluation choices become sub-optimal as better alternatives arise. This problem is especially pertinent in natural language generation which requires ever-improving suites of datasets, metrics, and human evaluation to make definitive claims. To make following best model evaluation practices easier, we introduce GEMv2. The new version of the Generation, Evaluation, and Metrics Benchmark introduces a modular infrastructure for dataset, model, and metric developers to benefit from each others work. GEMv2 supports 40 documented datasets in 51 languages. Models for all datasets can be evaluated online and our interactive data card creation and rendering tools make it easier to add new datasets to the living benchmark.

Rifat Shahriyar

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

ChakmaNMT: Machine Translation for a Low-Resource and Endangered Language via Transliteration

A Crowd-enabled Solution for Privacy-Preserving and Personalized Safe Route Planning for Fixed or Flexible Destinations (Full Version)

BanglaBERT: Language Model Pretraining and Benchmarks for Low-Resource Language Understanding Evaluation in Bangla

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code