Researcher profile

Boulbaba Ben Amor

Boulbaba Ben Amor contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2026arXiv

BYOL: Bring Your Own Language Into LLMs

Large Language Models (LLMs) exhibit strong multilingual capabilities, yet remain fundamentally constrained by the severe imbalance in global language resources. While over 7,000 languages are spoken worldwide, only a small subset (fewer than 100) has sufficient digital presence to meaningfully influence modern LLM training. This disparity leads to systematic underperformance, cultural misalignment, and limited accessibility for speakers of low-resource and extreme-low-resource languages. To address this gap, we introduce Bring Your Own Language (BYOL), a unified framework for scalable, language-aware LLM development tailored to each language's digital footprint. BYOL begins with a language resource classification that maps languages into four tiers (Extreme-Low, Low, Mid, High) using curated web-scale corpora, and uses this classification to select the appropriate integration pathway. For low-resource languages, we propose a full-stack data refinement and expansion pipeline that combines corpus cleaning, synthetic text generation, continual pretraining, and supervised finetuning. Applied to Chichewa and Maori, this pipeline yields language-specific LLMs that achieve approximately 12 percent average improvement over strong multilingual baselines across 12 benchmarks, while preserving English and multilingual capabilities via weight-space model merging. For extreme-low-resource languages, we introduce a translation-mediated inclusion pathway, and show on Inuktitut that a tailored machine translation system improves over a commercial baseline by 4 BLEU, enabling high-accuracy LLM access when direct language modeling is infeasible. Finally, we release human-translated versions of the Global MMLU-Lite benchmark in Chichewa, Maori, and Inuktitut, and make our codebase and models publicly available at https://github.com/microsoft/byol .

preprint2022arXiv

ResNet-LDDMM: Advancing the LDDMM Framework using Deep Residual Networks

In deformable registration, the geometric framework - large deformation diffeomorphic metric mapping or LDDMM, in short - has inspired numerous techniques for comparing, deforming, averaging and analyzing shapes or images. Grounded in flows, which are akin to the equations of motion used in fluid dynamics, LDDMM algorithms solve the flow equation in the space of plausible deformations, i.e. diffeomorphisms. In this work, we make use of deep residual neural networks to solve the non-stationary ODE (flow equation) based on a Euler's discretization scheme. The central idea is to represent time-dependent velocity fields as fully connected ReLU neural networks (building blocks) and derive optimal weights by minimizing a regularized loss function. Computing minimizing paths between deformations, thus between shapes, turns to find optimal network parameters by back-propagating over the intermediate building blocks. Geometrically, at each time step, ResNet-LDDMM searches for an optimal partition of the space into multiple polytopes, and then computes optimal velocity vectors as affine transformations on each of these polytopes. As a result, different parts of the shape, even if they are close (such as two fingers of a hand), can be made to belong to different polytopes, and therefore be moved in different directions without costing too much energy. Importantly, we show how diffeomorphic transformations, or more precisely bilipshitz transformations, are predicted by our algorithm. We illustrate these ideas on diverse registration problems of 3D shapes under complex topology-preserving transformations. We thus provide essential foundations for more advanced shape variability analysis under a novel joint geometric-neural networks Riemannian-like framework, i.e. ResNet-LDDMM.

preprint2022arXiv

SHREC'22 Track: Sketch-Based 3D Shape Retrieval in the Wild

Sketch-based 3D shape retrieval (SBSR) is an important yet challenging task, which has drawn more and more attention in recent years. Existing approaches address the problem in a restricted setting, without appropriately simulating real application scenarios. To mimic the realistic setting, in this track, we adopt large-scale sketches drawn by amateurs of different levels of drawing skills, as well as a variety of 3D shapes including not only CAD models but also models scanned from real objects. We define two SBSR tasks and construct two benchmarks consisting of more than 46,000 CAD models, 1,700 realistic models, and 145,000 sketches in total. Four teams participated in this track and submitted 15 runs for the two tasks, evaluated by 7 commonly-adopted metrics. We hope that, the benchmarks, the comparative results, and the open-sourced evaluation code will foster future research in this direction among the 3D object retrieval community.