Source author record

Hasan Mahmud

Hasan Mahmud appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Human-Computer Interaction Computation and Language Machine Learning Artificial Intelligence Computer Vision eess.AS Sound

Catalog footprint

What is connected

7works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Auxilio and Beyond: Comparative Evaluation, Usability, and Design Guidelines for Head Movement-based Assistive Mouse Controllers

Upper limb disability due to neurological disorders or other factors restricts computer interaction for affected individuals using a generic optical mouse. This work reports the findings of a comparative evaluation of Auxilio, a sensor-based wireless head-mounted Assistive Mouse Controller (AMC), that facilitates computer interaction for such individuals. Combining commercially available, low-cost motion and infrared sensors, Auxilio utilizes head movements and cheek muscle twitches for mouse control. Its performance in pointing tasks with subjects without motor impairments has been juxtaposed against a commercially available and patented vision-based head-tracking AMC developed for similar stakeholders. Furthermore, our study evaluates the usability of Auxilio using the System Usability Scale, supplemented by a qualitative analysis of participant interview transcripts to identify the strengths and weaknesses of both AMCs. Experimental results demonstrate the feasibility and effectiveness of Auxilio, and we summarize our key findings into design guidelines for the development of similar future AMCs.

preprint2026arXiv

BanglaLorica: Design and Evaluation of a Robust Watermarking Algorithm for Large Language Models in Bangla Text Generation

As large language models (LLMs) are increasingly deployed for text generation, watermarking has become essential for authorship attribution, intellectual property protection, and misuse detection. While existing watermarking methods perform well in high-resource languages, their robustness in low-resource languages remains underexplored. This work presents the first systematic evaluation of state-of-the-art text watermarking methods: KGW, Exponential Sampling (EXP), and Waterfall, for Bangla LLM text generation under cross-lingual round-trip translation (RTT) attacks. Under benign conditions, KGW and EXP achieve high detection accuracy (>88%) with negligible perplexity and ROUGE degradation. However, RTT causes detection accuracy to collapse below RTT causes detection accuracy to collapse to 9-13%, indicating a fundamental failure of token-level watermarking. To address this, we propose a layered watermarking strategy that combines embedding-time and post-generation watermarks. Experimental results show that layered watermarking improves post-RTT detection accuracy by 25-35%, achieving 40-50% accuracy, representing a 3$\times$ to 4$\times$ relative improvement over single-layer methods, at the cost of controlled semantic degradation. Our findings quantify the robustness-quality trade-off in multilingual watermarking and establish layered watermarking as a practical, training-free solution for low-resource languages such as Bangla. Our code and data will be made public.

preprint2026arXiv

Coordinates of Capability: A Unified MTMM-Geometric Framework for LLM Evaluation

The evaluation of Large Language Models (LLMs) faces a critical challenge in construct validity, where fragmented benchmarks and ad hoc metrics frequently conflate method variance, such as prompt sensitivity, with true latent capabilities. Concurrently, emerging research suggests that LLM capabilities and outputs can be modeled as continuous geometric manifolds. In this Systematization of Knowledge (SoK), we bridge these paradigms by proposing a generalized Multi-Trait Multi-Method (MTMM) framework for LLM evaluation. We formalize and unify nine evaluation metrics, including Paraphrase Instability, Drift Score, Overton Width, and Pluralism Score, interpreting them not as isolated scalar values but as geometric measurements within a shared latent coordinate space. This spatial unification factorizes model behavior into three orthogonal latent dimensions: (1) Instability and Sensitivity, (2) Position and Alignment, and (3) Coverage and Expressiveness. By systematically separating task-irrelevant perturbations from true capability spans, the framework provides a theoretically grounded and domain-agnostic taxonomy for robust and empirically stable benchmark design.

preprint2022arXiv

A Case Study on the Independence of Speech Emotion Recognition in Bangla and English Languages using Language-Independent Prosodic Features

A language agnostic approach to recognizing emotions from speech remains an incomplete and challenging task. In this paper, we performed a step-by-step comparative analysis of Speech Emotion Recognition (SER) using Bangla and English languages to assess whether distinguishing emotions from speech is independent of language. Six emotions were categorized for this study, such as - happy, angry, neutral, sad, disgust, and fear. We employed three Emotional Speech Sets (ESS), of which the first two were developed by native Bengali speakers in Bangla and English languages separately. The third was a subset of the Toronto Emotional Speech Set (TESS), which was developed by native English speakers from Canada. We carefully selected language-independent prosodic features, adopted a Support Vector Machine (SVM) model, and conducted three experiments to carry out our proposition. In the first experiment, we measured the performance of the three speech sets individually, followed by the second experiment, where different ESS pairs were integrated to analyze the impact on SER. Finally, we measured the recognition rate by training and testing the model with different speech sets in the third experiment. Although this study reveals that SER in Bangla and English languages is mostly language-independent, some disparities were observed while recognizing emotional states like disgust and fear in these two languages. Moreover, our investigations revealed that non-native speakers convey emotions through speech, much like expressing themselves in their native tongue.

preprint2022arXiv

Learning Audio Representations with MLPs

In this paper, we propose an efficient MLP-based approach for learning audio representations, namely timestamp and scene-level audio embeddings. We use an encoder consisting of sequentially stacked gated MLP blocks, which accept 2D MFCCs as inputs. In addition, we also provide a simple temporal interpolation-based algorithm for computing scene-level embeddings from timestamp embeddings. The audio representations generated by our method are evaluated across a diverse set of benchmarks at the Holistic Evaluation of Audio Representations (HEAR) challenge, hosted at the NeurIPS 2021 competition track. We achieved first place on the Speech Commands (full), Speech Commands (5 hours), and the Mridingham Tonic benchmarks. Furthermore, our approach is also the most resource-efficient among all the submitted methods, in terms of both the number of model parameters and the time required to compute embeddings.

preprint2022arXiv

VIS-iTrack: Visual Intention through Gaze Tracking using Low-Cost Webcam

Human intention is an internal, mental characterization for acquiring desired information. From interactive interfaces containing either textual or graphical information, intention to perceive desired information is subjective and strongly connected with eye gaze. In this work, we determine such intention by analyzing real-time eye gaze data with a low-cost regular webcam. We extracted unique features (e.g., Fixation Count, Eye Movement Ratio) from the eye gaze data of 31 participants to generate a dataset containing 124 samples of visual intention for perceiving textual or graphical information, labeled as either TEXT or IMAGE, having 48.39% and 51.61% distribution, respectively. Using this dataset, we analyzed 5 classifiers, including Support Vector Machine (SVM) (Accuracy: 92.19%). Using the trained SVM, we investigated the variation of visual intention among 30 participants, distributed in 3 age groups, and found out that young users were more leaned towards graphical contents whereas older adults felt more interested in textual ones. This finding suggests that real-time eye gaze data can be a potential source of identifying visual intention, analyzing which intention aware interactive interfaces can be designed and developed to facilitate human cognition.

preprint2021arXiv

ANTASID: A Novel Temporal Adjustment to Shannon's Index of Difficulty for Quantifying the Perceived Difficulty of Uncontrolled Pointing Tasks

Shannon's Index of Difficulty ($ID$), reputable for quantifying the perceived difficulty of pointing tasks as a logarithmic relationship between movement-amplitude ($A$) and target-width ($W$), is used for modelling the corresponding observed movement-times ($MT_O$) in such tasks in controlled experimental setup. However, real-life pointing tasks are both spatially and temporally uncontrolled, being influenced by factors such as - human aspects, subjective behavior, the context of interaction, the inherent speed-accuracy trade-off where, emphasizing accuracy compromises speed of interaction and vice versa, and so on. Effective target-width ($W_e$) is considered as spatial adjustment for compensating accuracy. However, no significant adjustment exists in the literature for compensating speed in different contexts of interaction in these tasks. As a result, without any temporal adjustment, the true difficulty of an uncontrolled pointing task may be inaccurately quantified using Shannon's ID. To verify this, we propose the ANTASID (A Novel Temporal Adjustment to Shannon's ID) formulation with detailed performance analysis. We hypothesized a temporal adjustment factor ($t$) as a binary logarithm of $MT_O$, compensating for speed due to contextual differences and minimizing the non-linearity between movement-amplitude and target-width. Considering spatial and/or temporal adjustments to ID, we conducted regression analysis using our own and Benchmark datasets in both controlled and uncontrolled scenarios of pointing tasks with a generic mouse.ANTASID formulation showed significantly superior fitness values and throughput in all the scenarios while reducing the standard error. Furthermore, the quantification of ID with ANTASID varied significantly compared to the classical formulations of Shannon's ID, validating the purpose of this study.

Hasan Mahmud

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

Auxilio and Beyond: Comparative Evaluation, Usability, and Design Guidelines for Head Movement-based Assistive Mouse Controllers

BanglaLorica: Design and Evaluation of a Robust Watermarking Algorithm for Large Language Models in Bangla Text Generation

Coordinates of Capability: A Unified MTMM-Geometric Framework for LLM Evaluation

A Case Study on the Independence of Speech Emotion Recognition in Bangla and English Languages using Language-Independent Prosodic Features

Learning Audio Representations with MLPs

VIS-iTrack: Visual Intention through Gaze Tracking using Low-Cost Webcam

ANTASID: A Novel Temporal Adjustment to Shannon's Index of Difficulty for Quantifying the Perceived Difficulty of Uncontrolled Pointing Tasks