Source author record

Faisal Ahmed

Faisal Ahmed appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision cond-mat.mtrl-sci cond-mat.mes-hall Computation and Language Machine Learning physics.optics

Catalog footprint

What is connected

8works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Four-Stage Alzheimer's Disease Classification from MRI Using Topological Feature Extraction, Feature Selection, and Ensemble Learning

Accurate and efficient classification of Alzheimer's disease (AD) severity from brain magnetic resonance imaging (MRI) remains a critical challenge, particularly when limited data and model interpretability are of concern. In this work, we propose TDA-Alz, a novel framework for four-stage Alzheimer's disease severity classification (non-demented, moderate dementia, mild, and very mild) using topological data analysis (TDA) and ensemble learning. Instead of relying on deep convolutional architectures or extensive data augmentation, our approach extracts topological descriptors that capture intrinsic structural patterns of brain MRI, followed by feature selection to retain the most discriminative topological features. These features are then classified using an ensemble learning strategy to achieve robust multiclass discrimination. Experiments conducted on the OASIS-1 MRI dataset demonstrate that the proposed method achieves an accuracy of 98.19% and an AUC of 99.75%, outperforming or matching state-of-the-art deep learning--based methods reported on OASIS and OASIS-derived datasets. Notably, the proposed framework does not require data augmentation, pretrained networks, or large-scale computational resources, making it computationally efficient and fast compared to deep neural network approaches. Furthermore, the use of topological descriptors provides greater interpretability, as the extracted features are directly linked to the underlying structural characteristics of brain MRI rather than opaque latent representations. These results indicate that TDA-Alz offers a powerful, lightweight, and interpretable alternative to deep learning models for MRI-based Alzheimer's disease severity classification, with strong potential for real-world clinical decision-support systems.

preprint2024arXiv

Broadband miniaturized spectrometers with a van der Waals tunnel diode

Miniaturized spectrometers are of immense interest for various on-chip and implantable photonic and optoelectronic applications. State-of-the-art conventional spectrometer designs rely heavily on bulky dispersive components (such as gratings, photodetector arrays, and interferometric optics) to capture different input spectral components that increase their integration complexity. Here, we report a high-performance broadband spectrometer based on a simple and compact van der Waals heterostructure diode, leveraging a careful selection of active van der Waals materials -- molybdenum disulfide and black phosphorus, their electrically tunable photoresponse, and advanced computational algorithms for spectral reconstruction. We achieve remarkably high peak wavelength accuracy of ~2 nanometers, and broad operation bandwidth spanning from ~500 to 1600 nanometers in a device with a ~30x20 μm2 footprint. This diode-based spectrometer scheme with broadband operation offers an attractive pathway for various applications, such as sensing, surveillance and spectral imaging.

preprint2022arXiv

SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning

The canonical approach to video captioning dictates a caption generation model to learn from offline-extracted dense video features. These feature extractors usually operate on video frames sampled at a fixed frame rate and are often trained on image/video understanding tasks, without adaption to video captioning data. In this work, we present SwinBERT, an end-to-end transformer-based model for video captioning, which takes video frame patches directly as inputs, and outputs a natural language description. Instead of leveraging multiple 2D/3D feature extractors, our method adopts a video transformer to encode spatial-temporal representations that can adapt to variable lengths of video input without dedicated design for different frame rates. Based on this model architecture, we show that video captioning can benefit significantly from more densely sampled video frames as opposed to previous successes with sparsely sampled video frames for video-and-language understanding tasks (e.g., video question answering). Moreover, to avoid the inherent redundancy in consecutive video frames, we propose adaptively learning a sparse attention mask and optimizing it for task-specific performance improvement through better long-range video sequence modeling. Through extensive experiments on 5 video captioning datasets, we show that SwinBERT achieves across-the-board performance improvements over previous methods, often by a large margin. The learned sparse attention masks in addition push the limit to new state of the arts, and can be transferred between different video lengths and between different datasets. Code is available at https://github.com/microsoft/SwinBERT

preprint2022arXiv

UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling

We propose UniTAB that Unifies Text And Box outputs for grounded vision-language (VL) modeling. Grounded VL tasks such as grounded captioning require the model to generate a text description and align predicted words with object regions. To achieve this, models must generate desired text and box outputs together, and meanwhile indicate the alignments between words and boxes. In contrast to existing solutions that use multiple separate modules for different outputs, UniTAB represents both text and box outputs with a shared token sequence, and introduces a special <obj> token to naturally indicate word-box alignments in the sequence. UniTAB thus could provide a more comprehensive and interpretable image description, by freely grounding generated words to object regions. On grounded captioning, UniTAB presents a simpler solution with a single output head, and significantly outperforms state of the art in both grounding and captioning evaluations. On general VL tasks that have different desired output formats (i.e., text, box, or their combination), UniTAB with a single network achieves better or comparable performance than task-specific state of the art. Experiments cover 7 VL benchmarks, including grounded captioning, visual grounding, image captioning, and visual question answering. Furthermore, UniTAB's unified multi-task network and the task-agnostic output sequence design make the model parameter efficient and generalizable to new tasks.

preprint2020arXiv

UNITER: UNiversal Image-TExt Representation Learning

Joint image-text embedding is the bedrock for most Vision-and-Language (V+L) tasks, where multimodality inputs are simultaneously processed for joint visual and textual understanding. In this paper, we introduce UNITER, a UNiversal Image-TExt Representation, learned through large-scale pre-training over four image-text datasets (COCO, Visual Genome, Conceptual Captions, and SBU Captions), which can power heterogeneous downstream V+L tasks with joint multimodal embeddings. We design four pre-training tasks: Masked Language Modeling (MLM), Masked Region Modeling (MRM, with three variants), Image-Text Matching (ITM), and Word-Region Alignment (WRA). Different from previous work that applies joint random masking to both modalities, we use conditional masking on pre-training tasks (i.e., masked language/region modeling is conditioned on full observation of image/text). In addition to ITM for global image-text alignment, we also propose WRA via the use of Optimal Transport (OT) to explicitly encourage fine-grained alignment between words and image regions during pre-training. Comprehensive analysis shows that both conditional masking and OT-based WRA contribute to better pre-training. We also conduct a thorough ablation study to find an optimal combination of pre-training tasks. Extensive experiments show that UNITER achieves new state of the art across six V+L tasks (over nine datasets), including Visual Question Answering, Image-Text Retrieval, Referring Expression Comprehension, Visual Commonsense Reasoning, Visual Entailment, and NLVR$^2$. Code is available at https://github.com/ChenRocks/UNITER.

preprint2016arXiv

High Electric Field Carrier Transport and Power Dissipation in Multilayer Black Phosphorus Field Effect Transistor with Dielectric Engineering

This study addresses high electric field transport in multilayer black phosphorus (BP) field effect transistors (FETs) with self-heating and thermal spreading by dielectric engineering. Interestingly, we found that multilayer BP device on a SiO2 substrate exhibited a maximum current density of 3.3 x 10E10 A/m2 at an electric field of 5.58 MV/m, several times higher than multilayer MoS2. Our breakdown thermometry analysis revealed that self-heating was impeded along BP-dielectric interface, resulting in a thermal plateau inside the channel and eventual Joule breakdown. Using a size-dependent electro-thermal transport model, we extracted an interfacial thermal conductance of 1-10 MW/m2 K for the BP-dielectric interfaces. By using hBN as a dielectric material for BP instead of thermally resistive SiO2 (about 1.4 W/m K), we observed a 3 fold increase in breakdown power density and a relatively higher electric field endurance together with efficient and homogenous thermal spreading because hBN had superior structural and thermal compatibility with BP. We further confirmed our results based on micro-Raman spectroscopy and atomic force microscopy, and observed that BP devices on hBN exhibited centrally localized hotspots with a breakdown temperature of 600K, while the BP device on SiO2 exhibited a hotspot in the vicinity of the electrode at 520K.

preprint2015arXiv

Carrier Transport at the Metal-MoS2 Interface

This study illustrates the nature of electronic transport and its transition from one mechanism to another between a metal electrode and MoS2 channel interface in a field effect transistor (FET) device. Interestingly, measurements of the contact resistance (Rc) as a function of temperature indicate a transition in the carrier transport across the energy barrier from a thermionic emission at a high temperature to tunneling at a low temperature. Furthermore, at a low temperature, the nature of the tunneling behavior is ascertained by the current-voltage dependency that helps us feature direct tunneling at a low bias and Fowler-Nordheim tunneling at a high bias for a Pd-MoS2 contact due to the effective barrier shape modulation by biasing. In contrast, only direct tunneling is observed for a Cr-MoS2 contact over the entire applied bias range. In addition, simple analytical calculations were carried out to extract Rc at the gating range, and the results are consistent with the experimental data. Our results describe the transition in carrier transport mechanisms across a metal-MoS2 interface, and this information provides guidance for the design of future flexible, transparent electronic devices based on 2-dimensional materials.

preprint2015arXiv

P-type polar transition of chemically doped multilayer MoS2 transistor

The accessibility of both n-type and p-type MoS2 FET is necessary for complementary device applications involving MoS2. However, MoS2 PFET is rarely achieved due to pinning effect resulting high Rc at metal-MoS2 interface and the inherently strong n-type property of the MoS2 material. In this study, we realized a high-performance multi-layer MoS2 PFET via controllable chemical doping, which has an excellent on/off ratio of 107 and a maximum hole mobility of 72 cm2/Vs at room temperature, and these values are further exceeding to 109 and 132 cm2/Vs at 133K. In addition, we revealed that large Rc hindered the polar transition of MoS2 FET from n-type to p-type, meanwhile channel Rs limited Ion of PFET. Therefore it is suggested that reducing Rc at high work function metal-MoS2 interface and p-type doping of channel were necessary for achieving high performance MoS2 PFET. Based on the high performance PFET, we successfully demonstrated a MoS2 CMOS inverter by integrating NFET and PFET.

Faisal Ahmed

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

Four-Stage Alzheimer's Disease Classification from MRI Using Topological Feature Extraction, Feature Selection, and Ensemble Learning

Broadband miniaturized spectrometers with a van der Waals tunnel diode

SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning

UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling

UNITER: UNiversal Image-TExt Representation Learning

High Electric Field Carrier Transport and Power Dissipation in Multilayer Black Phosphorus Field Effect Transistor with Dielectric Engineering

Carrier Transport at the Metal-MoS2 Interface

P-type polar transition of chemically doped multilayer MoS2 transistor