Researcher profile

Ibrahim Abdelaziz

Ibrahim Abdelaziz contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2026arXiv

ToolRM: Outcome Reward Models for Tool-Calling Large Language Models

As large language models (LLMs) increasingly interact with external tools, reward modeling for tool use has emerged as a critical yet underexplored area of research. Existing reward models, trained primarily on natural language outputs, struggle to evaluate tool-based reasoning and execution. To quantify this gap, we introduce FC-RewardBench, the first benchmark to systematically evaluate reward models in tool-calling scenarios. Our analysis shows that current reward models frequently miss key signals of effective tool use, highlighting the need for domain-specific modeling. We address this by proposing a training framework for outcome reward models using data synthesized from permissively licensed, open-weight LLMs. We introduce ToolRM - a suite of reward models for tool-use ranging from 1.7B to 14B parameters. Across diverse settings, these models consistently outperform general-purpose baselines. Notably, they achieve up to a 25% improvement with Best-of-N sampling, while also improving robustness to input noise, enabling effective data filtering, and supporting RL-training of policy models.

preprint2023arXiv

Serenity: Library Based Python Code Analysis for Code Completion and Automated Machine Learning

Dynamically typed languages such as Python have become very popular. Among other strengths, Python's dynamic nature and its straightforward linking to native code have made it the de-facto language for many research areas such as Artificial Intelligence. This flexibility, however, makes static analysis very hard. While creating a sound, or a soundy, analysis for Python remains an open problem, we present in this work Serenity, a framework for static analysis of Python that turns out to be sufficient for some tasks. The Serenity framework exploits two basic mechanisms: (a) reliance on dynamic dispatch at the core of language translation, and (b) extreme abstraction of libraries, to generate an abstraction of the code. We demonstrate the efficiency and usefulness of Serenity's analysis in two applications: code completion and automated machine learning. In these two applications, we demonstrate that such analysis has a strong signal, and can be leveraged to establish state-of-the-art performance, comparable to neural models and dynamic analysis respectively.

preprint2022arXiv

A Benchmark for Generalizable and Interpretable Temporal Question Answering over Knowledge Bases

Knowledge Base Question Answering (KBQA) tasks that involve complex reasoning are emerging as an important research direction. However, most existing KBQA datasets focus primarily on generic multi-hop reasoning over explicit facts, largely ignoring other reasoning types such as temporal, spatial, and taxonomic reasoning. In this paper, we present a benchmark dataset for temporal reasoning, TempQA-WD, to encourage research in extending the present approaches to target a more challenging set of complex reasoning tasks. Specifically, our benchmark is a temporal question answering dataset with the following advantages: (a) it is based on Wikidata, which is the most frequently curated, openly available knowledge base, (b) it includes intermediate sparql queries to facilitate the evaluation of semantic parsing based approaches for KBQA, and (c) it generalizes to multiple knowledge bases: Freebase and Wikidata. The TempQA-WD dataset is available at https://github.com/IBM/tempqa-wd.

preprint2022arXiv

A Scalable AutoML Approach Based on Graph Neural Networks

AutoML systems build machine learning models automatically by performing a search over valid data transformations and learners, along with hyper-parameter optimization for each learner. Many AutoML systems use meta-learning to guide search for optimal pipelines. In this work, we present a novel meta-learning system called KGpip which, (1) builds a database of datasets and corresponding pipelines by mining thousands of scripts with program analysis, (2) uses dataset embeddings to find similar datasets in the database based on its content instead of metadata-based features, (3) models AutoML pipeline creation as a graph generation problem, to succinctly characterize the diverse pipelines seen for a single dataset. KGpip's meta-learning is a sub-component for AutoML systems. We demonstrate this by integrating KGpip with two AutoML systems. Our comprehensive evaluation using 126 datasets, including those used by the state-of-the-art systems, shows that KGpip significantly outperforms these systems.

preprint2022arXiv

CBR-iKB: A Case-Based Reasoning Approach for Question Answering over Incomplete Knowledge Bases

Knowledge bases (KBs) are often incomplete and constantly changing in practice. Yet, in many question answering applications coupled with knowledge bases, the sparse nature of KBs is often overlooked. To this end, we propose a case-based reasoning approach, CBR-iKB, for knowledge base question answering (KBQA) with incomplete-KB as our main focus. Our method ensembles decisions from multiple reasoning chains with a novel nonparametric reasoning algorithm. By design, CBR-iKB can seamlessly adapt to changes in KBs without any task-specific training or fine-tuning. Our method achieves 100% accuracy on MetaQA and establishes new state-of-the-art on multiple benchmarks. For instance, CBR-iKB achieves an accuracy of 70% on WebQSP under the incomplete-KB setting, outperforming the existing state-of-the-art method by 22.3%.

preprint2022arXiv

Expressive Reasoning Graph Store: A Unified Framework for Managing RDF and Property Graph Databases

Resource Description Framework (RDF) and Property Graph (PG) are the two most commonly used data models for representing, storing, and querying graph data. We present Expressive Reasoning Graph Store (ERGS) -- a graph store built on top of JanusGraph (a Property Graph store) that also allows storing and querying of RDF datasets. First, we describe how RDF data can be translated into a Property Graph representation and then describe a query translation module that converts SPARQL queries into a series of Gremlin traversals. The converters and translators thus developed can allow any Apache Tinkerpop compliant graph database to store and query RDF datasets. We demonstrate the effectiveness of our proposed approach using JanusGraph as the base Property Graph store and compare its performance with standard RDF systems.

preprint2022arXiv

Large Scale Generation of Labeled Type Data for Python

Recently, dynamically typed languages, such as Python, have gained unprecedented popularity. Although these languages alleviate the need for mandatory type annotations, types still play a critical role in program understanding and preventing runtime errors. An attractive option is to infer types automatically to get static guarantees without writing types. Existing inference techniques rely mostly on static typing tools such as PyType for direct type inference; more recently, neural type inference has been proposed. However, neural type inference is data hungry, and depends on collecting labeled data based on static typing. Such tools, however, are poor at inferring user defined types. Furthermore, type annotation by developers in these languages is quite sparse. In this work, we propose novel techniques for generating high quality types using 1) information retrieval techniques that work on well documented libraries to extract types and 2) usage patterns by analyzing a large repository of programs. Our results show that these techniques are more precise and address the weaknesses of static tools, and can be useful for generating a large labeled dataset for type inference by machine learning methods. F1 scores are 0.52-0.58 for our techniques, compared to static typing tools which are at 0.06, and we use them to generate over 37,000 types for over 700 modules.

preprint2020arXiv

A Deep Reinforcement Learning Approach to First-Order Logic Theorem Proving

Automated theorem provers have traditionally relied on manually tuned heuristics to guide how they perform proof search. Deep reinforcement learning has been proposed as a way to obviate the need for such heuristics, however, its deployment in automated theorem proving remains a challenge. In this paper we introduce TRAIL, a system that applies deep reinforcement learning to saturation-based theorem proving. TRAIL leverages (a) a novel neural representation of the state of a theorem prover and (b) a novel characterization of the inference selection process in terms of an attention-based action policy. We show through systematic analysis that these mechanisms allow TRAIL to significantly outperform previous reinforcement-learning-based theorem provers on two benchmark datasets for first-order logic automated theorem proving (proving around 15% more theorems).

preprint2020arXiv

An Experimental Study of Formula Embeddings for Automated Theorem Proving in First-Order Logic

Automated theorem proving in first-order logic is an active research area which is successfully supported by machine learning. While there have been various proposals for encoding logical formulas into numerical vectors -- from simple strings to more involved graph-based embeddings -- little is known about how these different encodings compare. In this paper, we study and experimentally compare pattern-based embeddings that are applied in current systems with popular graph-based encodings, most of which have not been considered in the theorem proving context before. Our experiments show that the advantages of simpler encoding schemes in terms of runtime are outdone by more complex graph-based embeddings, which yield more efficient search strategies and simpler proofs. To support this, we present a detailed analysis across several dimensions of theorem prover performance beyond just proof completion rate, thus providing empirical evidence to help guide future research on neural-guided theorem proving towards the most promising directions.

preprint2020arXiv

Explainable Deep RDFS Reasoner

Recent research efforts aiming to bridge the Neural-Symbolic gap for RDFS reasoning proved empirically that deep learning techniques can be used to learn RDFS inference rules. However, one of their main deficiencies compared to rule-based reasoners is the lack of derivations for the inferred triples (i.e. explainability in AI terms). In this paper, we build on these approaches to provide not only the inferred graph but also explain how these triples were inferred. In the graph words approach, RDF graphs are represented as a sequence of graph words where inference can be achieved through neural machine translation. To achieve explainability in RDFS reasoning, we revisit this approach and introduce a new neural network model that gets the input graph--as a sequence of graph words-- as well as the encoding of the inferred triple and outputs the derivation for the inferred triple. We evaluated our justification model on two datasets: a synthetic dataset-- LUBM benchmark-- and a real-world dataset --ScholarlyData about conferences-- where the lowest validation accuracy approached 96%.

preprint2020arXiv

HandVoxNet: Deep Voxel-Based Network for 3D Hand Shape and Pose Estimation from a Single Depth Map

3D hand shape and pose estimation from a single depth map is a new and challenging computer vision problem with many applications. The state-of-the-art methods directly regress 3D hand meshes from 2D depth images via 2D convolutional neural networks, which leads to artefacts in the estimations due to perspective distortions in the images. In contrast, we propose a novel architecture with 3D convolutions trained in a weakly-supervised manner. The input to our method is a 3D voxelized depth map, and we rely on two hand shape representations. The first one is the 3D voxelized grid of the shape which is accurate but does not preserve the mesh topology and the number of mesh vertices. The second representation is the 3D hand surface which is less accurate but does not suffer from the limitations of the first representation. We combine the advantages of these two representations by registering the hand surface to the voxelized hand shape. In the extensive experiments, the proposed approach improves over the state of the art by 47.8% on the SynHand5M dataset. Moreover, our augmentation policy for voxelized depth maps further enhances the accuracy of 3D hand pose estimation on real data. Our method produces visually more reasonable and realistic hand shapes on NYU and BigHand2.2M datasets compared to the existing approaches.

preprint2020arXiv

Improving Graph Neural Network Representations of Logical Formulae with Subgraph Pooling

Recent advances in the integration of deep learning with automated theorem proving have centered around the representation of logical formulae as inputs to deep learning systems. In particular, there has been a growing interest in adapting structure-aware neural methods to work with the underlying graph representations of logical expressions. While more effective than character and token-level approaches, graph-based methods have often made representational trade-offs that limited their ability to capture key structural properties of their inputs. In this work we propose a novel approach for embedding logical formulae that is designed to overcome the representational limitations of prior approaches. Our architecture works for logics of different expressivity; e.g., first-order and higher-order logic. We evaluate our approach on two standard datasets and show that the proposed architecture achieves state-of-the-art performance on both premise selection and proof step classification.