Researcher profile

Michael Fire

Michael Fire contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2026arXiv

Comparative Analysis of Polygon-Based and Global Machine Learning Models for Bus Occupancy Prediction

Accurate forecasting of bus ridership (passengers numbers) is crucial for efficient management and optimization of public transport systems. Traditional forecasting models often fail to capture the unique and localized dynamics of different urban areas by treating the entire city as a single, homogeneous region. This paper introduces a novel framework that enhances bus ridership prediction by integrating a spatial clustering methodology with multi-dimensional feature analysis. The proposed framework utilizes a diverse set of data, including bus ridership data (by route number, time, and bus stop) complemented by a variety of open source data, such as spatial features (e.g., attractive destinations), meteorological conditions (e.g., temperature, rainfall), and temporal patterns (e.g., time of day, day of week). By clustering the urban area into distinct regions, based on the principle that bus stops in close proximity share similar ridership characteristics, a separate local forecasting model is trained for each of these clusters. This localized approach demonstrates an accuracy comparable to that of global models. The findings suggest that a spatially-aware, localized modeling strategy is effective for public transport prediction, paving the way for more targeted and efficient service improvements.

preprint2026arXiv

Studying Illustrations in Manuscripts: An Efficient Deep-Learning Approach

The recent Artificial Intelligence (AI) revolution has opened transformative possibilities for the humanities, particularly in unlocking the visual-artistic content embedded in historical illuminated manuscripts. While digital archives now offer unprecedented access to these materials, the ability to systematically locate, extract, and analyze illustrations at scale remains a major challenge. We present a general and scalable AI-based pipeline for large-scale visual analysis of illuminated manuscripts. The framework integrates modern deep-learning models for page-level illustration detection, illustration extraction, and multimodal description, enabling scholars to search, cluster, and study visual materials and artistic trends across entire corpora. We demonstrate the applicability of this approach on large heterogeneous collections, including the Vatican Library and richly illuminated manuscripts such as the Bible of Borso d'Este. The system reveals meaningful visual patterns and cross-manuscript relationships by embedding illustrations into a shared representation space and analyzing their similarity structure (see figure 4). By harnessing recent advances in computer vision and vision-language models, our framework enables new forms of large-scale visual scholarship in historical studies, art history, and cultural heritage making it possible to explore iconography, stylistic trends, and cultural connections in ways that were previously impractical.

preprint2022arXiv

CompanyName2Vec: Company Entity Matching Based on Job Ads

Entity Matching is an essential part of all real-world systems that take in structured and unstructured data coming from different sources. Typically no common key is available for connecting records. Massive data cleaning and integration processes require completion before any data analytics, or further processing can be performed. Although record linkage is frequently regarded as a somewhat tedious but necessary step, it reveals valuable insights, supports data visualization, and guides further analytic approaches to the data. Here, we focus on organization entity matching. We introduce CompanyName2Vec, a novel algorithm to solve company entity matching (CEM) using a neural network model to learn company name semantics from a job ad corpus, without relying on any information on the matched company besides its name. Based on a real-world data, we show that CompanyName2Vec outperforms other evaluated methods and solves the CEM challenge with an average success rate of 89.3%.

preprint2022arXiv

Ethnic Representation Analysis of Commercial Movie Posters

In the last decades, global awareness towards the importance of diverse representation has been increasing. Lack of diversity and discrimination toward minorities did not skip the film industry. Here, we examine ethnic bias in the film industry through commercial posters, the industry's primary advertisement medium for decades. Movie posters are designed to establish the viewer's initial impression. We developed a novel approach for evaluating ethnic bias in the film industry by analyzing nearly 125,000 posters using state-of-the-art deep learning models. Our analysis shows that while ethnic biases still exist, there is a trend of reduction of bias, as seen by several parameters. Particularly in English-speaking movies, the ethnic distribution of characters on posters from the last couple of years is reaching numbers that are approaching the actual ethnic composition of US population. An automatic approach to monitor ethnic diversity in the film industry, potentially integrated with financial value, may be of significant use for producers and policymakers.

preprint2022arXiv

Large-Scale Shill Bidder Detection in E-commerce

User feedback is one of the most effective methods to build and maintain trust in electronic commerce platforms. Unfortunately, dishonest sellers often bend over backward to manipulate users' feedback or place phony bids in order to increase their own sales and harm competitors. The black market of user feedback, supported by a plethora of shill bidders, prospers on top of legitimate electronic commerce. In this paper, we investigate the ecosystem of shill bidders based on large-scale data by analyzing hundreds of millions of users who performed billions of transactions, and we propose a machine-learning-based method for identifying communities of users that methodically provide dishonest feedback. Our results show that (1) shill bidders can be identified with high precision based on their transaction and feedback statistics; and (2) in contrast to legitimate buyers and sellers, shill bidders form cliques to support each other.

preprint2021arXiv

It Runs in the Family: Searching for Synonyms Using Digitized Family Trees

Searching for a person's name is a common online activity. However, Web search engines provide few accurate results to queries containing names. In contrast to a general word which has only one correct spelling, there are several legitimate spellings of a given name. Today, most techniques used to suggest synonyms in online search are based on pattern matching and phonetic encoding, however they often perform poorly. As a result, there is a need for an effective tool for improved synonym suggestion. In this paper, we propose a revolutionary approach for tackling the problem of synonym suggestion. Our novel algorithm, GRAFT, utilizes historical data collected from genealogy websites, along with network algorithms. GRAFT is a general algorithm that suggests synonyms using a graph based on names derived from digitized ancestral family trees. Synonyms are extracted from this graph, which is constructed using generic ordering functions that outperform other algorithms that suggest synonyms based on a single dimension, a factor that limits their performance. We evaluated GRAFT's performance on three ground truth datasets of forenames and surnames, including a large-scale online genealogy dataset with over 16 million profiles and more than 700,000 unique forenames and 500,000 surnames. We compared GRAFT's performance at suggesting synonyms to 10 other algorithms, including phonetic encoding, string similarity algorithms, and machine and deep learning algorithms. The results show GRAFT's superiority with respect to both forenames and surnames and demonstrate its use as a tool to improve synonym suggestion.

preprint2020arXiv

How Does That Sound? Multi-Language SpokenName2Vec Algorithm Using Speech Generation and Deep Learning

Searching for information about a specific person is an online activity frequently performed by many users. In most cases, users are aided by queries containing a name and sending back to the web search engines for finding their will. Typically, Web search engines provide just a few accurate results associated with a name-containing query. Currently, most solutions for suggesting synonyms in online search are based on pattern matching and phonetic encoding, however very often, the performance of such solutions is less than optimal. In this paper, we propose SpokenName2Vec, a novel and generic approach which addresses the similar name suggestion problem by utilizing automated speech generation, and deep learning to produce spoken name embeddings. This sophisticated and innovative embeddings captures the way people pronounce names in any language and accent. Utilizing the name pronunciation can be helpful for both differentiating and detecting names that sound alike, but are written differently. The proposed approach was demonstrated on a large-scale dataset consisting of 250,000 forenames and evaluated using a machine learning classifier and 7,399 names with their verified synonyms. The performance of the proposed approach was found to be superior to 10 other algorithms evaluated in this study, including well used phonetic and string similarity algorithms, and two recently proposed algorithms. The results obtained suggest that the proposed approach could serve as a useful and valuable tool for solving the similar name suggestion problem.

preprint2020arXiv

Using Data Mining for Infrastructure and Safety Violations Discovery in Cities

In city planning and maintenance, the abilty to quickly identify infrastructure violations - such as missing or misplaced fire hydrants - can be crucial for maintaining a safe city; it can even save lives. In this work, we aim to provide an analysis of such violations, and to demonstrate the potential of data-driven approaches for quickly locating and addressing them. We conduct an analytical study based upon data from the city of Beer-Sheva's public records of fire hydrants, bomb shelters, and other public facilities. The result of our analysis are presented along with an interactive exploration tool, which allows for easy exploration and identification of different facilities around the city that viloate regulations.

preprint2019arXiv

Using Data Science to Understand the Film Industry's Gender Gap

Data science can offer answers to a wide range of social science questions. Here we turn attention to the portrayal of women in movies, an industry that has a significant influence on society, impacting such aspects of life as self-esteem and career choice. To this end, we fused data from the online movie database IMDb with a dataset of movie dialogue subtitles to create the largest available corpus of movie social networks (15,540 networks). Analyzing this data, we investigated gender bias in on-screen female characters over the past century. We find a trend of improvement in all aspects of women`s roles in movies, including a constant rise in the centrality of female characters. There has also been an increase in the number of movies that pass the well-known Bechdel test, a popular--albeit flawed--measure of women in fiction. Here we propose a new and better alternative to this test for evaluating female roles in movies. Our study introduces fresh data, an open-code framework, and novel techniques that present new opportunities in the research and analysis of movies.