Source author record

Alper Yilmaz

Alper Yilmaz appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision eess.IV Machine Learning Human-Computer Interaction Robotics

Catalog footprint

What is connected

10works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Rethinking the Good Enough Embedding for Easy Few-Shot Learning

The field of deep visual recognition is undergoing a paradigm shift toward universal representations. The Platonic Representation Hypothesis suggests that diverse architectures trained on massive datasets are converging toward a shared, "ideal" latent space. This again raises a critical question: is a "Good Embedding All You Need?" In this paper, we leverage this convergence to demonstrate that off-the-shelf embeddings are inherently "good enough" for complex tasks, rendering intensive task-specific fine-tuning unnecessary. We explore this hypothesis within the few-shot learning framework, proposing a straightforward, non-parametric pipeline that entirely bypasses backpropagation. By utilizing a k-Nearest Neighbor classifier on frozen DINOv2-L features, we conduct a layer-wise characterization to identify an optimal feature extraction. We further demonstrate that manifold refinement via PCA and ICA provides a beneficial regularizing effect. Our results across four major benchmarks demonstrate that our approach consistently surpasses sophisticated meta-learning algorithms, achieving state-of-the-art performance.

preprint2025arXiv

MotivNet: Evolving Meta-Sapiens into an Emotionally Intelligent Foundation Model

In this paper, we introduce MotivNet, a generalizable facial emotion recognition model for robust real-world application. Current state-of-the-art FER models tend to have weak generalization when tested on diverse data, leading to deteriorated performance in the real world and hindering FER as a research domain. Though researchers have proposed complex architectures to address this generalization issue, they require training cross-domain to obtain generalizable results, which is inherently contradictory for real-world application. Our model, MotivNet, achieves competitive performance across datasets without cross-domain training by using Meta-Sapiens as a backbone. Sapiens is a human vision foundational model with state-of-the-art generalization in the real world through large-scale pretraining of a Masked Autoencoder. We propose MotivNet as an additional downstream task for Sapiens and define three criteria to evaluate MotivNet's viability as a Sapiens task: benchmark performance, model similarity, and data similarity. Throughout this paper, we describe the components of MotivNet, our training approach, and our results showing MotivNet is generalizable across domains. We demonstrate that MotivNet can be benchmarked against existing SOTA models and meets the listed criteria, validating MotivNet as a Sapiens downstream task, and making FER more incentivizing for in-the-wild application. The code is available at https://github.com/OSUPCVLab/EmotionFromFaceImages.

preprint2022arXiv

A Gis Aided Approach for Geolocalizing an Unmanned Aerial System Using Deep Learning

The Global Positioning System (GPS) has become a part of our daily life with the primary goal of providing geopositioning service. For an unmanned aerial system (UAS), geolocalization ability is an extremely important necessity which is achieved using Inertial Navigation System (INS) with the GPS at its heart. Without geopositioning service, UAS is unable to fly to its destination or come back home. Unfortunately, GPS signals can be jammed and suffer from a multipath problem in urban canyons. Our goal is to propose an alternative approach to geolocalize a UAS when GPS signal is degraded or denied. Considering UAS has a downward-looking camera on its platform that can acquire real-time images as the platform flies, we apply modern deep learning techniques to achieve geolocalization. In particular, we perform image matching to establish latent feature conjugates between UAS acquired imagery and satellite orthophotos. A typical application of feature matching suffers from high-rise buildings and new constructions in the field that introduce uncertainties into homography estimation, hence results in poor geolocalization performance. Instead, we extract GIS information from OpenStreetMap (OSM) to semantically segment matched features into building and terrain classes. The GIS mask works as a filter in selecting semantically matched features that enhance coplanarity conditions and the UAS geolocalization accuracy. Once the paper is published our code will be publicly available at https://github.com/OSUPCVLab/UbihereDrone2021.

preprint2022arXiv

Engineering deep learning methods on automatic detection of damage in infrastructure due to extreme events

This paper presents a few comprehensive experimental studies for automated Structural Damage Detection (SDD) in extreme events using deep learning methods for processing 2D images. In the first study, a 152-layer Residual network (ResNet) is utilized to classify multiple classes in eight SDD tasks, which include identification of scene levels, damage levels, material types, etc. The proposed ResNet achieved high accuracy for each task while the positions of the damage are not identifiable. In the second study, the existing ResNet and a segmentation network (U-Net) are combined into a new pipeline, cascaded networks, for categorizing and locating structural damage. The results show that the accuracy of damage detection is significantly improved compared to only using a segmentation network. In the third and fourth studies, end-to-end networks are developed and tested as a new solution to directly detect cracks and spalling in the image collections of recent large earthquakes. One of the proposed networks can achieve an accuracy above 67.6% for all tested images at various scales and resolutions, and shows its robustness for these human-free detection tasks. As a preliminary field study, we applied the proposed method to detect damage in a concrete structure that was tested to study its progressive collapse performance. The experiments indicate that these solutions for automatic detection of structural damage using deep learning methods are feasible and promising. The training datasets and codes will be made available for the public upon the publication of this paper.

preprint2022arXiv

How important are socioeconomic factors for hurricane performance of power systems? An analysis of disparities through machine learning

This paper investigates whether socioeconomic factors are important for the hurricane performance of the electric power system in Florida. The investigation is performed using the Random Forest classifier with Mean Decrease of Accuracy (MDA) for measuring the importance of a set of factors that include hazard intensity, time to recovery from maximum impact, and socioeconomic characteristics of the affected population. The data set (at county scale) for this study includes socioeconomic variables from the 5-year American Community Survey (ACS), as well as wind velocities, and outage data of five hurricanes including Alberto and Michael in 2018, Dorian in 2019, and Eta and Isaias in 2020. The study shows that socioeconomic variables are considerably important for the system performance model. This indicates that social disparities may exist in the occurrence of power outages, which directly impact the resilience of communities and thus require immediate attention.

preprint2022arXiv

Learning to Drive Using Sparse Imitation Reinforcement Learning

In this paper, we propose Sparse Imitation Reinforcement Learning (SIRL), a hybrid end-to-end control policy that combines the sparse expert driving knowledge with reinforcement learning (RL) policy for autonomous driving (AD) task in CARLA simulation environment. The sparse expert is designed based on hand-crafted rules which is suboptimal but provides a risk-averse strategy by enforcing experience for critical scenarios such as pedestrian and vehicle avoidance, and traffic light detection. As it has been demonstrated, training a RL agent from scratch is data-inefficient and time consuming particularly for the urban driving task, due to the complexity of situations stemming from the vast size of state space. Our SIRL strategy provides a solution to solve these problems by fusing the output distribution of the sparse expert policy and the RL policy to generate a composite driving policy. With the guidance of the sparse expert during the early training stage, SIRL strategy accelerates the training process and keeps the RL exploration from causing a catastrophe outcome, and ensures safe exploration. To some extent, the SIRL agent is imitating the driving expert's behavior. At the same time, it continuously gains knowledge during training therefore it keeps making improvement beyond the sparse expert, and can surpass both the sparse expert and a traditional RL agent. We experimentally validate the efficacy of proposed SIRL approach in a complex urban scenario within the CARLA simulator. Besides, we compare the SIRL agent's performance for risk-averse exploration and high learning efficiency with the traditional RL approach. We additionally demonstrate the SIRL agent's generalization ability to transfer the driving skill to unseen environment.

preprint2022arXiv

Network Comparison Study of Deep Activation Feature Discriminability with Novel Objects

Feature extraction has always been a critical component of the computer vision field. More recently, state-of-the-art computer visions algorithms have incorporated Deep Neural Networks (DNN) in feature extracting roles, creating Deep Convolutional Activation Features (DeCAF). The transferability of DNN knowledge domains has enabled the wide use of pretrained DNN feature extraction for applications with novel object classes, especially those with limited training data. This study analyzes the general discriminability of novel object visual appearances encoded into the DeCAF space of six of the leading visual recognition DNN architectures. The results of this study characterize the Mahalanobis distances and cosine similarities between DeCAF object manifolds across two visual object tracking benchmark data sets. The backgrounds surrounding each object are also included as an object classes in the manifold analysis, providing a wider range of novel classes. This study found that different network architectures led to different network feature focuses that must to be considered in the network selection process. These results are generated from the VOT2015 and UAV123 benchmark data sets; however, the proposed methods can be applied to efficiently compare estimated network performance characteristics for any labeled visual data set.

preprint2022arXiv

UAS Navigation in the Real World Using Visual Observation

This paper presents a novel end-to-end Unmanned Aerial System (UAS) navigation approach for long-range visual navigation in the real world. Inspired by dual-process visual navigation system of human's instinct: environment understanding and landmark recognition, we formulate the UAS navigation task into two same phases. Our system combines the reinforcement learning (RL) and image matching approaches. First, the agent learns the navigation policy using RL in the specified environment. To achieve this, we design an interactive UASNAV environment for the training process. Once the agent learns the navigation policy, which means 'familiarized themselves with the environment', we let the UAS fly in the real world to recognize the landmarks using image matching method and take action according to the learned policy. During the navigation process, the UAS is embedded with single camera as the only visual sensor. We demonstrate that the UAS can learn navigating to the destination hundreds meters away from the starting point with the shortest path in the real world scenario.

preprint2020arXiv

UAVid: A Semantic Segmentation Dataset for UAV Imagery

Semantic segmentation has been one of the leading research interests in computer vision recently. It serves as a perception foundation for many fields, such as robotics and autonomous driving. The fast development of semantic segmentation attributes enormously to the large scale datasets, especially for the deep learning related methods. There already exist several semantic segmentation datasets for comparison among semantic segmentation methods in complex urban scenes, such as the Cityscapes and CamVid datasets, where the side views of the objects are captured with a camera mounted on the driving car. There also exist semantic labeling datasets for the airborne images and the satellite images, where the top views of the objects are captured. However, only a few datasets capture urban scenes from an oblique Unmanned Aerial Vehicle (UAV) perspective, where both of the top view and the side view of the objects can be observed, providing more information for object recognition. In this paper, we introduce our UAVid dataset, a new high-resolution UAV semantic segmentation dataset as a complement, which brings new challenges, including large scale variation, moving object recognition and temporal consistency preservation. Our UAV dataset consists of 30 video sequences capturing 4K high-resolution images in slanted views. In total, 300 images have been densely labeled with 8 classes for the semantic labeling task. We have provided several deep learning baseline methods with pre-training, among which the proposed Multi-Scale-Dilation net performs the best via multi-scale feature extraction. Our UAVid website and the labeling tool have been published https://uavid.nl/.

preprint2019arXiv

The Mobile AR Sensor Logger for Android and iOS Devices

In recent years, commodity mobile devices equipped with cameras and inertial measurement units (IMUs) have attracted much research and design effort for augmented reality (AR) and robotics applications. Based on such sensors, many commercial AR toolkits and public benchmark datasets have been made available to accelerate hatching and validating new ideas. To lower the difficulty and enhance the flexibility in accessing the rich raw data of typical AR sensors on mobile devices, this paper present the mobile AR sensor (MARS) logger for two of the most popular mobile operating systems, Android and iOS. The logger highlights the best possible synchronization between the camera and the IMU allowed by a mobile device, and efficient saving of images at about 30Hz, and recording the metadata relevant to AR applications. This logger has been tested on a relatively large spectrum of mobile devices, and the collected data has been used for analyzing the sensor characteristics. We see that this application will facilitate research and development related to AR and robotics, so it has been open sourced at https://github.com/OSUPCVLab/mobile-ar-sensor-logger.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint