Source author record

Duygu Sarikaya

Duygu Sarikaya appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision eess.IV

Catalog footprint

What is connected

4works

2topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

An Efficient Polyp Segmentation Network

Cancer is a disease that occurs as a result of the uncontrolled division and proliferation of cells. Colon cancer is one of the most common types of cancer in the world. Polyps that can be seen in the large intestine can cause cancer if not removed with early intervention. Deep learning and image segmentation techniques are used to minimize the number of polyps that goes unnoticed by the experts during these interventions. Although these techniques perform well in terms of accuracy, they require too many parameters. We propose a new model to address this problem. Our proposed model requires fewer parameters as well as outperforms the state-of-the-art models. We use EfficientNetB0 for the encoder part, as it performs well in various tasks while requiring fewer parameters. We use partial decoder, which is used to reduce the number of parameters while achieving high accuracy in segmentation. Since polyps have variable appearances and sizes, we use an asymmetric convolution block instead of a classic convolution block. Then, we weight each feature map using a squeeze and excitation block to improve our segmentation results. We used different splits of Kvasir and CVC-ClinicDB datasets for training, validation, and testing, while we use CVC- ColonDB, ETIS, and Endoscene datasets for testing. Our model outperforms state-of-art models with a Dice metric of %71.8 on the ColonDB test dataset, %89.3 on the EndoScene test dataset, and %74.8 on the ETIS test dataset while requiring fewer parameters. Our model requires 2.626.337 parameters in total while the closest model in the state-of-the-art is U-Net++ with 9.042.177 parameters.

preprint2020arXiv

Detection and Localization of Robotic Tools in Robot-Assisted Surgery Videos Using Deep Neural Networks for Region Proposal and Detection

Video understanding of robot-assisted surgery (RAS) videos is an active research area. Modeling the gestures and skill level of surgeons presents an interesting problem. The insights drawn may be applied in effective skill acquisition, objective skill assessment, real-time feedback, and human-robot collaborative surgeries. We propose a solution to the tool detection and localization open problem in RAS video understanding, using a strictly computer vision approach and the recent advances of deep learning. We propose an architecture using multimodal convolutional neural networks for fast detection and localization of tools in RAS videos. To our knowledge, this approach will be the first to incorporate deep neural networks for tool detection and localization in RAS videos. Our architecture applies a Region Proposal Network (RPN), and a multi-modal two stream convolutional network for object detection, to jointly predict objectness and localization on a fusion of image and temporal motion cues. Our results with an Average Precision (AP) of 91% and a mean computation time of 0.1 seconds per test frame detection indicate that our study is superior to conventionally used methods for medical imaging while also emphasizing the benefits of using RPN for precision and efficiency. We also introduce a new dataset, ATLAS Dione, for RAS video understanding. Our dataset provides video data of ten surgeons from Roswell Park Cancer Institute (RPCI) (Buffalo, NY) performing six different surgical tasks on the daVinci Surgical System (dVSS R ) with annotations of robotic tools per frame.

preprint2020arXiv

Surgical Gesture Recognition with Optical Flow only

In this paper, we address the open research problem of surgical gesture recognition using motion cues from video data only. We adapt Optical flow ConvNets initially proposed by Simonyan et al.. While Simonyan uses both RGB frames and dense optical flow, we use only dense optical flow representations as input to emphasize the role of motion in surgical gesture recognition, and present it as a robust alternative to kinematic data. We also overcome one of the limitations of Optical flow ConvNets by initializing our model with cross modality pre-training. A large number of promising studies that address surgical gesture recognition highly rely on kinematic data which requires additional recording devices. To our knowledge, this is the first paper that addresses surgical gesture recognition using dense optical flow information only. We achieve competitive results on JIGSAWS dataset, moreover, our model achieves more robust results with less standard deviation, which suggests optical flow information can be used as an alternative to kinematic data for the recognition of surgical gestures.

preprint2020arXiv

Towards Generalizable Surgical Activity Recognition Using Spatial Temporal Graph Convolutional Networks

Modeling and recognition of surgical activities poses an interesting research problem. Although a number of recent works studied automatic recognition of surgical activities, generalizability of these works across different tasks and different datasets remains a challenge. We introduce a modality that is robust to scene variation, and that is able to infer part information such as orientational and relative spatial relationships. The proposed modality is based on spatial temporal graph representations of surgical tools in videos, for surgical activity recognition. To explore its effectiveness, we model and recognize surgical gestures with the proposed modality. We construct spatial graphs connecting the joint pose estimations of surgical tools. Then, we connect each joint to the corresponding joint in the consecutive frames forming inter-frame edges representing the trajectory of the joint over time. We then learn hierarchical spatial temporal graph representations using Spatial Temporal Graph Convolutional Networks (ST-GCN). Our experiments show that learned spatial temporal graph representations perform well in surgical gesture recognition even when used individually. We experiment with the Suturing task of the JIGSAWS dataset where the chance baseline for gesture recognition is 10%. Our results demonstrate 68% average accuracy which suggests a significant improvement. Learned hierarchical spatial temporal graph representations can be used either individually, in cascades or as a complementary modality in surgical activity recognition, therefore provide a benchmark for future studies. To our knowledge, our paper is the first to use spatial temporal graph representations of surgical tools, and pose-based skeleton representations in general, for surgical activity recognition.

Duygu Sarikaya

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

An Efficient Polyp Segmentation Network

Detection and Localization of Robotic Tools in Robot-Assisted Surgery Videos Using Deep Neural Networks for Region Proposal and Detection

Surgical Gesture Recognition with Optical Flow only

Towards Generalizable Surgical Activity Recognition Using Spatial Temporal Graph Convolutional Networks