Source author record

Kris Sankaran

Kris Sankaran appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Applications Computer Vision Machine Learning Computation eess.IV Genomics Methodology physics.ao-ph physics.comp-ph

Catalog footprint

What is connected

13works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Bootstrap Confidence Regions for Learned Feature Embeddings

Algorithmic feature learners provide high-dimensional vector representations for non-matrix structured signals, like images, audio, text, and graphs. Low-dimensional projections derived from these representations can be used to explore variation across collections of these data. However, it is not clear how to assess the uncertainty associated with these projections. We adapt methods developed for bootstrapping principal components analysis to the setting where features are learned from non-matrix data. We empirically compare the derived confidence regions in simulations, varying factors that influence both feature learning and the bootstrap. Approaches are illustrated on spatial proteomic data. Code, data, and trained models are released as an R compendium.

preprint2022arXiv

Discovering Concepts in Learned Representations using Statistical Inference and Interactive Visualization

Concept discovery is one of the open problems in the interpretability literature that is important for bridging the gap between non-deep learning experts and model end-users. Among current formulations, concepts defines them by as a direction in a learned representation space. This definition makes it possible to evaluate whether a particular concept significantly influences classification decisions for classes of interest. However, finding relevant concepts is tedious, as representation spaces are high-dimensional and hard to navigate. Current approaches include hand-crafting concept datasets and then converting them to latent space directions; alternatively, the process can be automated by clustering the latent space. In this study, we offer another two approaches to guide user discovery of meaningful concepts, one based on multiple hypothesis testing, and another on interactive visualization. We explore the potential value and limitations of these approaches through simulation experiments and an demo visual interface to real data. Overall, we find that these techniques offer a promising strategy for discovering relevant concepts in settings where users do not have predefined descriptions of them, but without completely automating the process.

preprint2022arXiv

Generative Models: An Interdisciplinary Perspective

By linking conceptual theories with observed data, generative models can support reasoning in complex situations. They have come to play a central role both within and beyond statistics, providing the basis for power analysis in molecular biology, theory building in particle physics, and resource allocation in epidemiology, for example. We introduce the probabilistic and computational concepts underlying modern generative models and then analyze how they can be used to inform experimental design, iterative model refinement, goodness-of-fit evaluation, and agent-based simulation. We emphasize a modular view of generative mechanisms and discuss how they can be flexibly recombined in new problem contexts. We provide practical illustrations throughout, and code for reproducing all examples is available at https://github.com/krisrs1128/generative_review. Finally, we observe how research in generative models is currently split across several islands of activity, and we highlight opportunities lying at disciplinary intersections.

preprint2022arXiv

Interactive Visualization and Representation Analysis Applied to Glacier Segmentation

Interpretability has attracted increasing attention in earth observation problems. We apply interactive visualization and representation analysis to guide interpretation of glacier segmentation models. We visualize the activations from a U-Net to understand and evaluate the model performance. We build an online interface using the Shiny R package to provide comprehensive error analysis of the predictions. Users can interact with the panels and discover model failure modes. Further, we discuss how visualization can provide sanity checks during data preprocessing and model training.

preprint2022arXiv

Multiscale Analysis of Count Data through Topic Alignment

Topic modeling is a popular method used to describe biological count data. With topic models, the user must specify the number of topics $K$. Since there is no definitive way to choose $K$ and since a true value might not exist, we develop techniques to study the relationships across models with different $K$. This can show how many topics are consistently present across different models, if a topic is only transiently present, or if a topic splits in two when $K$ increases. This strategy gives more insight into the process generating the data than choosing a single value of $K$ would. We design a visual representation of these cross-model relationships, which we call a topic alignment, and present three diagnostics based on it. We show the effectiveness of these tools for interpreting the topics on simulated and real data, and we release an accompanying R package, alto, available at https://lasy.github.io/alto.

preprint2022arXiv

Sampling Strategy for Fine-Tuning Segmentation Models to Crisis Area under Scarcity of Data

The use of remote sensing in humanitarian crisis response missions is well-established and has proven relevant repeatedly. One of the problems is obtaining gold annotations as it is costly and time consuming which makes it almost impossible to fine-tune models to new regions affected by the crisis. Where time is critical, resources are limited and environment is constantly changing, models has to evolve and provide flexible ways to adapt to a new situation. The question that we want to answer is if prioritization of samples provide better results in fine-tuning vs other classical sampling methods under annotated data scarcity? We propose a method to guide data collection during fine-tuning, based on estimated model and sample properties, like predicted IOU score. We propose two formulas for calculating sample priority. Our approach blends techniques from interpretability, representation learning and active learning. We have applied our method to a deep learning model for semantic segmentation, U-Net, in a remote sensing application of building detection - one of the core use cases of remote sensing in humanitarian applications. Preliminary results shows utility in prioritization of samples for tuning semantic segmentation models under scarcity of data condition.

preprint2022arXiv

Source data selection for out-of-domain generalization

Models that perform out-of-domain generalization borrow knowledge from heterogeneous source data and apply it to a related but distinct target task. Transfer learning has proven effective for accomplishing this generalization in many applications. However, poor selection of a source dataset can lead to poor performance on the target, a phenomenon called negative transfer. In order to take full advantage of available source data, this work studies source data selection with respect to a target task. We propose two source selection methods that are based on the multi-bandit theory and random search, respectively. We conduct a thorough empirical evaluation on both simulated and real data. Our proposals can be also viewed as diagnostics for the existence of a reweighted source subsamples that perform better than the random selection of available samples.

preprint2022arXiv

Spatial Transcriptomics Dimensionality Reduction using Wavelet Bases

Spatially resolved transcriptomics (ST) measures gene expression along with the spatial coordinates of the measurements. The analysis of ST data involves significant computation complexity. In this work, we propose gene expression dimensionality reduction algorithm that retains spatial structure. We combine the wavelet transformation with matrix factorization to select spatially-varying genes. We extract a low-dimensional representation of these genes. We consider Empirical Bayes setting, imposing regularization through the prior distribution of factor genes. Additionally, We provide visualization of extracted representation genes capturing the global spatial pattern. We illustrate the performance of our methods by spatial structure recovery and gene expression reconstruction in simulation. In real data experiments, our method identifies spatial structure of gene factors and outperforms regular decomposition regarding reconstruction error. We found the connection between the fluctuation of gene patterns and wavelet technique, providing smoother visualization. We develop the package and share the workflow generating reproducible quantitative results and gene visualization. The package is available at https://github.com/OliverXUZY/waveST.

preprint2021arXiv

Measuring the Stability of Learned Features

Many modern datasets don't fit neatly into $n \times p$ matrices, but most techniques for measuring statistical stability expect rectangular data. We study methods for stability assessment on non-rectangular data, using statistical learning algorithms to extract rectangular latent features. We design controlled simulations to characterize the power and practicality of competing approaches. This motivates new strategies for visualizing feature stability. Our stability curves supplement the direct analysis, providing information about the reliability of inferences based on learned features. Finally, we illustrate our approach using a spatial proteomics dataset, where machine learning tools can augment the scientist's workflow, but where guarantees of statistical reproducibility are still central. Our raw data, packaged code, and experimental outputs are publicly available.

preprint2020arXiv

HighRes-net: Recursive Fusion for Multi-Frame Super-Resolution of Satellite Imagery

Generative deep learning has sparked a new wave of Super-Resolution (SR) algorithms that enhance single images with impressive aesthetic results, albeit with imaginary details. Multi-frame Super-Resolution (MFSR) offers a more grounded approach to the ill-posed problem, by conditioning on multiple low-resolution views. This is important for satellite monitoring of human impact on the planet -- from deforestation, to human rights violations -- that depend on reliable imagery. To this end, we present HighRes-net, the first deep learning approach to MFSR that learns its sub-tasks in an end-to-end fashion: (i) co-registration, (ii) fusion, (iii) up-sampling, and (iv) registration-at-the-loss. Co-registration of low-resolution views is learned implicitly through a reference-frame channel, with no explicit registration mechanism. We learn a global fusion operator that is applied recursively on an arbitrary number of low-resolution pairs. We introduce a registered loss, by learning to align the SR output to a ground-truth through ShiftNet. We show that by learning deep representations of multiple views, we can super-resolve low-resolution signals and enhance Earth Observation data at scale. Our approach recently topped the European Space Agency's MFSR competition on real-world satellite imagery.

preprint2020arXiv

Modeling Cloud Reflectance Fields using Conditional Generative Adversarial Networks

We introduce a conditional Generative Adversarial Network (cGAN) approach to generate cloud reflectance fields (CRFs) conditioned on large scale meteorological variables such as sea surface temperature and relative humidity. We show that our trained model can generate realistic CRFs from the corresponding meteorological observations, which represents a step towards a data-driven framework for stochastic cloud parameterization.

preprint2020arXiv

Nanoscale Microscopy Images Colorization Using Neural Networks

Microscopy images are powerful tools and widely used in the majority of research areas, such as biology, chemistry, physics and materials fields by various microscopies (scanning electron microscope (SEM), atomic force microscope (AFM) and the optical microscope, et al.). However, most of the microscopy images are colorless due to the unique imaging mechanism. Though investigating on some popular solutions proposed recently about colorizing images, we notice the process of those methods are usually tedious, complicated, and time-consuming. In this paper, inspired by the achievement of machine learning algorithms on different science fields, we introduce two artificial neural networks for gray microscopy image colorization: An end-to-end convolutional neural network (CNN) with a pre-trained model for feature extraction and a pixel-to-pixel neural style transfer convolutional neural network (NST-CNN), which can colorize gray microscopy images with semantic information learned from a user-provided colorful image at inference time. The results demonstrate that our algorithm not only can colorize the microscopy images under complex circumstances precisely but also make the color naturally according to the training of a massive number of nature images with proper hue and saturation.

preprint2016arXiv

Opioid Atlas: Mapping Access to Pain Medication

Opiates are some of the most effective pain relief medications available for patients suffering from cancer and surgery-related pain. Despite the affordability and effectiveness of these medications, access to opiates is highly geographically variable. Pain researchers have attributed geographic variation to various factors including the fear of opioid addiction, diversion of legal opiods to the underground market and pharmaceutical industry influences. However, the extent to which there is inequity in untreated cancer and surgery-related pain is unknown. To help opioid investigators study these questions, we designed a tool, the Opioid Atlas, for exploring data on legal opioid consumption, by country and time, collected by the International Narcotics Control Board. Our design borrows ideas from the data visualization and multivariate statistics communities, especially the principles of linking and dimensionality reduction. Our work is relevant to policymakers and pain researchers who wish to systematically assess country-level factors that contribute to differences in opioid access for patients with cancer and surgery-related pain. The Opioid Atlas, and the code behind it, is freely available with an open source license.

Kris Sankaran

What is connected

Connect this record

See the researcher in context

Building this map preview

13 published item(s)

Bootstrap Confidence Regions for Learned Feature Embeddings

Discovering Concepts in Learned Representations using Statistical Inference and Interactive Visualization

Generative Models: An Interdisciplinary Perspective

Interactive Visualization and Representation Analysis Applied to Glacier Segmentation

Multiscale Analysis of Count Data through Topic Alignment

Sampling Strategy for Fine-Tuning Segmentation Models to Crisis Area under Scarcity of Data

Source data selection for out-of-domain generalization

Spatial Transcriptomics Dimensionality Reduction using Wavelet Bases

Measuring the Stability of Learned Features

HighRes-net: Recursive Fusion for Multi-Frame Super-Resolution of Satellite Imagery

Modeling Cloud Reflectance Fields using Conditional Generative Adversarial Networks

Nanoscale Microscopy Images Colorization Using Neural Networks

Opioid Atlas: Mapping Access to Pain Medication