Researcher profile

Nadiia Chepurko

Nadiia Chepurko contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2022arXiv

Learning Program Representations for Food Images and Cooking Recipes

In this paper, we are interested in modeling a how-to instructional procedure, such as a cooking recipe, with a meaningful and rich high-level representation. Specifically, we propose to represent cooking recipes and food images as cooking programs. Programs provide a structured representation of the task, capturing cooking semantics and sequential relationships of actions in the form of a graph. This allows them to be easily manipulated by users and executed by agents. To this end, we build a model that is trained to learn a joint embedding between recipes and food images via self-supervision and jointly generate a program from this embedding as a sequence. To validate our idea, we crowdsource programs for cooking recipes and show that: (a) projecting the image-recipe embeddings into programs leads to better cross-modal retrieval results; (b) generating programs from images leads to better recognition results compared to predicting raw cooking instructions; and (c) we can generate food images by manipulating programs via optimizing the latent code of a GAN. Code, data, and models are available online.

preprint2022arXiv

Quantum-Inspired Algorithms from Randomized Numerical Linear Algebra

We create classical (non-quantum) dynamic data structures supporting queries for recommender systems and least-squares regression that are comparable to their quantum analogues. De-quantizing such algorithms has received a flurry of attention in recent years; we obtain sharper bounds for these problems. More significantly, we achieve these improvements by arguing that the previous quantum-inspired algorithms for these problems are doing leverage or ridge-leverage score sampling in disguise; these are powerful and standard techniques in randomized numerical linear algebra. With this recognition, we are able to employ the large body of work in numerical linear algebra to obtain algorithms for these problems that are simpler or faster (or both) than existing approaches. Our experiments demonstrate that the proposed data structures also work well on real-world datasets.

preprint2020arXiv

ARDA: Automatic Relational Data Augmentation for Machine Learning

Automatic machine learning (\AML) is a family of techniques to automate the process of training predictive models, aiming to both improve performance and make machine learning more accessible. While many recent works have focused on aspects of the machine learning pipeline like model selection, hyperparameter tuning, and feature selection, relatively few works have focused on automatic data augmentation. Automatic data augmentation involves finding new features relevant to the user's predictive task with minimal ``human-in-the-loop'' involvement. We present \system, an end-to-end system that takes as input a dataset and a data repository, and outputs an augmented data set such that training a predictive model on this augmented dataset results in improved performance. Our system has two distinct components: (1) a framework to search and join data with the input data, based on various attributes of the input, and (2) an efficient feature selection algorithm that prunes out noisy or irrelevant features from the resulting join. We perform an extensive empirical evaluation of different system components and benchmark our feature selection algorithm on real-world datasets.

preprint2020arXiv

Testing Positive Semi-Definiteness via Random Submatrices

We study the problem of testing whether a matrix $\mathbf{A} \in \mathbb{R}^{n \times n}$ with bounded entries ($\|\mathbf{A}\|_\infty \leq 1$) is positive semi-definite (PSD), or $ε$-far in Euclidean distance from the PSD cone, meaning that $\min_{\mathbf{B} \succeq 0} \|\mathbf{A} - \mathbf{B}\|_F^2 > εn^2$, where $\mathbf{B} \succeq 0$ denotes that $\mathbf{B}$ is PSD. Our main algorithmic contribution is a non-adaptive tester which distinguishes between these cases using only $\tilde{O}(1/ε^4)$ queries to the entries of $\mathbf{A}$. If instead of the Euclidean norm we considered the distance in spectral norm, we obtain the "$\ell_\infty$-gap problem", where $\mathbf{A}$ is either PSD or satisfies $\min_{\mathbf{B}\succeq 0} \|\mathbf{A}- \mathbf{B}\|_2 > εn$. For this related problem, we give a $\tilde{O}(1/ε^2)$ query tester, which we show is optimal up to $\log(1/ε)$ factors. Our testers randomly sample a collection of principal submatrices and check whether these submatrices are PSD. Consequentially, our algorithms achieve one-sided error: whenever they output that $\mathbf{A}$ is not PSD, they return a certificate that $\mathbf{A}$ has negative eigenvalues. We complement our upper bound for PSD testing with Euclidean norm distance by giving a $\tildeΩ(1/ε^2)$ lower bound for any non-adaptive algorithm. Our lower bound construction is general, and can be used to derive lower bounds for a number of spectral testing problems. As an example of the applicability of our construction, we obtain a new $\tildeΩ(1/ε^4)$ sampling lower bound for testing the Schatten-$1$ norm with a $εn^{1.5}$ gap, extending a result of Balcan, Li, Woodruff, and Zhang [SODA'19]. In addition, it yields new sampling lower bounds for estimating the Ky-Fan Norm, and the cost of the best rank-$k$ approximation.

preprint2020arXiv

Weighted Maximum Independent Set of Geometric Objects in Turnstile Streams

We study the Maximum Independent Set problem for geometric objects given in the data stream model. A set of geometric objects is said to be independent if the objects are pairwise disjoint. We consider geometric objects in one and two dimensions, i.e., intervals and disks. Let $α$ be the cardinality of the largest independent set. Our goal is to estimate $α$ in a small amount of space, given that the input is received as a one-pass stream. We also consider a generalization of this problem by assigning weights to each object and estimating $β$, the largest value of a weighted independent set. We initialize the study of this problem in the turnstile streaming model (insertions and deletions) and provide the first algorithms for estimating $α$ and $β$. For unit-length intervals, we obtain a $(2+ε)$-approximation to $α$ and $β$ in poly$(\frac{\log(n)}ε)$ space. We also show a matching lower bound. Combined with the $3/2$-approximation for insertion-only streams by Cabello and Perez-Lanterno [CP15], our result implies a separation between the insertion-only and turnstile model. For unit-radius disks, we obtain a $\left(\frac{8\sqrt{3}}π\right)$-approximation to $α$ and $β$ in poly$(\log(n), ε^{-1})$ space, which is closely related to the hexagonal circle packing constant. We provide algorithms for estimating $α$ for arbitrary-length intervals under a bounded intersection assumption and study the parameterized space complexity of estimating $α$ and $β$, where the parameter is the ratio of maximum to minimum interval length.