Researcher profile

Nancy Hitschfeld

Nancy Hitschfeld contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2022arXiv

A Scalable and Energy Efficient GPU Thread Map for m-Simplex Domains

This work proposes a new GPU thread map for $m$-simplex domains, that scales its speedup with dimension and is energy efficient compared to other state of the art approaches. The main contributions of this work are i) the formulation of the new block-space map $\mathcal{H}: \mathbb{Z}^m \mapsto \mathbb{Z}^m$ for regular orthogonal simplex domains, which is analyzed in terms of resource usage, and ii) the experimental evaluation in terms of speedup over a bounding box approach and energy efficiency as elements per second per Watt. Results from the analysis show that $\mathcal{H}$ has a potential speedup of up to $2\times$ and $6\times$ for $2$ and $3$-simplices, respectively. Experimental evaluation shows that $\mathcal{H}$ is competitive for $2$-simplices, reaching $1.2\times \sim 2.0\times$ of speedup for different tests, which is on par with the fastest state of the art approaches. For $3$-simplices $\mathcal{H}$ reaches up to $1.3\times \sim 6.0\times$ of speedup making it the fastest of all. The extension of $\mathcal{H}$ to higher dimensional $m$-simplices is feasible and has a potential speedup that scales as $m!$ given a proper selection of parameters $r, β$ which are the scaling and replication factors, respectively. In terms of energy consumption, although $\mathcal{H}$ is among the highest in power consumption, it compensates by its short duration, making it one of the most energy efficient approaches. Lastly, further improvements with Tensor and Ray Tracing Cores are analyzed, giving insights to leverage each one of them. The results obtained in this work show that $\mathcal{H}$ is a scalable and energy efficient map that can contribute to the efficiency of GPU applications when they need to process $m$-simplex domains, such as Cellular Automata or PDE simulations.

preprint2022arXiv

A Topological Data Analysis Based Classifier

Topological Data Analysis (TDA) is an emergent field that aims to discover topological information hidden in a dataset. TDA tools have been commonly used to create filters and topological descriptors to improve Machine Learning (ML) methods. This paper proposes an algorithm that applies TDA directly to multi-class classification problems, without any further ML stage, showing advantages for imbalanced datasets. The proposed algorithm builds a filtered simplicial complex on the dataset. Persistent Homology (PH) is applied to guide the selection of a sub-complex where unlabeled points obtain the label with the majority of votes from labeled neighboring points. We select 8 datasets with different dimensions, degrees of class overlap and imbalanced samples per class. On average, the proposed TDABC method was better than KNN and weighted-KNN. It behaves competitively with Local SVM and Random Forest baseline classifiers in balanced datasets, and it outperforms all baseline methods classifying entangled and minority classes.

preprint2022arXiv

GPU Parallel algorithm for the generation of polygonal meshes based on terminal-edge regions

This paper presents a GPU parallel algorithm to generate a new kind of polygonal meshes obtained from Delaunay triangulations. To generate the polygonal mesh, the algorithm first uses a classification system to label each edge of an input triangulation; second it builds polygons (simple or not) from terminal-edge regions using the label system, and third it transforms each non-simple polygon from the previous phase into simple ones, convex or not convex polygons. We show some preliminary experiments to test the scalability of the algorithm and compare it with the sequential version. We also run a very simple test to show that these meshes can be useful for the virtual element method.

preprint2022arXiv

Squeeze: Efficient Compact Fractals for Tensor Core GPUs

This work presents Squeeze, an efficient compact fractal processing scheme for tensor core GPUs. By combining discrete-space transformations between compact and expanded forms, one can do data-parallel computation on a fractal with neighborhood access without needing to expand the fractal in memory. The space transformations are formulated as two GPU tensor-core accelerated thread maps, $λ(ω)$ and $ν(ω)$, which act as compact-to-expanded and expanded-to-compact space functions, respectively. The cost of the maps is $\mathcal{O}(\log_2 \log_s(n))$ time, with $n$ being the side of a $n \times n$ embedding for the fractal in its expanded form, and $s$ the linear scaling factor. The proposed approach works for any fractal that belongs to the Non-overlapping-Bounding-Boxes (NBB) class of discrete fractals, and can be extended to three dimensions as well. Experimental results using a discrete Sierpinski Triangle as a case study shows up to $\sim12\times$ of speedup and a memory reduction factor of up to $\sim 315\times$ with respect to a GPU-based expanded-space bounding box approach. These results show that the proposed compact approach will allow the scientific community to efficiently tackle problems that up to now could not fit into GPU memory.

preprint2021arXiv

Classification based on Topological Data Analysis

Topological Data Analysis (TDA) is an emergent field that aims to discover topological information hidden in a dataset. TDA tools have been commonly used to create filters and topological descriptors to improve Machine Learning (ML) methods. This paper proposes an algorithm that applies TDA directly to multi-class classification problems, even imbalanced datasets, without any further ML stage. The proposed algorithm built a filtered simplicial complex on the dataset. Persistent homology is then applied to guide choosing a sub-complex where unlabeled points obtain the label with most votes from labeled neighboring points. To assess the proposed method, 8 datasets were selected with several degrees of class entanglement, variability on the samples per class, and dimensionality. On average, the proposed TDABC method was capable of overcoming baseline classifiers (wk-NN and k-NN) in each of the computed metrics, especially on classifying entangled and minority classes.

preprint2020arXiv

Efficient GPU Thread Mapping on Embedded 2D Fractals

This work proposes a new approach for mapping GPU threads onto a family of discrete embedded 2D fractals. A block-space map $λ: \mathbb{Z}_{\mathbb{E}}^{2} \mapsto \mathbb{Z}_{\mathbb{F}}^{2}$ is proposed, from Euclidean parallel space $\mathbb{E}$ to embedded fractal space $\mathbb{F}$, that maps in $\mathcal{O}(\log_2 \log_2(n))$ time and uses no more than $\mathcal{O}(n^\mathbb{H})$ threads with $\mathbb{H}$ being the Hausdorff dimension of the fractal, making it parallel space efficient. When compared to a bounding-box (BB) approach, $λ(ω)$ offers a sub-exponential improvement in parallel space and a monotonically increasing speedup $n \ge n_0$. The Sierpinski gasket fractal is used as a particular case study and the experimental performance results show that $λ(ω)$ reaches up to $9\times$ of speedup over the bounding-box approach. A tensor-core based implementation of $λ(ω)$ is also proposed for modern GPUs, providing up to $\sim40\%$ of extra performance. The results obtained in this work show that doing efficient GPU thread mapping on fractal domains can significantly improve the performance of several applications that work with this type of geometry.