Source author record

Hayden Jananthan

Hayden Jananthan appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Distributed, Parallel, and Cluster Computing Networking and Internet Architecture Databases Performance Programming Languages Computation and Language Discrete Mathematics math.LO Social and Information Networks Software Engineering

Catalog footprint

What is connected

7works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Improving the Graph Challenge Reference Implementation

The MIT/IEEE/Amazon Graph Challenge provides a venue for individuals and teams to showcase new innovations in large-scale graph and sparse data analysis. The Anonymized Network Sensing Graph Challenge processes over 100 billion network packets to construct privacy-preserving traffic matrices, with a GraphBLAS reference implementation demonstrating how hypersparse matrices can be applied to this problem. This work presents a refactoring and benchmarking of a section of the reference code to improve clarity, adaptability, and performance. The original Python implementation spanning approximately 1000 lines across 3 files has been streamlined to 325 lines across two focused modules, achieving a 67% reduction in code size while maintaining full functionality. Using pMatlab and pPython distributed array programming libraries, the addition of parallel maps allowed for parallel benchmarking of the data. Scalable performance is demonstrated for large-scale summation and analysis of traffic matrices. The resulting implementation increases the potential impact of the Graph Challenge by providing a clear and efficient foundation for participants.

preprint2023arXiv

pPython Performance Study

pPython seeks to provide a parallel capability that provides good speed-up without sacrificing the ease of programming in Python by implementing partitioned global array semantics (PGAS) on top of a simple file-based messaging library (PythonMPI) in pure Python. pPython follows a SPMD (single program multiple data) model of computation. pPython runs on a single-node (e.g., a laptop) running Windows, Linux, or MacOS operating systems or on any combination of heterogeneous systems that support Python, including on a cluster through a Slurm scheduler interface so that pPython can be executed in a massively parallel computing environment. It is interesting to see what performance pPython can achieve compared to the traditional socket-based MPI communication because of its unique file-based messaging implementation. In this paper, we present the point-to-point and collective communication performances of pPython and compare them with those obtained by using mpi4py with OpenMPI. For large messages, pPython demonstrates comparable performance as compared to mpi4py.

preprint2022arXiv

Complexity and Avoidance

In this dissertation we examine the relationships between the several hierarchies, including the complexity, $\mathrm{LUA}$ (Linearly Universal Avoidance), and shift complexity hierarchies, with an eye towards quantitative bounds on growth rates therein. We show that for suitable $f$ and $p$, there are $q$ and $g$ such that $\mathrm{LUA}(q) \leq_\mathrm{s} \mathrm{COMPLEX}(f)$ and $\mathrm{COMPLEX}(g) \leq_\mathrm{s} \mathrm{LUA}(p)$, as well as quantify the growth rates of $q$ and $g$. In the opposite direction, we show that for certain sub-identical $f$ satisfying $\lim_{n \to \infty}{f(n)/n}=1$ there is a $q$ such that $\mathrm{COMPLEX}(f) \leq_\mathrm{w} \mathrm{LUA}(q)$, and for certain fast-growing $p$ there is a $g$ such that $\mathrm{LUA}(p) \leq_\mathrm{s} \mathrm{COMPLEX}(g)$, as well as quantify the growth rates of $q$ and $g$. Concerning shift complexity, explicit bounds are given on how slow-growing $q$ must be for any member of $\rm{LUA}(q)$ to compute $δ$-shift complex sequences. Motivated by the complexity hierarchy, we generalize the notion of shift complexity to consider sequences $X$ satisfying $\operatorname{KP}(τ) \geq f(|τ|) - O(1)$ for all substrings $τ$ of $X$ where $f$ is any order function. We show that for sufficiently slow-growing $f$, $f$-shift complex sequences can be uniformly computed by $g$-complex sequences, where $g$ grows slightly faster than $f$. The structure of the $\mathrm{LUA}$ hierarchy is examined using bushy tree forcing, with the main result being that for any order function $p$, there is a slow-growing order function $q$ such that $\mathrm{LUA}(p)$ and $\mathrm{LUA}(q)$ are weakly incomparable. Using this, we prove new results about the filter of the weak degrees of deep nonempty $Π^0_1$ classes and the connection between the shift complexity and $\mathrm{LUA}$ hierarchies.

preprint2022arXiv

Hypersparse Network Flow Analysis of Packets with GraphBLAS

Internet analysis is a major challenge due to the volume and rate of network traffic. In lieu of analyzing traffic as raw packets, network analysts often rely on compressed network flows (netflows) that contain the start time, stop time, source, destination, and number of packets in each direction. However, many traffic analyses benefit from temporal aggregation of multiple simultaneous netflows, which can be computationally challenging. To alleviate this concern, a novel netflow compression and resampling method has been developed leveraging GraphBLAS hyperspace traffic matrices that preserve anonymization while enabling subrange analysis. Standard multitemporal spatial analyses are then performed on each subrange to generate detailed statistical aggregates of the source packets, source fan-out, unique links, destination fan-in, and destination packets of each subrange which can then be used for background modeling and anomaly detection. A simple file format based on GraphBLAS sparse matrices is developed for storing these statistical aggregates. This method is scale tested on the MIT SuperCloud using a 50 trillion packet netflow corpus from several hundred sites collected over several months. The resulting compression achieved is significant (<0.1 bit per packet) enabling extremely large netflow analyses to be stored and transported. The single node parallel performance is analyzed in terms of both processors and threads showing that a single node can perform hundreds of simultaneous analyses at over a million packets/sec (roughly equivalent to a 10 Gigabit link).

preprint2022arXiv

Temporal Correlation of Internet Observatories and Outposts

The Internet has become a critical component of modern civilization requiring scientific exploration akin to endeavors to understand the land, sea, air, and space environments. Understanding the baseline statistical distributions of traffic are essential to the scientific understanding of the Internet. Correlating data from different Internet observatories and outposts can be a useful tool for gaining insights into these distributions. This work compares observed sources from the largest Internet telescope (the CAIDA darknet telescope) with those from a commercial outpost (the GreyNoise honeyfarm). Neither of these locations actively emit Internet traffic and provide distinct observations of unsolicited Internet traffic (primarily botnets and scanners). Newly developed GraphBLAS hyperspace matrices and D4M associative array technologies enable the efficient analysis of these data on significant scales. The CAIDA sources are well approximated by a Zipf-Mandelbrot distribution. Over a 6-month period 70\% of the brightest (highest frequency) sources in the CAIDA telescope are consistently detected by coeval observations in the GreyNoise honeyfarm. This overlap drops as the sources dim (reduce frequency) and as the time difference between the observations grows. The probability of seeing a CAIDA source is proportional to the logarithm of the brightness. The temporal correlations are well described by a modified Cauchy distribution. These observations are consistent with a correlated high frequency beam of sources that drifts on a time scale of a month.

preprint2020arXiv

AI Data Wrangling with Associative Arrays

The AI revolution is data driven. AI "data wrangling" is the process by which unusable data is transformed to support AI algorithm development (training) and deployment (inference). Significant time is devoted to translating diverse data representations supporting the many query and analysis steps found in an AI pipeline. Rigorous mathematical representations of these data enables data translation and analysis optimization within and across steps. Associative array algebra provides a mathematical foundation that naturally describes the tabular structures and set mathematics that are the basis of databases. Likewise, the matrix operations and corresponding inference/training calculations used by neural networks are also well described by associative arrays. More surprisingly, a general denormalized form of hierarchical formats, such as XML and JSON, can be readily constructed. Finally, pivot tables, which are among the most widely used data analysis tools, naturally emerge from associative array constructors. A common foundation in associative arrays provides interoperability guarantees, proving that their operations are linear systems with rigorous mathematical properties, such as, associativity, commutativity, and distributivity that are critical to reordering optimizations.

preprint2016arXiv

Associative Array Model of SQL, NoSQL, and NewSQL Databases

The success of SQL, NoSQL, and NewSQL databases is a reflection of their ability to provide significant functionality and performance benefits for specific domains, such as financial transactions, internet search, and data analysis. The BigDAWG polystore seeks to provide a mechanism to allow applications to transparently achieve the benefits of diverse databases while insulating applications from the details of these databases. Associative arrays provide a common approach to the mathematics found in different databases: sets (SQL), graphs (NoSQL), and matrices (NewSQL). This work presents the SQL relational model in terms of associative arrays and identifies the key mathematical properties that are preserved within SQL. These properties include associativity, commutativity, distributivity, identities, annihilators, and inverses. Performance measurements on distributivity and associativity show the impact these properties can have on associative array operations. These results demonstrate that associative arrays could provide a mathematical model for polystores to optimize the exchange of data and execution queries.

Hayden Jananthan

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

Improving the Graph Challenge Reference Implementation

pPython Performance Study

Complexity and Avoidance

Hypersparse Network Flow Analysis of Packets with GraphBLAS

Temporal Correlation of Internet Observatories and Outposts

AI Data Wrangling with Associative Arrays

Associative Array Model of SQL, NoSQL, and NewSQL Databases