Researcher profile

Hayden Jananthan

Hayden Jananthan contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2026arXiv

Improving the Graph Challenge Reference Implementation

The MIT/IEEE/Amazon Graph Challenge provides a venue for individuals and teams to showcase new innovations in large-scale graph and sparse data analysis. The Anonymized Network Sensing Graph Challenge processes over 100 billion network packets to construct privacy-preserving traffic matrices, with a GraphBLAS reference implementation demonstrating how hypersparse matrices can be applied to this problem. This work presents a refactoring and benchmarking of a section of the reference code to improve clarity, adaptability, and performance. The original Python implementation spanning approximately 1000 lines across 3 files has been streamlined to 325 lines across two focused modules, achieving a 67% reduction in code size while maintaining full functionality. Using pMatlab and pPython distributed array programming libraries, the addition of parallel maps allowed for parallel benchmarking of the data. Scalable performance is demonstrated for large-scale summation and analysis of traffic matrices. The resulting implementation increases the potential impact of the Graph Challenge by providing a clear and efficient foundation for participants.

preprint2023arXiv

pPython Performance Study

pPython seeks to provide a parallel capability that provides good speed-up without sacrificing the ease of programming in Python by implementing partitioned global array semantics (PGAS) on top of a simple file-based messaging library (PythonMPI) in pure Python. pPython follows a SPMD (single program multiple data) model of computation. pPython runs on a single-node (e.g., a laptop) running Windows, Linux, or MacOS operating systems or on any combination of heterogeneous systems that support Python, including on a cluster through a Slurm scheduler interface so that pPython can be executed in a massively parallel computing environment. It is interesting to see what performance pPython can achieve compared to the traditional socket-based MPI communication because of its unique file-based messaging implementation. In this paper, we present the point-to-point and collective communication performances of pPython and compare them with those obtained by using mpi4py with OpenMPI. For large messages, pPython demonstrates comparable performance as compared to mpi4py.

preprint2022arXiv

Complexity and Avoidance

In this dissertation we examine the relationships between the several hierarchies, including the complexity, $\mathrm{LUA}$ (Linearly Universal Avoidance), and shift complexity hierarchies, with an eye towards quantitative bounds on growth rates therein. We show that for suitable $f$ and $p$, there are $q$ and $g$ such that $\mathrm{LUA}(q) \leq_\mathrm{s} \mathrm{COMPLEX}(f)$ and $\mathrm{COMPLEX}(g) \leq_\mathrm{s} \mathrm{LUA}(p)$, as well as quantify the growth rates of $q$ and $g$. In the opposite direction, we show that for certain sub-identical $f$ satisfying $\lim_{n \to \infty}{f(n)/n}=1$ there is a $q$ such that $\mathrm{COMPLEX}(f) \leq_\mathrm{w} \mathrm{LUA}(q)$, and for certain fast-growing $p$ there is a $g$ such that $\mathrm{LUA}(p) \leq_\mathrm{s} \mathrm{COMPLEX}(g)$, as well as quantify the growth rates of $q$ and $g$. Concerning shift complexity, explicit bounds are given on how slow-growing $q$ must be for any member of $\rm{LUA}(q)$ to compute $δ$-shift complex sequences. Motivated by the complexity hierarchy, we generalize the notion of shift complexity to consider sequences $X$ satisfying $\operatorname{KP}(τ) \geq f(|τ|) - O(1)$ for all substrings $τ$ of $X$ where $f$ is any order function. We show that for sufficiently slow-growing $f$, $f$-shift complex sequences can be uniformly computed by $g$-complex sequences, where $g$ grows slightly faster than $f$. The structure of the $\mathrm{LUA}$ hierarchy is examined using bushy tree forcing, with the main result being that for any order function $p$, there is a slow-growing order function $q$ such that $\mathrm{LUA}(p)$ and $\mathrm{LUA}(q)$ are weakly incomparable. Using this, we prove new results about the filter of the weak degrees of deep nonempty $Π^0_1$ classes and the connection between the shift complexity and $\mathrm{LUA}$ hierarchies.

preprint2022arXiv

Hypersparse Network Flow Analysis of Packets with GraphBLAS

Internet analysis is a major challenge due to the volume and rate of network traffic. In lieu of analyzing traffic as raw packets, network analysts often rely on compressed network flows (netflows) that contain the start time, stop time, source, destination, and number of packets in each direction. However, many traffic analyses benefit from temporal aggregation of multiple simultaneous netflows, which can be computationally challenging. To alleviate this concern, a novel netflow compression and resampling method has been developed leveraging GraphBLAS hyperspace traffic matrices that preserve anonymization while enabling subrange analysis. Standard multitemporal spatial analyses are then performed on each subrange to generate detailed statistical aggregates of the source packets, source fan-out, unique links, destination fan-in, and destination packets of each subrange which can then be used for background modeling and anomaly detection. A simple file format based on GraphBLAS sparse matrices is developed for storing these statistical aggregates. This method is scale tested on the MIT SuperCloud using a 50 trillion packet netflow corpus from several hundred sites collected over several months. The resulting compression achieved is significant (<0.1 bit per packet) enabling extremely large netflow analyses to be stored and transported. The single node parallel performance is analyzed in terms of both processors and threads showing that a single node can perform hundreds of simultaneous analyses at over a million packets/sec (roughly equivalent to a 10 Gigabit link).

preprint2022arXiv

Temporal Correlation of Internet Observatories and Outposts

The Internet has become a critical component of modern civilization requiring scientific exploration akin to endeavors to understand the land, sea, air, and space environments. Understanding the baseline statistical distributions of traffic are essential to the scientific understanding of the Internet. Correlating data from different Internet observatories and outposts can be a useful tool for gaining insights into these distributions. This work compares observed sources from the largest Internet telescope (the CAIDA darknet telescope) with those from a commercial outpost (the GreyNoise honeyfarm). Neither of these locations actively emit Internet traffic and provide distinct observations of unsolicited Internet traffic (primarily botnets and scanners). Newly developed GraphBLAS hyperspace matrices and D4M associative array technologies enable the efficient analysis of these data on significant scales. The CAIDA sources are well approximated by a Zipf-Mandelbrot distribution. Over a 6-month period 70\% of the brightest (highest frequency) sources in the CAIDA telescope are consistently detected by coeval observations in the GreyNoise honeyfarm. This overlap drops as the sources dim (reduce frequency) and as the time difference between the observations grows. The probability of seeing a CAIDA source is proportional to the logarithm of the brightness. The temporal correlations are well described by a modified Cauchy distribution. These observations are consistent with a correlated high frequency beam of sources that drifts on a time scale of a month.

preprint2020arXiv

AI Data Wrangling with Associative Arrays

The AI revolution is data driven. AI &#34;data wrangling&#34; is the process by which unusable data is transformed to support AI algorithm development (training) and deployment (inference). Significant time is devoted to translating diverse data representations supporting the many query and analysis steps found in an AI pipeline. Rigorous mathematical representations of these data enables data translation and analysis optimization within and across steps. Associative array algebra provides a mathematical foundation that naturally describes the tabular structures and set mathematics that are the basis of databases. Likewise, the matrix operations and corresponding inference/training calculations used by neural networks are also well described by associative arrays. More surprisingly, a general denormalized form of hierarchical formats, such as XML and JSON, can be readily constructed. Finally, pivot tables, which are among the most widely used data analysis tools, naturally emerge from associative array constructors. A common foundation in associative arrays provides interoperability guarantees, proving that their operations are linear systems with rigorous mathematical properties, such as, associativity, commutativity, and distributivity that are critical to reordering optimizations.