Source author record

Danny Hendler

Danny Hendler appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Distributed, Parallel, and Cluster Computing Machine Learning Social and Information Networks Databases physics.soc-ph

Catalog footprint

What is connected

8works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Machine-Learning Based Objective Function Selection for Community Detection

NECTAR, a Node-centric ovErlapping Community deTection AlgoRithm, presented in 2016 by Cohen et. al, chooses dynamically between two objective functions which function to optimize, based on the network on which it is invoked. This approach, as shown by Cohen et al., outperforms six state-of-the-art algorithms for overlapping community detection. In this work, we present NECTAR-ML, an extension of the NECTAR algorithm that uses a machine-learning based model for automating the selection of the objective function, trained and evaluated on a dataset of 15,755 synthetic and 7 real-world networks. Our analysis shows that in approximately 90% of the cases our model was able to successfully select the correct objective function. We conducted a competitive analysis of NECTAR and NECTAR-ML. NECTAR-ML was shown to significantly outperform NECTAR's ability to select the best objective function. We also conducted a competitive analysis of NECTAR-ML and two additional state-of-the-art multi-objective community detection algorithms. NECTAR-ML outperformed both algorithms in terms of average detection quality. Multiobjective EAs (MOEAs) are considered to be the most popular approach to solve MOP and the fact that NECTAR-ML significantly outperforms them demonstrates the effectiveness of ML-based objective function selection.

preprint2022arXiv

Recovery of Distributed Iterative Solvers for Linear Systems Using Non-Volatile RAM

HPC systems are a critical resource for scientific research. The increased demand for computational power and memory ushers in the exascale era, in which supercomputers are designed to provide enormous computing power to meet these needs. These complex supercomputers consist of numerous compute nodes and are consequently expected to experience frequent faults and crashes. Mathematical solvers, in particular, iterative linear solvers are key building block in numerous large-scale scientific applications. Consequently, supporting the recovery of distributed solvers is necessary for scaling scientific applications to exascale platforms. Previous recovery methods for iterative solvers are based on Checkpoint-Restart (CR), which incurs high fault tolerance overhead, or intrinsic fault tolerance, which require extra computation time to converge after failures. Exact state reconstruction (ESR) was proposed as an alternative mechanism to alleviate the impact of frequent failures on long-term computations. ESR has been shown to provide exact reconstruction of the computation state while avoiding the need for costly checkpointing. However, ESR currently relies on volatile memory for fault tolerance, and must therefore maintain redundancies in the RAM of multiple nodes, incurring high memory and network overheads. Recent supercomputer designs feature emerging non-volatile RAM (NVRAM) technology. This paper investigates how NVRAM can be utilized to devise an enhanced ESR-based recovery mechanism that is more efficient and provides full resilience. Our mechanism, called in-NVRAM ESR, is based on a novel MPI One-Sided Communication (OSC) over RDMA implementation, and provides full resiliency while significantly reducing both the memory footprint and the time overhead in comparison with the original ESR design (in-RAM ESR).

preprint2020arXiv

Upper and Lower Bounds on the Space Complexity of Detectable Object

The emergence of systems with non-volatile main memory (NVM) increases the interest in the design of \emph{recoverable concurrent objects} that are robust to crash-failures, since their operations are able to recover from such failures by using state retained in NVM. Of particular interest are recoverable algorithms that, in addition to ensuring object consistency, also provide \emph{detectability}, a correctness condition requiring that the recovery code can infer if the failed operation was linearized or not and, in the former case, obtain its response. In this work, we investigate the space complexity of detectable algorithms and the external support they require. We make the following three contributions. First, we present the first wait-free bounded-space detectable read/write and CAS object implementations. Second, we prove that the bit complexity of every $N$-process obstruction-free detectable CAS implementation, assuming values from a domain of size at least $N$, is $Ω(N)$. Finally, we prove that the following holds for obstruction-free detectable implementations of a large class of objects: their recoverable operations must be provided with \emph{auxiliary state} -- state that is not required by the non-recoverable counterpart implementation -- whose value must be provided from outside the operation, either by the system or by the caller of the operation. In contrast, this external support is, in general, not required if the recoverable algorithm is not detectable.

preprint2016arXiv

Node-Centric Detection of Overlapping Communities in Social Networks

We present NECTAR, a community detection algorithm that generalizes Louvain method's local search heuristic for overlapping community structures. NECTAR chooses dynamically which objective function to optimize based on the network on which it is invoked. Our experimental evaluation on both synthetic benchmark graphs and real-world networks, based on ground-truth communities, shows that NECTAR provides excellent results as compared with state of the art community detection algorithms.

preprint2013arXiv

Exploiting Locality in Lease-Based Replicated Transactional Memory via Task Migration

We present Lilac-TM, the first locality-aware Distributed Software Transactional Memory (DSTM) implementation. Lilac-TM is a fully decentralized lease-based replicated DSTM. It employs a novel self- optimizing lease circulation scheme based on the idea of dynamically determining whether to migrate transactions to the nodes that own the leases required for their validation, or to demand the acquisition of these leases by the node that originated the transaction. Our experimental evaluation establishes that Lilac-TM provides significant performance gains for distributed workloads exhibiting data locality, while typically incurring no overhead for non-data local workloads.

preprint2013arXiv

Lightweight Contention Management for Efficient Compare-and-Swap Operations

Many concurrent data-structure implementations use the well-known compare-and-swap (CAS) operation, supported in hardware by most modern multiprocessor architectures for inter-thread synchronization. A key weakness of the CAS operation is the degradation in its performance in the presence of memory contention. In this work we study the following question: can software-based contention management improve the efficiency of hardware-provided CAS operations? Our performance evaluation establishes that lightweight contention management support can greatly improve performance under medium and high contention levels while typically incurring only small overhead when contention is low.

preprint2012arXiv

Fast Randomized Model Generation for Shapelet-Based Time Series Classification

Time series classification is a field which has drawn much attention over the past decade. A new approach for classification of time series uses classification trees based on shapelets. A shapelet is a subsequence extracted from one of the time series in the dataset. A disadvantage of this approach is the time required for building the shapelet-based classification tree. The search for the best shapelet requires examining all subsequences of all lengths from all time series in the training set. A key goal of this work was to find an evaluation order of the shapelets space which enables fast convergence to an accurate model. The comparative analysis we conducted clearly indicates that a random evaluation order yields the best results. Our empirical analysis of the distribution of high-quality shapelets within the shapelets space provides insights into why randomized shapelets sampling is superior to alternative evaluation orders. We present an algorithm for randomized model generation for shapelet-based classification that converges extremely quickly to a model with surprisingly high accuracy after evaluating only an exceedingly small fraction of the shapelets space.

preprint2011arXiv

A Dynamic Elimination-Combining Stack Algorithm

Two key synchronization paradigms for the construction of scalable concurrent data-structures are software combining and elimination. Elimination-based concurrent data-structures allow operations with reverse semantics (such as push and pop stack operations) to "collide" and exchange values without having to access a central location. Software combining, on the other hand, is effective when colliding operations have identical semantics: when a pair of threads performing operations with identical semantics collide, the task of performing the combined set of operations is delegated to one of the threads and the other thread waits for its operation(s) to be performed. Applying this mechanism iteratively can reduce memory contention and increase throughput. The most highly scalable prior concurrent stack algorithm is the elimination-backoff stack. The elimination-backoff stack provides high parallelism for symmetric workloads in which the numbers of push and pop operations are roughly equal, but its performance deteriorates when workloads are asymmetric. We present DECS, a novel Dynamic Elimination-Combining Stack algorithm, that scales well for all workload types. While maintaining the simplicity and low-overhead of the elimination-bakcoff stack, DECS manages to benefit from collisions of both identical- and reverse-semantics operations. Our empirical evaluation shows that DECS scales significantly better than both blocking and non-blocking best prior stack algorithms.