Source author record

Tien Tuan Anh Dinh

Tien Tuan Anh Dinh appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Databases Distributed, Parallel, and Cluster Computing Cryptography and Security Machine Learning Performance Artificial Intelligence Systems and Control

Catalog footprint

What is connected

9works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Blockchain Goes Green? Part II: Characterizing the Performance and Cost of Blockchains on the Cloud and at the Edge

While state-of-the-art permissioned blockchains can achieve thousands of transactions per second on commodity hardware with x86/64 architecture, their performance when running on different architectures is not clear. The goal of this work is to characterize the performance and cost of permissioned blockchains on different hardware systems, which is important as diverse application domains are adopting t. To this end, we conduct extensive cost and performance evaluation of two permissioned blockchains, namely Hyperledger Fabric and ConsenSys Quorum, on five different types of hardware covering both x86/64 and ARM architecture, as well as, both cloud and edge computing. The hardware nodes include servers with Intel Xeon CPU, servers with ARM-based Amazon Graviton CPU, and edge devices with ARM-based CPU. Our results reveal a diverse profile of the two blockchains across different settings, demonstrating the impact of hardware choices on the overall performance and cost. We find that Graviton servers outperform Xeon servers in many settings, due to their powerful CPU and high memory bandwidth. Edge devices with ARM architecture, on the other hand, exhibit low performance. When comparing the cloud with the edge, we show that the cost of the latter is much smaller in the long run if manpower cost is not considered.

preprint2022arXiv

Securing Smart Grids Through an Incentive Mechanism for Blockchain-Based Data Sharing

Smart grids leverage the data collected from smart meters to make important operational decisions. However, they are vulnerable to False Data Injection (FDI) attacks in which an attacker manipulates meter data to disrupt the grid operations. Existing works on FDI are based on a simple threat model in which a single grid operator has access to all the data, and only some meters can be compromised. Our goal is to secure smart grids against FDI under a realistic threat model. To this end, we present a threat model in which there are multiple operators, each with a partial view of the grid, and each can be fully compromised. An effective defense against FDI in this setting is to share data between the operators. However, the main challenge here is to incentivize data sharing. We address this by proposing an incentive mechanism that rewards operators for uploading data, but penalizes them if the data is missing or anomalous. We derive formal conditions under which our incentive mechanism is provably secure against operators who withhold or distort measurement data for profit. We then implement the data sharing solution on a private blockchain, introducing several optimizations that overcome the inherent performance limitations of the blockchain. Finally, we conduct an experimental evaluation that demonstrates that our implementation has practical performance.

preprint2022arXiv

Serverless Data Science -- Are We There Yet? A Case Study of Model Serving

Machine learning (ML) is an important part of modern data science applications. Data scientists today have to manage the end-to-end ML life cycle that includes both model training and model serving, the latter of which is essential, as it makes their works available to end-users. Systems of model serving require high performance, low cost, and ease of management. Cloud providers are already offering model serving choices, including managed services and self-rented servers. Recently, serverless computing, whose advantages include high elasticity and a fine-grained cost model, brings another option for model serving. Our goal in this paper is to examine the viability of serverless as a mainstream model serving platform. To this end, we first conduct a comprehensive evaluation of the performance and cost of serverless against other model serving systems on Amazon Web Service and Google Cloud Platform. We find that serverless outperforms many cloud-based alternatives. Further, there are settings under which it even achieves better performance than GPU-based systems. Next, we present the design space of serverless model serving, which comprises multiple dimensions, including cloud platforms, serving runtimes, and other function-specific parameters. For each dimension, we analyze the impact of different choices and provide suggestions for data scientists to better utilize serverless model serving. Finally, we discuss challenges and opportunities in building a more practical serverless model serving system.

preprint2021arXiv

Blockchains vs. Distributed Databases: Dichotomy and Fusion

Blockchain has come a long way: a system that was initially proposed specifically for cryptocurrencies is now being adapted and adopted as a general-purpose transactional system. As blockchain evolves into another data management system, the natural question is how it compares against distributed database systems. Existing works on this comparison focus on high-level properties, such as security and throughput. They stop short of showing how the underlying design choices contribute to the overall differences. Our work fills this important gap and provides a principled framework for analyzing the emerging trend of blockchain-database fusion. We perform a twin study of blockchains and distributed database systems as two types of transactional systems. We propose a taxonomy that illustrates the dichotomy across four dimensions, namely replication, concurrency, storage, and sharding. Within each dimension, we discuss how the design choices are driven by two goals: security for blockchains, and performance for distributed databases. To expose the impact of different design choices on the overall performance, we conduct an in-depth performance analysis of two blockchains, namely Quorum and Hyperledger Fabric, and two distributed databases, namely TiDB, and etcd. Lastly, we propose a framework for back-of-the-envelope performance forecast of blockchain-database hybrids.

preprint2020arXiv

ForkBase: Immutable, Tamper-evident Storage Substrate for Branchable Applications

Data collaboration activities typically require systematic or protocol-based coordination to be scalable. Git, an effective enabler for collaborative coding, has been attested for its success in countless projects around the world. Hence, applying the Git philosophy to general data collaboration beyond coding is motivating. We call it Git for data. However, the original Git design handles data at the file granule, which is considered too coarse-grained for many database applications. We argue that Git for data should be co-designed with database systems. To this end, we developed ForkBase to make Git for data practical. ForkBase is a distributed, immutable storage system designed for data version management and data collaborative operation. In this demonstration, we show how ForkBase can greatly facilitate collaborative data management and how its novel data deduplication technique can improve storage efficiency for archiving massive data versions.

preprint2020arXiv

On Exploiting Transaction Concurrency To Speed Up Blockchains

Consensus protocols are currently the bottlenecks that prevent blockchain systems from scaling. However, we argue that transaction execution is also important to the performance and security of blockchains. In other words, there are ample opportunities to speed up and further secure blockchains by reducing the cost of transaction execution. Our goal is to understand how much we can speed up blockchains by exploiting transaction concurrency available in blockchain workloads. To this end, we first analyze historical data of seven major public blockchains, namely Bitcoin, Bitcoin Cash, Litecoin, Dogecoin, Ethereum, Ethereum Classic, and Zilliqa. We consider two metrics for concurrency, namely the single-transaction conflict rate per block, and the group conflict rate per block. We find that there is more concurrency in UTXO-based blockchains than in account-based ones, although the amount of concurrency in the former is lower than expected. Another interesting finding is that some blockchains with larger blocks have more concurrency than blockchains with smaller blocks. Next, we propose an analytical model for estimating the transaction execution speed-up given an amount of concurrency. Using results from our empirical analysis, the model estimates that 6x speed-ups in Ethereum can be achieved if all available concurrency is exploited.

preprint2016arXiv

Deep Learning At Scale and At Ease

Recently, deep learning techniques have enjoyed success in various multimedia applications, such as image classification and multi-modal data analysis. Large deep learning models are developed for learning rich representations of complex data. There are two challenges to overcome before deep learning can be widely adopted in multimedia and other applications. One is usability, namely the implementation of different models and training algorithms must be done by non-experts without much effort especially when the model is large and complex. The other is scalability, that is the deep learning system must be able to provision for a huge demand of computing resources for training large models with massive datasets. To address these two challenges, in this paper, we design a distributed deep learning platform called SINGA which has an intuitive programming model based on the common layer abstraction of deep learning models. Good scalability is achieved through flexible distributed training architecture and specific optimization techniques. SINGA runs on GPUs as well as on CPUs, and we show that it outperforms many other state-of-the-art deep learning systems. Our experience with developing and training deep learning models for real-life multimedia applications in SINGA shows that the platform is both usable and scalable.

preprint2013arXiv

Streamforce: outsourcing access control enforcement for stream data to the clouds

As tremendous amount of data being generated everyday from human activity and from devices equipped with sensing capabilities, cloud computing emerges as a scalable and cost-effective platform to store and manage the data. While benefits of cloud computing are numerous, security concerns arising when data and computation are outsourced to a third party still hinder the complete movement to the cloud. In this paper, we focus on the problem of data privacy on the cloud, particularly on access controls over stream data. The nature of stream data and the complexity of sharing data make access control a more challenging issue than in traditional archival databases. We present Streamforce - a system allowing data owners to securely outsource their data to the cloud. The owner specifies fine-grained policies which are enforced by the cloud. The latter performs most of the heavy computations, while learning nothing about the data. To this end, we employ a number of encryption schemes, including deterministic encryption, proxy-based attribute based encryption and sliding-window encryption. In Streamforce, access control policies are modeled as secure continuous queries, which entails minimal changes to existing stream processing engines, and allows for easy expression of a wide-range of policies. In particular, Streamforce comes with a number of secure query operators including Map, Filter, Join and Aggregate. Finally, we implement Streamforce over an open source stream processing engine (Esper) and evaluate its performance on a cloud platform. The results demonstrate practical performance for many real-world applications, and although the security overhead is visible, Streamforce is highly scalable.

preprint2012arXiv

Stream on the Sky: Outsourcing Access Control Enforcement for Stream Data to the Cloud

There is an increasing trend for businesses to migrate their systems towards the cloud. Security concerns that arise when outsourcing data and computation to the cloud include data confidentiality and privacy. Given that a tremendous amount of data is being generated everyday from plethora of devices equipped with sensing capabilities, we focus on the problem of access controls over live streams of data based on triggers or sliding windows, which is a distinct and more challenging problem than access control over archival data. Specifically, we investigate secure mechanisms for outsourcing access control enforcement for stream data to the cloud. We devise a system that allows data owners to specify fine-grained policies associated with their data streams, then to encrypt the streams and relay them to the cloud for live processing and storage for future use. The access control policies are enforced by the cloud, without the latter learning about the data, while ensuring that unauthorized access is not feasible. To realize these ends, we employ a novel cryptographic primitive, namely proxy-based attribute-based encryption, which not only provides security but also allows the cloud to perform expensive computations on behalf of the users. Our approach is holistic, in that these controls are integrated with an XML based framework (XACML) for high-level management of policies. Experiments with our prototype demonstrate the feasibility of such mechanisms, and early evaluations suggest graceful scalability with increasing numbers of policies, data streams and users.

Tien Tuan Anh Dinh

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

Blockchain Goes Green? Part II: Characterizing the Performance and Cost of Blockchains on the Cloud and at the Edge

Securing Smart Grids Through an Incentive Mechanism for Blockchain-Based Data Sharing

Serverless Data Science -- Are We There Yet? A Case Study of Model Serving

Blockchains vs. Distributed Databases: Dichotomy and Fusion

ForkBase: Immutable, Tamper-evident Storage Substrate for Branchable Applications

On Exploiting Transaction Concurrency To Speed Up Blockchains

Deep Learning At Scale and At Ease

Streamforce: outsourcing access control enforcement for stream data to the clouds

Stream on the Sky: Outsourcing Access Control Enforcement for Stream Data to the Cloud