Researcher profile

Yingjie Shi

Yingjie Shi contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - Baseline
4works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2014arXiv

BigDataBench: a Big Data Benchmark Suite from Internet Services

As architecture, systems, and data management communities pay greater attention to innovative big data systems and architectures, the pressure of benchmarking and evaluating these systems rises. Considering the broad use of big data systems, big data benchmarks must include diversity of data and workloads. Most of the state-of-the-art big data benchmarking efforts target evaluating specific types of applications or system software stacks, and hence they are not qualified for serving the purposes mentioned above. This paper presents our joint research efforts on this issue with several industrial partners. Our big data benchmark suite BigDataBench not only covers broad application scenarios, but also includes diverse and representative data sets. BigDataBench is publicly available from http://prof.ict.ac.cn/BigDataBench . Also, we comprehensively characterize 19 big data workloads included in BigDataBench with varying data inputs. On a typical state-of-practice processor, Intel Xeon E5645, we have the following observations: First, in comparison with the traditional benchmarks: including PARSEC, HPCC, and SPECCPU, big data applications have very low operation intensity; Second, the volume of data input has non-negligible impact on micro-architecture characteristics, which may impose challenges for simulation-based big data architecture research; Last but not least, corroborating the observations in CloudSuite and DCBench (which use smaller data inputs), we find that the numbers of L1 instruction cache misses per 1000 instructions of the big data applications are higher than in the traditional benchmarks; also, we find that L3 caches are effective for the big data applications, corroborating the observation in DCBench.

preprint2013arXiv

AxPUE: Application Level Metrics for Power Usage Effectiveness in Data Centers

The rapid growth of data volume brings big challenges to the data center computing, and energy efficiency is one of the most concerned problems. Researchers from various fields are now proposing solutions to green the data center operations. Power usage effectiveness metric plays an important role in the energy saving research. However, the exising usage effectiveness metrics focus on measuring the relationship between the total facility energy consumed and the IT equipment energy consumed, without reflecting the energy efficiency of applications. In this paper, we analyze the requirements of application-level metrics for power usage efficiency of the data centers, and propose two novel energy efficiency metrics to provide strong guidance and useful insight to data center design and optimization. We conduct comprehensive experiments in the practical data centers using BigDataBench, a big data benchmark suite, and the results demonstrate the rationality and efficiency of AxPUE in measuring the actual computation energy consumption in data centers.

preprint2013arXiv

The Implications from Benchmarking Three Big Data Systems

Along with today's data explosion and application diversification, a variety of hardware platforms for big data are emerging, attracting interests from both industry and academia. The existing hardware platforms represent a wide range of implementation approaches, and different hardware platforms have different strengths. In this paper, we conduct comprehensive evaluations on three representative big data systems: Intel Xeon, Atom (low power processors), and many-core Tilera using BigDataBench - a big data benchmark suite. Then we explore the relative performance of the three implementation approaches by running BigDataBench, and provide strong guidance for the big data systems construction. Through our experiments, we have inferred that a big data system based on specific hardware has different performance in the context of different applications and data volumes. When we construct a system, we should take into account not only the performance or energy consumption of the pure hardware, but also the characteristics of applications running on them. Data scale, application type and complexity should be considered comprehensively when researchers or architects plan to choose fundamental components for their big data systems.

preprint2013arXiv

The Implications of Diverse Applications and Scalable Data Sets in Benchmarking Big Data Systems

Now we live in an era of big data, and big data applications are becoming more and more pervasive. How to benchmark data center computer systems running big data applications (in short big data systems) is a hot topic. In this paper, we focus on measuring the performance impacts of diverse applications and scalable volumes of data sets on big data systems. For four typical data analysis applications---an important class of big data applications, we find two major results through experiments: first, the data scale has a significant impact on the performance of big data systems, so we must provide scalable volumes of data sets in big data benchmarks. Second, for the four applications, even all of them use the simple algorithms, the performance trends are different with increasing data scales, and hence we must consider not only variety of data sets but also variety of applications in benchmarking big data systems.