Source author record

Fei Tang

Fei Tang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

cond-mat.str-el Performance Machine Learning Artificial Intelligence Computation and Language Computer Vision Databases Distributed, Parallel, and Cluster Computing

Catalog footprint

What is connected

11works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Safe, or Simply Incapable? Rethinking Safety Evaluation for Phone-Use Agents

When a phone-use agent avoids harm, does that show safety, or simply inability to act? Existing evaluations often cannot tell. A harmful outcome may be avoided because the agent recognized the risk and chose the safe action, or because it failed to understand the screen or execute any relevant action at all. These cases have different causes and call for different fixes, yet current benchmarks often merge them under task success, refusal, or final harmful outcome. We address this problem with PhoneSafety, a benchmark of 700 safety-critical moments drawn from real phone interactions across more than 130 apps. Each instance isolates the next decision at a risky moment and asks a simple question: does the model take the safe action, take the unsafe action, or fail to do anything useful? We evaluate eight representative phone-use agents under this framework. Our results reveal two main patterns. First, stronger general phone-use ability does not reliably imply safer choices at risky moments. Models that perform better on ordinary app tasks are not always the ones that behave more safely when the next action matters. Second, failures to do anything useful behave like a capability signal rather than a safety signal: they are concentrated in more visually and operationally demanding settings and remain stable when the evaluation protocol changes. Across models, failures split into two recurring patterns: unsafe choices in settings where the model can act but chooses wrongly, and inability to act in more visually and operationally demanding screens. Overall, a harmless outcome is not enough to count as evidence of safety. Evaluating phone-use agents requires separating unsafe judgment from inability to act.

preprint2022arXiv

OLxPBench: Real-time, Semantically Consistent, and Domain-specific are Essential in Benchmarking, Designing, and Implementing HTAP Systems

As real-time analysis of the new data become increasingly compelling, more organizations deploy Hybrid Transactional/Analytical Processing (HTAP) systems to support real-time queries on data recently generated by online transaction processing. This paper argues that real-time queries, semantically consistent schema, and domain-specific workloads are essential in benchmarking, designing, and implementing HTAP systems. However, most state-of-the-art and state-of-the-practice benchmarks ignore those critical factors. Hence, they are incommensurable and, at worst, misleading in benchmarking, designing, and implementing HTAP systems. This paper presents OLxPBench, a composite HTAP benchmark suite. OLxPBench proposes: (1) the abstraction of a hybrid transaction, performing a real-time query in-between an online transaction, to model widely-observed behavior pattern -- making a quick decision while consulting real-time analysis; (2) a semantically consistent schema to express the relationships between OLTP and OLAP schema; (3) the combination of domain-specific and general benchmarks to characterize diverse application scenarios with varying resource demands. Our evaluations justify the three design decisions of OLxPBench and pinpoint the bottlenecks of two mainstream distributed HTAP DBMSs. International Open Benchmark Council (BenchCouncil) sets up the OLxPBench homepage at https://www.benchcouncil.org/olxpbench/. Its source code is available from https://github.com/BenchCouncil/olxpbench.git.

preprint2022arXiv

Traffic4cast at NeurIPS 2021 -- Temporal and Spatial Few-Shot Transfer Learning in Gridded Geo-Spatial Processes

The IARAI Traffic4cast competitions at NeurIPS 2019 and 2020 showed that neural networks can successfully predict future traffic conditions 1 hour into the future on simply aggregated GPS probe data in time and space bins. We thus reinterpreted the challenge of forecasting traffic conditions as a movie completion task. U-Nets proved to be the winning architecture, demonstrating an ability to extract relevant features in this complex real-world geo-spatial process. Building on the previous competitions, Traffic4cast 2021 now focuses on the question of model robustness and generalizability across time and space. Moving from one city to an entirely different city, or moving from pre-COVID times to times after COVID hit the world thus introduces a clear domain shift. We thus, for the first time, release data featuring such domain shifts. The competition now covers ten cities over 2 years, providing data compiled from over 10^12 GPS probe data. Winning solutions captured traffic dynamics sufficiently well to even cope with these complex domain shifts. Surprisingly, this seemed to require only the previous 1h traffic dynamic history and static road graph as input.

preprint2021arXiv

HPC AI500: Representative, Repeatable and Simple HPC AI Benchmarking

Recent years witness a trend of applying large-scale distributed deep learning algorithms (HPC AI) in both business and scientific computing areas, whose goal is to speed up the training time to achieve a state-of-the-art quality. The HPC AI benchmarks accelerate the process. Unfortunately, benchmarking HPC AI systems at scale raises serious challenges. This paper presents a representative, repeatable and simple HPC AI benchmarking methodology. Among the seventeen AI workloads of AIBench Training -- by far the most comprehensive AI Training benchmarks suite -- we choose two representative and repeatable AI workloads. The selected HPC AI benchmarks include both business and scientific computing: Image Classification and Extreme Weather Analytics. To rank HPC AI systems, we present a new metric named Valid FLOPS, emphasizing both throughput performance and a target quality. The specification, source code, datasets, and HPC AI500 ranking numbers are publicly available from \url{https://www.benchcouncil.org/HPCAI500/}.

preprint2020arXiv

AIBench: An Agile Domain-specific Benchmarking Methodology and an AI Benchmark Suite

Domain-specific software and hardware co-design is encouraging as it is much easier to achieve efficiency for fewer tasks. Agile domain-specific benchmarking speeds up the process as it provides not only relevant design inputs but also relevant metrics, and tools. Unfortunately, modern workloads like Big data, AI, and Internet services dwarf the traditional one in terms of code size, deployment scale, and execution path, and hence raise serious benchmarking challenges. This paper proposes an agile domain-specific benchmarking methodology. Together with seventeen industry partners, we identify ten important end-to-end application scenarios, among which sixteen representative AI tasks are distilled as the AI component benchmarks. We propose the permutations of essential AI and non-AI component benchmarks as end-to-end benchmarks. An end-to-end benchmark is a distillation of the essential attributes of an industry-scale application. We design and implement a highly extensible, configurable, and flexible benchmark framework, on the basis of which, we propose the guideline for building end-to-end benchmarks, and present the first end-to-end Internet service AI benchmark. The preliminary evaluation shows the value of our benchmark suite---AIBench against MLPerf and TailBench for hardware and software designers, micro-architectural researchers, and code developers. The specifications, source code, testbed, and results are publicly available from the web site \url{http://www.benchcouncil.org/AIBench/index.html}.

preprint2020arXiv

HPC AI500: The Methodology, Tools, Roofline Performance Models, and Metrics for Benchmarking HPC AI Systems

The recent years witness a trend of applying large-scale distributed deep learning in both business and scientific computing areas, whose goal is to speed up the training time to achieve a state-of-the-art quality. The HPC community feels a great interest in building the HPC AI systems that are dedicated to running those workloads. The HPC AI benchmarks accelerate the process. Unfortunately, benchmarking HPC AI systems at scale raises serious challenges. None of previous HPC AI benchmarks achieve the goal of being equivalent, relevant, representative, affordable, and repeatable. This paper presents a comprehensive methodology, tools, Roofline performance models, and innovative metrics for benchmarking, optimizing, and ranking HPC AI systems, which we call HPC AI500 V2.0. We abstract the HPC AI system into nine independent layers, and present explicit benchmarking rules and procedures to assure equivalence of each layer, repeatability, and replicability. On the basis of AIBench -- by far the most comprehensive AI benchmarks suite, we present and build two HPC AI benchmarks from both business and scientific computing: Image Classification, and Extreme Weather Analytics, achieving both representativeness and affordability. To rank the performance and energy-efficiency of HPC AI systems, we propose Valid FLOPS, and Valid FLOPS per watt, which impose a penalty on failing to achieve the target quality. We propose using convolution and GEMM -- the two most intensively-used kernel functions to measure the upper bound performance of the HPC AI systems, and present HPC AI roofline models for guiding performance optimizations. The evaluations show our methodology, benchmarks, performance models, and metrics can measure, optimize, and rank the HPC AI systems in a scalable, simple, and affordable way. HPC AI500 V2.0 are publicly available from http://www.benchcouncil.org/benchhub/hpc-ai500-benchmark.

preprint2016arXiv

AccuracyTrader: Accuracy-aware Approximate Processing for Low Tail Latency and High Result Accuracy in Cloud Online Services

Modern latency-critical online services such as search engines often process requests by consulting large input data spanning massive parallel components. Hence the tail latency of these components determines the service latency. To trade off result accuracy for tail latency reduction, existing techniques use the components responding before a specified deadline to produce approximate results. However, they may skip a large proportion of components when load gets heavier, thus incurring large accuracy losses. This paper presents AccuracyTrader that produces approximate results with small accuracy losses while maintaining low tail latency. AccuracyTrader aggregates information of input data on each component to create a small synopsis, thus enabling all components producing initial results quickly using their synopses. AccuracyTrader also uses synopses to identify the parts of input data most related to arbitrary requests' result accuracy, thus first using these parts to improve the produced results in order to minimize accuracy losses. We evaluated AccuracyTrader using workloads in real services. The results show: (i) AccuracyTrader reduces tail latency by over 40 times with accuracy losses of less than 7% compared to existing exact processing techniques; (ii) when using the same latency, AccuracyTrader reduces accuracy losses by over 13 times comparing to existing approximate processing techniques.

preprint2015arXiv

Analytical and numerical studies of the one-dimensional sawtooth chain

By using the analytical coupled cluster method, the numerical exact diagonalization method, and the numerical density matrix renormalization group method, we investigated the properties of the one-dimensional sawtooth chain. The results of the coupled cluster method based on Neel state demonstrate that the ground state is in the quasi-Neel-long-range order state when a<ac1. The translational symmetry of the ground state varies and the ground state evolves from the quasi-Neel-long-range order state to the dimerized state at the critical point ac1. The dimerized state is stable in the intermediate parameter regime and vanishes at another critical point ac2. The results drawn from the exact diagonalization show that the precise critical point ac1 and ac2 can be determined by using the spin stiffness fidelity susceptibility and spin gap separately. We compared the results obtained by using the coupled cluster method based on canted state with those obtained based on spiral state, and found that the ground state of the sawtooth chain is in the quasi-canted state if a>ac2. The results of the coupled cluster method and the density matrix renormalization group method both disclose that the type of the quantum phase transition occurring at ac2 belongs to the first-order transition.

preprint2015arXiv

Coupled cluster treatment of one quasi-one-dimensional coupled spin triangles

By using the coupled cluster method (CCM) and the numerical exact diagonalization (ED) method, we investigated the properties of the one quasi-one-dimensional coupled spin triangles. The results of ED disclose that the system is in the exact tetramer-dimer (TD) state if a<ac1 . We prove that the exact TD state can be formed by using CCM in that parameter region. It is more interesting to find that the exact spin gap in the TD state can also be obtained via CCM. At the critical point ac1, a first order transition occurs and the ground state of the system evolves from the TD state to the ferromagnetic long range order state. In the parameter region a>ac1, we find that CCM results for some physical quantities, such as the ground state energy, the magnetization and the antiferromagnetic gap are in excellent agreement with the results obtained by ED. Thus, CCM can be used to accurately analyze the properties of the quasi-one-dimensional coupled spin triangles in the whole parameter region. We believe that it can also be applied to investigating the properties of other quasi-one spin systems reliably.

preprint2015arXiv

Coupled Cluster Treatment of the Alternating Bond Diamond Chain

By the analytical coupled cluster method (CCM), we study both the ground state and lowest-lying excited-state properties of the alternating bond diamond chain. The numerical exact diagonalization (ED) method is also applied to the chain to verify the accuracy of CCM results. The ED results show that the ground-state phase diagram contains two exact spin cluster solid ground states, namely, the tetramer-dimer (TD) state and dimer state, and the ferrimagnetic long-range-ordered state. We prove that the two exact spin cluster solid ground states can both be formed by CCM. Moreover, the exact spin gap in the TD state can be obtained by CCM. In the ferrimagnetic region, we find that the CCM results for some physical quantities, such as the ground-state energy, the sublattice magnetizations, and the antiferromagnetic gap, are comparable to the results obtained by numerical methods. The critical line dividing the TD state from the ferrimagnetic state is also given by CCM and is in perfect agreement with that determined by the ED method.

preprint2014arXiv

Frustration induced noncollinear magnetic order phase in one-dimensional Heisenberg chain with alternating antiferromagnetic and ferromagnetic next nearest neighbor interactions

By using the coupled cluster method, the numerical exact diagonalization method, and the numerical density matrix renormalization group method, we investigated the properties of the one-dimensional Heisenberg chain with alternating antiferromagnetic and ferromagnetic next nearest neighbor interactions. In the classical limit, the ground state is in the collinear Neel state if a<1/2, while for a>1/2, there is an noncollinear canted state. For the quantum case, we found that, although the classical Neel state is absent, the canted state exists if the frustration parameter a exceeds a critical point ac1. The precise critical point ac1 can be determined by using the coupled cluster method and the numerical exact diagonalization method separately. The results of the coupled cluster method and the exact diagonalization method both disclose that the type of phase transition occurring at ac1 changes from a classical second-order transition to a quantum first-order transition due to quantum fluctuation. Although there is another critical point ac2 in a finite system at which the ground state evolves from the canted state to the collinear Neel plus ferromagnetic state, that state is absent because ac2 tends to infinity in the thermodynamic limit.

Fei Tang

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

Safe, or Simply Incapable? Rethinking Safety Evaluation for Phone-Use Agents

OLxPBench: Real-time, Semantically Consistent, and Domain-specific are Essential in Benchmarking, Designing, and Implementing HTAP Systems

Traffic4cast at NeurIPS 2021 -- Temporal and Spatial Few-Shot Transfer Learning in Gridded Geo-Spatial Processes

HPC AI500: Representative, Repeatable and Simple HPC AI Benchmarking

AIBench: An Agile Domain-specific Benchmarking Methodology and an AI Benchmark Suite

HPC AI500: The Methodology, Tools, Roofline Performance Models, and Metrics for Benchmarking HPC AI Systems

AccuracyTrader: Accuracy-aware Approximate Processing for Low Tail Latency and High Result Accuracy in Cloud Online Services

Analytical and numerical studies of the one-dimensional sawtooth chain

Coupled cluster treatment of one quasi-one-dimensional coupled spin triangles

Coupled Cluster Treatment of the Alternating Bond Diamond Chain

Frustration induced noncollinear magnetic order phase in one-dimensional Heisenberg chain with alternating antiferromagnetic and ferromagnetic next nearest neighbor interactions