Source author record

Fa Zhang

Fa Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Distributed, Parallel, and Cluster Computing Networking and Internet Architecture Computer Vision Data Structures and Algorithms Machine Learning Computation eess.IV physics.chem-ph

Catalog footprint

What is connected

15works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

FocalOrder: Focal Preference Optimization for Reading Order Detection

Reading order detection is the foundation of document understanding. Most existing methods rely on uniform supervision, implicitly assuming a constant difficulty distribution across layout regions. In this work, we challenge this assumption by revealing a critical flaw: \textbf{Positional Disparity}, a phenomenon where models demonstrate mastery over the deterministic start and end regions but suffer a performance collapse in the complex intermediate sections. This degradation arises because standard training allows the massive volume of easy patterns to drown out the learning signals from difficult layouts. To address this, we propose \textbf{FocalOrder}, a framework driven by \textbf{Focal Preference Optimization (FPO)}. Specifically, FocalOrder employs adaptive difficulty discovery with exponential moving average mechanism to dynamically pinpoint hard-to-learn transitions, while introducing a difficulty-calibrated pairwise ranking objective to enforce global logical consistency. Extensive experiments demonstrate that FocalOrder establishes new state-of-the-art results on OmniDocBench v1.0 and Comp-HRDoc. Our compact model not only outperforms competitive specialized baselines but also significantly surpasses large-scale general VLMs. These results demonstrate that aligning the optimization with intrinsic structural ambiguity of documents is critical for mastering complex document structures.

preprint2026arXiv

MFAI: A Scalable Bayesian Matrix Factorization Approach to Leveraging Auxiliary Information

In various practical situations, matrix factorization methods suffer from poor data quality, such as high data sparsity and low signal-to-noise ratio (SNR). Here, we consider a matrix factorization problem by utilizing auxiliary information, which is massively available in real-world applications, to overcome the challenges caused by poor data quality. Unlike existing methods that mainly rely on simple linear models to combine auxiliary information with the main data matrix, we propose to integrate gradient boosted trees in the probabilistic matrix factorization framework to effectively leverage auxiliary information (MFAI). Thus, MFAI naturally inherits several salient features of gradient boosted trees, such as the capability of flexibly modeling nonlinear relationships and robustness to irrelevant features and missing values in auxiliary information. The parameters in MFAI can be automatically determined under the empirical Bayes framework, making it adaptive to the utilization of auxiliary information and immune to overfitting. Moreover, MFAI is computationally efficient and scalable to large datasets by exploiting variational inference. We demonstrate the advantages of MFAI through comprehensive numerical results from simulation studies and real data analyses. Our approach is implemented in the R package mfair available at https://github.com/YangLabHKUST/mfair.

preprint2026arXiv

PARL: Position-Aware Relation Learning Network for Document Layout Analysis

Document layout analysis aims to detect and categorize structural elements (e.g., titles, tables, figures) in scanned or digital documents. Popular methods often rely on high-quality Optical Character Recognition (OCR) to merge visual features with extracted text. This dependency introduces two major drawbacks: propagation of text recognition errors and substantial computational overhead, limiting the robustness and practical applicability of multimodal approaches. In contrast to the prevailing multimodal trend, we argue that effective layout analysis depends not on text-visual fusion, but on a deep understanding of documents' intrinsic visual structure. To this end, we propose PARL (Position-Aware Relation Learning Network), a novel OCR-free, vision-only framework that models layout through positional sensitivity and relational structure. Specifically, we first introduce a Bidirectional Spatial Position-Guided Deformable Attention module to embed explicit positional dependencies among layout elements directly into visual features. Second, we design a Graph Refinement Classifier (GRC) to refine predictions by modeling contextual relationships through a dynamically constructed layout graph. Extensive experiments show PARL achieves state-of-the-art results. It establishes a new benchmark for vision-only methods on DocLayNet and, notably, surpasses even strong multimodal models on M6Doc. Crucially, PARL (65M) is highly efficient, using roughly four times fewer parameters than large multimodal models (256M), demonstrating that sophisticated visual structure modeling can be both more efficient and robust than multimodal fusion.

preprint2022arXiv

SHREC 2021: Classification in cryo-electron tomograms

Cryo-electron tomography (cryo-ET) is an imaging technique that allows three-dimensional visualization of macro-molecular assemblies under near-native conditions. Cryo-ET comes with a number of challenges, mainly low signal-to-noise and inability to obtain images from all angles. Computational methods are key to analyze cryo-electron tomograms. To promote innovation in computational methods, we generate a novel simulated dataset to benchmark different methods of localization and classification of biological macromolecules in tomograms. Our publicly available dataset contains ten tomographic reconstructions of simulated cell-like volumes. Each volume contains twelve different types of complexes, varying in size, function and structure. In this paper, we have evaluated seven different methods of finding and classifying proteins. Seven research groups present results obtained with learning-based methods and trained on the simulated dataset, as well as a baseline template matching (TM), a traditional method widely used in cryo-ET research. We show that learning-based approaches can achieve notably better localization and classification performance than TM. We also experimentally confirm that there is a negative relationship between particle size and performance for all methods.

preprint2022arXiv

The consistent behavior of negative Poissons ratio with interlayer interactions

Negative Poissons ratio (NPR) is of great interest due to the novel applications in lots of fields. Films are the most commonly used form in practical applications, which involves multiple layers. However, the effect of interlayer interactions on the NPR is still unclear. In this study, based on first principles calculations, we systematically investigate the effect of interlayer interactions on the NPR by comparably studying single-layer graphene, few-layer graphene, h-BN, and graphene-BN heterostructure. It is found that they almost have the same geometry-strain response. Consequently, the NPR in bilayer graphene, triple-layer graphene, and graphene-BN heterostructure are consistent with that in single-layer graphene and h-BN. The fundamental mechanism lies in that the response to strain of the orbital coupling are consistent under the effect of interlayer interactions. The deep understanding of the NPR with the effect of interlayer interactions as achieved in this study is beneficial for the future design and development of micro-/nanoscale electromechanical devices with novel functions based on nanostructures.

preprint2020arXiv

DWMD: Dimensional Weighted Orderwise Moment Discrepancy for Domain-specific Hidden Representation Matching

Knowledge transfer from a source domain to a different but semantically related target domain has long been an important topic in the context of unsupervised domain adaptation (UDA). A key challenge in this field is establishing a metric that can exactly measure the data distribution discrepancy between two homogeneous domains and adopt it in distribution alignment, especially in the matching of feature representations in the hidden activation space. Existing distribution matching approaches can be interpreted as failing to either explicitly orderwise align higher-order moments or satisfy the prerequisite of certain assumptions in practical uses. We propose a novel moment-based probability distribution metric termed dimensional weighted orderwise moment discrepancy (DWMD) for feature representation matching in the UDA scenario. Our metric function takes advantage of a series for high-order moment alignment, and we theoretically prove that our DWMD metric function is error-free, which means that it can strictly reflect the distribution differences between domains and is valid without any feature distribution assumption. In addition, since the discrepancies between probability distributions in each feature dimension are different, dimensional weighting is considered in our function. We further calculate the error bound of the empirical estimate of the DWMD metric in practical applications. Comprehensive experiments on benchmark datasets illustrate that our method yields state-of-the-art distribution metrics.

preprint2016arXiv

Green Data Centers: A Survey, Perspectives, and Future Directions

At present, a major concern regarding data centers is their extremely high energy consumption and carbon dioxide emissions. However, because of the over-provisioning of resources, the utilization of existing data centers is, in fact, remarkably low, leading to considerable energy waste. Therefore, over the past few years, many research efforts have been devoted to increasing efficiency for the construction of green data centers. The goal of these efforts is to efficiently utilize available resources and to reduce energy consumption and thermal cooling costs. In this paper, we provide a survey of the state-of-the-art research on green data center techniques, including energy efficiency, resource management, thermal control and green metrics. Additionally, we present a detailed comparison of the reviewed proposals. We further discuss the key challenges for future research and highlight some future research issues for addressing the problem of building green data centers.

preprint2016arXiv

Multi-resource Energy-efficient Routing in Cloud Data Centers with Networks-as-a-Service

With the rapid development of software defined networking and network function virtualization, researchers have proposed a new cloud networking model called Network-as-a-Service (NaaS) which enables both in-network packet processing and application-specific network control. In this paper, we revisit the problem of achieving network energy efficiency in data centers and identify some new optimization challenges under the NaaS model. Particularly, we extend the energy-efficient routing optimization from single-resource to multi-resource settings. We characterize the problem through a detailed model and provide a formal problem definition. Due to the high complexity of direct solutions, we propose a greedy routing scheme to approximate the optimum, where flows are selected progressively to exhaust residual capacities of active nodes, and routing paths are assigned based on the distributions of both node residual capacities and flow demands. By leveraging the structural regularity of data center networks, we also provide a fast topology-aware heuristic method based on hierarchically solving a series of vector bin packing instances. Our simulations show that the proposed routing scheme can achieve significant gain on energy savings and the topology-aware heuristic can produce comparably good results while reducing the computation time to a large extent.

preprint2014arXiv

A Joint Optimization of Operational Cost and Performance Interference in Cloud Data Centers

Virtual machine (VM) scheduling is an important technique to efficiently operate the computing resources in a data center. Previous work has mainly focused on consolidating VMs to improve resource utilization and thus to optimize energy consumption. However, the interference between collocated VMs is usually ignored, which can result in very worse performance degradation to the applications running in those VMs due to the contention of the shared resources. Based on this observation, we aim at designing efficient VM assignment and scheduling strategies where we consider optimizing both the operational cost of the data center and the performance degradation of running applications and then, we propose a general model which captures the inherent tradeoff between the two contradictory objectives. We present offline and online solutions for this problem by exploiting the spatial and temporal information of VMs where VM scheduling is done by jointly consider the combinations and the life-cycle overlapping of the VMs. Evaluation results show that the proposed methods can generate efficient schedules for VMs, achieving low operational cost while significantly reducing the performance degradation of applications in cloud data centers.

preprint2014arXiv

Energy-Efficient Flow Scheduling and Routing with Hard Deadlines in Data Center Networks

The power consumption of enormous network devices in data centers has emerged as a big concern to data center operators. Despite many traffic-engineering-based solutions, very little attention has been paid on performance-guaranteed energy saving schemes. In this paper, we propose a novel energy-saving model for data center networks by scheduling and routing "deadline-constrained flows" where the transmission of every flow has to be accomplished before a rigorous deadline, being the most critical requirement in production data center networks. Based on speed scaling and power-down energy saving strategies for network devices, we aim to explore the most energy efficient way of scheduling and routing flows on the network, as well as determining the transmission speed for every flow. We consider two general versions of the problem. For the version of only flow scheduling where routes of flows are pre-given, we show that it can be solved polynomially and we develop an optimal combinatorial algorithm for it. For the version of joint flow scheduling and routing, we prove that it is strongly NP-hard and cannot have a Fully Polynomial-Time Approximation Scheme (FPTAS) unless P=NP. Based on a relaxation and randomized rounding technique, we provide an efficient approximation algorithm which can guarantee a provable performance ratio with respect to a polynomial of the total number of flows.

preprint2014arXiv

Improving the Load Balance of MapReduce Operations based on the Key Distribution of Pairs

Load balance is important for MapReduce to reduce job duration, increase parallel efficiency, etc. Previous work focuses on coarse-grained scheduling. This study concerns fine-grained scheduling on MapReduce operations. Each operation represents one invocation of the Map or Reduce function. Scheduling MapReduce operations is difficult due to highly screwed operation loads, no support to collect workload statistics, and high complexity of the scheduling problem. So current implementations adopt simple strategies, leading to poor load balance. To address these difficulties, we design an algorithm to schedule operations based on the key distribution of intermediate pairs. The algorithm involves a sub-program for selecting operations for task slots, and we name it the Balanced Subset Sum (BSS) problem. We discuss properties of BSS and design exact and approximation algorithms for it. To transparently incorporate these algorithms into MapReduce, we design a communication mechanism to collect statistics, and a pipeline within Reduce tasks to increase resource utilization. To the best of our knowledge, this is the first work on scheduling MapReduce workload at this fine-grained level. Experiments on PUMA [T+12] benchmarks show consistent performance improvement. The job duration can be reduced by up to 37%, compared with standard MapReduce.

preprint2014arXiv

OS4M: Achieving Global Load Balance of MapReduce Workload by Scheduling at the Operation Level

The efficiency of MapReduce is closely related to its load balance. Existing works on MapReduce load balance focus on coarse-grained scheduling. This study concerns fine-grained scheduling on MapReduce operations, with each operation representing one invocation of the Map or Reduce function. By default, MapReduce adopts the hash-based method to schedule Reduce operations, which often leads to poor load balance. In addition, the copy phase of Reduce tasks overlaps with Map tasks, which significantly hinders the progress of Map tasks due to I/O contention. Moreover, the three phases of Reduce tasks run in sequence, while consuming different resources, thereby under-utilizing resources. To overcome these problems, we introduce a set of mechanisms named OS4M (Operation Scheduling for MapReduce) to improve MapReduce's performance. OS4M achieves load balance by collecting statistics of all Map operations, and calculates a globally optimal schedule to distribute Reduce operations. With OS4M, the copy phase of Reduce tasks no longer overlaps with Map tasks, and the three phases of Reduce tasks are pipelined based on their operation loads. OS4M has been transparently incorporated into MapReduce. Evaluations on standard benchmarks show that OS4M's job duration can be shortened by up to 42%, compared with a baseline of Hadoop.

preprint2013arXiv

Energy-Efficient Scheduling with Time and Processors Eligibility Restrictions

While previous work on energy-efficient algorithms focused on assumption that tasks can be assigned to any processor, we initially study the problem of task scheduling on restricted parallel processors. The objective is to minimize the overall energy consumption while speed scaling (SS) method is used to reduce energy consumption under the execution time constraint (Makespan $C_{max}$). In this work, we discuss the speed setting in the continuous model that processors can run at arbitrary speed in $[s_{min},s_{max}]$. The energy-efficient scheduling problem, involving task assignment and speed scaling, is inherently complicated as it is proved to be NP-Complete. We formulate the problem as an Integer Programming (IP) problem. Specifically, we devise a polynomial time optimal scheduling algorithm for the case tasks have a uniform size. Our algorithm runs in $O(mn^3logn)$ time, where $m$ is the number of processors and $n$ is the number of tasks. We then present a polynomial time algorithm that achieves an approximation factor of $2^{α-1}(2-\frac{1}{m^α})$ ($α$ is the power parameter) when the tasks have arbitrary size work. Experimental results demonstrate that our algorithm could provide an efficient scheduling for the problem of task scheduling on restricted parallel processors.

preprint2013arXiv

GreenDCN: a General Framework for Achieving Energy Efficiency in Data Center Networks

The popularization of cloud computing has raised concerns over the energy consumption that takes place in data centers. In addition to the energy consumed by servers, the energy consumed by large numbers of network devices emerges as a significant problem. Existing work on energy-efficient data center networking primarily focuses on traffic engineering, which is usually adapted from traditional networks. We propose a new framework to embrace the new opportunities brought by combining some special features of data centers with traffic engineering. Based on this framework, we characterize the problem of achieving energy efficiency with a time-aware model, and we prove its NP-hardness with a solution that has two steps. First, we solve the problem of assigning virtual machines (VM) to servers to reduce the amount of traffic and to generate favorable conditions for traffic engineering. The solution reached for this problem is based on three essential principles that we propose. Second, we reduce the number of active switches and balance traffic flows, depending on the relation between power consumption and routing, to achieve energy conservation. Experimental results confirm that, by using this framework, we can achieve up to 50 percent energy savings. We also provide a comprehensive discussion on the scalability and practicability of the framework.

preprint2013arXiv

Routing for Energy Minimization with Discrete Cost Functions

Energy saving is becoming an important issue in the design and use of computer networks. In this work we propose a problem that considers the use of rate adaptation as the energy saving strategy in networks. The problem is modeled as an integral demand-routing problem in a network with discrete cost functions at the links. The discreteness of the cost function comes from the different states (bandwidths) at which links can operate and, in particular, from the energy consumed at each state. This in its turn leads to the non-convexity of the cost function, and thus adds complexity to solve this problem. We formulate this routing problem as an integer program, and we show that the general case of this problem is NP-hard, and even hard to approximate. For the special case when the step ratio of the cost function is bounded, we show that effective approximations can be obtained. Our main algorithm executes two processes in sequence: relaxation and rounding. The relaxation process eliminates the non-convexity of the cost function, so that the problem is transformed into a fractional convex program solvable in polynomial time. After that, a randomized rounding process is used to get a feasible solution for the original problem. This algorithm provides a constant approximation ratio for uniform demands and an approximation ratio of $O(\log^{β-1} d)$ for non-uniform demands, where $β$ is a constant and $d$ is the largest demand.

Fa Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

15 published item(s)

FocalOrder: Focal Preference Optimization for Reading Order Detection

MFAI: A Scalable Bayesian Matrix Factorization Approach to Leveraging Auxiliary Information

PARL: Position-Aware Relation Learning Network for Document Layout Analysis

SHREC 2021: Classification in cryo-electron tomograms

The consistent behavior of negative Poissons ratio with interlayer interactions

DWMD: Dimensional Weighted Orderwise Moment Discrepancy for Domain-specific Hidden Representation Matching

Green Data Centers: A Survey, Perspectives, and Future Directions

Multi-resource Energy-efficient Routing in Cloud Data Centers with Networks-as-a-Service

A Joint Optimization of Operational Cost and Performance Interference in Cloud Data Centers

Energy-Efficient Flow Scheduling and Routing with Hard Deadlines in Data Center Networks

Improving the Load Balance of MapReduce Operations based on the Key Distribution of Pairs

OS4M: Achieving Global Load Balance of MapReduce Workload by Scheduling at the Operation Level

Energy-Efficient Scheduling with Time and Processors Eligibility Restrictions

GreenDCN: a General Framework for Achieving Energy Efficiency in Data Center Networks

Routing for Energy Minimization with Discrete Cost Functions