Researcher profile

Yue Song

Yue Song contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
14works
0followers
12topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

14 published item(s)

preprint2026arXiv

A Short Note on Batch-efficient Divide-and-Conquer Algorithm for EigenDecomposition

EigenDecomposition (ED) is at the heart of many computer vision algorithms and applications. One crucial bottleneck limiting its usage is the expensive computation cost, particularly for a mini-batch of matrices in deep neural networks. Our previous work proposed a dedicated QR-based ED algorithm for batched small matrices (dim${<}32$). This short paper targets the limitation and proposes a batch-efficient Divide-and-Conquer based ED algorithm for larger matrices. The numerical test shows that for a mini-batch of matrices whose dimensions are smaller than $64$, our method can be much faster than the Pytorch SVD function.

preprint2022arXiv

Batch-efficient EigenDecomposition for Small and Medium Matrices

EigenDecomposition (ED) is at the heart of many computer vision algorithms and applications. One crucial bottleneck limiting its usage is the expensive computation cost, particularly for a mini-batch of matrices in the deep neural networks. In this paper, we propose a QR-based ED method dedicated to the application scenarios of computer vision. Our proposed method performs the ED entirely by batched matrix/vector multiplication, which processes all the matrices simultaneously and thus fully utilizes the power of GPUs. Our technique is based on the explicit QR iterations by Givens rotation with double Wilkinson shifts. With several acceleration techniques, the time complexity of QR iterations is reduced from $O{(}n^5{)}$ to $O{(}n^3{)}$. The numerical test shows that for small and medium batched matrices (\emph{e.g.,} $dim{<}32$) our method can be much faster than the Pytorch SVD function. Experimental results on visual recognition and image generation demonstrate that our methods also achieve competitive performances.

preprint2022arXiv

Convex Relaxation of AC Optimal Power Flow with Flexible Transmission Line Impedances

Flexible transmission line impedances on one hand are a promising control resource for facilitating grid flexibility, but on the other hand add much complexity to the concerned optimization problems. This paper develops a convexification method for the AC optimal power flow with flexible line impedances. First, it is discovered that a flexible-impedance line is equivalent to a constant-impedance line linking a pair of transformers with correlated and continuously adjustable tap ratios. Then, with this circuit equivalent, the original optimization problem is reformulated into a semi-definite program under the existing convex relaxation framework, which improves the solution tractability and optimality in an easy-to-implement manner. The proposed method is verified by numerical tests on the IEEE 118-bus system.

preprint2022arXiv

Disentangle Saliency Detection into Cascaded Detail Modeling and Body Filling

Salient object detection has been long studied to identify the most visually attractive objects in images/videos. Recently, a growing amount of approaches have been proposed all of which rely on the contour/edge information to improve detection performance. The edge labels are either put into the loss directly or used as extra supervision. The edge and body can also be learned separately and then fused afterward. Both methods either lead to high prediction errors near the edge or cannot be trained in an end-to-end manner. Another problem is that existing methods may fail to detect objects of various sizes due to the lack of efficient and effective feature fusion mechanisms. In this work, we propose to decompose the saliency detection task into two cascaded sub-tasks, \emph{i.e.}, detail modeling and body filling. Specifically, the detail modeling focuses on capturing the object edges by supervision of explicitly decomposed detail label that consists of the pixels that are nested on the edge and near the edge. Then the body filling learns the body part which will be filled into the detail map to generate more accurate saliency map. To effectively fuse the features and handle objects at different scales, we have also proposed two novel multi-scale detail attention and body attention blocks for precise detail and body modeling. Experimental results show that our method achieves state-of-the-art performances on six public datasets.

preprint2022arXiv

Fast Differentiable Matrix Square Root

Computing the matrix square root or its inverse in a differentiable manner is important in a variety of computer vision tasks. Previous methods either adopt the Singular Value Decomposition (SVD) to explicitly factorize the matrix or use the Newton-Schulz iteration (NS iteration) to derive the approximate solution. However, both methods are not computationally efficient enough in either the forward pass or in the backward pass. In this paper, we propose two more efficient variants to compute the differentiable matrix square root. For the forward propagation, one method is to use Matrix Taylor Polynomial (MTP), and the other method is to use Matrix Padé Approximants (MPA). The backward gradient is computed by iteratively solving the continuous-time Lyapunov equation using the matrix sign function. Both methods yield considerable speed-up compared with the SVD or the Newton-Schulz iteration. Experimental results on the de-correlated batch normalization and second-order vision transformer demonstrate that our methods can also achieve competitive and even slightly better performances. The code is available at \href{https://github.com/KingJamesSong/FastDifferentiableMatSqrt}{https://github.com/KingJamesSong/FastDifferentiableMatSqrt}.

preprint2022arXiv

Formulating Connectedness in Security-Constrained Optimal Transmission Switching Problems

This paper focuses on the issue of network connectedness (NC) in security-constrained optimal transmission switching problems, which is complicated by branch contingencies and corrective line switching. Two criteria are firstly proposed with the principle of preserving NC as much as possible within reasonable limits. By extending the electrical flow based NC constraints, a proposition is derived to associate different cases of NC with the optimum of a linear program, yielding the mathematical formulation of the NC criteria. By Karush-Kuhn-Tucker conditions, this formulation is further transformed into a tractable version which can be incorporated with existing SCOTS models without affecting the applicability of original solution approaches. Finally, case studies on various networks and SCOTS models demonstrate the efficacy of the proposed approach.

preprint2022arXiv

Improving Covariance Conditioning of the SVD Meta-layer by Orthogonality

Inserting an SVD meta-layer into neural networks is prone to make the covariance ill-conditioned, which could harm the model in the training stability and generalization abilities. In this paper, we systematically study how to improve the covariance conditioning by enforcing orthogonality to the Pre-SVD layer. Existing orthogonal treatments on the weights are first investigated. However, these techniques can improve the conditioning but would hurt the performance. To avoid such a side effect, we propose the Nearest Orthogonal Gradient (NOG) and Optimal Learning Rate (OLR). The effectiveness of our methods is validated in two applications: decorrelated Batch Normalization (BN) and Global Covariance Pooling (GCP). Extensive experiments on visual recognition demonstrate that our methods can simultaneously improve the covariance conditioning and generalization. Moreover, the combinations with orthogonal weight can further boost the performances.

preprint2022arXiv

Joule-Thomson expansion of d-dimensional charged AdS black holes with cloud of strings and quintessence

Herein, we focus on the study of Joule-Thomson expansion corresponding to a d-dimensional charged AdS black hole with cloud of strings and quintessence. Then its relevant solution and some thermodynamic properties are investigated. Specifically, we evaluate its Joule-Thomson expansion from four important aspects, including the Joule-Thomson coefficient, inversion curve, isenthalpic curve, and ratio $\frac{T_{i}^{min}}{T_{c}}$. After analysis, different dimensions with strings of cloud and quintessence parameters have different effects on the Joule-Thomson coefficient (the same situation are found for the inversion curve, isenthalpic curve, and ratio $\frac{T_{i}^{min}}{T_{c}}$).

preprint2022arXiv

On the Eigenvalues of Global Covariance Pooling for Fine-grained Visual Recognition

The Fine-Grained Visual Categorization (FGVC) is challenging because the subtle inter-class variations are difficult to be captured. One notable research line uses the Global Covariance Pooling (GCP) layer to learn powerful representations with second-order statistics, which can effectively model inter-class differences. In our previous conference paper, we show that truncating small eigenvalues of the GCP covariance can attain smoother gradient and improve the performance on large-scale benchmarks. However, on fine-grained datasets, truncating the small eigenvalues would make the model fail to converge. This observation contradicts the common assumption that the small eigenvalues merely correspond to the noisy and unimportant information. Consequently, ignoring them should have little influence on the performance. To diagnose this peculiar behavior, we propose two attribution methods whose visualizations demonstrate that the seemingly unimportant small eigenvalues are crucial as they are in charge of extracting the discriminative class-specific features. Inspired by this observation, we propose a network branch dedicated to magnifying the importance of small eigenvalues. Without introducing any additional parameters, this branch simply amplifies the small eigenvalues and achieves state-of-the-art performances of GCP methods on three fine-grained benchmarks. Furthermore, the performance is also competitive against other FGVC approaches on larger datasets. Code is available at \href{https://github.com/KingJamesSong/DifferentiableSVD}{https://github.com/KingJamesSong/DifferentiableSVD}.

preprint2022arXiv

Optimal Topology Transition

Network topology has significant impacts on operational performance of power systems. While extensive research efforts have been devoted to optimization of network topology for improving various system performances, the problem of how to transition from the initial topology to the desired optimal topology requires study. To address this problem, we propose the concept of optimal topology transition (OTT). This aims to find the topology transition trajectory from an initial topology to a desired terminal topology, which optimizes certain transition performance and satisfies operational constraints. The OTT problem is further formulated as a mixed-integer program under certain assumptions. Next, we propose the formulation of transition-embedded topology optimization that is capable of optimizing network topology and its transition trajectory simultaneously. Considering the time complexity of directly solving the mixed-integer programs, an efficient problem-specific solution algorithm is developed. Finally, numerical studies demonstrate the effectiveness of the proposed OTT and transition-embedded topology optimization models, as well as the superiority of the obtained optimal transition trajectories compared to ad hoc transition trajectories.

preprint2022arXiv

PICASSO: Unleashing the Potential of GPU-centric Training for Wide-and-deep Recommender Systems

The development of personalized recommendation has significantly improved the accuracy of information matching and the revenue of e-commerce platforms. Recently, it has 2 trends: 1) recommender systems must be trained timely to cope with ever-growing new products and ever-changing user interests from online marketing and social network; 2) SOTA recommendation models introduce DNN modules to improve prediction accuracy. Traditional CPU-based recommender systems cannot meet these two trends, and GPU- centric training has become a trending approach. However, we observe that GPU devices in training recommender systems are underutilized, and they cannot attain an expected throughput improvement as what it has achieved in CV and NLP areas. This issue can be explained by two characteristics of these recommendation models: First, they contain up to a thousand input feature fields, introducing fragmentary and memory-intensive operations; Second, the multiple constituent feature interaction submodules introduce substantial small-sized compute kernels. To remove this roadblock to the development of recommender systems, we propose a novel framework named PICASSO to accelerate the training of recommendation models on commodity hardware. Specifically, we conduct a systematic analysis to reveal the bottlenecks encountered in training recommendation models. We leverage the model structure and data distribution to unleash the potential of hardware through our packing, interleaving, and caching optimization. Experiments show that PICASSO increases the hardware utilization by an order of magnitude on the basis of SOTA baselines and brings up to 6x throughput improvement for a variety of industrial recommendation models. Using the same hardware budget in production, PICASSO on average shortens the walltime of daily training tasks by 7 hours, significantly reducing the delay of continuous delivery.

preprint2020arXiv

Reducing BESS Capacity for Accommodating Renewables in Subtransmission Systems with Power Flow Routers

Widespread utilization of renewable energy sources (RESs) in subtransmission systems causes serious problems on power quality, such as voltage violations, leading to significant curtailment of renewables. This is due to the inherent variability of renewables and the high R/X ratio of the subtransmission system. To achieve full utilization of renewables, battery energy storage systems (BESSs) are commonly used to mitigate the negative effects of massive fluctuations of RESs. Power flow router (PFR), which can be regarded as a general type of network-side controller, has also been verified to enhance the grid flexibility for accommodating renewables. In this paper, we investigate the value of PFR in helping BESSs for renewable power accommodation. The performance of PFR is evaluated with the minimum BESS capacity required for zero renewable power curtailment with and without PFRs. The operational constraints of BESSs and the terminal voltage property of PFRs are considered in a multi-period optimization model. The proposed model is tested through numerical simulations on a modified IEEE 30-bus subtransmission system and a remarkable result shows that 15% reduction of BESS capacity can be achieved by installing PFRs on a single line.

preprint2020arXiv

Robust Transient Stability Constrained Optimal Power Flow with Power Flow Routers Considering Renewable Uncertainties

This paper proposes a robust transient stability constrained optimal power flow problem that addresses renewable uncertainties by the coordination of generation re-dispatch and power flow router (PFR) tuning.PFR refers to a general type of network-side controller that enlarges the feasible region of the OPF problem. The coordination between network-side and generator-side control in the proposed model is more general than the traditional methods which focus on generation dispatch only. An offline-online solution framework is developed to solve the problem efficiently. Under this framework the original problem is significantly simplified, so that we only need to solve a low-dimensional deterministic problem at the online stage to achieve real-time implementation with a high robustness level. The proposed method is verified on the modified New England 39-bus system. Numerical results demonstrate that the proposed method is efficient and shows good performance on economy and robustness.

preprint2019arXiv

Radiality Constraints for Resilient Reconfiguration of Distribution Systems: Formulation and Application to Microgrid Formation

Network reconfiguration is an effective strategy for different purposes of distribution systems (DSs), e.g., resilience enhancement. In particular, DS automation, distributed generation integration and microgrid (MG) technology development, etc., are empowering much more flexible reconfiguration and operation of the system, e.g., DSs or MGs with flexible boundaries. However, the formulation of DS reconfiguration-related optimization problems to include those new flexibilities is non-trivial, especially for the issue of topology, which has to be radial. That is, existing methods of formulating radiality constraints can cause underutilization of DS flexibilities. Thus, this work proposes a new method for radiality constraints formulation fully enabling the topological and some other related flexibilities of DSs, so that the reconfiguration-related optimization problems can have extended feasibility and enhanced optimality. Graph-theoretic supports are provided to certify its theoretical validity. As integer variables are involved, we also analyze the tightness and compactness issues. The proposed radiality constraints are specifically applied to post-disaster MG formation, which is involved in many DS resilience-oriented service restoration and/or infrastructure recovery problems. The resulting new MG formation model, which allows more flexible merge and/or separation of sub-grids, etc., establishes superiority over the models in the literature. Case studies are conducted on two test systems.