Source author record

Gene Cheung

Gene Cheung appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning eess.SP Multimedia eess.IV Computer Vision Neurons and Cognition Quantitative Methods

Catalog footprint

What is connected

25works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Sparse Graph Learning from Sparse Data via Fiedler Number Maximization

We aim to learn a sparse and connected graph from sparse data, where the number of observations K can be substantially smaller than the signal dimension N for signals x in R^N, and the underlying distribution is unknown. In this severely ill-posed setting, we incorporate Fiedler number (the second eigenvalue of the graph Laplacian matrix that quantifies connectedness) as a robust regularization term in the sparse graph learning objective. We first develop a greedy algorithm that iteratively selects one edge globally for weakening/removal to reduce the objective, leveraging eigenvalue perturbation theorems that bound the adverse effect of an edge change to the Fiedler number. Next, we design a parallel variant, based on the Cheeger's inequality, that recursively partitions an input graph into two sub-graphs using an approximate Cheeger cut to distributedly find an optimal edge. Simulation experiments show that Fiedler number maximization robustifies sparse graph estimates, outperforming previous sparse graph learning algorithms.

preprint2024arXiv

Signal Processing in the Retina: Interpretable Graph Classifier to Predict Ganglion Cell Responses

It is a popular hypothesis in neuroscience that ganglion cells in the retina are activated by selectively detecting visual features in an observed scene. While ganglion cell firings can be predicted via data-trained deep neural nets, the networks remain indecipherable, thus providing little understanding of the cells' underlying operations. To extract knowledge from the cell firings, in this paper we learn an interpretable graph-based classifier from data to predict the firings of ganglion cells in response to visual stimuli. Specifically, we learn a positive semi-definite (PSD) metric matrix $\mathbf{M} \succeq 0$ that defines Mahalanobis distances between graph nodes (visual events) endowed with pre-computed feature vectors; the computed inter-node distances lead to edge weights and a combinatorial graph that is amenable to binary classification. Mathematically, we define the objective of metric matrix $\mathbf{M}$ optimization using a graph adaptation of large margin nearest neighbor (LMNN), which is rewritten as a semi-definite programming (SDP) problem. We solve it efficiently via a fast approximation called Gershgorin disc perfect alignment (GDPA) linearization. The learned metric matrix $\mathbf{M}$ provides interpretability: important features are identified along $\mathbf{M}$'s diagonal, and their mutual relationships are inferred from off-diagonal terms. Our fast metric learning framework can be applied to other biological systems with pre-chosen features that require interpretation.

preprint2023arXiv

Efficient Signed Graph Sampling via Balancing & Gershgorin Disc Perfect Alignment

A basic premise in graph signal processing (GSP) is that a graph encoding pairwise (anti-)correlations of the targeted signal as edge weights is exploited for graph filtering. However, existing fast graph sampling schemes are designed and tested only for positive graphs describing positive correlations. In this paper, we show that for datasets with strong inherent anti-correlations, a suitable graph contains both positive and negative edge weights. In response, we propose a linear-time signed graph sampling method centered on the concept of balanced signed graphs. Specifically, given an empirical covariance data matrix $\bar{\bf{C}}$, we first learn a sparse inverse matrix (graph Laplacian) $\mathcal{L}$ corresponding to a signed graph $\mathcal{G}$. We define the eigenvectors of Laplacian $\mathcal{L}_B$ for a balanced signed graph $\mathcal{G}_B$ -- approximating $\mathcal{G}$ via edge weight augmentation -- as graph frequency components. Next, we choose samples to minimize the low-pass filter reconstruction error in two steps. We first align all Gershgorin disc left-ends of Laplacian $\mathcal{L}_B$ at smallest eigenvalue $λ_{\min}(\mathcal{L}_B)$ via similarity transform $\mathcal{L}_p = §\mathcal{L}_B §^{-1}$, leveraging a recent linear algebra theorem called Gershgorin disc perfect alignment (GDPA). We then perform sampling on $\mathcal{L}_p$ using a previous fast Gershgorin disc alignment sampling (GDAS) scheme. Experimental results show that our signed graph sampling method outperformed existing fast sampling schemes noticeably on various datasets.

preprint2022arXiv

Fast Computation of Generalized Eigenvectors for Manifold Graph Embedding

Our goal is to efficiently compute low-dimensional latent coordinates for nodes in an input graph -- known as graph embedding -- for subsequent data processing such as clustering. Focusing on finite graphs that are interpreted as uniform samples on continuous manifolds (called manifold graphs), we leverage existing fast extreme eigenvector computation algorithms for speedy execution. We first pose a generalized eigenvalue problem for sparse matrix pair $(\A,\B)$, where $\A = Ł- μ\Q + ε\I$ is a sum of graph Laplacian $Ł$ and disconnected two-hop difference matrix $\Q$. Eigenvector $\v$ minimizing Rayleigh quotient $\frac{\v^{\top} \A \v}{\v^{\top} \v}$ thus minimizes $1$-hop neighbor distances while maximizing distances between disconnected $2$-hop neighbors, preserving graph structure. Matrix $\B = \text{diag}(\{\b_i\})$ that defines eigenvector orthogonality is then chosen so that boundary / interior nodes in the sampling domain have the same generalized degrees. $K$-dimensional latent vectors for the $N$ graph nodes are the first $K$ generalized eigenvectors for $(\A,\B)$, computed in $\cO(N)$ using LOBPCG, where $K \ll N$. Experiments show that our embedding is among the fastest in the literature, while producing the best clustering performance for manifold graphs.

preprint2022arXiv

Hybrid Model-based / Data-driven Graph Transform for Image Coding

Transform coding to sparsify signal representations remains crucial in an image compression pipeline. While the Karhunen-Loève transform (KLT) computed from an empirical covariance matrix $\bar{C}$ is theoretically optimal for a stationary process, in practice, collecting sufficient statistics from a non-stationary image to reliably estimate $\bar{C}$ can be difficult. In this paper, to encode an intra-prediction residual block, we pursue a hybrid model-based / data-driven approach: the first $K$ eigenvectors of a transform matrix are derived from a statistical model, e.g., the asymmetric discrete sine transform (ADST), for stability, while the remaining $N-K$ are computed from $\bar{C}$ for performance. The transform computation is posed as a graph learning problem, where we seek a graph Laplacian matrix minimizing a graphical lasso objective inside a convex cone sharing the first $K$ eigenvectors in a Hilbert space of real symmetric matrices. We efficiently solve the problem via augmented Lagrangian relaxation and proximal gradient (PG). Using WebP as a baseline image codec, experimental results show that our hybrid graph transform achieved better energy compaction than default discrete cosine transform (DCT) and better stability than KLT.

preprint2022arXiv

Landmarking for Navigational Streaming of Stored High-Dimensional Media

Modern media data such as 360 videos and light field (LF) images are typically captured in much higher dimensions than the observers' visual displays. To efficiently browse high-dimensional media over bandwidth-constrained networks, a navigational streaming model is considered: a client navigates the large media space by dictating a navigation path to a server, who in response transmits the corresponding pre-encoded media data units (MDU) to the client one-by-one in sequence. Intra-coding an MDU (I-MDU) would result in a large bitrate but I-MDU can be randomly accessed, while inter-coding an MDU (P-MDU) using another MDU as a predictor incurs a small coding cost but imposes an order where the predictor must be first transmitted and decoded. From a compression perspective, the technical challenge is: how to achieve coding gain via inter-coding of MDUs, while enabling adequate random access for satisfactory user navigation. To address this problem, we propose landmarks, a selection of key MDUs from the high-dimensional media. Using landmarks as predictors, nearby MDUs in local neighborhoods are intercoded, resulting in a predictive MDU structure with controlled coding cost. It means that any requested MDU can be decoded by at most transmitting a landmark and an inter-coded MDU, enabling navigational random access. To build a landmarked MDU structure, we employ tree-structured vector quantizer (TSVQ) to first optimize landmark locations, then iteratively add/remove inter-coded MDUs as refinements using a fast branch-and-bound technique. Taking interactive LF images and viewport adaptive 360 images as illustrative applications, and I-, P- and previously proposed merge frames to intra- and inter-code MDUs, we show experimentally that landmarked MDU structures can noticeably reduce the expected transmission cost compared with MDU structures without landmarks.

preprint2022arXiv

Pre-demosaic Graph-based Light Field Image Compression

An unfocused plenoptic light field (LF) camera places an array of microlenses in front of an image sensor in order to separately capture different directional rays arriving at an image pixel. Using a conventional Bayer pattern, data captured at each pixel is a single color component (R, G or B).The sensed data then undergoes demosaicking (interpolation of RGB components per pixel) and conversion to an array of sub-aperture images (SAIs). In this paper, we propose a new LF image coding scheme based on graph lifting transform (GLT), where the acquired sensor data are coded in the original captured form without pre-processing. Specifically, we directly map raw sensed color data to the SAIs, resulting in sparsely distributed color pixels on 2D grids, and perform demosaicking at the receiver after decoding. To exploit spatial correlation among the sparse pixels, we propose a novel intra-prediction scheme, where the prediction kernel is determined according to the local gradient estimated from already coded neighboring pixel blocks. We then connect the pixels by forming a graph, modeling the prediction residuals statistically as a Gaussian Markov Random Field (GMRF). The optimal edge weights are computed via a graph learning method using a set of training SAIs. The residual data is encoded via low-complexity GLT. Experiments show that at high PSNRs -- important for archiving and instant storage scenarios -- our method outperformed significantly a conventional light field image coding scheme with demosaicking followed by High Efficiency Video Coding (HEVC).

preprint2022arXiv

Projection-free Graph-based Classifier Learning using Gershgorin Disc Perfect Alignment

In semi-supervised graph-based binary classifier learning, a subset of known labels $\hat{x}_i$ are used to infer unknown labels, assuming that the label signal $\mathbf{x}$ is smooth with respect to a similarity graph specified by a Laplacian matrix. When restricting labels $x_i$ to binary values, the problem is NP-hard. While a conventional semi-definite programming relaxation (SDR) can be solved in polynomial time using, for example, the alternating direction method of multipliers (ADMM), the complexity of projecting a candidate matrix $\mathbf{M}$ onto the positive semi-definite (PSD) cone ($\mathbf{M} \succeq 0$) per iteration remains high. In this paper, leveraging a recent linear algebraic theory called Gershgorin disc perfect alignment (GDPA), we propose a fast projection-free method by solving a sequence of linear programs (LP) instead. Specifically, we first recast the SDR to its dual, where a feasible solution $\mathbf{H} \succeq 0$ is interpreted as a Laplacian matrix corresponding to a balanced signed graph minus the last node. To achieve graph balance, we split the last node into two, each retains the original positive / negative edges, resulting in a new Laplacian $\bar{\mathbf{H}}$. We repose the SDR dual for solution $\bar{\mathbf{H}}$, then replace the PSD cone constraint $\bar{\mathbf{H}} \succeq 0$ with linear constraints derived from GDPA -- sufficient conditions to ensure $\bar{\mathbf{H}}$ is PSD -- so that the optimization becomes an LP per iteration. Finally, we extract predicted labels from converged solution $\bar{\mathbf{H}}$. Experiments show that our algorithm enjoyed a $28\times$ speedup over the next fastest scheme while achieving comparable label prediction performance.

preprint2022arXiv

Unsupervised Graph Spectral Feature Denoising for Crop Yield Prediction

Prediction of annual crop yields at a county granularity is important for national food production and price stability. In this paper, towards the goal of better crop yield prediction, leveraging recent graph signal processing (GSP) tools to exploit spatial correlation among neighboring counties, we denoise relevant features via graph spectral filtering that are inputs to a deep learning prediction model. Specifically, we first construct a combinatorial graph with edge weights that encode county-to-county similarities in soil and location features via metric learning. We then denoise features via a maximum a posteriori (MAP) formulation with a graph Laplacian regularizer (GLR). We focus on the challenge to estimate the crucial weight parameter $μ$, trading off the fidelity term and GLR, that is a function of noise variance in an unsupervised manner. We first estimate noise variance directly from noise-corrupted graph signals using a graph clique detection (GCD) procedure that discovers locally constant regions. We then compute an optimal $μ$ minimizing an approximate mean square error function via bias-variance analysis. Experimental results from collected USDA data show that using denoised features as input, performance of a crop yield prediction model can be improved noticeably.

preprint2021arXiv

Learning Sparse Graph Laplacian with K Eigenvector Prior via Iterative GLASSO and Projection

Learning a suitable graph is an important precursor to many graph signal processing (GSP) pipelines, such as graph spectral signal compression and denoising. Previous graph learning algorithms either i) make some assumptions on connectivity (e.g., graph sparsity), or ii) make simple graph edge assumptions such as positive edges only. In this paper, given an empirical covariance matrix $\bar{C}$ computed from data as input, we consider a structural assumption on the graph Laplacian matrix $L$: the first $K$ eigenvectors of $L$ are pre-selected, e.g., based on domain-specific criteria, such as computation requirement, and the remaining eigenvectors are then learned from data. One example use case is image coding, where the first eigenvector is pre-chosen to be constant, regardless of available observed data. We first prove that the subspace of symmetric positive semi-definite (PSD) matrices $H_{u}^+$ with the first $K$ eigenvectors being $\{u_k\}$ in a defined Hilbert space is a convex cone. We then construct an operator to project a given positive definite (PD) matrix $L$ to $H_{u}^+$, inspired by the Gram-Schmidt procedure. Finally, we design an efficient hybrid graphical lasso/projection algorithm to compute the most suitable graph Laplacian matrix $L^* \in H_{u}^+$ given $\bar{C}$. Experimental results show that given the first $K$ eigenvectors as a prior, our algorithm outperforms competing graph learning schemes using a variety of graph comparison metrics.

preprint2020arXiv

3D Point Cloud Enhancement using Graph-Modelled Multiview Depth Measurements

A 3D point cloud is often synthesized from depth measurements collected by sensors at different viewpoints. The acquired measurements are typically both coarse in precision and corrupted by noise. To improve quality, previous works denoise a synthesized 3D point cloud a posteriori after projecting the imperfect depth data onto 3D space. Instead, we enhance depth measurements on the sensed images a priori, exploiting inherent 3D geometric correlation across views, before synthesizing a 3D point cloud from the improved measurements. By enhancing closer to the actual sensing process, we benefit from optimization targeting specifically the depth image formation model, before subsequent processing steps that can further obscure measurement errors. Mathematically, for each pixel row in a pair of rectified viewpoint depth images, we first construct a graph reflecting inter-pixel similarities via metric learning using data in previous enhanced rows. To optimize left and right viewpoint images simultaneously, we write a non-linear mapping function from left pixel row to the right based on 3D geometry relations. We formulate a MAP optimization problem, which, after suitable linear approximations, results in an unconstrained convex and differentiable objective, solvable using fast gradient method (FGM). Experimental results show that our method noticeably outperforms recent denoising algorithms that enhance after 3D point clouds are synthesized.

preprint2020arXiv

Fast Graph Sampling Set Selection Using Gershgorin Disc Alignment

Graph sampling set selection, where a subset of nodes are chosen to collect samples to reconstruct a smooth graph signal, is a fundamental problem in graph signal processing (GSP). Previous works employ an unbiased least-squares (LS) signal reconstruction scheme and select samples via expensive extreme eigenvector computation. Instead, we assume a biased graph Laplacian regularization (GLR) based scheme that solves a system of linear equations for reconstruction. We then choose samples to minimize the condition number of the coefficient matrix---specifically, maximize the smallest eigenvalue $λ_{\min}$. Circumventing explicit eigenvalue computation, we maximize instead the lower bound of $λ_{\min}$, designated by the smallest left-end of all Gershgorin discs of the matrix. To achieve this efficiently, we first convert the optimization to a dual problem, where we minimize the number of samples needed to align all Gershgorin disc left-ends at a chosen lower-bound target $T$. Algebraically, the dual problem amounts to optimizing two disc operations: i) shifting of disc centers due to sampling, and ii) scaling of disc radii due to a similarity transformation of the matrix. We further reinterpret the dual as an intuitive disc coverage problem bearing strong resemblance to the famous NP-hard set cover (SC) problem. The reinterpretation enables us to derive a fast approximation scheme from a known SC error-bounded approximation algorithm. We find an appropriate target $T$ efficiently via binary search. Extensive simulation experiments show that our disc-based sampling algorithm runs substantially faster than existing sampling schemes and outperforms other eigen-decomposition-free sampling schemes in reconstruction error.

preprint2020arXiv

Feature Graph Learning for 3D Point Cloud Denoising

Identifying an appropriate underlying graph kernel that reflects pairwise similarities is critical in many recent graph spectral signal restoration schemes, including image denoising, dequantization, and contrast enhancement. Existing graph learning algorithms compute the most likely entries of a properly defined graph Laplacian matrix $\mathbf{L}$, but require a large number of signal observations $\mathbf{z}$'s for a stable estimate. In this work, we assume instead the availability of a relevant feature vector $\mathbf{f}_i$ per node $i$, from which we compute an optimal feature graph via optimization of a feature metric. Specifically, we alternately optimize the diagonal and off-diagonal entries of a Mahalanobis distance matrix $\mathbf{M}$ by minimizing the graph Laplacian regularizer (GLR) $\mathbf{z}^{\top} \mathbf{L} \mathbf{z}$, where edge weight is $w_{i,j} = \exp\{-(\mathbf{f}_i - \mathbf{f}_j)^{\top} \mathbf{M} (\mathbf{f}_i - \mathbf{f}_j) \}$, given a single observation $\mathbf{z}$. We optimize diagonal entries via proximal gradient (PG), where we constrain $\mathbf{M}$ to be positive definite (PD) via linear inequalities derived from the Gershgorin circle theorem. To optimize off-diagonal entries, we design a block descent algorithm that iteratively optimizes one row and column of $\mathbf{M}$. To keep $\mathbf{M}$ PD, we constrain the Schur complement of sub-matrix $\mathbf{M}_{2,2}$ of $\mathbf{M}$ to be PD when optimizing via PG. Our algorithm mitigates full eigen-decomposition of $\mathbf{M}$, thus ensuring fast computation speed even when feature vector $\mathbf{f}_i$ has high dimension. To validate its usefulness, we apply our feature graph learning algorithm to the problem of 3D point cloud denoising, resulting in state-of-the-art performance compared to competing schemes in extensive experiments.

preprint2020arXiv

Graph Metric Learning via Gershgorin Disc Alignment

We propose a fast general projection-free metric learning framework, where the minimization objective $\min_{\textbf{M} \in \mathcal{S}} Q(\textbf{M})$ is a convex differentiable function of the metric matrix $\textbf{M}$, and $\textbf{M}$ resides in the set $\mathcal{S}$ of generalized graph Laplacian matrices for connected graphs with positive edge weights and node degrees. Unlike low-rank metric matrices common in the literature, $\mathcal{S}$ includes the important positive-diagonal-only matrices as a special case in the limit. The key idea for fast optimization is to rewrite the positive definite cone constraint in $\mathcal{S}$ as signal-adaptive linear constraints via Gershgorin disc alignment, so that the alternating optimization of the diagonal and off-diagonal terms in $\textbf{M}$ can be solved efficiently as linear programs via Frank-Wolfe iterations. We prove that the Gershgorin discs can be aligned perfectly using the first eigenvector $\textbf{v}$ of $\textbf{M}$, which we update iteratively using Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) with warm start as diagonal / off-diagonal terms are optimized. Experiments show that our efficiently computed graph metric matrices outperform metrics learned using competing methods in terms of classification tasks.

preprint2020arXiv

Graph Neural Net using Analytical Graph Filters and Topology Optimization for Image Denoising

While convolutional neural nets (CNNs) have achieved remarkable performance for a wide range of inverse imaging applications, the filter coefficients are computed in a purely data-driven manner and are not explainable. Inspired by an analytically derived CNN by Hadji et al., in this paper we construct a new layered graph neural net (GNN) using GraphBio as our graph filter. Unlike convolutional filters in previous GNNs, our employed GraphBio is analytically defined and requires no training, and we optimize the end-to-end system only via learning of appropriate graph topology at each layer. In signal filtering terms, it means that our linear graph filter at each layer is always intrepretable as low-pass with known biorthogonal conditions, while the graph spectrum itself is optimized via data training. As an example application, we show that our analytical GNN achieves image denoising performance comparable to a state-of-the-art CNN-based scheme when the training and testing data share the same statistics, and when they differ, our analytical GNN outperforms it by more than 1dB in PSNR.

preprint2020arXiv

Joint Demosaicking / Rectification of Fisheye Camera Images using Multi-color Graph Laplacian Regularization

To compose a 360 image from a rig with multiple fisheye cameras, a conventional processing pipeline first performs demosaicking on each fisheye camera's Bayer-patterned grid, then translates demosaicked pixels from the camera grid to a rectified image grid---thus performing two image interpolation steps in sequence. Hence interpolation errors can accumulate, and acquisition noise in the captured pixels can pollute neighbors in two consecutive processing stages. In this paper, we propose a joint processing framework that performs demosaicking and grid-to-grid mapping simultaneously---thus limiting noise pollution to one interpolation. Specifically, we first obtain a reverse mapping function from a regular on-grid location in the rectified image to an irregular off-grid location in the camera's Bayer-patterned image. For each pair of adjacent pixels in the rectified grid, we estimate its gradient using the pair's neighboring pixel gradients in three colors in the Bayer-patterned grid. We construct a similarity graph based on the estimated gradients, and interpolate pixels in the rectified grid directly via graph Laplacian regularization (GLR). Experiments show that our joint method outperforms several competing local methods that execute demosaicking and rectification in sequence, by up to 0.52 dB in PSNR and 0.086 in SSIM on the publicly available dataset, and by up to 5.53dB in PSNR and 0.411 in SSIM on the in-house constructed dataset.

preprint2019arXiv

Graph Sampling for Matrix Completion Using Recurrent Gershgorin Disc Shift

Matrix completion algorithms fill missing entries in a large matrix given a subset of observed samples. However, how to best pre-select informative matrix entries given a sampling budget is largely unaddressed. In this paper, we propose a fast sample selection strategy for matrix completion from a graph signal processing perspective. Specifically, we first regularize the matrix reconstruction objective using a dual graph signal smoothness prior, resulting in a system of linear equations for solution. We then select appropriate samples to maximize the smallest eigenvalue $λ_{\min}$ of the coefficient matrix, thus maximizing the stability of the linear system. To efficiently solve this combinatorial problem, we derive a greedy sampling strategy, leveraging on Gershgorin circle theorem, that iteratively selects one sample (equivalent to shifting one Gershgorin disc) at a time corresponding to the largest magnitude entry in the first eigenvector of a modified graph Laplacian matrix. Our algorithm benefits computationally from warm start as the first eigenvectors of incremented Laplacian matrices are computed recurrently for more samples. To achieve computation scalability when sampling large matrices, we further rewrite the coefficient matrix as a sum of two separate components, each of which exhibits block-diagonal structure that we exploit for alternating block-wise sampling. Extensive experiments on both synthetic and real-world datasets show that our graph sampling algorithm substantially outperforms existing sampling schemes for matrix completion and reduces the completion error, when combined with a range of modern matrix completion algorithms.

preprint2016arXiv

Context Tree based Image Contour Coding using A Geometric Prior

If object contours in images are coded efficiently as side information, then they can facilitate advanced image / video coding techniques, such as graph Fourier transform coding or motion prediction of arbitrarily shaped pixel blocks. In this paper, we study the problem of lossless and lossy compression of detected contours in images. Specifically, we first convert a detected object contour composed of contiguous between-pixel edges to a sequence of directional symbols drawn from a small alphabet. To encode the symbol sequence using arithmetic coding, we compute an optimal variable-length context tree (VCT) $\mathcal{T}$ via a maximum a posterior (MAP) formulation to estimate symbols' conditional probabilities. MAP prevents us from overfitting given a small training set $\mathcal{X}$ of past symbol sequences by identifying a VCT $\mathcal{T}$ that achieves a high likelihood $P(\mathcal{X}|\mathcal{T})$ of observing $\mathcal{X}$ given $\mathcal{T}$, and a large geometric prior $P(\mathcal{T})$ stating that image contours are more often straight than curvy. For the lossy case, we design efficient dynamic programming (DP) algorithms that optimally trade off coding rate of an approximate contour $\hat{\mathbf{x}}$ given a VCT $\mathcal{T}$ with two notions of distortion of $\hat{\mathbf{x}}$ with respect to the original contour $\mathbf{x}$. To reduce the size of the DP tables, a total suffix tree is derived from a given VCT $\mathcal{T}$ for compact table entry indexing, reducing complexity. Experimental results show that for lossless contour coding, our proposed algorithm outperforms state-of-the-art context-based schemes consistently for both small and large training datasets. For lossy contour coding, our algorithms outperform comparable schemes in the literature in rate-distortion performance.

preprint2016arXiv

Object Shape Approximation & Contour Adaptive Depth Image Coding for Virtual View Synthesis

A depth image provides partial geometric information of a 3D scene, namely the shapes of physical objects as observed from a particular viewpoint. This information is important when synthesizing images of different virtual camera viewpoints via depth-image-based rendering (DIBR). It has been shown that depth images can be efficiently coded using contour-adaptive codecs that preserve edge sharpness, resulting in visually pleasing DIBR-synthesized images. However, contours are typically losslessly coded as side information (SI), which is expensive if the object shapes are complex. In this paper, we pursue a new paradigm in depth image coding for color-plus-depth representation of a 3D scene: we pro-actively simplify object shapes in a depth and color image pair to reduce depth coding cost, at a penalty of a slight increase in synthesized view distortion. Specifically, we first mathematically derive a distortion upper-bound proxy for 3DSwIM---a quality metric tailored for DIBR-synthesized images. This proxy reduces interdependency among pixel rows in a block to ease optimization. We then approximate object contours via a dynamic programming (DP) algorithm to optimally trade off coding cost of contours using arithmetic edge coding (AEC) with our proposed view synthesis distortion proxy. We modify the depth and color images according to the approximated object contours in an inter-view consistent manner. These are then coded respectively using a contour-adaptive image codec based on graph Fourier transform (GFT) for edge preservation and HEVC intra. Experimental results show that by maintaining sharp but simplified object contours during contour-adaptive coding, for the same visual quality of DIBR-synthesized virtual views, our proposal can reduce depth image coding rate by up to 22% compared to alternative coding strategies such as HEVC intra.

preprint2016arXiv

Optimal Lagrange Multipliers for Dependent Rate Allocation in Video Coding

In a typical video rate allocation problem, the objective is to optimally distribute a source rate budget among a set of (in)dependently coded data units to minimize the total distortion of all units. Conventional Lagrangian approaches convert the lone rate constraint to a linear rate penalty scaled by a multiplier in the objective, resulting in a simpler unconstrained formulation. However, the search for the "optimal" multiplier, one that results in a distortion-minimizing solution among all Lagrangian solutions that satisfy the original rate constraint, remains an elusive open problem in the general setting. To address this problem, we propose a computation-efficient search strategy to identify this optimal multiplier numerically. Specifically, we first formulate a general rate allocation problem where each data unit can be dependently coded at different quantization parameters (QP) using a previous unit as predictor, or left uncoded at the encoder and subsequently interpolated at the decoder using neighboring coded units. After converting the original rate constrained problem to the unconstrained Lagrangian counterpart, we design an efficient dynamic programming (DP) algorithm that finds the optimal Lagrangian solution for a fixed multiplier. Finally, within the DP framework, we iteratively compute neighboring singular multiplier values, each resulting in multiple simultaneously optimal Lagrangian solutions, to drive the rates of the computed Lagrangian solutions towards the bit budget. We terminate when a singular multiplier value results in two Lagrangian solutions with rates below and above the bit budget. In extensive monoview and multiview video coding experiments, we show that our DP algorithm and selection of optimal multipliers on average outperform comparable rate control solutions used in video compression standards such as HEVC that do not skip frames in Y-PSNR.

preprint2016arXiv

Random Walk Graph Laplacian based Smoothness Prior for Soft Decoding of JPEG Images

Given the prevalence of JPEG compressed images, optimizing image reconstruction from the compressed format remains an important problem. Instead of simply reconstructing a pixel block from the centers of indexed DCT coefficient quantization bins (hard decoding), soft decoding reconstructs a block by selecting appropriate coefficient values within the indexed bins with the help of signal priors. The challenge thus lies in how to define suitable priors and apply them effectively. In this paper, we combine three image priors---Laplacian prior for DCT coefficients, sparsity prior and graph-signal smoothness prior for image patches---to construct an efficient JPEG soft decoding algorithm. Specifically, we first use the Laplacian prior to compute a minimum mean square error (MMSE) initial solution for each code block. Next, we show that while the sparsity prior can reduce block artifacts, limiting the size of the over-complete dictionary (to lower computation) would lead to poor recovery of high DCT frequencies. To alleviate this problem, we design a new graph-signal smoothness prior (desired signal has mainly low graph frequencies) based on the left eigenvectors of the random walk graph Laplacian matrix (LERaG). Compared to previous graph-signal smoothness priors, LERaG has desirable image filtering properties with low computation overhead. We demonstrate how LERaG can facilitate recovery of high DCT frequencies of a piecewise smooth (PWS) signal via an interpretation of low graph frequency components as relaxed solutions to normalized cut in spectral clustering. Finally, we construct a soft decoding algorithm using the three signal priors with appropriate prior weights. Experimental results show that our proposal outperforms state-of-the-art soft decoding algorithms in both objective and subjective evaluations noticeably.

preprint2015arXiv

In-Network View Synthesis for Interactive Multiview Video Systems

To enable Interactive multiview video systems with a minimum view-switching delay, multiple camera views are sent to the users, which are used as reference images to synthesize additional virtual views via depth-image-based rendering. In practice, bandwidth constraints may however restrict the number of reference views sent to clients per time unit, which may in turn limit the quality of the synthesized viewpoints. We argue that the reference view selection should ideally be performed close to the users, and we study the problem of in-network reference view synthesis such that the navigation quality is maximized at the clients. We consider a distributed cloud network architecture where data stored in a main cloud is delivered to end users with the help of cloudlets, i.e., resource-rich proxies close to the users. In order to satisfy last-hop bandwidth constraints from the cloudlet to the users, a cloudlet re-samples viewpoints of the 3D scene into a discrete set of views (combination of received camera views and virtual views synthesized) to be used as reference for the synthesis of additional virtual views at the client. This in-network synthesis leads to better viewpoint sampling given a bandwidth constraint compared to simple selection of camera views, but it may however carry a distortion penalty in the cloudlet-synthesized reference views. We therefore cast a new reference view selection problem where the best subset of views is defined as the one minimizing the distortion over a view navigation window defined by the user under some transmission bandwidth constraints. We show that the view selection problem is NP-hard, and propose an effective polynomial time algorithm using dynamic programming to solve the optimization problem. Simulation results finally confirm the performance gain offered by virtual view synthesis in the network.

preprint2015arXiv

Merge Frame Design for Video Stream Switching using Piecewise Constant Functions

The ability to efficiently switch from one pre-encoded video stream to another (e.g., for bitrate adaptation or view switching) is important for many interactive streaming applications. Recently, stream-switching mechanisms based on distributed source coding (DSC) have been proposed. In order to reduce the overall transmission rate, these approaches provide a "merge" mechanism, where information is sent to the decoder such that the exact same frame can be reconstructed given that any one of a known set of side information (SI) frames is available at the decoder (e.g., each SI frame may correspond to a different stream from which we are switching). However, the use of bit-plane coding and channel coding in many DSC approaches leads to complex coding and decoding. In this paper, we propose an alternative approach for merging multiple SI frames, using a piecewise constant (PWC) function as the merge operator. In our approach, for each block to be reconstructed, a series of parameters of these PWC merge functions are transmitted in order to guarantee identical reconstruction given the known side information blocks. We consider two different scenarios. In the first case, a target frame is first given, and then merge parameters are chosen so that this frame can be reconstructed exactly at the decoder. In contrast, in the second scenario, the reconstructed frame and merge parameters are jointly optimized to meet a rate-distortion criteria. Experiments show that for both scenarios, our proposed merge techniques can outperform both a recent approach based on DSC and the SP-frame approach in H.264, in terms of compression efficiency and decoder complexity.

preprint2013arXiv

Navigation domain representation for interactive multiview imaging

Enabling users to interactively navigate through different viewpoints of a static scene is a new interesting functionality in 3D streaming systems. While it opens exciting perspectives towards rich multimedia applications, it requires the design of novel representations and coding techniques in order to solve the new challenges imposed by interactive navigation. Interactivity clearly brings new design constraints: the encoder is unaware of the exact decoding process, while the decoder has to reconstruct information from incomplete subsets of data since the server can generally not transmit images for all possible viewpoints due to resource constrains. In this paper, we propose a novel multiview data representation that permits to satisfy bandwidth and storage constraints in an interactive multiview streaming system. In particular, we partition the multiview navigation domain into segments, each of which is described by a reference image and some auxiliary information. The auxiliary information enables the client to recreate any viewpoint in the navigation segment via view synthesis. The decoder is then able to navigate freely in the segment without further data request to the server; it requests additional data only when it moves to a different segment. We discuss the benefits of this novel representation in interactive navigation systems and further propose a method to optimize the partitioning of the navigation domain into independent segments, under bandwidth and storage constraints. Experimental results confirm the potential of the proposed representation; namely, our system leads to similar compression performance as classical inter-view coding, while it provides the high level of flexibility that is required for interactive streaming. Hence, our new framework represents a promising solution for 3D data representation in novel interactive multimedia services.

preprint2012arXiv

Collaborative P2P Streaming of Interactive Live Free Viewpoint Video

We study an interactive live streaming scenario where multiple peers pull streams of the same free viewpoint video that are synchronized in time but not necessarily in view. In free viewpoint video, each user can periodically select a virtual view between two anchor camera views for display. The virtual view is synthesized using texture and depth videos of the anchor views via depth-image-based rendering (DIBR). In general, the distortion of the virtual view increases with the distance to the anchor views, and hence it is beneficial for a peer to select the closest anchor views for synthesis. On the other hand, if peers interested in different virtual views are willing to tolerate larger distortion in using more distant anchor views, they can collectively share the access cost of common anchor views. Given anchor view access cost and synthesized distortion of virtual views between anchor views, we study the optimization of anchor view allocation for collaborative peers. We first show that, if the network reconfiguration costs due to view-switching are negligible, the problem can be optimally solved in polynomial time using dynamic programming. We then consider the case of non-negligible reconfiguration costs (e.g., large or frequent view-switching leading to anchor-view changes). In this case, the view allocation problem becomes NP-hard. We thus present a locally optimal and centralized allocation algorithm inspired by Lloyd's algorithm in non-uniform scalar quantization. We also propose a distributed algorithm with guaranteed convergence where each peer group independently make merge-and-split decisions with a well-defined fairness criteria. The results show that depending on the problem settings, our proposed algorithms achieve respective optimal and close-to-optimal performance in terms of total cost, and outperform a P2P scheme without collaborative anchor selection.

Gene Cheung

What is connected

Connect this record

See the researcher in context

Building this map preview

25 published item(s)

Sparse Graph Learning from Sparse Data via Fiedler Number Maximization

Signal Processing in the Retina: Interpretable Graph Classifier to Predict Ganglion Cell Responses

Efficient Signed Graph Sampling via Balancing & Gershgorin Disc Perfect Alignment

Fast Computation of Generalized Eigenvectors for Manifold Graph Embedding

Hybrid Model-based / Data-driven Graph Transform for Image Coding

Landmarking for Navigational Streaming of Stored High-Dimensional Media

Pre-demosaic Graph-based Light Field Image Compression

Projection-free Graph-based Classifier Learning using Gershgorin Disc Perfect Alignment

Unsupervised Graph Spectral Feature Denoising for Crop Yield Prediction

Learning Sparse Graph Laplacian with K Eigenvector Prior via Iterative GLASSO and Projection

3D Point Cloud Enhancement using Graph-Modelled Multiview Depth Measurements

Fast Graph Sampling Set Selection Using Gershgorin Disc Alignment

Feature Graph Learning for 3D Point Cloud Denoising

Graph Metric Learning via Gershgorin Disc Alignment

Graph Neural Net using Analytical Graph Filters and Topology Optimization for Image Denoising

Joint Demosaicking / Rectification of Fisheye Camera Images using Multi-color Graph Laplacian Regularization

Graph Sampling for Matrix Completion Using Recurrent Gershgorin Disc Shift

Context Tree based Image Contour Coding using A Geometric Prior

Object Shape Approximation & Contour Adaptive Depth Image Coding for Virtual View Synthesis

Optimal Lagrange Multipliers for Dependent Rate Allocation in Video Coding

Random Walk Graph Laplacian based Smoothness Prior for Soft Decoding of JPEG Images

In-Network View Synthesis for Interactive Multiview Video Systems

Merge Frame Design for Video Stream Switching using Piecewise Constant Functions

Navigation domain representation for interactive multiview imaging

Collaborative P2P Streaming of Interactive Live Free Viewpoint Video