Source author record

Haijian Zhang

Haijian Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT eess.AS eess.SP Multimedia

Catalog footprint

What is connected

7works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

A Time-Frequency Perspective on Audio Watermarking

Existing audio watermarking methods usually treat the host audio signals of a function of time or frequency individually, while considering them in the joint time-frequency (TF) domain has received less attention. This paper proposes an audio watermarking framework from the perspective of TF analysis. The proposed framework treats the host audio signal in the 2-dimensional (2D) TF plane, and selects a series of patches within the 2D TF image. These patches correspond to the TF clusters with minimum averaged energy, and are used to form the feature vectors for watermark embedding. Classical spread spectrum embedding schemes are incorporated in the framework. The feature patches that carry the watermarks only occupy a few TF regions of the host audio signal, thus leading to improved imperceptibility property. In addition, since the feature patches contain a neighborhood area of TF representation of audio samples, the correlations among the samples within a single patch could be exploited for improved robustness against a series of processing attacks. Extensive experiments are carried out to illustrate the effectiveness of the proposed system, as compared to its counterpart systems. The aim of this work is to shed some light on the notion of audio watermarking in TF feature domain, which may potentially lead us to more robust watermarking solutions against malicious attacks.

preprint2020arXiv

Improved Source Counting and Separation for Monaural Mixture

Single-channel speech separation in time domain and frequency domain has been widely studied for voice-driven applications over the past few years. Most of previous works assume known number of speakers in advance, however, which is not easily accessible through monaural mixture in practice. In this paper, we propose a novel model of single-channel multi-speaker separation by jointly learning the time-frequency feature and the unknown number of speakers. Specifically, our model integrates the time-domain convolution encoded feature map and the frequency-domain spectrogram by attention mechanism, and the integrated features are projected into high-dimensional embedding vectors which are then clustered with deep attractor network to modify the encoded feature. Meanwhile, the number of speakers is counted by computing the Gerschgorin disks of the embedding vectors which are orthogonal for different speakers. Finally, the modified encoded feature is inverted to the sound waveform using a linear decoder. Experimental evaluation on the GRID dataset shows that the proposed method with a single model can accurately estimate the number of speakers with 96.7 % probability of success, while achieving the state-of-the-art separation results on multi-speaker mixtures in terms of scale-invariant signal-to-noise ratio improvement (SI-SNRi) and signal-to-distortion ratio improvement (SDRi).

preprint2020arXiv

Robust Time-Frequency Reconstruction by Learning Structured Sparsity

Time-frequency distributions (TFDs) play a vital role in providing descriptive analysis of non-stationary signals involved in realistic scenarios. It is well known that low time-frequency (TF) resolution and the emergency of cross-terms (CTs) are two main issues, which make it difficult to analyze and interpret practical signals using TFDs. In order to address these issues, we propose the U-Net aided iterative shrinkage-thresholding algorithm (U-ISTA) for reconstructing a near-ideal TFD by exploiting structured sparsity in signal TF domain. Specifically, the signal ambiguity function is firstly compressed, followed by unfolding the ISTA as a recurrent neural network. To consider continuously distributed characteristics of signals, a structured sparsity constraint is incorporated into the unfolded ISTA by regarding the U-Net as an adaptive threshold block, in which structure-aware thresholds are learned from enormous training data to exploit the underlying dependencies among neighboring TF coefficients. The proposed U-ISTA model is trained by both non-overlapped and overlapped synthetic signals including closely and far located non-stationary components. Experimental results demonstrate that the robust U-ISTA achieves superior performance compared with state-of-the-art algorithms, and gains a high TF resolution with CTs greatly eliminated even in low signal-to-noise ratio (SNR) environments.

preprint2016arXiv

Frequency Estimation of Multiple Sinusoids with Sub-Nyquist Sampling Sequences

In some applications of frequency estimation, the frequencies of multiple sinusoids are required to be estimated from sub-Nyquist sampling sequences. In this paper, we propose a novel method based on subspace techniques to estimate the frequencies by using under-sampled samples. We analyze the impact of under-sampling and demonstrate that three sub-Nyquist sequences are general enough to estimate the frequencies under some condition. The frequencies estimated from one sequence are unfolded in frequency domain, and then the other two sequences are used to pick the correct frequencies from all possible frequencies. Simulations illustrate the validity of the theory. Numerical results show that this method is feasible and accurate at quite low sampling rates.

preprint2016arXiv

Line Spectral Estimation Based on Compressed Sensing with Deterministic Sub-Nyquist Sampling

As an alternative to the traditional sampling theory, compressed sensing allows acquiring much smaller amount of data, still estimating the spectra of frequency-sparse signals accurately. However, compressed sensing usually requires random sampling in data acquisition, which is difficult to implement in hardware. In this paper, we propose a deterministic and simple sampling scheme, that is, sampling at three sub-Nyquist rates which have coprime undersampled ratios. This sampling method turns out to be valid through numerical experiments. A complex-valued multitask algorithm based on variational Bayesian inference is proposed to estimate the spectra of frequency-sparse signals after sampling. Simulations show that this method is feasible and robust at quite low sampling rates.

preprint2015arXiv

A Class of Deterministic Sensing Matrices and Their Application in Harmonic Detection

In this paper, a class of deterministic sensing matrices are constructed by selecting rows from Fourier matrices. These matrices have better performance in sparse recovery than random partial Fourier matrices. The coherence and restricted isometry property of these matrices are given to evaluate their capacity as compressive sensing matrices. In general, compressed sensing requires random sampling in data acquisition, which is difficult to implement in hardware. By using these sensing matrices in harmonic detection, a deterministic sampling method is provided. The frequencies and amplitudes of the harmonic components are estimated from under-sampled data. The simulations show that this under-sampled method is feasible and valid in noisy environments.

preprint2015arXiv

Joint Frequency Estimation with Two Sub-Nyquist Sampling Sequences

In many applications of frequency estimation, the frequencies of the signals are so high that the data sampled at Nyquist rate are hard to acquire due to hardware limitation. In this paper, we propose a novel method based on subspace techniques to estimate the frequencies by using two sub-Nyquist sample sequences, provided that the two under-sampled ratios are relatively prime integers. We analyze the impact of under-sampling and expand the estimated frequencies which suffer from aliasing. Through jointing the results estimated from these two sequences, the frequencies approximate to the frequency components really contained in the signals are screened. The method requires a small quantity of hardware and calculation. Numerical results show that this method is valid and accurate at quite low sampling rates.

Haijian Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

A Time-Frequency Perspective on Audio Watermarking

Improved Source Counting and Separation for Monaural Mixture

Robust Time-Frequency Reconstruction by Learning Structured Sparsity

Frequency Estimation of Multiple Sinusoids with Sub-Nyquist Sampling Sequences

Line Spectral Estimation Based on Compressed Sensing with Deterministic Sub-Nyquist Sampling

A Class of Deterministic Sensing Matrices and Their Application in Harmonic Detection

Joint Frequency Estimation with Two Sub-Nyquist Sampling Sequences