Source author record

Pengcheng Guo

Pengcheng Guo appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Sound eess.AS astro-ph.EP astro-ph.SR Computation and Language astro-ph

Catalog footprint

What is connected

12works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Improving Transformer-based Conversational ASR by Inter-Sentential Attention Mechanism

Transformer-based models have demonstrated their effectiveness in automatic speech recognition (ASR) tasks and even shown superior performance over the conventional hybrid framework. The main idea of Transformers is to capture the long-range global context within an utterance by self-attention layers. However, for scenarios like conversational speech, such utterance-level modeling will neglect contextual dependencies that span across utterances. In this paper, we propose to explicitly model the inter-sentential information in a Transformer based end-to-end architecture for conversational speech recognition. Specifically, for the encoder network, we capture the contexts of previous speech and incorporate such historic information into current input by a context-aware residual attention mechanism. For the decoder, the prediction of current utterance is also conditioned on the historic linguistic information through a conditional decoder framework. We show the effectiveness of our proposed method on several open-source dialogue corpora and the proposed method consistently improved the performance from the utterance-level Transformer-based ASR models.

preprint2022arXiv

Linguistic-Acoustic Similarity Based Accent Shift for Accent Recognition

General accent recognition (AR) models tend to directly extract low-level information from spectrums, which always significantly overfit on speakers or channels. Considering accent can be regarded as a series of shifts relative to native pronunciation, distinguishing accents will be an easier task with accent shift as input. But due to the lack of native utterance as an anchor, estimating the accent shift is difficult. In this paper, we propose linguistic-acoustic similarity based accent shift (LASAS) for AR tasks. For an accent speech utterance, after mapping the corresponding text vector to multiple accent-associated spaces as anchors, its accent shift could be estimated by the similarities between the acoustic embedding and those anchors. Then, we concatenate the accent shift with a dimension-reduced text vector to obtain a linguistic-acoustic bimodal representation. Compared with pure acoustic embedding, the bimodal representation is richer and more clear by taking full advantage of both linguistic and acoustic information, which can effectively improve AR performance. Experiments on Accented English Speech Recognition Challenge (AESRC) dataset show that our method achieves 77.42% accuracy on Test set, obtaining a 6.94% relative improvement over a competitive system in the challenge.

preprint2022arXiv

M2MeT: The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge

Recent development of speech processing, such as speech recognition, speaker diarization, etc., has inspired numerous applications of speech technologies. The meeting scenario is one of the most valuable and, at the same time, most challenging scenarios for the deployment of speech technologies. Specifically, two typical tasks, speaker diarization and multi-speaker automatic speech recognition have attracted much attention recently. However, the lack of large public meeting data has been a major obstacle for the advancement of the field. Therefore, we make available the AliMeeting corpus, which consists of 120 hours of recorded Mandarin meeting data, including far-field data collected by 8-channel microphone array as well as near-field data collected by headset microphone. Each meeting session is composed of 2-4 speakers with different speaker overlap ratio, recorded in rooms with different size. Along with the dataset, we launch the ICASSP 2022 Multi-channel Multi-party Meeting Transcription Challenge (M2MeT) with two tracks, namely speaker diarization and multi-speaker ASR, aiming to provide a common testbed for meeting rich transcription and promote reproducible research in this field. In this paper we provide a detailed introduction of the AliMeeting dateset, challenge rules, evaluation methods and baseline systems.

preprint2022arXiv

Summary On The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge

The ICASSP 2022 Multi-channel Multi-party Meeting Transcription Grand Challenge (M2MeT) focuses on one of the most valuable and the most challenging scenarios of speech technologies. The M2MeT challenge has particularly set up two tracks, speaker diarization (track 1) and multi-speaker automatic speech recognition (ASR) (track 2). Along with the challenge, we released 120 hours of real-recorded Mandarin meeting speech data with manual annotation, including far-field data collected by 8-channel microphone array as well as near-field data collected by each participants' headset microphone. We briefly describe the released dataset, track setups, baselines and summarize the challenge results and major techniques used in the submissions.

preprint2022arXiv

WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition

In this paper, we present WenetSpeech, a multi-domain Mandarin corpus consisting of 10000+ hours high-quality labeled speech, 2400+ hours weakly labeled speech, and about 10000 hours unlabeled speech, with 22400+ hours in total. We collect the data from YouTube and Podcast, which covers a variety of speaking styles, scenarios, domains, topics, and noisy conditions. An optical character recognition (OCR) based method is introduced to generate the audio/text segmentation candidates for the YouTube data on its corresponding video captions, while a high-quality ASR transcription system is used to generate audio/text pair candidates for the Podcast data. Then we propose a novel end-to-end label error detection approach to further validate and filter the candidates. We also provide three manually labelled high-quality test sets along with WenetSpeech for evaluation -- Dev for cross-validation purpose in training, Test_Net, collected from Internet for matched test, and Test\_Meeting, recorded from real meetings for more challenging mismatched test. Baseline systems trained with WenetSpeech are provided for three popular speech recognition toolkits, namely Kaldi, ESPnet, and WeNet, and recognition results on the three test sets are also provided as benchmarks. To the best of our knowledge, WenetSpeech is the current largest open-sourced Mandarin speech corpus with transcriptions, which benefits research on production-level speech recognition.

preprint2020arXiv

Inaudible Adversarial Perturbations for Targeted Attack in Speaker Recognition

Speaker recognition is a popular topic in biometric authentication and many deep learning approaches have achieved extraordinary performances. However, it has been shown in both image and speech applications that deep neural networks are vulnerable to adversarial examples. In this study, we aim to exploit this weakness to perform targeted adversarial attacks against the x-vector based speaker recognition system. We propose to generate inaudible adversarial perturbations achieving targeted white-box attacks to speaker recognition system based on the psychoacoustic principle of frequency masking. Specifically, we constrict the perturbation under the masking threshold of original audio, instead of using a common l_p norm to measure the perturbations. Experiments on Aishell-1 corpus show that our approach yields up to 98.5% attack success rate to arbitrary gender speaker targets, while retaining indistinguishable attribute to listeners. Furthermore, we also achieve an effective speaker attack when applying the proposed approach to a completely irrelevant waveform, such as music.

preprint2020arXiv

Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals

Neural sequence-to-sequence models are well established for applications which can be cast as mapping a single input sequence into a single output sequence. In this work, we focus on one-to-many sequence transduction problems, such as extracting multiple sequential sources from a mixture sequence. We extend the standard sequence-to-sequence model to a conditional multi-sequence model, which explicitly models the relevance between multiple output sequences with the probabilistic chain rule. Based on this extension, our model can conditionally infer output sequences one-by-one by making use of both input and previously-estimated contextual output sequences. This model additionally has a simple and efficient stop criterion for the end of the transduction, making it able to infer the variable number of output sequences. We take speech data as a primary test field to evaluate our methods since the observed speech data is often composed of multiple sources due to the nature of the superposition principle of sound waves. Experiments on several different tasks including speech separation and multi-speaker speech recognition show that our conditional multi-sequence models lead to consistent improvements over the conventional non-conditional models.

preprint2016arXiv

Very Low-Mass Stellar and Substellar Companions to Solar-like Stars From MARVELS VI: A Giant Planet and a Brown Dwarf Candidate in a Close Binary System HD 87646

We report the detections of a giant planet (MARVELS-7b) and a brown dwarf candidate (MARVELS-7c) around the primary star in the close binary system, HD 87646. It is the first close binary system with more than one substellar circum-primary companion discovered to the best of our knowledge. The detection of this giant planet was accomplished using the first multi-object Doppler instrument (KeckET) at the Sloan Digital Sky Survey (SDSS) telescope. Subsequent radial velocity observations using ET at Kitt Peak National Observatory, HRS at HET, the "Classic" spectrograph at the Automatic Spectroscopic Telescope at Fairborn Observatory, and MARVELS from SDSS-III confirmed this giant planet discovery and revealed the existence of a long-period brown dwarf in this binary. HD 87646 is a close binary with a separation of $\sim22$ AU between the two stars, estimated using the Hipparcos catalogue and our newly acquired AO image from PALAO on the 200-inch Hale Telescope at Palomar. The primary star in the binary, HD 87646A, has Teff = 5770$\pm$80K, log(g)=4.1$\pm$0.1 and [Fe/H] = $-0.17\pm0.08$. The derived minimum masses of the two substellar companions of HD 87646A are 12.4$\pm$0.7M$_{\rm Jup}$ and 57.0$\pm3.7$M$_{\rm Jup}$. The periods are 13.481$\pm$0.001 days and 674$\pm$4 days and the measured eccentricities are 0.05$\pm$0.02 and 0.50$\pm$0.02 respectively. Our dynamical simulations show the system is stable if the binary orbit has a large semi-major axis and a low eccentricity, which can be verified with future astrometry observations.

preprint2011arXiv

Eclipsing Binary Science Via the Merging of Transit and Doppler Exoplanet Survey Data - A Case Study With the MARVELS Pilot Project and SuperWASP

Exoplanet transit and Doppler surveys discover many binary stars during their operation that can be used to conduct a variety of ancillary science. Specifically, eclipsing binary stars can be used to study the stellar mass-radius relationship and to test predictions of theoretical stellar evolution models. By cross-referencing 24 binary stars found in the MARVELS Pilot Project with SuperWASP photometry, we find two new eclipsing binaries, TYC 0272-00458-1 and TYC 1422-01328-1, which we use as case studies to develop a general approach to eclipsing binaries in survey data. TYC 0272-00458-1 is a single-lined spectroscopic binary for which we calculate a mass of the secondary and radii for both components using reasonable constraints on the primary mass through several different techniques. For a primary mass of M_1 = 0.92 +/- 0.1 M_solar, we find M_2 = 0.610 +/- 0.036 M_solar, R_1 = 0.932 +/- 0.076 R_solar and R_2 = 0.559 +/- 0.102 R_solar, and find that both stars have masses and radii consistent with model predictions. TYC 1422-01328-1 is a triple-component system for which we can directly measure the masses and radii of the eclipsing pair. We find that the eclipsing pair consists of an evolved primary star (M_1 = 1.163 +/- 0.034 M_solar, R_1 = 2.063 +/- 0.058 R_solar) and a G-type dwarf secondary (M_2 = 0.905 +/- 0.067 M_solar, R_2 = 0.887 +/- 0.037 R_solar). We provide the framework necessary to apply this analysis to much larger datasets.

preprint2010arXiv

Discovery of a Low-Mass Companion to a Metal-Rich F Star with the MARVELS Pilot Project

We report the discovery of a low-mass companion orbiting the metal-rich, main sequence F star TYC 2949-00557-1 during the MARVELS (Multi-object APO Radial Velocity Exoplanet Large-area Survey) Pilot Project. The host star has an effective temperature T_eff = 6135 +/- 40 K, log(g) = 4.4 +/- 0.1 and [Fe/H] = 0.32 +/- 0.01, indicating a mass of M = 1.25 +/- 0.09 M_\odot and R = 1.15 +/- 0.15 R_\odot. The companion has an orbital period of 5.69449 +/- 0.00023 days and straddles the hydrogen burning limit with a minimum mass of 64 M_J, and may thus be an example of the rare class of brown dwarfs orbiting at distances comparable to those of "Hot Jupiters." We present relative photometry that demonstrates the host star is photometrically stable at the few millimagnitude level on time scales of hours to years, and rules out transits for a companion of radius greater than 0.8 R_J at the 95% confidence level. Tidal analysis of the system suggests that the star and companion are likely in a double synchronous state where both rotational and orbital synchronization have been achieved. This is the first low-mass companion detected with a multi-object, dispersed, fixed-delay interferometer.

preprint2010arXiv

MARVELS-1b: A Short-Period, Brown Dwarf Desert Candidate from the SDSS-III MARVELS Planet Search

We present a new short-period brown dwarf candidate around the star TYC 1240-00945-1. This candidate was discovered in the first year of the Multi-object APO Radial Velocity Exoplanets Large-area Survey (MARVELS), which is part of the third phase of the Sloan Digital Sky Survey (SDSS-III), and we designate the brown dwarf as MARVELS-1b. MARVELS uses the technique of dispersed fixed-delay interferometery to simultaneously obtain radial velocity measurements for 60 objects per field using a single, custom-built instrument that is fiber fed from the SDSS 2.5-m telescope. From our 20 radial velocity measurements spread over a ~370 d time baseline, we derive a Keplerian orbital fit with semi-amplitude K=2.533+/-0.025 km/s, period P=5.8953+/-0.0004 d, and eccentricity consistent with circular. Independent follow-up radial velocity data confirm the orbit. Adopting a mass of 1.37+/-0.11 M_Sun for the slightly evolved F9 host star, we infer that the companion has a minimum mass of 28.0+/-1.5 M_Jup, a semimajor axis 0.071+/-0.002 AU assuming an edge-on orbit, and is probably tidally synchronized. We find no evidence for coherent instrinsic variability of the host star at the period of the companion at levels greater than a few millimagnitudes. The companion has an a priori transit probability of ~14%. Although we find no evidence for transits, we cannot definitively rule them out for companion radii ~<1 R_Jup.

preprint2006arXiv

The First Extrasolar Planet Discovered with a New Generation High Throughput Doppler Instrument

We report the detection of the first extrasolar planet, ET-1 (HD 102195b), using the Exoplanet Tracker (ET), a new generation Doppler instrument. The planet orbits HD 102195, a young star with solar metallicity that may be part of the local association. The planet imparts radial velocity variability to the star with a semiamplitude of $63.4\pm2.0$ m s$^{-1}$ and a period of 4.11 days. The planetary minimum mass ($m \sin i$) is $0.488\pm0.015$ $M_J$.

Pengcheng Guo

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

Improving Transformer-based Conversational ASR by Inter-Sentential Attention Mechanism

Linguistic-Acoustic Similarity Based Accent Shift for Accent Recognition

M2MeT: The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge

Summary On The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge

WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition

Inaudible Adversarial Perturbations for Targeted Attack in Speaker Recognition

Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals

Very Low-Mass Stellar and Substellar Companions to Solar-like Stars From MARVELS VI: A Giant Planet and a Brown Dwarf Candidate in a Close Binary System HD 87646

Eclipsing Binary Science Via the Merging of Transit and Doppler Exoplanet Survey Data - A Case Study With the MARVELS Pilot Project and SuperWASP

Discovery of a Low-Mass Companion to a Metal-Rich F Star with the MARVELS Pilot Project

MARVELS-1b: A Short-Period, Brown Dwarf Desert Candidate from the SDSS-III MARVELS Planet Search

The First Extrasolar Planet Discovered with a New Generation High Throughput Doppler Instrument