Researcher profile

Bin Ma

Bin Ma contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
16works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

16 published item(s)

preprint2026arXiv

A Highly Magnetic Ultra Massive White Dwarf with a 23-minute Rotation Period

We present a physical characterization of TMTS J00063798+3104160 (J0006), a rapidly rotating,ultra-massive white dwarf (WD) identified in high-cadence light curves from the Tsinghua University-Ma Huateng Telescope for Survey (TMTS). A coherent 23-minute periodicity is detected in TMTS, TESS, and ZTF photometry. A time series of low-resolution spectra with the Keck-I 10 m telescope reveals broad, shallow hydrogen absorption features indicative of an extreme magnetic field and shows no evidence for radial-velocity variations. Atmospheric modeling yields a magnetic field strength of $\sim$ 250 MG, while Gaia astrometry and photometry imply a mass of 1.06 $\pm$ 0.01 M$_{\odot}$. A significant infrared excess is detected in the WISE W1 band and is well fitted by a 550 K blackbody, likely arising from residual material of a merger. We interpret the 23-minute photometric modulation as the rotation period of an isolated, massive WD formed likely through the merger of a double WD binary. With one of the shortest rotation periods known among candidate merger remnants and with constraints from a deep Einstein Probe X-ray nondetection, J0006 provides a rare and important observational window into the poorly explored intermediate stages of post-merger evolution.

preprint2022arXiv

Amino Acid Classification in 2D NMR Spectra via Acoustic Signal Embeddings

Nuclear Magnetic Resonance (NMR) is used in structural biology to experimentally determine the structure of proteins, which is used in many areas of biology and is an important part of drug development. Unfortunately, NMR data can cost thousands of dollars per sample to collect and it can take a specialist weeks to assign the observed resonances to specific chemical groups. There has thus been growing interest in the NMR community to use deep learning to automate NMR data annotation. Due to similarities between NMR and audio data, we propose that methods used in acoustic signal processing can be applied to NMR as well. Using a simulated amino acid dataset, we show that by swapping out filter banks with a trainable convolutional encoder, acoustic signal embeddings from speaker verification models can be used for amino acid classification in 2D NMR spectra by treating each amino acid as a unique speaker. On an NMR dataset comparable in size with of 46 hours of audio, we achieve a classification performance of 97.7% on a 20-class problem. We also achieve a 23% relative improvement by using an acoustic embedding model compared to an existing NMR-based model.

preprint2022arXiv

End-to-End Complex-Valued Multidilated Convolutional Neural Network for Joint Acoustic Echo Cancellation and Noise Suppression

Echo and noise suppression is an integral part of a full-duplex communication system. Many recent acoustic echo cancellation (AEC) systems rely on a separate adaptive filtering module for linear echo suppression and a neural module for residual echo suppression. However, not only do adaptive filtering modules require convergence and remain susceptible to changes in acoustic environments, but this two-stage framework also often introduces unnecessary delays to the AEC system when neural modules are already capable of both linear and nonlinear echo suppression. In this paper, we exploit the offset-compensating ability of complex time-frequency masks and propose an end-to-end complex-valued neural network architecture. The building block of the proposed model is a pseudocomplex extension based on the densely-connected multidilated DenseNet (D3Net) building block, resulting in a very small network of only 354K parameters. The architecture utilized the multi-resolution nature of the D3Net building blocks to eliminate the need for pooling, allowing the network to extract features using large receptive fields without any loss of output resolution. We also propose a dual-mask technique for joint echo and noise suppression with simultaneous speech enhancement. Evaluation on both synthetic and real test sets demonstrated promising results across multiple energy-based metrics and perceptual proxies.

preprint2022arXiv

I2CR: Improving Noise Robustness on Keyword Spotting Using Inter-Intra Contrastive Regularization

Noise robustness in keyword spotting remains a challenge as many models fail to overcome the heavy influence of noises, causing the deterioration of the quality of feature embeddings. We proposed a contrastive regularization method called Inter-Intra Contrastive Regularization (I2CR) to improve the feature representations by guiding the model to learn the fundamental speech information specific to the cluster. This involves maximizing the similarity across Intra and Inter samples of the same class. As a result, it pulls the instances closer to more generalized representations that form more prominent clusters and reduces the adverse impact of noises. We show that our method provides consistent improvements in accuracy over different backbone model architectures under different noise environments. We also demonstrate that our proposed framework has improved the accuracy of unseen out-of-domain noises and unseen variant noise SNRs. This indicates the significance of our work with the overall refinement in noise robustness.

preprint2022arXiv

Learning Disentangled Representations for Counterfactual Regression via Mutual Information Minimization

Learning individual-level treatment effect is a fundamental problem in causal inference and has received increasing attention in many areas, especially in the user growth area which concerns many internet companies. Recently, disentangled representation learning methods that decompose covariates into three latent factors, including instrumental, confounding and adjustment factors, have witnessed great success in treatment effect estimation. However, it remains an open problem how to learn the underlying disentangled factors precisely. Specifically, previous methods fail to obtain independent disentangled factors, which is a necessary condition for identifying treatment effect. In this paper, we propose Disentangled Representations for Counterfactual Regression via Mutual Information Minimization (MIM-DRCFR), which uses a multi-task learning framework to share information when learning the latent factors and incorporates MI minimization learning criteria to ensure the independence of these factors. Extensive experiments including public benchmarks and real-world industrial user growth datasets demonstrate that our method performs much better than state-of-the-art methods.

preprint2022arXiv

M2MeT: The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge

Recent development of speech processing, such as speech recognition, speaker diarization, etc., has inspired numerous applications of speech technologies. The meeting scenario is one of the most valuable and, at the same time, most challenging scenarios for the deployment of speech technologies. Specifically, two typical tasks, speaker diarization and multi-speaker automatic speech recognition have attracted much attention recently. However, the lack of large public meeting data has been a major obstacle for the advancement of the field. Therefore, we make available the AliMeeting corpus, which consists of 120 hours of recorded Mandarin meeting data, including far-field data collected by 8-channel microphone array as well as near-field data collected by headset microphone. Each meeting session is composed of 2-4 speakers with different speaker overlap ratio, recorded in rooms with different size. Along with the dataset, we launch the ICASSP 2022 Multi-channel Multi-party Meeting Transcription Challenge (M2MeT) with two tracks, namely speaker diarization and multi-speaker ASR, aiming to provide a common testbed for meeting rich transcription and promote reproducible research in this field. In this paper we provide a detailed introduction of the AliMeeting dateset, challenge rules, evaluation methods and baseline systems.

preprint2022arXiv

Summary On The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge

The ICASSP 2022 Multi-channel Multi-party Meeting Transcription Grand Challenge (M2MeT) focuses on one of the most valuable and the most challenging scenarios of speech technologies. The M2MeT challenge has particularly set up two tracks, speaker diarization (track 1) and multi-speaker automatic speech recognition (ASR) (track 2). Along with the challenge, we released 120 hours of real-recorded Mandarin meeting speech data with manual annotation, including far-field data collected by 8-channel microphone array as well as near-field data collected by each participants' headset microphone. We briefly describe the released dataset, track setups, baselines and summarize the challenge results and major techniques used in the submissions.

preprint2021arXiv

Cloud Cover and Aurora Contamination at Dome A in 2017 from KLCAM

Dome A in Antarctica has many characteristics that make it an excellent site for astronomical observations, from the optical to the terahertz. Quantitative site testing is still needed to confirm the site's properties. In this paper, we present a statistical analysis of cloud cover and aurora contamination from the Kunlun Cloud and Aurora Monitor (KLCAM). KLCAM is an automatic, unattended all-sky camera aiming for long-term monitoring of the usable observing time and optical sky background at Dome~A. It was installed at Dome~A in January 2017, worked through the austral winter, and collected over 47,000 images over 490 days. A semi-quantitative visual data analysis of cloud cover and auroral contamination was carried out by five individuals. The analysis shows that the night sky was free of cloud for 83 per cent of the time, which ranks Dome~A highly in a comparison with other observatory sites. Although aurorae were detected somewhere on an image for nearly 45 per cent of the time, the strongest auroral emission lines can be filtered out with customized filters.

preprint2021arXiv

Sampling Subgraph Network with Application to Graph Classification

Graphs are naturally used to describe the structures of various real-world systems in biology, society, computer science etc., where subgraphs or motifs as basic blocks play an important role in function expression and information processing. However, existing research focuses on the basic statistics of certain motifs, largely ignoring the connection patterns among them. Recently, a subgraph network (SGN) model is proposed to study the potential structure among motifs, and it was found that the integration of SGN can enhance a series of graph classification methods. However, SGN model lacks diversity and is of quite high time complexity, making it difficult to widely apply in practice. In this paper, we introduce sampling strategies into SGN, and design a novel sampling subgraph network model, which is scale-controllable and of higher diversity. We also present a hierarchical feature fusion framework to integrate the structural features of diverse sampling SGNs, so as to improve the performance of graph classification. Extensive experiments demonstrate that, by comparing with the SGN model, our new model indeed has much lower time complexity (reduced by two orders of magnitude) and can better enhance a series of graph classification methods (doubling the performance enhancement).

preprint2021arXiv

Towards Natural and Controllable Cross-Lingual Voice Conversion Based on Neural TTS Model and Phonetic Posteriorgram

Cross-lingual voice conversion (VC) is an important and challenging problem due to significant mismatches of the phonetic set and the speech prosody of different languages. In this paper, we build upon the neural text-to-speech (TTS) model, i.e., FastSpeech, and LPCNet neural vocoder to design a new cross-lingual VC framework named FastSpeech-VC. We address the mismatches of the phonetic set and the speech prosody by applying Phonetic PosteriorGrams (PPGs), which have been proved to bridge across speaker and language boundaries. Moreover, we add normalized logarithm-scale fundamental frequency (Log-F0) to further compensate for the prosodic mismatches and significantly improve naturalness. Our experiments on English and Mandarin languages demonstrate that with only mono-lingual corpus, the proposed FastSpeech-VC can achieve high quality converted speech with mean opinion score (MOS) close to the professional records while maintaining good speaker similarity. Compared to the baselines using Tacotron2 and Transformer TTS models, the FastSpeech-VC can achieve controllable converted speech rate and much faster inference speed. More importantly, the FastSpeech-VC can easily be adapted to a speaker with limited training utterances.

preprint2020arXiv

AstroCatR: a Mechanism and Tool for Efficient Time Series Reconstruction of Large-Scale Astronomical Catalogues

Time series data of celestial objects are commonly used to study valuable and unexpected objects such as extrasolar planets and supernova in time domain astronomy. Due to the rapid growth of data volume, traditional manual methods are becoming extremely hard and infeasible for continuously analyzing accumulated observation data. To meet such demands, we designed and implemented a special tool named AstroCatR that can efficiently and flexibly reconstruct time series data from large-scale astronomical catalogues. AstroCatR can load original catalogue data from Flexible Image Transport System (FITS) files or databases, match each item to determine which object it belongs to, and finally produce time series datasets. To support the high-performance parallel processing of large-scale datasets, AstroCatR uses the extract-transform-load (ETL) preprocessing module to create sky zone files and balance the workload. The matching module uses the overlapped indexing method and an in-memory reference table to improve accuracy and performance. The output of AstroCatR can be stored in CSV files or be transformed other into formats as needed. Simultaneously, the module-based software architecture ensures the flexibility and scalability of AstroCatR. We evaluated AstroCatR with actual observation data from The three Antarctic Survey Telescopes (AST3). The experiments demonstrate that AstroCatR can efficiently and flexibly reconstruct all time series data by setting relevant parameters and configuration files. Furthermore, the tool is approximately 3X faster than methods using relational database management systems at matching massive catalogues.

preprint2020arXiv

Automation of the AST3 optical sky survey from Dome~A, Antarctica

The 0.5\,m Antarctic Survey Telescopes (AST3) were designed for time-domain optical/infrared astronomy. They are located in Dome~A, Antarctica, where they can take advantage of the continuous dark time during winter. Since the site is unattended in winter, everything for the operation, from observing to data reduction, had to be fully automated. Here, we present a brief overview of the AST3 project and some of its unique characteristics due to its location in Antarctica. We summarise the various components of the survey, including the customized hardware and software, that make complete automation possible.

preprint2020arXiv

Leveraging Text Data Using Hybrid Transformer-LSTM Based End-to-End ASR in Transfer Learning

In this work, we study leveraging extra text data to improve low-resource end-to-end ASR under cross-lingual transfer learning setting. To this end, we extend our prior work [1], and propose a hybrid Transformer-LSTM based architecture. This architecture not only takes advantage of the highly effective encoding capacity of the Transformer network but also benefits from extra text data due to the LSTM-based independent language model network. We conduct experiments on our in-house Malay corpus which contains limited labeled data and a large amount of extra text. Results show that the proposed architecture outperforms the previous LSTM-based architecture [1] by 24.2% relative word error rate (WER) when both are trained using limited labeled data. Starting from this, we obtain further 25.4% relative WER reduction by transfer learning from another resource-rich language. Moreover, we obtain additional 13.6% relative WER reduction by boosting the LSTM decoder of the transferred model with the extra text data. Overall, our best model outperforms the vanilla Transformer ASR by 11.9% relative WER. Last but not least, the proposed hybrid architecture offers much faster inference compared to both LSTM and Transformer architectures.

preprint2020arXiv

Night-time measurements of astronomical seeing at Dome A in Antarctica

Seeing, the angular size of stellar images blurred by atmospheric turbulence, is a critical parameter used to assess the quality of astronomical sites. Median values at the best mid-latitude sites are generally in the range of 0.6--0.8\,arcsec. Sites on the Antarctic plateau are characterized by comparatively-weak turbulence in the free-atmosphere above a strong but thin boundary layer. The median seeing at Dome C is estimated to be 0.23--0.36 arcsec above a boundary layer that has a typical height of 30\,m. At Dome A and F, the only previous seeing measurements were made during daytime. Here we report the first direct measurements of night-time seeing at Dome A, using a Differential Image Motion Monitor. Located at a height of just 8\,m, it recorded seeing as low as 0.13\,arcsec, and provided seeing statistics that are comparable to those for a 20\,m height at Dome C. It indicates that the boundary layer was below 8\,m 31\% of the time. At such times the median seeing was 0.31\,arcsec, consistent with free-atmosphere seeing. The seeing and boundary layer thickness are found to be strongly correlated with the near-surface temperature gradient. The correlation confirms a median thickness of approximately 14\,m for the boundary layer at Dome A, as found from a sonic radar. The thinner boundary layer makes it less challenging to locate a telescope above it, thereby giving greater access to the free-atmosphere.

preprint2020arXiv

Optical turbulence at Ali, China -- Results from the first year of lunar scintillometer observations

The location of an astronomical observatory is a key factor that affects its scientific productivity. The best astronomical sites are generally those found at high altitudes. Several such sites in western China and the Tibetan plateau are presently under development for astronomy. One of these is Ali, which at over 5000 m is one of the highest astronomical sites in the world. In order to further investigate the astronomical potential of Ali, we have installed a lunar scintillometer, for the primary purpose of profiling atmospheric turbulence. This paper describes the instrument and technique, and reports results from the first year of observations. We find that ground layer (GL) turbulence at Ali is remarkably weak and relatively thin. The median seeing, from turbulence in the range 11- 500 m above ground is 0.34 arcsec, with seeing better than 0.26 arcsec occurring 25 per cent of the time. Under median conditions, half of the GL turbulence lies below a height of 62 m. These initial results, and the high altitude and relatively low temperatures, suggest that Ali could prove to be an outstanding site for ground-based astronomy.

preprint2019arXiv

Constrained Output Embeddings for End-to-End Code-Switching Speech Recognition with Only Monolingual Data

The lack of code-switch training data is one of the major concerns in the development of end-to-end code-switching automatic speech recognition (ASR) models. In this work, we propose a method to train an improved end-to-end code-switching ASR using only monolingual data. Our method encourages the distributions of output token embeddings of monolingual languages to be similar, and hence, promotes the ASR model to easily code-switch between languages. Specifically, we propose to use Jensen-Shannon divergence and cosine distance based constraints. The former will enforce output embeddings of monolingual languages to possess similar distributions, while the later simply brings the centroids of two distributions to be close to each other. Experimental results demonstrate high effectiveness of the proposed method, yielding up to 4.5% absolute mixed error rate improvement on Mandarin-English code-switching ASR task.