Source author record

Mark D. Robinson

Mark D. Robinson appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Genomics Quantitative Methods Applications Digital Libraries Other Quantitative Biology

Catalog footprint

What is connected

5works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Perceptual and technical barriers in sharing and formatting metadata accompanying omics studies

Metadata, often termed "data about data," is crucial for organizing, understanding, and managing vast omics datasets. It aids in efficient data discovery, integration, and interpretation, enabling users to access, comprehend, and utilize data effectively. Its significance spans the domains of scientific research, facilitating data reproducibility, reusability, and secondary analysis. However, numerous perceptual and technical barriers hinder the sharing of metadata among researchers. These barriers compromise the reliability of research results and hinder integrative meta-analyses of omics studies . This study highlights the key barriers to metadata sharing, including the lack of uniform standards, privacy and legal concerns, limitations in study design, limited incentives, inadequate infrastructure, and the dearth of well-trained personnel for metadata management and reuse. Proposed solutions include emphasizing the promotion of standardization, educational efforts, the role of journals and funding agencies, incentives and rewards, and the improvement of infrastructure. More accurate, reliable, and impactful research outcomes are achievable if the scientific community addresses these barriers, facilitating more accurate, reliable, and impactful research outcomes.

preprint2014arXiv

Robustly detecting differential expression in RNA sequencing data using observation weights

A popular approach for comparing gene expression levels between (replicated) conditions of RNA sequencing data relies on counting reads that map to features of interest. Within such count-based methods, many flexible and advanced statistical approaches now exist and offer the ability to adjust for covariates (e.g., batch effects). Often, these methods include some sort of (sharing of information) across features to improve inferences in small samples. It is important to achieve an appropriate tradeoff between statistical power and protection against outliers. Here, we study the robustness of existing approaches for count-based differential expression analysis and propose a new strategy based on observation weights that can be used within existing frameworks. The results suggest that outliers can have a global effect on differential analyses. We demonstrate the effectiveness of our new approach with real data and simulated data that reflects properties of real datasets (e.g., dispersion-mean trend) and develop an extensible framework for comprehensive testing of current and future methods. In addition, we explore the origin of such outliers, in some cases highlighting additional biological or technical factors within the experiment. Further details can be downloaded from the project website: http://imlspenticton.uzh.ch/robinson_lab/edgeR_robust/

preprint2013arXiv

Agreeing to disagree, some ironies, disappointing scientific practice and a call for better: reply to <<The poor performance of TMM on microRNA-Seq>>

This letter is a response to a Divergent Views article entitled <<The poor performance of TMM on microRNA-Seq>> (Garmire and Subramaniam 2013), which was a response to our Divergent Views article entitled <<miRNA-seq normalization comparisons need improvement>> (Zhou et al. 2013). Using reproducible code examples, we showed that they incorrectly used our normalization method and highlighted additional concerns with their study. Here, I wish to debunk several untrue or misleading statements made by the authors (hereafter referred to as GS) in their response. Unlike GSs, my claims are supported by R code, citations and email correspondences. I finish by making a call for better practice.

preprint2013arXiv

BayMeth: Improved DNA methylation quantification for affinity capture sequencing data using a flexible Bayesian approach

DNA methylation (DNAme) is a critical component of the epigenetic regulatory machinery and aberrations in DNAme patterns occur in many diseases, such as cancer. Mapping and understanding DNAme profiles offers considerable promise for reversing the aberrant states. There are several approaches to analyze DNAme, which vary widely in cost, resolution and coverage. Affinity capture and high-throughput sequencing of methylated DNA strike a good balance between the high cost of whole genome bisulphite sequencing (WGBS) and the low coverage of methylation arrays. However, existing methods cannot adequately differentiate between hypomethylation patterns and low capture efficiency, and do not offer flexibility to integrate copy number variation (CNV). Furthermore, no uncertainty estimates are provided, which may prove useful for combining data from multiple protocols or propagating into downstream analysis. We propose an empirical Bayes framework that uses a fully methylated (i.e. SssI treated) control sample to transform observed read densities into regional methylation estimates. In our model, inefficient capture can be distinguished from low methylation levels by means of larger posterior variances. Furthermore, we can integrate CNV by introducing a multiplicative offset into our Poisson model framework. Notably, our model offers analytic expressions for the mean and variance of the methylation level and thus is fast to compute. Our algorithm outperforms existing approaches in terms of bias, mean-squared error and coverage probabilities as illustrated on multiple reference datasets. Although our method provides advantages even without the SssI-control, considerable improvement is achieved by its incorporation. Our method can be applied to methylated DNA affinity enrichment assays (e.g MBD-seq, MeDIP-seq) and a software implementation is available in the Bioconductor Repitools package.

preprint2013arXiv

Count-based differential expression analysis of RNA sequencing data using R and Bioconductor

RNA sequencing (RNA-seq) has been rapidly adopted for the profiling of transcriptomes in many areas of biology, including studies into gene regulation, development and disease. Of particular interest is the discovery of differentially expressed genes across different conditions (e.g., tissues, perturbations), while optionally adjusting for other systematic factors that affect the data collection process. There are a number of subtle yet critical aspects of these analyses, such as read counting, appropriate treatment of biological variability, quality control checks and appropriate setup of statistical modeling. Several variations have been presented in the literature, and there is a need for guidance on current best practices. This protocol presents a "state-of-the-art" computational and statistical RNA-seq differential expression analysis workflow largely based on the free open-source R language and Bioconductor software and in particular, two widely-used tools DESeq and edgeR. Hands-on time for typical small experiments (e.g., 4-10 samples) can be <1 hour, with computation time <1 day using a standard desktop PC.

Mark D. Robinson

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Perceptual and technical barriers in sharing and formatting metadata accompanying omics studies

Robustly detecting differential expression in RNA sequencing data using observation weights

Agreeing to disagree, some ironies, disappointing scientific practice and a call for better: reply to <<The poor performance of TMM on microRNA-Seq>>

BayMeth: Improved DNA methylation quantification for affinity capture sequencing data using a flexible Bayesian approach

Count-based differential expression analysis of RNA sequencing data using R and Bioconductor