Source author record

Ali Madani

Ali Madani appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Biomolecules cond-mat.mtrl-sci Quantitative Methods

Catalog footprint

What is connected

3works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

ProGen2: Exploring the Boundaries of Protein Language Models

Attention-based models trained on protein sequences have demonstrated incredible success at classification and generation tasks relevant for artificial intelligence-driven protein design. However, we lack a sufficient understanding of how very large-scale models and data play a role in effective protein model development. We introduce a suite of protein language models, named ProGen2, that are scaled up to 6.4B parameters and trained on different sequence datasets drawn from over a billion proteins from genomic, metagenomic, and immune repertoire databases. ProGen2 models show state-of-the-art performance in capturing the distribution of observed evolutionary sequences, generating novel viable sequences, and predicting protein fitness without additional finetuning. As large model sizes and raw numbers of protein sequences continue to become more widely accessible, our results suggest that a growing emphasis needs to be placed on the data distribution provided to a protein sequence model. We release the ProGen2 models and code at https://github.com/salesforce/progen.

preprint2020arXiv

ProGen: Language Modeling for Protein Generation

Generative modeling for protein engineering is key to solving fundamental problems in synthetic biology, medicine, and material science. We pose protein engineering as an unsupervised sequence generation problem in order to leverage the exponentially growing set of proteins that lack costly, structural annotations. We train a 1.2B-parameter language model, ProGen, on ~280M protein sequences conditioned on taxonomic and keyword tags such as molecular function and cellular component. This provides ProGen with an unprecedented range of evolutionary sequence diversity and allows it to generate with fine-grained control as demonstrated by metrics based on primary sequence similarity, secondary structure accuracy, and conformational energy.

preprint2013arXiv

Bottom-up Graphene Nanoribbon Field-Effect Transistors

Recently developed processes have enabled bottom-up chemical synthesis of graphene nanoribbons (GNRs) with precise atomic structure. These GNRs are ideal candidates for electronic devices because of their uniformity, extremely narrow width below 1 nm, atomically perfect edge structure, and desirable electronic properties. Here, we demonstrate nanoscale chemically synthesized GNR field-effect transistors, made possible by development of a new layer transfer process. We observe strong environmental sensitivity and unique transport behavior characteristic of sub-1nm width GNRs.