Protein sequence design with deep generative models.
Zachary Wu, Kadina E Johnston, Frances H Arnold, Kevin K Yang
Author Information
Zachary Wu: Division of Chemistry and Chemical Engineering, California Institute of Technology, 1200 E California Blvd, Pasadena, 91125, CA, USA.
Kadina E Johnston: Division of Biology and Biological Engineering, California Institute of Technology, 1200 E California Blvd, Pasadena, 91125, CA, USA.
Frances H Arnold: Division of Chemistry and Chemical Engineering, California Institute of Technology, 1200 E California Blvd, Pasadena, 91125, CA, USA; Division of Biology and Biological Engineering, California Institute of Technology, 1200 E California Blvd, Pasadena, 91125, CA, USA.
Kevin K Yang: Microsoft Research New England, 1 Memorial Drive, Cambridge, 02142, MA, USA. Electronic address: yang.kevin@microsoft.com.
Protein engineering seeks to identify protein sequences with optimized properties. When guided by machine learning, protein sequence generation methods can draw on prior knowledge and experimental efforts to improve this process. In this review, we highlight recent applications of machine learning to generate protein sequences, focusing on the emerging field of deep generative methods.