Biotech labs are using AI inspired by DALL-E to invent new drugs
The explosion in text to image AI models such as OpenAI’s DALL–E 2 –programs that can generate images of almost any subject you request–has caused ripples in the creative industries. These programs can produce amazing images on demand and have been used by many industries, including filmmaking and fashion.
The same technology behind these programs are also making waves in biotech labs. They are using this type generative AI, also known as a diffusion modeling, to create designs for new types protein that have never been seen in nature.
Today, two labs separately announced programs that use diffusion models to generate designs for novel proteins with more precision than ever before. Generate Biomedicines, a Boston-based startup has revealed a program named Chroma ,, which the company calls the “DALLE 2 of biology
“. At the same time, a team from the University of Washington, led by biologist David Baker, has created a comparable program called RoseTTAFold Diffusion .. Baker and his colleagues demonstrate that their model can produce precise designs for new proteins that can be brought to life in a lab. Brian Trippe, one co-developer of RoseTTAFold, says that they are generating proteins that have no similarity to any existing proteins.
These protein generators can be programmed to create designs for proteins with particular properties such as size, shape, or function. This allows for the creation of new proteins that can do specific jobs. Researchers believe this will eventually lead researchers to develop new, more effective drugs. Gevorg Grigoryan is the CEO of Generate Biomedicines. “We can discover in minutes something that took evolution millions and years to do,” he says.
” “What is remarkable about this work? The generation of proteins according to wanted constraints,” says Ava Amini a biophysicist at Microsoft Research, Cambridge, Massachusetts.
Proteins are the fundamental building blocks of living systems. Animals use proteins to digest food, contract muscles, sense light, and drive the immune system. Proteins play an important role in the development of illness.
Proteins are therefore prime targets for drugs. Many of the newest drugs today are protein-based. Grigoryan says that nature uses proteins for almost everything. “The promise of therapeutic interventions is really immense .” But drug designers currently have to draw from a list that includes natural proteins. The goal of protein production is to expand that list with nearly unlimited computer-designed proteins. Computational techniques are not new for designing proteins. However, previous methods were slow and not very good at designing large proteins or complexes of proteins–molecular machines that are made up multiple proteins linked together. These proteins are crucial in the treatment of diseases.
The two programs announced today are also not the first use of diffusion models for protein generation. Although a few studies from Amini and others in the last few weeks have shown that diffusion models can be a promising technique for protein generation, these were only proof-of-concept prototypes. RoseTTAFold Diffusion and Chroma are the first programs to produce precise designs for a wide range of proteins. They build on this work.
Namrata Anand, who co-developed one of the first diffusion models for protein generation in May 2022, thinks the big significance of Chroma and RoseTTAFold Diffusion is that they have taken the technique and supersized it, training on more data and more computers. She says that it is possible to say that this is more DALL-E due to the way they have scaled it up.
Diffusion networks are neural networks that are trained to remove “noise,” which is random perturbations to data. A diffusion model can transform a random collection of pixels into a recognized image. In Chroma, noise can be added by unravelling the amino acid chains from which a protein is formed. Chroma takes a random collection of these chains and attempts to combine them to make a protein. Chroma can create novel proteins with specific properties if it is guided by certain constraints.
Baker uses a different approach to the same end result, but it is still similar. It starts with a more complicated structure. Another key difference is that RoseTTAFold Diffusion uses information about how the pieces of a protein fit together provided by a separate neural network trained to predict protein structure (as DeepMind’s AlphaFold does). This helps to guide the overall generative process.
Generate Biomedicines, and Baker’s team show impressive results. They can generate proteins with multiple degrees symmetry, including hexagonal, triangular, and circular proteins. To illustrate the versatility of their program, Generate Biomedicines generated proteins shaped like the 26 letters of the Latin alphabet and the numerals 0 to 10. Both teams can also create pieces of proteins that match existing structures.
Most of these demonstrated structures would serve no purpose in practice. However, a protein’s function depends on its structure so it is important to be able to create different structures on request.
Creating strange designs on a computer one thing. However, the ultimate goal is to make these designs into real proteins. Generate Biomedicines ran an AI program to verify that Chroma could produce designs that could be made. It took sequences from some of Chroma’s designs, which are the amino acid strings that make up the proteins, and ran them through Generate Biomedicines. They found that 55% of them would be predicted to fold into the structure generated by Chroma, which suggests that these are designs for viable proteins.
Baker’s team ran a similar test. Baker and his colleagues went further than Generate Biomedicines when evaluating their model. They have created some of RoseTTAFold Diffusion designs in their laboratory. (Generate Biomedicines states that it is also performing lab tests, but is not yet ready for sharing the results. Trippe says, “This is more that just proof of concept.” “We’re actually using this to make really great proteins.”
For Baker, the headline result is the generation of a new protein that attaches to the parathyroid hormone, which controls calcium levels in the blood. He says, “We basically gave it the hormone and told it to make a protein which binds to it.” They found that the new protein attached to the hormone was more stable than any other protein that could be generated by other computational methods. It also held onto the hormone better than any existing drugs. Baker says, “It created this protein design from thin air.”
Grigoryan recognizes that inventing new proteins can only be the first step. He says that we are a drug company. “What matters most is whether we can make medicines which work.” Protein-based drugs must be manufactured in large quantities, then tested in the laboratory and finally in humans. This can take many years. He believes that his company and others will find AI ways to speed up these steps. “Science progress is slow and steady,” Baker says. “But right now we’re in the middle of what can only be called a technological revolution.”
I’m a journalist who specializes in investigative reporting and writing. I have written for the New York Times and other publications.