November 19, 2021
Science Direct

De novo molecular design and generative models

Joshua Meyers, Benedek Fabian, Nathan Brown


Molecular design strategies are integral to therapeutic progress in drug discovery. Computational approaches for de novo molecular design have been developed over the past three decades and, recently, thanks in part to advances in machine learning (ML) and artificial intelligence (AI), the drug discovery field has gained practical experience. Here, we review these learnings and present de novo approaches according to the coarseness of their molecular representation: that is, whether molecular design is modeled on an atom-based, fragment-based, or reaction-based paradigm. Furthermore, we emphasize the value of strong benchmarks, describe the main challenges to using these methods in practice, and provide a viewpoint on further opportunities for exploration and challenges to be tackled in the upcoming years.


Our short review describes our experience with de novo molecular design methods which help us to discover novel medicines.

De novo molecular design

Since the 1990s, researchers in industry and academia have sought computational models to assist chemists with the complex design task of designing “better” molecules to treat disease. In the ensuing decades, practitioners in the field of de novo molecular design have learned many lessons on how best to apply these models to generate actionable molecules for active drug discovery programmes.

De novo molecular design is the art of designing molecules to optimally satisfy the desired objective. In our case, this objective is producing better drugs, which involves balancing a number of molecular properties and is, therefore, a multiobjective problem. We previously published GuacaMol, an open benchmark for measuring the aptitude of de novo design algorithms.

In our review, “De novo molecular design and generative models” we categorise existing methods for de novo design by a new paradigm, related to the practicality of using these methods in anger. Namely, we classify generative chemistry algorithms by the coarseness of their molecular representations, whether atom-based, fragment-based or reaction-based, each of which has profound implications for the types of molecular design that can be achieved. We also distinguish between modern AI methods (gradient-based) and traditional chemoinformatics approaches (metaheuristic) and emphasize that while older methods are out of vogue, they can offer competitive performance and practical advantages.

On top of the choice of automated molecular design algorithm, we offer our perspective on practical usage of these algorithms. For example, some algorithms for de novo design allow the practitioner to “grow” from an initial starting molecule more easily than others. Algorithms which allow users to ask a broad range of questions are certainly practically advantageous.

While there has been much focus on developing new algorithms for de novo design, we emphasize our belief that it is the design of a suitable fitness objective that remains the challenge for most de novo design endeavours. Automated design algorithms are able to exploit loopholes in calculated scores which can result in less useful outputs being generated, therefore it is sensible to take steps to avoid the presence of loopholes in addition to providing a number of competing objectives.

Medicinal chemists now have a variety of tools at their disposal that are proficient generators of sensible molecule structures, now the challenge is to evaluate whether our generators and optimization objectives are useful for the tasks at hand. De novo molecular design and generative chemistry models remain a controversial topic in the field, but we believe there is strong evidence to support adding atom-based generators, fragment-based methods and reaction-based de novo design tools to the medicinal chemistry toolbox.