Extraction of organic chemistry grammar from unsupervised learning of chemical reactions

by Philippe Schwaller, Benjamin Hoover, Jean-Louis Reymond, Hendrik Strobelt, Teodoro Laino
IBM Research Zürich, University Bern, MIT-IBM Watson AI Lab


Knowing how atoms rearrange during a chemical transformation is fundamental to numerous applications aiming to accelerate organic synthesis and molecular discovery. This labelling is known as atom-mapping and is an NP-hard problem. Current solutions use a combination of graph-theoretical approaches, heuristics, and rule-based systems. Unfortunately, the existing mappings and algorithms are often prone to errors and quality issues, which limit the effectiveness of supervised approaches. Self-supervised neural networks called Transformers, on the other hand, have recently shown tremendous potential when applied to textual representations of different domain-specific data, such as chemical reactions. Here we demonstrate that attention weights learned by a Transformer, without supervision or human labelling, encode atom rearrangement information between products and reactants. We build a chemically agnostic attention-guided reaction mapper that shows a remarkable performance in terms of accuracy and speed, even for strongly imbalanced reactions. Our work suggests that unannotated collections of chemical reactions contain all the relevant information to construct coherent sets of reaction rules. This finding provides the missing link between data-driven and rule-based approaches and will stimulate machine-assisted discovery in the chemical domain.

Launch Demo

Feedback & Citation

Please leave feedback on social media using the hashtag #rxnmapper, use GitHub issues/PRs, or just write us an email. Please feel free to cite our work:

  title={Extraction of organic chemistry grammar from unsupervised learning of chemical reactions},
  author={Schwaller, Philippe and Hoover, Benjamin and Reymond, Jean-Louis and Strobelt, Hendrik and Laino, Teodoro},
  journal={Science Advances},
  publisher={American Association for the Advancement of Science}