Computational chemistry/materials: Two separate challenges
Computer simulations are an indelible part of modern engineering and science. Take for instance the Burj Khalifa, the tallest building in the world. I am sure that a computer was used to fully design it and analyze its properties, before any ground was broken. Today, architects and engineers can be essentially 100% sure that their building will not fall down, without having actually constructed any portion of it. Sufficiently accurate computational analysis can be performed for the building’s structural integrity, maximum allowable load per floor, resistance to high winds, robustness to earthquakes, and even vulnerability to explosions (whether the explosion is deliberate or not). Once all the due diligence and simulations have been completed, what you see is basically what you will get.
But what may not be obvious is that “designing” substances (molecules, materials, liquid mixtures, …) is an entirely different grizzly bear. If you want to design a new battery or solar or structural material, for example, you cannot have anywhere near the same confidence that your computer-generated “design” will behave as predicted. There are two distinct reasons that computational design of substances is difficult. You could call these ‘analysis’ and ‘design’ obstacles, and both are interesting and challenging for entirely separate reasons.
Challenge 1 [analysis]. Accurate simulations of even a single molecule or material.
Even other physicists are often surprised to find out how pitiful the accuracy of typical chemical or materials simulations are. For instance, a molecule as simple as the 2-atom chromium dimer (Cr2) cannot be simulated accurately enough to match experiment! This is simply a result of the fact that chemistry and materials are governed by quantum physics (as opposed to Newtonian physics), and calculating quantum physics in a high-precision manner is extremely costly. Even the world’s biggest supercomputers are nowhere close to being able to give quantitatively precise results for most substances we care to study.
To reiterate: even the simulation of a single substance is difficult. This means that often, at best one can get an approximate answer that guides the solution. For example, a drug company might run approximate analyses for 10,000 drugs, go to the lab and synthesize the 80 that showed they would work well, and then discover that only 3 of them actually work as expected. The holy grail would be to be able to calculate better answers for each of those drugs, so that if your computer predicts that these 80 will work, then all 80 do indeed work well in the laboratory.
Challenge 2 [design]. Searching through molecular space.
When you are designing anything, effectively what you are doing is starting from a desired set of properties, and then working backwards to find a design that gives you these properties. In civil engineering, your desired properties might be the ability to hold up 1000 tons. You should be able to simply cast a steel beam of sufficiently wide width, and you have met the design constraint. (I’m not claiming all mechanical or structure engineering design constraints are this easy to overcome—but generally there are fewer easy wins like this for substance-design).
Contrast this macroscopic steel beam to molecules and materials, which can (usually) be defined over a discrete set. This means that we don’t have fine-grained tunability. Let’s say you need a molecule with a light-absorption maximum at a particular wavelength, say 510 nm, perhaps for some sort of imaging or communication application. You may be able to find a molecule that absorbs at 498 nm, and another at 520 nm. But you cannot simply pick a molecule “in-between” them–the properties can’t be changed just by changing a continuous parameter like the width of a steel beam. And herein lies the “design” difficulty.
There is an enormous number of possible molecules–the number of possible molecules is literally more than the number of particles in the universe—and one could argue that because of this it’s quite likely that there *is* a molecule that absorbs at your desired wavelength. But the enormity of the space is a problem as well. Because you cannot “work backwards” from the property to find the “dimensions” of your substance, as you could have with macroscopic things—you basically need to add/remove some atoms, test it out, and then move onto the next one. Conversely, with the steel beam, you can indeed often directly “work backwards”—if you need a column 1.5x as strong, then you can just make it 50% thicker (again, I’m not claiming all engineering tuning is this easy, but many important engineering tasks are).
These two challenges really are completely “decoupled.” They’re effectively unrelated. If the periodic table had just four elements, then we might live in a universe where challenge 2 wasn’t difficult while challenge 1 still was. Conversely, if we lived in a universe where quantum mechanics was actually easy to simulate, then Challenge 1 would be solved but Challenge 2 would not be, as the space to “search” for good molecules would still be astronomical.
(An aside on the promise of quantum computing.)
(In the public perception, there is a false impression that quantum computing is good for combinatorial problems, “because a quantum computer can run many calculations in parallel.” This is not correct, at least not in the sense one would think by looking at the sentence. I point this out because, when it comes to materials discovery, quantum computing is poised to effectively solve Challenge 1, but not be particularly transformative for Challenge 2.)
These issues of “analysis” (i.e. simulation) and “design” (i.e. search) are distinct, and they are both very challenging and therefore very fun problems. Challenge 1 has been the focus of theoretical chemists and physicists for a hundred years, with every decade showing bounds in progress. Challenge 2 has been studied intensely more recently, for example with machine learning methods that “generate” the next molecule for you to consider. It is important that they be viewed as separate problems, because people should not be given the impression that algorithm improvement in one challenge implies an improvement in the other.