Foundation models for metallurgy?

Abstract

Foundation models appear to promise precision atomic modeling across the periodic table, requiring little more than “fine-tuning” with a few density functional theory calculations. However, it is not clear whether they are sufficiently accurate, even with fine-tuning, for materials modeling to justify their high compute cost. Here, we compare state-of-the art foundation models against a collection of “bespoke” neural network potentials for the Al–Cu–Mg–Zn system. While we find many foundation models to give poor or very poor results, some such as GRACE2L-OAM offer extremely good accuracy in most cases. We find that models trained on the defect-containing and higher k-point density “Alexandria” data set had much better performance than those trained on Materials Project data alone. Our results also indicate that thermal conductivity scores are a much better indicator of metallurgical performance than energy errors on the convex hull. Impact statement GPT4 established to the broad community that foundation models can achieve results that are impossible with hand-crafted or “bespoke” language models. There has now been a rush to see if this is also true for atomic modeling. Indeed even large corporations such as Microsoft, Meta, and Google have begun to develop large training sets and publishing results for universal atomic potentials. The promise is that these models should be accurate enough out of the box for most applications or at most, require a small amount of “fine-tuning” on targeted data. But how accurate are these models and can they be used in a production context for metallurgy? Here, we carefully validate a variety of top-performing foundation models on a comprehensive set of benchmarks applicable to aluminum metallurgy. While we find many are completely unsuitable, some, including the very recently released GRACE2L-OAM model, do indeed have sufficient accuracy for most applications. Our results show that the GRACE2L-OAM model offers similar and sometimes somewhat better accuracy than our previously developed neural network potentials. While foundation models do indeed seem to have found their place in the atomic modeling toolkit, they remain much more computationally expensive. Thus proper model selection is a tradeoff between desired accuracy, compute time, and user setup time. Graphical abstract

Language

English

Author(s)

Daniel Marchand

Affiliation

SINTEF Industry / Materials and Nanotechnology

Year

2025

Published in

MRS bulletin

ISSN

0883-7694

Volume

Page(s)

805 - 818

DOI

https://doi.org/10.1557/s43577-025-00911-0

View this publication at Norwegian Research Information Repository

Contact us

Our services

Career

Sustainability

Management and board

Institutes

Other units

About us

Follow us