To main content

On the value of popular crystallographic databases for machine learning prediction of space groups

Abstract

Predicting crystal structure information is a challenging problem in materials science that clearly benefits from artificial intelligence approaches. The leading strategies in machine learning are notoriously data-hungry and although a handful of large crystallographic databases are currently available, their predictive quality has never been assessed. In this article, we have employed composition-driven machine learning models, as well as deep learning, to predict space groups from well known experimental and theoretical databases. The results generated by comprehensive testing indicate that data-abundant repositories such as COD (Crystallography Open Database) and OQMD (Open Quantum Materials Database) do not provide the best models even for heavily populated space groups. Classification models trained on databases such as the Pearson Crystal Database and ICSD (Inorganic Crystal Structure Database), and to a lesser extent the Materials Project, generally outperform their data-richer counterparts due to more balanced distributions of the representative classes. Experimental validation with novel high entropy compounds was used to confirm the predictive value of the different databases and showcase the scope of the machine learning approaches employed.
Read the publication

Category

Academic article

Language

English

Author(s)

Affiliation

  • SINTEF Industry / Sustainable Energy Technology
  • Catholic University of Portugal
  • Norwegian University of Science and Technology

Year

2022

Published in

Acta Materialia

ISSN

1359-6454

Volume

240

Page(s)

1 - 14

View this publication at Norwegian Research Information Repository