Taylor Sparks Professor of Materials Science & Engineering and co-host of Materialism Podcast
3d • Linkedin ttps://www.linkedin.com/ 11-20-24
are there differences in large language model embeddings for materials? we investigate and do some careful comparisons. excellent collaborative work with the pros over at Trinity College Dublin >>>>>>>>>>> My footnote: Embeddings are numerical representations of real-world objects that machine learning (ML) and artificial intelligence (AI) systems use to understand complex knowledge domains like humans do.
<<<<< Luke Gilligan, PhDLuke Gilligan, • Postdoctoral Research Fellow at Trinity College Dublin
Linkedin 11-20-24
It's great to share that our latest paper is officially out!
In this work, we examine the
untapped potential of large language models (LLMs) and explore how much information about compounds and materials might already be "hidden" within their parameters. More importantly, we discuss potential interesting strategies for unlocking and using this data effectively.
This is a pretty novel work, and we're excited about the new questions and possibilities it opens up for the integration of LLMs into the domain of materials science. Huge thanks to the collaborators from
Trinity College Dublin and The
University of Utah:
Matteo Cobelli, Stefano Sanvito, Hasan Sayeed & Taylor Sparks
The full paper can be found here:
https://lnkd.in/eFyNhq83 Abstract
Vector embeddings derived from large language models (LLMs) show promise in capturing latent information from the literature. Interestingly, these can be integrated into material embeddings, potentially useful for data-driven predictions of materials properties. We investigate the extent to which LLM-derived vectors capture the desired information and their potential to provide insights into material properties without additional training. Our findings indicate that, although LLMs can be used to generate representations reflecting certain property information, extracting the embeddings requires identifying the optimal contextual clues and appropriate comparators.
Despite this restriction, it appears that LLMs still have the potential to be useful in generating meaningful materials-science representations. Conclusion:
All in all, we can then conclude that LLMs are likely not to be a valuable means of gaining an estimate of relative material property ranking immediately, at least when they are taken without any training or optimization step. However, a common sense choice of contextualization term and query key may be useful in certain contexts. This work serves to highlight the potential strengths and drawbacks of LLMs for constructing valuable materials representations for data-driven discovery.