Abstract
Large Language Models (LLMs) have demonstrated intelligent capabilities, but there has been limited research into their applicability to engineering applications. This work assesses the potential of LLMs, specifically OpenAI’s GPT-4, as an interface for the Numerical Propulsion System Simulation (NPSS). Training data and code were provided to the LLM in context, and it was able to process and understand basic engineering concepts related to NPSS. In most cases it was able to satisfactorily answer user queries and provide suggestions. Despite the evident capability of the model, it also exhibited several deficiencies including errors, oversights, and high operational costs. The most insidious problems occurred when the LLM answered confidently and incorrectly. As such, LLMs cannot yet be relied on for complex engineering work. Potential areas for further development and testing to mitigate these issues were explored. An analysis of LLM temperature variation for simple recall and reasoning tasks was conducted, showing good repeatability and reliability, albeit on a limited test set. Future work could explore the development of a standardized NPSS test suite for LLM performance evaluation. With improvements, LLMs could have transformative implications for future users of complex engineering software.