Abstract
The condition under which the data wrangling process is undertaken has a profound impact on the quality of the results of the data wrangling and analysis. This paper presents the results of the analysis of the sociotechnical aspects of a data wrangling activity in a large, multi-site global manufacturer. This activity was technically demanding, as operational data from multiple sources and formats needed to be integrated, but also involved interaction with multiple stakeholders in different parts of the world with their own ways of collecting and structuring the data. The data had been captured previously for a different purpose. The clients were not aware that the data followed a different logic in the various sites and in some cases needed to be manually extracted and interpreted. The paper describes the data wrangling process and analyses the assumptions, goals, and biases of the different stakeholders. The analysis raises questions and insights about how data can be trusted and suggests that human intervention with data along the data wrangling process is often un-intentional, tacit, and easily overlooked. It is suggested that contextual factors, such as data quality and assessment of consequences when acting/making decisions on the new data set are given higher attention during the specification of data wrangling assignments. The paper concludes with recommendations for data wrangling practitioners.