This article explores the notion of the ‘Gray Box’ to symbolize the idea of providing sufficient information about the learning technology to establish trust. The term system is used throughout this article to represent an intelligent agent, robot, or other form of automation that possesses both decision initiative and authority to act. The article also discusses a proposed and tested Situation Awareness-based Agent Transparency (SAT) model, which posits that users need to understand the system’s perception, comprehension, and projection of a situation. One of the key challenges is that a learning system may adopt behavior that is difficult to understand and challenging to condense to traditional if-then statements. Without a shared semantic space, the system will have little basis for communicating with the human. One of the key recommendations of this article is that there is a need to provide learning systems with transparency as to the state of the human operator, including their momentary capabilities and potential impact of changes in task allocation and teaming approach.
Current trends in learning systems have favored methods such as deep learning that have had high profile successes, including IBM's Watson and the Deepmind AlphaGo system. These systems are developed via extensive training rather than being explicitly designed, and as such many of the capabilities, behaviors, and limitations of learning systems are an emergent property of interaction/experience. Given appropriate training this can result in systems that are robust and meet or exceed human capabilities [e.g., 1]. However, this training process can have unpredictable results or produce apparently inexplicable behavior, which has been described as the “black box” problem of such systems. Indeed, the widely reported “Move 37” that AlphaGo selected in its second game against Lee Sedol was regarded as very unpredictable, a move that no human would have made, yet critical in the system's eventual win. This has resulted in a popular notion that without complete knowledge and predictability of a learning system, one cannot fully understand, and thus, partner with such technology.
There are at least three reasons why learning systems can create challenges for human interaction. First, a learning system may adopt behavior that is difficult to understand and challenging to condense to traditional if-then statements. Without a shared semantic space, the system will have little basis for communicating with the human. As a result, what a human may perceive as an error may be fully logical to the system. Second, an actual error on the part of the system may be difficult to detect by the human if the human does not understand the system's basis for the decision making and data/environmental state. Third, by definition, a learning system should evidence some degree of dynamic behavior which challenges the notion of predictability. This article adopts the perspective that learning systems may never be completely “knowable,” much like humans; yet they very well may be trusted by providing the users with information to reduce uncertainty, increase understanding of rationale, and by sharing lessons learned through peer and informal networks. In this paper we explore the notion of the “Gray Box” to symbolize the idea of providing sufficient information about the learning technology to establish trust wherein, much like with humans, we trust based on the synthesis of predictability, feasibility, and inference of intent based on one's knowledge of the goals, values, and interaction with the system. The term system is used throughout this brief paper to represent an intelligent agent, robot, or other form of automation that possesses both decision initiative and authority to act.
Trust and Transparency in Learning Systems
Paramount in the notion of the Gray Box is the idea of reducing uncertainty. Predictability is an essential antecedent to trust of complex systems , and the same will hold true of learning systems–perhaps even more so. Yet, by their very nature, learning systems are believed to be unpredictable, due both to the fact that their future behavior is contingent on past experience and that many systems incorporate sources of randomness or random sampling in generating and selecting courses of action. While this is true, we contend that there are still ways to reduce uncertainty associated with such systems. In particular, we will discuss one method in detail–using transparency methods.
In general, transparency refers to a set of methods to establish shared awareness and shared intent between a human and a machine [3,4]. This may include information about the current and future state of the system and information related to the system's intent in order to allow the human to develop a clear and efficient mental model of the system . Chen and colleagues [5,6] have proposed and tested the Situation Awareness-based Agent Transparency (SAT) Model which posits that users need to understand the system's perception, comprehension, and projection of a situation. Guided by the SAT model, Mercado and colleagues  found that added transparency increased user performance and trust, notably, without increasing workload. Lyons  offers a broader conceptualization of transparency to include features of the intent, environment, task, analytics, teamwork, human state, and social intent as it relates to the human. For learning systems, transparency will likely need to include some fusion of information from the SAT model and information from the various transparency facets discussed by Lyons .
Humans interacting with learning systems will need to understand how the system senses the environment, how it makes decisions and acts, how it teams with the human, and how this teaming strategy changes over time based on changing situational constraints or goals (i.e., the notable autonomy paradox of transfer of authority). To the first point, the human needs to understand how the system interacts with its environment. This may include understanding how the system ingests and perceives data, what kind of sensors it has and the limitations of those sensors, and where possible it should communicate its understanding of the environment to the human. This will help the human understand the mental model of the system in relation to the environment and notably how this mental model changes as the system adapts to novel situations. Second, the human should understand how the system makes decisions and how these decisions translate into actions. Research has shown that transparency methods in the form of decision rationale can increase trust for recommender systems in commercial aviation . A replication of Lyons and colleagues  using a high-fidelity simulation found that added rationale increases user trust and reliance on the decision aid while reducing verification (i.e., second guessing) of the automation's recommendation . Humans need to understand the logic behind any recommendations by a complex system. With a learning system, the human will need to understand if and how the decision logic of the system changes and why it changes (i.e., what conditions drive the strategy change, what are the thresholds for such changes, what are the underlying assumptions of the system?). Perhaps most importantly for a learning system, the human needs to understand how the system will team with the human and how this teaming strategy changes based on human states and situational constraints.
The teaming strategy of the system may include the division of labor between the human and the system, the intent of the system toward the human, and meaningful exposure of the human and system to events to jointly experience and react to novel stimuli. Future human-machine teaming paradigms will likely involve some division of labor between humans and intelligent machines. The human needs to understand both in real-time and future projections, how that division of labor is perceived by the system, how it will change, and what triggers the change. The system should visually represent the division of labor for a particular task or set of tasks. This will allow the human and system to develop shared awareness of the current and future teamwork context. Further, it is plausible that advances in physiological assessment and intelligent algorithms will allow systems to transfer authority between the human and the system as required by situational demands. For instance, the Air Force has fielded an advanced automated system called the Automatic Ground Collision Avoidance System (AGCAS) on the F-16 platform that will take control away from the pilot when it detects an imminent collision with the ground . This system only activates at the last possible moment to avoid nuisance activations and interference with the pilot. It was this innovative design to consider the pilot's perceived nuisance threshold that drove much of its success–and it is this understanding that has positively influenced pilot's trust of the system .
Intent and Consistency
Humans must also understand the intent of the system in relation to the human. This will require that humans fully understand the goals of the system and how the system prioritizes multiple goals across a variety of situational constraints. Understanding this goal prioritization and how priorities fluctuate across situations will be an important antecedent of trust for learning systems. This forms the basis for understating what “motivates” the system's behavior. Humans can gain exposure to these nuances through systematic joint training sessions where the human and system jointly interact across a range of scenarios. These scenarios will comprise meaningful tests or stretches of the system's intent across the various situations that will be needed to foster appropriate trust of a learning system . While the “values” of the system may be opaque, what we can do is to structure scenarios that test out the behavioral consistency of the system across a range of demanding constraints. Thus, while we can never test every possible scenario to achieve full understanding, much like humans, we must infer behavioral consistency based on demonstrated consistency and predictability in a variety of challenging scenarios. I do not know how a close friend will react to positive or negative news with full certainty, but such outcomes are generally predictable based on prior experiences which we shared. The same will hold true for a learning system. Prior experiences should serve as information to guide future predictions of consistency. The value of such information will depend not on a factor of total time of interaction, but rather in the meaningfulness of the interactions that are jointly experienced. Ultimately, humans may not need to know in detail exactly how a learning system will react to a novel stimulus, however it would be effective to know that the system has reacted to other novel stimuli encountered in the past in ways that support their own goals. Understanding the rules that govern the behavior of the system and having experienced behavioral consistency that is in accordance with those rules should be a sufficient starting point for teaming with a learning system.
The rules or values that are encoded in a learning system are thus critical to overall success and especially successfully teaming with humans. Much like learning in humans, learning systems require explicit and implicit direction to instill those values. For example, in many implementations, explicit feedback in the form of negative rewards can be attached to behaviors (e.g., damaging a friendly asset), while implicit direction is provided by careful construction of training scenarios wherein violations of values lead to failure. This process then may be validated during a cooperative training process; as the human gains experience with the system's behavior they should have the opportunity to provide feedback to the system and reinforce those values.
System Understanding of Human State
In a human-machine team, transparency of the system is not the sole key for the human in understanding the learning system. The human partner must be transparent as well, in the sense that the system should monitor the state and inputs of the human partner in order to adapt and team most effectively–a concept that has been termed Robot-of-Human transparency  to refer to the bidirectional nature of transparency. Ideally, this monitoring is passive and nonintrusive so as to minimize additional workload burdens on the human partner. There are at least two potential ways in which data from human monitoring can be utilized by a learning system. One is error detection or expectation mismatch; signals such as verbal expressions of surprise or the P300 (an electroencephalographic waveform indicating that the human has perceived something as unexpected) can provide the system with evidence that its behavior has caused violated operator expectations, and trigger reevaluation, changes in behavior, and potentially queries to the human operator to clarify. This concept has been demonstrated in online control with no overt response from the operator .
A second way in which human data from physiology and behavior can be utilized is as a system input to adapt teaming behavior. This concept has been extensively discussed in the literature under adaptive automation and coadaptive aiding [12–15]. The key concept is that accurate real-time monitoring of human state (e.g., cognitive workload, stress, and fatigue) can become part of the total system environment and directly impact optimal function allocation and teaming strategies. This approach has the potential both to improve overall system performance due to optimizing utilization of human and system resources, and to improve the likelihood of user acceptance and adoption. Users have historically resisted adoption of aiding systems that can arbitrarily take control of tasks, however if the system only does so when the user is overloaded then the system can be framed and communicated as more of a partner and aid than an unpredictable, in-flexible machine. There are a few key challenges to successful implementation of this approach. One is the limitation in accuracy of state assessment (generally on the order of 80-90% correct over time, ); confidence and error probability must be understood and quantified in the system's representation of the human partner and behavior selection. Another challenge is effective handoff or transfer of tasks; task set changes and sudden transitions in workload levels can have negative impacts on human performance [e.g. 17, 18].
In addition to providing transparency, we can reduce uncertainty of learning systems by facilitating knowledge sharing between peer groups. Stories related to both successful instances of interaction and failures can be a useful way to reduce uncertainty for a novel system. Stories shared among operators have been shown to influence trust of fielded automated systems in the Air Force . These stories help to fill in the gaps of uncertainty as different users encounter disparate environmental constraints–and as a result a wider variety of experience with the system (under various conditions) is shared throughout the social network. A critical consideration, however, is to ensure that systems being considered are indeed the same, lest users share stories based on different versions of a system or a different system altogether. Designers should expect that users will transfer both optimistic and pessimistic expectations from one system to another when the users perceive the systems as similar.
In summary, we have three recommendations that will shed light on the function of learning systems, resulting in systems that can be described as gray boxes:
Provide human operators with maximum transparency as to the inputs, process, and potential outputs of the learning system, as well as values encoded in that system.
Train humans and learning systems together using challenging and realistic scenarios to increase mutual understanding, improve teaming, and enable human operators to gain experience and insight into system performance.
Provide learning systems with transparency as to the state of the human operator, including their momentary capabilities and potential impact of changes in task allocation and teaming approach.