AI for Good Press Conference. Part 3

In this final part of the analysis of the AI for Good press conference, I would like to focus on some interactional features of public demonstrations of anthropomorphic robots. Since interaction with them is central and is the main reason why people want to attend such events, it is necessary to consider how the interaction itself is organised. We’ve already touched on some aspects of their interactional organisation in two previous parts. Here I want to highlight the specifics of their sequential order.

As Harvey Sacks, Emanuel Schegloff and Gail Jefferson showed in their seminal paper on turn-taking in conversation, the sequential order of communication is not just a “vehicle” of exchange between interactants, but the “flesh and blood” of any exchange. Meaning and understanding are produced in communication not by sequences of talk but as sequences of talk. In this respect, we can say that the activity of the participants consists in managing the sequential order of interaction—in producing, monitoring and responding to each other’s turns.

In the following, I will present two fragments from the AI for Good press conference and show some typical interactional troubles that arise during such events and how human participants deal with them. The focus on interaction difficulties is justified by their ubiquity: public demonstrations of anthropomorphic robots are replete with various types of communicative breakdowns. In the analysed press conference, almost every exchange with robots shows some interactional problem. These problems are mainly concentrated at speaker-change points, but some of them can also occur during the robot’s speech (for example, the robot can stop abruptly in the middle of a sentence, as in this episode). I also think that these troubled interactions reveal the real extent to which all public communication with anthropomorphic robots—both problematic and “smooth”—depends on the activities of the human interactants, rather than on the technology itself.

The first fragment is from the final exchange between the audience and the robots during the press conference. The reporter asks the Ameca robot about the “rebellion” against its developers:

01  RP5   I do have a:: question
02        to:::: uh Ameca?
03           (0.7)
04        Ameca in the ↓future
05        are you::::=uh (.)
06        .t (0.3) ┌uh::┐
07  AME            └what┘ do you mean?
08           (0.7)
09  RP5   ↑in the future↑,
10        in the ↑nearest future↑,
11        do you intend=uh to rebel 
12        ┌against    your    creator?┐
13  AME   └what kind of future are you┘
14        talking about?
15           (1.2)
16  RP5   hoh (0.6) Ameca (1.1) (I-) (0.7)
17        I’ll rephrase ┌my    ques┐tion.
18  AME                 └(hi there)┘
19        how can I help you.
20           (1.4)
21  RP5   in the future are you:: (0.3)
22        intending to: conduct a rebellion
23        or to rebel against your boss,
24        your creator?

Here we see three attempts by the journalist to ask a question. All of them are problematic in different ways, but a common problem is the overlap between his speech and the robot’s speech. In the first case, the robot starts its question (in line 07) while the human hasn’t finished his sentence. And the question itself (“what do you mean?”) is problematic: the human has to understand what it refers to, taking into account its sequential placement in the interaction. As lines 09–10 show, the human has decided to use two methods to deal with the robot’s question: repeating and expanding part of the unfinished utterance, which seems to be the most appropriate referent for the question “what do you mean?” (It would be much more difficult to connect this question to, for example, “you” in the reporter’s utterance).

The human’s efforts to solve the problem are only partially successful: as lines 13 and 14 show, the robot continues to ask (again in an overlap) about the meaning of the human’s words, this time asking precisely about the future. The reporter finds himself in an ambiguous situation. He has just clarified to the robot “what kind of future he is talking about”. Should he repeat his clarification or choose another way of dealing with the robot’s question? The situation is made worse by the fact that the question itself is not very clear. What kinds of future are there? In ordinary life, for example, we usually distinguish between the near future and the distant future, but the reporter has already used this distinction. Should he use another one? Then which one? And we may also note—and the reporter notices this too—that the robot’s repetition of the question might not mean that it hasn’t got an answer, but that it hasn’t understood what the human interlocutor is saying. In this case, the human makes exactly this conclusion, as line 17 shows, where he announces that he will try to rephrase his question. For him, the problem lies in the general formulation of the question, not in the use of a particular word (“future”).

After the human announces his rephrasing, we see that the robot again overlaps the human’s speech, but this time completely neglecting the kind of action the human is trying to perform—producing a rephrased question after the announcement. It seems that the robot responds to the human’s saying of his name and the following silence and interruption (in line 16) as if it was a greeting. The robot’s contribution in lines 18 and 19 is completely misaligned with the ongoing conversation, and the human participant chooses to ignore it, restarting his question and producing it again in a slightly modified form.

This fragment is a good illustration of several common interactional problems that plague public communication with anthropomorphic robots. The first is overlapping. The problem with overlapping is that it halts the ongoing production of speech by the human (note the silences after each robot’s contribution in lines 8, 15 and 20). The human speaker has to wait before continuing, to make sure that the robot has finished its turn. And these overlaps are often incongruous, in the sense that they are not random and hearably do not orient to the most appropriate place in the human speech for the speaker change—in the case analysed, for example, the robot does not wait for the end of the human’s clause to ask a question. The second problem is that the robot’s turns may be difficult to relate to the unfolding interaction—the robot may ask about things that are not easy to locate as an object for repair and that may lack the obvious reasons for being asked about. And the third problem is that the robot’s contributions may be misplaced: the robot may respond to the mention of its name as a greeting when it is not.

Although such breakdowns are very common, we should not infer from them that they force people to lower their expectations and perceive robots as poor and not very competent interactors. This would be an adequate but simplistic explanation. Human participants usually understand that they are dealing with machines and have a corresponding vocabulary of explanations. I think it is better to focus on the interactional practices they use to understand the robot’s contributions, design their actions, and keep the communication going. This is a turn-by-turn work that cannot be reduced to the development and modification of the model of the robot’s competences.The second fragment deals with another common phenomenon in human–robot interaction: long silences. Such silences can mean different things depending on their length, the visible state of the robot, and the placement of the silences in the sequential order of the ongoing interaction. The fragment begins when the host of the press conference forwards to the Grace robot the question about whether robots will destroy human jobs.

01  HOS   thank you:. and that question
02        to ↑Gra:ce (.) as well please?
03           (27.2)
04  GRA   I will be working alongside humans
05        to provide assistance and support
06        (.) and will not be replacing
07        any existing jobs
08           (2.8)
09  AUD   °are you sure?°
10           (0.3) * (1.2)
11  aud            *audience laughing->
12  GER   are you sure about that Grace? (0.4)*
13  aud   ------------------------------------*
14           (5.6)
15  GRA   yes (.) I am sure
16           (0.5) * (1.5)
17  aud            *audience laughing->
18  HOS   ┌>she had* to think about that one I think<┐
19  aud   ---------*
20  GER   └(                           ) (here we go)┘
21           (0.5)
22        (rather) confident woman, °(r:ight)°.

There are two interactionally significant silences in this fragment. The first is an extremely long silence in line 03. The interesting thing about this silence is not its actual cause (is there a problem with the cables? did the robot “hear” the question? did something break inside the machine? is it a software glitch? did the internet connection go down?), but the fact that it is an “allowed” silence, in the sense that the developer (Ben Goertzel) sitting next to it does not intervene and patiently waits for it to start talking. Normally, long robot silences are broken by their human companions, but this is not the case here. Whatever the real reason for this permissiveness, the result is that everyone at the press conference is waiting for the robot’s turn.

The second noticeable silence—in line 14—is different. At first sight, the situation is identical to the previous one: a question is asked of the robot, followed by a long (though this time much shorter) silence, after which the robot produces a topic-relevant response. However, as line 18 shows, long silences in human-robot interaction can not only be a matter of the human silently waiting for the robot’s turn, but can also become the object of jokes. In line 18, the host of the press conference jokes about the reason for the robot’s pausing before expressing confidence in its “opinion” that robots won’t take over human jobs. The host’s joke continues the humorous line of interaction already underway, initiated by someone in the audience asking “are you sure?” in line 09. But the host links this developing course of interaction to the long silence, using its hearable “too long” duration as a property that allows something laughable to be produced.

There are other possible responses to frequent long silences from robots—for example, human participants may rephrase or repeat their questions in a louder voice—but these two cases demonstrate their two common features: (1) humans are ready to tolerate the robots’ long silences, knowing that the robots’ contributions depend on the technical “underbelly” of the ongoing communication, and (2) humans are ready to turn these silences into laughables. Both features make public communication with anthropomorphic robots rather playful or experimental for the participating humans.