Are Digital Assistants Smart Enough to Do Their Jobs?
Apple has Siri, Microsoft has Cortana, Amazon has Alexa — and the list goes on. Today’s tech titans all offer “digital assistants” integrated into their devices, to help our lives run just a little more smoothly.
Justine Cassell, an artificial intelligence researcher and associate dean at Carnegie Mellon’s School of Computer Science, says that speech recognition is one area for development (accents in particular are still difficult for the technology). But for her, the crux of the matter is social: Our digital assistants need to be able to read our conversational cues — not just our words — and draw on an ongoing relationship for context.
“Siri or Cortana or M or Alexa, they all talk to us as if it’s the first time they’ve met us,” Cassell says. “And it’s no fun to have a personal assistant with amnesia. That’s not what we want from our assistant. We want them to say, ‘You remember that thing you were looking at online last week? Well, the price has gone down. And [I] know what trouble you got in last year when you forgot your husband’s birthday, so I suggest you look it up today and you buy it now, while the price is low.’
“That’s the kind of not just context, but also the kind of pushy conversation you might expect from someone you’ve known for a long time.”
Cassell is working with a team to build “socially aware” artificial intelligence that can respond to us the way a friend or colleague does. As part of their research, they videotaped pairs of people doing tasks together for an hour each week, noting the differences between the pairs as time passed.
“We found, for example, that teasing went up from week to week,” she says. “And referring to shared experience went up from week to week. … But positivity and praise went down from week to week.”
Cassell’s team built those findings into Sara — their “socially aware robot assistant.” Right now, Sara helps attendees at conferences get introduced to other participants and find the sessions that best fit their interests.
“We find that people are surprisingly engaged — surprisingly because they themselves are surprised by how engaged they are and by, how compelling they find the experience,” she says.
But if digital assistants can become better conversationalists — more like a person — who do we want them to be most like? A friend? A colleague? A butler?
“I talk about it in Jarvis and Mary Poppins terms,” says Carolina Milanesi, a tech analyst at Creative Strategies in San Jose. “From a family perspective, you see a lot of advertising for Google Home, of these things that come in, and they are your Mary Poppins in the home and helping out and reminding everybody of everything.”
But Milanesi says that at the moment, many of these caretaker assistants operate on just one family member’s account.
“And that’s not really, you know, what Mary Poppins does,” she adds. “She needs to know everything about everybody to be able to help the family.”
In contrast, Milanesi says that assistants like Siri are more suited for a Jarvis, or butler, experience. (Edwin Jarvis being the butler character in several Marvel Comic titles.)
“They are my assistant,” she says. “You know, with what Apple is doing with AirPods, adding in Siri’s whisper in your ear like you see in the movies for the ambassador, [where] that person behind their shoulder is just saying, ‘Oh, this is so and so, you met him two years ago, he has a daughter, and that’s what he does.’ That is a little bit the idea, but it’s not clear how are we going to get there.”
For now, Milanesi says, “advertising is way overselling what, in general … these assistants do.” But that may be experienced differently in the next generation of digital assistants — or by the next generation of humans.
When Milanesi was setting up Amazon’s Echo speaker in her home, a device powered by Alexa that is “this kind of jar-looking thing that you have on your shelf,” her 8-year-old checked in to see what was going on.
“And I said well, ‘This is Echo, there’s Alexa,’” Milanesi says. “And she goes, ‘Well, is she like Siri?’ And she used ‘she.’ So it was, ‘Is she like Siri?’ And I said, ‘Yeah.’ And she asked the question, and she got her answer, and she said, ‘Well, if she’s smarter than Siri, can we keep her?’”
“Five minutes in, there was that emotional connection that she created with this thing.”
Carolina Milanesi is an analyst at Creative Strategies in San Jose, California.
Justine Cassell is Associate Dean of Technology Strategy and Impact in the School of Computer Science at Carnegie Mellon University in Pittsburgh, Pennsylvania.
IRA FLATOW: This is “Science Friday.” I’m Ira Flatow. Do you find yourself like I do dictating into your smartphone? I use the dictation feature of my iPhone all the time. You tap that little microphone, and it translates my speech to words into a text. But as accurate as the dictation, is I got to admit I have never really become great friends with Siri, the built-in digital assistant. And I don’t know how many people, or many people who have.
Just a little example. I want to give you an example of how difficult this is. Our producer Christopher, trying to talk with her, talk with Siri, just a few hours ago.
CHRISTOPHER: What time is my appointment on Sunday evening?
SIRI: You have four appointments last Tuesday at 12:00 AM.
CHRISTOPHER: I said, what time is my appointment on Sunday evening?
SIRI: You have four appointments last Tuesday at 12:00 AM.
CHRISTOPHER: I’m not asking about Tuesday.
SIRI: I don’t know what you mean by “I’m not asking about Tuesday.” How about a web search for it?
IRA FLATOW: And just like with Christopher, that’s what happens to me, usually. Siri says, I’ll search for it on the web. Typical, right? So why not just open the calendar app and save the time, if you ask somebody to look at the calendar for you? Forgo the assistance of this not very helpful assistant.
Well, I’m not trying to pick on Siri here, because all this leads up to the question, what is about the way I talk or about human conversation that makes this such a challenge for artificial intelligence developers? The tech titans are all developing these assistants. You’ve got Google, and Apple, and Facebook, and Microsoft and Amazon. Who’s going to win the prize? Who’s going to come out on top? Well, that’s what we’re going to be talking about.
Carolina Milanesi is an analyst at Creative Strategies in San Jose, California. Welcome to “Science Friday.”
CAROLINA MILANESI: Thank you for having me.
IRA FLATOW: Justine Cassell, Associate Dean of Technology Strategy and Impact for the School of Computer Science at Carnegie Mellon University in Pittsburgh, welcome to “Science Friday.”
JUSTINE CASSELL: Thank you. Thanks for the invitation.
IRA FLATOW: You’re welcome. Carolina, what’s your experience using Siri and Alexa? And so am I giving these assistants an undeserved bad rap here?
CAROLINA MILANESI: Well, I’m Italian. I lived in the UK for 17 years, moved to Silicon Valley four years ago. I’m married to a New Yorker. So my accent is not [? backing ?] the Siri very [INAUDIBLE].
So one of the difficulties is definitely how voice, accents, jargon, different words that we use all the time. And we want to be talking to this digital assistant in the same way that you and I are talking. And a lot of times, that is not possible.
Now, it has been interesting to see the different approaches that vendors have taken to this. So Alexa has a very detailed list of things that you can ask her. And she’s very precise about how you need to ask her things. And if you do it the right way, she’s very helpful. But most of us don’t want to learn that. We just want to talk to them.
IRA FLATOW: Yeah, we want to like they do on Star Trek, or they do in “2001– Space Odyssey.” You just say a word or you ask the computer to do something and it knows exactly what you’re saying, knows exactly what you want to do. Justine, you agree that it’s very difficult to teach these computers to learn how to speak English, or Spanish or Italian?
JUSTINE CASSELL: What we’re talking about right now is how difficult it is to get them to understand English, and English spoken by people with a variety of accents. And that’s because there’s an entire pipeline that goes into an assistant like this understanding what you say. It has to understand your accent, and translate it into text. And then it has to understand the meaning of that text. And then it has to reason about what an appropriate reply is. And then it has to say that appropriate reply in a way that’s understandable.
And you’ve given examples, lots of examples, and I have many more of what’s called automatic speech recognition failures. That is, not understanding what you’re saying. And you also gave an example of it not understanding the meaning, looking up Tuesday when what you meant was Sunday.
But you started out the show by saying that it was difficult to be friends with these assistants. And that’s what I think is really the crux of the matter. Because these assistants are advertised as the future of our interaction with computing in the same way we interact with our human assistants or our human colleagues. And do you think that’s the case? Most people don’t.
IRA FLATOW: I think you’re right. I think that we want actually maybe not to become close friends, but friends enough that we can carry on a conversation, and even have context, so that when I say, can you make me a reservation at this restaurant? And then I come back later and I say, did you make the reservation? I don’t hear, what restaurant? What reservation? So it remembers my former conversation.
JUSTINE CASSELL: Right. I heard a great example from Alexa where someone asked Alexa to set an alarm. And when the alarm went off, the person said, thanks. But Alexa didn’t turn off the alarm, because she didn’t know that “thanks” meant thanks for turning on the alarm. Now, turn off the alarm. Because what we mean is not always contained in what we say.
And our way of talking to different people changes over time. So the way I talk to you is the way I talk to somebody I just met. Nice to meet you. I’m being fairly polite. As we get to know each other, I may be a little less polite, but that’s because I’m a New Yorker. And you’re a New Yorker too, so you’re going to get it. All three of us are New Yorkers. We’ll pick up on that fast.
IRA FLATOW: Now, wait a minute here. Forget about it.
JUSTINE CASSELL: Exactly. But Siri, or Cortana, or M or Alexa, they all talk to us as if it’s the first time they’ve met us. And it’s no fun to have a personal assistant with amnesia. That’s not what we want from our assistant. We want them to say, you remember that thing you were looking at online last week? Well, the price has gone down. And we know what trouble you got in last year when you forgot your husband’s birthday. So I suggest you look it up today and you buy it now while the price is low.
That’s the kind of not just context, but also the kind of pushy conversation you might expect from someone you’ve known for a long time. And that’s actually what I’ve been working on is building in those social smarts as well as linguistics smarts into personal assistants, virtual personal assistants.
IRA FLATOW: I know that’s something called SARA. Is that the name of it?
JUSTINE CASSELL: Exactly, right.
IRA FLATOW: And how does SARA learn about what the right way to do things is?
JUSTINE CASSELL: For the moment, SARA has learned from 60 hours of human-human conversation over the course of five weeks. So we put groups, pairs of people together, and we had them do a task together, and we videotaped them. And they did that task for an hour each week. And we looked at the differences between the pairs, but also the differences across weeks.
And we found, for example, that teasing went up from week to week, and referring to shared experience went up from week to week. So I might say, remember how we did that last week? Well, let’s try that again this week. But positivity and praise went down from week to week. And that’s the kind of thing that we’ve built into SARA, which stands for the Socially-Aware Robot Assistant.
And we find that people are surprisingly engaged. Surprisingly because they themselves are surprised by how engaged they are, and by how compelling they find the experience.
IRA FLATOW: Carolina, there’s a lot of money at stake in getting this right, isn’t there?
CAROLINA MILANESI: There is. A lot of people see this as the next computing platform, is the invisible platform to some extent. Voice is very powerful. But I think there’s still a lot of questions there as far– not just the social side, as we’ve just been discussing, but what kind of assistance do we want?
I talk about it in JARVIS and Mary Poppins terms. From a family perspective, you see a lot of advertising for Alexa and Google Home, of these things that come in, and they are your Mary Poppins in the home, and helping out, and reminding everybody about things. The reality is that most of them are [INAUDIBLE] with just one account. So they know everything about one person in the family. And that’s not really what Mary Poppins does, because she needs to know everything about everybody to help the family.
And others like Siri are more suited for a JARVIS experience. So they are my assistant. With what Apple is doing with the AirPods and adding Siri’s whisper in your ear, like you’ve seen the movies for the ambassador, and that person behind their shoulders just saying, oh this is so-and-so. You met him two years ago. They have a daughter, and that’s way he does. That [INAUDIBLE] but it’s unclear how we’re going to get there.
And for now, I think advertising is way overselling in general what these assistants do. There’s a lot of market already. But I think trying to talk about them in that context of, oh wow, they really are helping out my life, and my life is going to be so much easier now, is overselling for now.
It would let consumers down once they try, and then that frustration comes in because it doesn’t understand that when you say “her,” you mean your daughter, your spouse, your friend, and you still need to say who you’re referring to.
And just every time you speak, starting with, hey Siri, or Alexa, or Google, that’s not natural. I don’t keep on saying your name every time I engage with you.
JUSTINE CASSELL: That’s that same amnesia that makes us feel as if they’ve just met us for the first time.
IRA FLATOW: On the other hand, Mary Poppins, to take this analogy a little further, she lived in my house. She was around all the time. She heard my most intimate conversations. She’s dealing with the kids. You, for this to become more like Mary Poppins, you have to give up your privacy, don’t you?
JUSTINE CASSELL: Haven’t you already given up your privacy?
IRA FLATOW: Well Mary’s not moved in yet, but–
CAROLINA MILANESI: No, but a lot of other technology has.
IRA FLATOW: But seriously, if you don’t say Alexa or Siri, they are listening or watching all the time, even when you’re not talking to them. They could be collecting your conversation with someone else.
JUSTINE CASSELL: Right. That’s the case. And that’s also the case with the emails you write and the text messages you send. And most of the technology today is based on machine-learning algorithms that require a tremendous amount of data. And to get that quantity of data, they have to be collecting a lot of what you say.
And so in some sense, unless you read the small talk, you may already be giving a lot of your talk to science, so to speak. Rather than giving your body to science, you’re giving a lot of details about your life to machine-learning.
IRA FLATOW: I’m Ira Flatow. This is “Science Friday” from PRI, Public Radio International. Talking with Carolina Milanesi and Justine Cassell about these new digital assistants.
So where do we stand? Should we expect to be seeing new and improved stuff coming out soon?
JUSTINE CASSELL: Every day.
IRA FLATOW: Every day?
JUSTINE CASSELL: Yes. This really is, as Carolina said, this is expected to be the computing platform of the future. And really, all the tech giants are trying to compete in this space. They’re trying to make it so that while you’re driving, for example, your hands are busy. You can be talking to a personal assistant. While you’re at home, you can be talking to a personal assistant that will segue into that same personal assistant on another platform in your office. Now, how long it’s going to take to achieve that vision is another question.
IRA FLATOW: Now let me go to the phones. Our number, 844-724-8255. Katie in Kansas City, Missouri. Hi, Katie.
KATIE: Hi, there. My question is my husband and I don’t use this, but we were over at a friend’s house, and we were testing the Alexa app. And I think one of the things we noticed is we found throughout the evening, we were frustrated at times. She didn’t understand what we were saying. We had to keep repeating ourselves.
And what was really fascinating was two things. It’s one, we have friends named Alexa. And so we’re curious the fact that there’s this new service you’re working on called SARA, what is the feedback around using names that are still common in our society today? And my second question is the fact that these are overwhelmingly female voices and female names, do you think that’s creating a new dynamic or bringing back the old school feeling of our assistants tend to be female. Why aren’t there more apps with male voices, I guess is my question?
JUSTINE CASSELL: Excellent question. And actually in an article in the “Times” earlier this week, I talked about whether we push the boundaries of gender stereotypes or whether we stay within those boundaries. And SARA, we only called it SARA and we only made it female because we worked very hard to find an acronym, and Socially-Aware Robot Assistant was the only one we could come up with. If it had been a male name, it would have been a male-looking agent with a male body.
IRA FLATOW: BOB, Big Old something. Bob could’ve been it.
JUSTINE CASSELL: I know. I tried really hard. I couldn’t find anything. But it is the case that different cultures have different stereotypes. And so Siri in the UK is a male voice.
IRA FLATOW: Can you make Alexa be a male voice if you want?
JUSTINE CASSELL: I think you can.
IRA FLATOW: Does that answer your question? Thanks for calling. Carolina, how do you react to that?
CAROLINA MILANESI: Well, it’s interesting to start with we think, does the system need to be personified or not? And I argue that you are creating more of a relationship with it. And as I think about when we got Echo, and you can actually call Echo, Echo. You don’t have to call Echo Alexa if you don’t want to. But Alexa, you are personifying this jar-looking thing that you have on your shelf.
And when we were setting it up, my eight-year-old asked me, well, what are you doing? I said, well, this is Echo. There’s Alexa. And she goes, well, is she like Siri? And she used “she.” So it was, “Is she like Siri?” And I said, yeah. And then she asked a question, and she got her answer. And she said, well, she’s smarter than Siri. Can we keep her?
And there was immediately five minutes in, there was the emotional connection that she created with this thing. And I think that’s important for consumers to learn to trust the assistant, and then engage in that conversation.
Now, the female or male is a whole different conversation. We were watching a program the other day on AI in general. And there was Watson from IBM. And again, my eight-year-old is a blessing as far as tweets and everything else that I can use. But well, is it like Alexa but is a boy?
IRA FLATOW: Yeah. Well, I’ve got– I have to wrap up, because we’ve run out of time. Maybe we’ll call it Hal. How about that? Or we don’t want to call it Hal. Allison Van– I want to thank Carolina Milanesi is from Creative Strategies in San Jose. And Justine Cassell, an Associate Dean of Technology Strategy and Impact for the School of Computer Science at Carnegie Mellon University in Pittsburgh.