My research background is in artificial intelligence for human-robot collaboration, with emphasis on dialogue systems and manipulation in a shared workspace. I have extensive experience in design and implementation of robot behavior architectures. I also previously worked on interactive learning and learning from demonstration.
Over the years, my research has been featured in the New York Times, Nova ScienceNow, and CBS News.
Past Projects[Human-Robot Collaboration]
[Conversational Turn-Taking Styles]
[Task Learning from Demonstration]
[Embodied Active Learning]
Human-Robot CollaborationThe culminating domain of my PhD thesis was — Mega Bloks. Isn't blocks world so three decades ago? Actually, I was motivated by the great work that Herb Clark did in analyzing human communication and common ground using block models, and hoped to find similarities between his data and my own.
My PhD work focused on developing an interaction framework called CADENCE: Control Architecture for the Dynamics of Embodied Natural Coordination and Engagement. I needed a domain that could show off the multimodal aspects of the behavior and resource management. So I came up with a collaborative block assembly task in a shared manipulation workspace, with the catch that the robot and the human each know half the information about how to complete the model. This forces them to talk to each other to solve it!
A big part of collaboration is the ability to repair common ground. Errors can happen at the level of an acoustic signal, a semantic reference, a plan of action — up a hierarchy of what Herb Clark calls "levels of understanding." Part of the work involved enabling the robot to take initiative to repair at these different levels.
A couple journal articles about this work are still in the pipeline, but most things are in my PhD dissertation.
Conversational Turn-Taking StylesThe idea for this work came out of discussions with Professor Brian Magerko and his PhD student Mikhail Jacob. They were developing a computational improvisation system for acting and were interested in cues that communicated social status in speech, gesture, and posture. The question that naturally evolved was, can a robot control its status in an interaction?
The system I subsequently developed had general-purpose parameters for social status, such as response delay, speaking speed, time between speech acts, self-interruptions, and how often to take turns or backchannel relative to a conversational partner. We called this parameterized floor regulation. These parameters existed independently of the contents of the domain. The idea was that the domain may dictate a social role for the robot, which would be a parameter setting, but you would not have to design these behaviors into each individual dialogue act; they were automatically generated from the parameter setting, and kept separate from the semantic content of the interaction.
When we investigated different parameter settings with users, we wanted to dissociate the effects of task semantics from the experimental outcomes as much as possible. So we had the robot speak a fake language! We published this work in the Journal of HRI.
I would say the system succeeded at something somewhat different from the original intent of controlling for status. Realistically, status has way more to do with the competency of the robot. Instead, the different parameters had strong effects on how the resulting interaction felt structured to the human. Depending on the robot's turn-taking style, people would either take the lead and assume control over the interaction, or just passively respond to whatever the robot was doing.
Action InterruptionsAt this point in my PhD, I had worked on several autonomous interactions in the space of interactive learning, and Andrea was looking to make progress on a grant she had on turn-taking. My biggest frustration with robot turn-taking was that all of our robot actions (such as a spoken utterance or gesture) ran to completion. It was so annoying having to wait for the robot to finish, especially when the action was incorrect or no longer relevant. This annoyance was exacerbated in interactions with repetitive loops of actions, like teaching interactions where you have to give a bunch of examples to the robot. Shouldn't there be a way for the interaction to be faster and more fluent?
This idea of interrupting actions to yield control to the person became a big part of my PhD thesis work. From my very first hacked-up implementation of the idea, it was immediately apparent that interruptibility completely changed the dynamics of an interaction. This is something we looked at again and again across the years in multiple domains. The later years of my PhD research focused on how to make this idea scalable for multimodal behavior within a general interaction architecture using timed Petri nets.
In our initial foray into this problem space, we used the imitation game Simon Says as an interaction setting. I wizarded Simon to play with participants we invited into our lab. It was clear from the data that the pacing was slow and awkward without action interruptions. I then coded up an autonomous version with action interruptions to demonstrate the contrast.
A key problem when dealing with interruptible actions is, when is it appropriate to interrupt? It has to do with the information structure of the interaction. When you have enough information to make progress, you can go on — we call this special threshold the point of minimum necessary information. Before that point, it's just a grapple for resources. We published on this idea in RO-MAN 2011 and AI Magazine.
The next domain I used was a collaborative version of solving the Towers of Hanoi. I wanted to investigate the same phenomenon in a more goal-driven task. The weird thing here is that the robot might be better at the mental problem of solving the task, but the human is just so much faster than the robot at manipulation actions. When the robot interrupts its manipulation all the time, it can't really help out. It's difficult to strike a good balance in the collaboration without more transparency about what the human and robot are thinking. These results were published in the inaugural issue of the Journal of HRI.
Task Learning from DemonstrationThe roots of this project came from Andrea's work on task learning with Leonardo at MIT, which focused very much on discrete states. The next question we had was, how do enable the same types of generalizations but also encapsulate values on continuous manifolds, like locations, rotations, or sizes of objects?
I came up with the idea for this framework while learning about categorization in a cognitive psychology course taught by Professor Eric Schumacher. The idea is that all sensory inputs are intrinsically continuous-valued, but they can be clustered into discrete concepts for higher-level reasoning. The basis for a meaningful clustering from sparse data is through demonstrations from an end user while achieving a task goal. The interaction structures the segmentation. I worked with Maya Cakmak on the implementation and evaluation of this learning framework, and we published the work in ICDL 2011.
We also did a project wherein we spent a week working with Victor Ng-Thow-Hing at Honda Research Institute in Mountain View to port the framework to the Honda humanoid robot. Generalization across robots!
Embodied Active LearningFor this project, I collaborated with my labmate Maya Cakmak, who is now a Professor at the University of Washington. We had been reading machine learning literature on active learning at the time. I had a very simple idea, which was to have the robot act as an embodied active learner by using pointing gestures to select instances in the physical workspace to be labeled by the human. In one week before the paper deadline for HRI 2010, we devised a domain based on tangrams, I coded up the concept learning and active learning algorithms, Maya coded up the interaction control, and I ran a quick pilot study. The paper got in!
We conducted a more polished version of the user study in a follow-up journal article for TAMD, with four conditions across which the robot exhibited a spectrum of initiative in its behavior. The qualitative takeaway is, straight-up active learning on every turn is actually pretty annoying. People stop complying with the robot after a while of being spammed with so many questions. And it's not even theoretically better than an optimal teacher! Using my learning framework, Maya went on to show this analysis in a follow-up ICDL 2010 paper.