Research

Integrate Multimodal Input to Offer Tailored Feedback In Robotic Tutoring

Current robotic tutoring systems mainly use computer vision to monitor student activities or rely on touch-based interfaces such as buttons and touchscreens for interaction. This limited range of interaction can hinder the robot’s ability to fully understand a student’s learning progress, which may affect the effectiveness of the feedback provided.

In my research, guided by Nicole Salomons, we are exploring an exciting idea: adding verbal input to robotic feedback systems. By combining visual and verbal data—observing students’ physical interactions and listening to their verbal responses—we aim to create a more comprehensive understanding of student learning. This approach utilizes the LLM model to process verbal inputs, allowing robots to offer more relevant and timely feedback, thus enhancing the learning experience.

Develop a Novel AAC Text Generation System Powered by Image Recognition and LLM

Communication remains a fundamental challenge for individuals with motor disabilities. Historically, they’ve had to rely on traditional augmentative and alternative communication (AAC) systems, which come with inherent limitations: symbol-based AAC confines users to a restricted vocabulary, and text-entry methods are painstakingly slow, hampering fluid conversations.

Recognizing these challenges, we developed ImageTalk. Our approach diverges from the norm by seamlessly integrating image recognition models with state-of-the-art large language models (LLMs). The premise is straightforward yet transformative: by interpreting information from user-selected images combined with minimal text inputs, ImageTalk can rapidly generate nuanced stories that reflect a user’s intent and context.

The ImageTalk system has a keystroke savings of 94.4%, much higher than traditional text-entry methods. This efficiency translates to faster, richer, and more meaningful interactions for AAC users. Our research offers insights and design guidelines to further optimize this human-AI collaboration, driving our mission to make communication limitations a relic of the past.

Puming (Oscar) Jiang

Research

Integrate Multimodal Input to Offer Tailored Feedback In Robotic Tutoring

Develop a Novel AAC Text Generation System Powered by Image Recognition and LLM

More research projects can be found in my CV, but they are less HCI related.