LLM-powered Social Robot System for Non-Technical End-User
Building an LLM-powered social robot framework that enables non-technical users to author, customize, and control robot interactions without engineering support.
LLM IntegrationEnd-User AuthoringSocially Assistive RoboticsHuman-AI InteractionROSAgentic Systems
Why this work: A 6-month real-world deployment of a pre-scripted robot system revealed that fixed interactions create "interaction fatigue" — children disengage as content fails to adapt over time. Therapists had no mechanism to update content themselves. This system directly addresses that limitation: replacing rigid, pre-scripted interactions with LLM-generated, therapist-authored ones.
System Overview
What we're building
The system integrates large language models into a social robot platform, enabling dynamic, contextually appropriate interactions to be generated in real time. Rather than relying on pre-written scripts, the robot generates responses, stories, and therapy activities on the fly — guided by therapist-defined goals and child-specific parameters.
A central design goal is the therapist-facing authoring dashboard: a UI that allows non-technical clinicians to create and customize LLM-driven interactions without any engineering intervention.
System pipeline: the therapist configures interactions via the authoring UI → the LLM generates speech, gesture, prosody, and facial expression → safety guardrails are applied → the therapist previews and approves → the robot delivers in session.
System Architecture
Three-layer pipeline
The system is organized into three layers — perception, cognition, and expression — with a Flask-based therapist dashboard.
Perception
Speech recognition via NVIDIA Riva ASR, voice activity detection via Silero VAD, facial expression recognition via DeepFace, and physical object recognition via Gemini Robotics-ER.
Cognition
LLM-based reasoning using Ollama / LLaMA 3.1 with RAG for therapy-grounded context retrieval for each client, and Gemini 2.5 Flash for content generation.
Expression
Text-to-speech with prosody and lip-sync, IK-based motion planning, and a gesture and emotion library.
Design Goals
What this system is designed to do
01
Enable non-technical authoring. SLPs can define therapy goals, child profiles, and interaction parameters through a dashboard — without writing code or modifying system internals.
02
Replace pre-scripted content with dynamic generation. The robot generates new stories, questions, and responses in real time based on the child's current session context — avoiding interaction fatigue.
03
Support multi-modal interactions.The robot synchronizes its lip movements, gestures, and facial expressions with generated narrative, conversational, and responsive content.
04
Keep the therapists in the loop. The system is designed for therapist-guided autonomy — the robot acts within boundaries set by the therapists, not independently of them.
Early Demo
System in progress
The following demo shows an early version of the LLM-integrated system. This is a work-in-progress prototype — not a final deployment.
Early prototype demo of the LLM-integrated robot system — work in progress.
Technical Stack
Technologies
PythonFlaskROSGemini Robotics-ERTTS with prosody/lip-sync