Skip to main content

AIUI Node

Roban utilizes the iFlytek AIUI interface to implement functions such as speech recognition and speech synthesis. After the node is launched, AIUI is in a standby state, waiting to be awakened by the wake-up phrase "Lu Ban Lu Ban" to enter the working state. When the server issues an end command or a timeout occurs while waiting for upload, AIUI re-enters the standby state, waiting to be awakened again.

Obtaining the Direction of Voice Wake-Up

The wake-up phrase for voice interaction is "Lu Ban Lu Ban." The recognition of this phrase is implemented on the sound card module. Once the wake-up phrase is recognized, the system calculates the direction of the sound source based on the audio pickup situation of each microphone in the six-microphone array and publishes the direction through a ROS message.

Voice Wake-Up Message

  • Topic Name

    • /micarrays/wakeup
  • Message Type

    • std_msgs/String
  • Message Definition

    • string data

Subscribing to Voice Wake-Up Messages (Python Implementation)

#!/usr/local/bin/python
# -*- coding: utf-8 -*-
import rospy
from std_msgs.msg import String # Import message type
import json

class WakeUp():
def __init__(self):
rospy.Subscriber("/micarrays/wakeup", String, self.wakeup_callback)
rospy.spin() # Control the ROS message loop

def wakeup_callback(self, msg):
data = msg.data # Get a string, e.g., {u'key_word': u'lu3', u'score': u'1338', u'angle': u'324'}
data = json.loads(msg.data.replace("'", '"')) # Replace single quotes with double quotes in the string to directly convert it to a dict using json, resulting in:{u'key_word': u'lu3', u'score': u'1338', u'angle': u'324'}
angle = float(data['angle']) # Get the angle of the sound source
rospy.loginfo("Sound is coming from {}°".format(angle))

if __name__ == "__main__":
rospy.init_node("wakeup")
wakeup = WakeUp()
  • Meaning of the angle Value

    • There are six holes on the top of the robot, each containing a microphone. When the sound source is directly in front of Roban, the angle value is 300, increasing clockwise by 60 degrees for each hole.

Speech-to-Text

The AIUI Node provides the functionality for converting user speech to text, publishing the recognized text through ROS messages.

Speech-to-Text Message

  • Topic Name

    • /aiui/nlp
  • Message Type

    • std_msgs/String
  • Message Definition

    • string data

Subscribing to Speech-to-Text Messages (Python Implementation)

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import rospy
from std_msgs.msg import String # Import message type

class S2T():
def __init__(self):
rospy.Subscriber("/aiui/nlp", String, self.nlp_callback)
rospy.spin() # Control the ROS message loop

def nlp_callback(self, msg):
rospy.loginfo(msg.data) # Output the result using ROS log

if __name__ == '__main__':
rospy.init_node("speak_to_text")
s2t = S2T()

Text-to-Speech

The AIUI node also provides the capability for converting text to speech, allowing users to request this service through ROS. The AIUI node uploads the text and parameters to the server and plays the result returned by the server.

Text-to-Speech Service

  • Service Name

    • /aiui/text_to_speak_multiple_options
  • Service Type

    • ros_AIUI_node/textToSpeakMultipleOptions
  • Service Definition

    string text         # Text to be converted to speech
    string vcn # Voice character (currently supports 'qige', 'dangdang', 'xiaojuan')
    uint8 speed # Speech speed (0-100)
    uint8 pitch # Speech pitch (0-100)
    uint8 volume # Speech volume (0-100)
    ---
    bool is_success # Request success flag

Requesting Text-to-Speech Service

Ubuntu Command Line Implementation

source ~/robot_ros_application/catkin_ws/devel/setup.bash   # Load workspace environment variables
rosservice call /aiui/text_to_speak_multiple_options "{text: 'hello, i am Roban', vcn: 'qige', speed: 50, pitch: 50, volume: 50}"

Python Implementation

#!/usr/bin/python
# -*- coding: utf-8 -*-
import rospy
from ros_AIUI_node.srv import textToSpeakMultipleOptions

class T2S():
def __init__(self):
self.tts_param = { # Define the text to be converted and synthesis parameters
'text': 'hello I am Roban',
'vcn': 'qige',
'speed': 50,
'pitch': 5,
'volume': 20
}

def tts(self):
rospy.wait_for_service("/aiui/text_to_speak_multiple_options", timeout=2) # Wait for the service to be available. Timeout is set to 2s here. By default, it will wait indefinitely, and a rospy.ROSException will be thrown on timeout.
tts_client = rospy.ServiceProxy("/aiui/text_to_speak_multiple_options", textToSpeakMultipleOptions) # Create a ROS service client
tts_client(self.tts_param['text'], self.tts_param['vcn'], self.tts_param['speed'], self.tts_param['pitch'], self.tts_param['volume']) # The client makes a request, with parameters matching the service definition

if __name__ == '__main__':
rospy.init_node("text_to_speak")
t2s = T2S()
t2s.tts()