AIUI Node
Roban utilizes the iFlytek AIUI interface to implement functions such as speech recognition and speech synthesis. After the node is launched, AIUI is in a standby state, waiting to be awakened by the wake-up phrase "Lu Ban Lu Ban" to enter the working state. When the server issues an end command or a timeout occurs while waiting for upload, AIUI re-enters the standby state, waiting to be awakened again.
Obtaining the Direction of Voice Wake-Up
The wake-up phrase for voice interaction is "Lu Ban Lu Ban." The recognition of this phrase is implemented on the sound card module. Once the wake-up phrase is recognized, the system calculates the direction of the sound source based on the audio pickup situation of each microphone in the six-microphone array and publishes the direction through a ROS message.
Voice Wake-Up Message
Topic Name
- /micarrays/wakeup
Message Type
- std_msgs/String
Message Definition
- string data
Subscribing to Voice Wake-Up Messages (Python Implementation)
#!/usr/local/bin/python
# -*- coding: utf-8 -*-
import rospy
from std_msgs.msg import String # Import message type
import json
class WakeUp():
def __init__(self):
rospy.Subscriber("/micarrays/wakeup", String, self.wakeup_callback)
rospy.spin() # Control the ROS message loop
def wakeup_callback(self, msg):
data = msg.data # Get a string, e.g., {u'key_word': u'lu3', u'score': u'1338', u'angle': u'324'}
data = json.loads(msg.data.replace("'", '"')) # Replace single quotes with double quotes in the string to directly convert it to a dict using json, resulting in:{u'key_word': u'lu3', u'score': u'1338', u'angle': u'324'}
angle = float(data['angle']) # Get the angle of the sound source
rospy.loginfo("Sound is coming from {}°".format(angle))
if __name__ == "__main__":
rospy.init_node("wakeup")
wakeup = WakeUp()
Meaning of the
angle
Value- There are six holes on the top of the robot, each containing a microphone. When the sound source is directly in front of Roban, the
angle
value is 300, increasing clockwise by 60 degrees for each hole.
- There are six holes on the top of the robot, each containing a microphone. When the sound source is directly in front of Roban, the
Speech-to-Text
The AIUI Node provides the functionality for converting user speech to text, publishing the recognized text through ROS messages.
Speech-to-Text Message
Topic Name
- /aiui/nlp
Message Type
- std_msgs/String
Message Definition
- string data
Subscribing to Speech-to-Text Messages (Python Implementation)
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import rospy
from std_msgs.msg import String # Import message type
class S2T():
def __init__(self):
rospy.Subscriber("/aiui/nlp", String, self.nlp_callback)
rospy.spin() # Control the ROS message loop
def nlp_callback(self, msg):
rospy.loginfo(msg.data) # Output the result using ROS log
if __name__ == '__main__':
rospy.init_node("speak_to_text")
s2t = S2T()
Text-to-Speech
The AIUI node also provides the capability for converting text to speech, allowing users to request this service through ROS. The AIUI node uploads the text and parameters to the server and plays the result returned by the server.
Text-to-Speech Service
Service Name
- /aiui/text_to_speak_multiple_options
Service Type
- ros_AIUI_node/textToSpeakMultipleOptions
Service Definition
string text # Text to be converted to speech
string vcn # Voice character (currently supports 'qige', 'dangdang', 'xiaojuan')
uint8 speed # Speech speed (0-100)
uint8 pitch # Speech pitch (0-100)
uint8 volume # Speech volume (0-100)
---
bool is_success # Request success flag
Requesting Text-to-Speech Service
Ubuntu Command Line Implementation
source ~/robot_ros_application/catkin_ws/devel/setup.bash # Load workspace environment variables
rosservice call /aiui/text_to_speak_multiple_options "{text: 'hello, i am Roban', vcn: 'qige', speed: 50, pitch: 50, volume: 50}"
Python Implementation
#!/usr/bin/python
# -*- coding: utf-8 -*-
import rospy
from ros_AIUI_node.srv import textToSpeakMultipleOptions
class T2S():
def __init__(self):
self.tts_param = { # Define the text to be converted and synthesis parameters
'text': 'hello I am Roban',
'vcn': 'qige',
'speed': 50,
'pitch': 5,
'volume': 20
}
def tts(self):
rospy.wait_for_service("/aiui/text_to_speak_multiple_options", timeout=2) # Wait for the service to be available. Timeout is set to 2s here. By default, it will wait indefinitely, and a rospy.ROSException will be thrown on timeout.
tts_client = rospy.ServiceProxy("/aiui/text_to_speak_multiple_options", textToSpeakMultipleOptions) # Create a ROS service client
tts_client(self.tts_param['text'], self.tts_param['vcn'], self.tts_param['speed'], self.tts_param['pitch'], self.tts_param['volume']) # The client makes a request, with parameters matching the service definition
if __name__ == '__main__':
rospy.init_node("text_to_speak")
t2s = T2S()
t2s.tts()