[Back to Skill Manifest Guide]
This documentation is outdated, update coming soon.
In addition to understanding the user, we would also like Almond to reply back to the user in natural language when the user issue a command. The replies from Almond are controlled by the canonical forms, using similar templates as those used to generate commands for training. This is often sufficient, but sometimes the replies from Almond are clunky. In that case, you can provide additional annotations to control how Almond replies.
If a command misses a required input parameter, Almond will ask the user for the value, by asking a slot filling question. Similarly, if a query returns too many results and it's pointless to list them all, Almond will ask the user if they want to refine by adding a new filter. By default, the question looks like something “What <parameter-name> are you interested in?” (e.g. "What artist are you interested in?"). Users might not understand such a question, especially when the parameter name is not informative enough.
To give users a better experience, you can provide a customized slot filling question with a prompt
annotation for each parameter in your function. For example, in Spotify we declare play_song
as:
action play_song(in req song: Entity(com.spotify:song)
#_[prompt=["what song do you want to play"]]
...);
By default, the slot filling question will be “What song are you interested in?”. Now with the prompt provided, Almond will ask “What song do you want to play?” instead.
prompt
annotations should be lowercase and should not include a question mark. The question mark will be included automatically (this allows Almond to ask two questions at once).
You must provide a
prompt
annotation on any required input parameter, or the device will not be valid. This restriction might be lifted in the future.
In addition to the agent asking questions during the dialogue, the user might also have a question about the current result. For example, when searching songs, the agent might be recommending some tracks to play, and the user might want to know what genre they are in, or how popular they are.
By default, the following commands are used to train follow up questions: "what is the $param of the $query" (e.g. "what is the genre of the song", "what is the popularity of the song") and "is that a $filter $query" (e.g. "is that a pop song", "is that a song with popularity greater than 70").
To understand more follow-up questions, you can add the #_[questions]
annotation. For example, for Spotify we might have:
query song(...,
popularity : Number,
#_[questions=["how popular is that song", "is that a popular song"]]
...);
When the agent replies to a question from the user, it needs to form a coherent sentence that describes the answer (a database row). Similarly, when the agent executes an action, it needs to describe to the user what just happened, so the user has confidence that the agent executed the right action.
For questions, the default way to do so is to form a phrase that describe each found item using the canonical forms. For example, when searching for songs, the agent might reply with "I have found Delicate. It is a song by Taylor Swift released in 2017." or "I have found Shake It Off and Bad Blood. Both are songs by Taylor Swift.". For actions, the default is to use a verb phrase for the action and convert it to past tense; for example "I played Welcome To New York for you." These kinds of descriptions are often appropriate for queries and actions that operate over named entities (like songs, movies, restaurants, hotels, etc.) but do not cover all possible skills. For example, it would be very weird for a weather forecast skill to reply with "I have found Today's Weather Forecast. It is a cloudy forecast with a temperature of 90F".
To obviate the limitations of generic reply templates, developers can provide customized result phrases using the #_[result]
annotation. This annotation is used to describe a single result from a query (the top result, if the query returns multiple results), or to describe the succesful execution of an action. For example, the weather device is declared as:
class @org.thingpedia.weather
monitorable query current(in opt location: Location,
out temperature: Measure(C),
out wind_speed: Measure(mps),
out humidity: Number,
out cloudiness: Number,
out fog: Number,
out status: Enum(raining,cloudy,sunny,snowy,sleety,drizzling,windy),
out icon: Entity(tt:picture))
#_[result=["the current weather in ${location} is ${status} . the temperature is ${temperature} and the humidity is ${humidity} % .",
"the current weather in ${location} is ${status}",
"the weather in ${location} is ${status}",
"it is ${status} today in ${location} and the temperature is ${temperature}"]]
#[minimal_projection=["status"]];
Result phrases can use placeholders to refer to input or output parameters. The syntax is $name
or ${name}
(similar to primitive templates). Unlike primitive templates, no options are valid for placeholders. The agent uses the phrase with most parameters that have a valid value (not null
or undefined
). Which parameters have a value depends on the projection applied to the command. For example, if the user asks for the weather temperature explicitly, only the "temperature" field will be projected and the agent will choose a phrase that only uses the "temperature". Input parameters are always available in a result phrase. You can ensure that certain output parameters are part of the result phrase regardless of projection with a #[minimal_projection]
annotation. In the example, the minimal_projection
is set to "status", to the agent can always talk about the weather status. If minimal_projection
is unspecified, it defaults to "id" if an "id" parameter is present, and empty otherwise.
If an error occurs while calling a query or action API (in the form of a JavaScript exception), the agent will display an error to the user. By default, the raw exception message will be displayed. Often, the exception message will be cryptic and unsuitable for displaying directly to the user. It will also not be translated.
Instead, "normal" errors that are to be expected in the course of using the device should be specified using an #_[on_error]
annotation on the query or action, describing both the error codes and associated message. For example, in the Twitter skill:
action post(in req status : String)
#_[on_error={
too_long="your tweet exceed 240 characters",
duplicate="you already tweeted this"
}];
Given that annotation, in case an overlong tweet, Genie would generate replies of the form:
The format of the annotation should be an object whose keys are error codes, and whose values are phrases describing the error condition. You can use placeholders to refer to input parameters (but not output parameters, because no output was generated). You should not include a description of the action that was attempted, or an invitation to try different inputs, because both of those will be added automatically.
At runtime, your JS code should catch any low-level API error, and then throw an exception having code
property equal to one of the declared error codes. For example:
async do_post() {
try {
await callTwitterPostAPI();
} catch(e) {
if (isTweetTooLongError(e)) {
const newError = new Error("Tweet too long"); // included in debugging logs
newError.code = 'too_long';
throw newError;
}
// rethrow unexpected errors here
throw e;
}
}
Error codes starting with
E
,E_
andERR_
are reserved for predefined nodejs errors and internal errors.
You should not catch network connectivity or authentication errors, unless you are able to recover them. Instead, you should propagate the low-level error as-is and the agent will handle it appropriately.
In addition to a textual reply, you can specify that your agent should show a graphical or interactive element when it answers. You do so with the formatted
annotation. The annotation takes a list of messages, using object syntax. For each result from your query that the agent presents to the user, all messages specified in the annotation will be instantiated. Each property of a message is a string with placeholders, that are replaced based on the results of the query. Five types of messages are supported: rdl
, sound
, picture
, audio
, video
.
Not all platforms support all types of non-textual output. You should design your skill so that the textual reply is sufficient for the user, or account for all supported platforms.
An RDL message is a clickable link with a title, an optional description, and an optional picture. It is suitable for website links and news articles. It has the following properties:
webCallback
(the link, required)displayTitle
(the title, required)displayText
(the description, optional)pictureUrl
(the picture, optional)See Tutorial 2 for an example of this format type.
A sound
message plays a predefined sound effect, specified with the name
property. At the moment, the available sound effects include those in the Freedesktop sound theme specification.
A picture message shows a picture to the user. It has only one property: url
. E.g., { type="picture", url="${picture_url}" }
(in Tutorial 3).
An audio message plays an audio file to the user, specified by the url
property, which should point to a publicly accessible URL for the audio stream. To maximize compatibility, it is recommended to use a patent-free format such as Ogg/Vorbis.
On voice platforms, the audio is played in the background after the agent is done speaking. If multiple audio files are played for the same agent reply, they are played consecutively. The user can say stop
to stop playing audio. Audio messages can also be interleaved with sound effect messages, and will be played sequentially.
On supported web-based and graphical platforms, the message will appear as an interactive audio player.
A video message plays a video to the user, specified by the url
property, which should point to a publicly accessible URL for the video stream. To maximize compatibility, it is recommended to use a patent-free format such as WebM.
On supported web-based and graphical platforms, the message will appear as an interactive video player. This message is not supported on voice-based platforms, and has no effect.
[Back to Skill Manifest Guide]