The NLP API provides low-level access to the speech and natural language capabilities of Genie. It is suitable to build custom experiences that need additional control beyond what is provided by the dialog API.
At the moment, there is no authentication or rate limiting in the API. This might change in the future.
This API is experimental and may be significantly modified in the future. Please use with caution.
The API is available at https://nlp.genie.stanford.edu. You must append the desired locale to the URL, for example: https://nlp.genie.stanford.edu/en-US. Currently, only en-US
is officially supported; other locales are in development and will be added in the future.
NOTE: Genie client libraries will append the locale automatically, so you should pass the URL without it.
All APIs use POST
request methods. Except where noted, APIs expect a JSON request body, with appropriate Content-Type
header.
The most basic API: given a sentence, return the corresponding ThingTalk code.
POST /en-US/query
Host: nlp.almond.stanford.edu
Content-Type: application/json
{
"q": "get a cat picture",
"thingtalk_version": "1.9.0",
"store": "yes"
}
HTTP/1.1 200 Ok
Content-Type: application/json
{
"result": "ok",
"candidates": [
{
code: ["now", "=>", "@com.thecatapi.get", "=>", "notify"],
score: 1.0
}
],
"tokens": ["get", "a", "cat", "picture"],
"entities": {},
"intent": {
"question": 0,
"command": 1,
"chatty": 0,
"other": 0
}
}
All parameters are optional except for q
. thingtalk_version
is optional for compatibility reasons but strongly recommended.
q
: the input from the userthingtalk_version
: the version of ThingTalk used by the client application; use this parameter to ensure that the produced code is compatible with the clientstore
: one of yes
, no
; controls whether the sentence can be stored for analysis and research; defaults to no
if not providedlimit
: maximum number of candidate parses to return; note that, depending on the model, the actual number might be lowerexpect
: what type the client is expecting; currently, the only recognized values are Location
and MultipleChoice
choices
: an array of strings indicating the possible options the user is choosing from; this is ignored unless expect
is set to MultipleChoice
context
: the current state of the dialogue agent, as a ThingTalk string in neural network syntax; this is used only for contextual (multi-turn) NLP models, which are experimental and not yet supportedentities
: entities present in the contexttokenized
(boolean): if specified, the input from the user is assumed to be already tokenized; this is used primarily to evaluate a trained against a dataset that was already preprocessedskip_typechecking
(boolean): if specified, the server will not check syntax and types of the produced parses, returning the raw result from the neural model; this is only useful during evaluation, and you must have an admin-level developer key to use this optionaccess_token
: access token to control access to a private NLP modeldeveloper_key
: Thingpedia developer key to use to access unpublished devicesresult
: either ok
, or absent in case of an errorcandidates
: an array of candidate parses, sorted from the most to the least likelycandidates[].code
: the ThingTalk code of the candidate parse, as an array of tokenscandidates[].score
: the likelihood score of the parse; the special value Infinity
indicates that the sentence was matched exactly instead of using a neural modeltokens
: tokenized input from the userentities
: entities extracted from the user's inputintent
: high-level intent of the user's inputintent.command
: likelihood that the user's input was a command or question that can be interpreted in ThingTalk; candidates
should be considered unreliable unless intent.command
has high-scoreintent.question
: likelihood that the user's input was an open-domain question suitable for a search engineintent.chatty
: likelihood that the user's input was chatty text (unsupported)intent.other
: likelihood that the user's input was not in any of the other categoriesConverts an audio file containing speech to the text representation.
POST /en-US/voice/stt
Host: nlp.almond.stanford.edu
Content-Type: multipart/form-data; boundary=XXXXX
--XXXX
Content-Type: audio/x-wav
Content-Disposition: form-data; name="audio"; filename="audio.wav"
... raw audio data ...
HTTP/1.1 200 Ok
Content-Type: application/json
{
"result": "ok",
"text": "Recognized text."
}
The body of the request must contain a .wav
file with the correct MIME type audio/x-wav
in a field named audio
. The filename must be specified, but can have any value. The wav file needs to have a sample rate of 16000 Hz, and must be in PCM mono format, encoded as 16 bit signed little-endian.
result
: either ok
, or absent in case of an errortext
: the recognized text, capitalized and punctuated correctlyConverts an audio file containing speech to the ThingTalk interpretation, in one step. This combines the /voice/stt
and /query
APIs
POST /en-US/voice/query
Host: nlp.almond.stanford.edu
Content-Type: multipart/form-data; boundary=XXXXX
--XXXX
Content-Type: audio/x-wav
Content-Disposition: form-data; name="audio"; filename="audio.wav"
... raw audio data ...
--XXXX
Content-Disposition: form-data; name="metadata"
{"thingtalk_version": "1.9.0", "store": "yes"}
--XXXX--
HTTP/1.1 200 Ok
Content-Type: application/json
{
"result": "ok",
"text": "Recognized text.",
"candidates": [
{
code: ["now", "=>", "@com.thecatapi.get", "=>", "notify"],
score: 1.0
}
],
"tokens": ["get", "a", "cat", "picture"],
"entities": {},
"intent": {
"question": 0,
"command": 1,
"chatty": 0,
"other": 0
}
}
The body of the request must contain a .wav
file with the correct MIME type audio/x-wav
in a field named audio
. The filename must be specified, but can have any value. The wav file needs to have a sample rate of 16000 Hz, and must be in PCM mono format, encoded as 16 bit signed little-endian.
The request must also contain a field called metadata
, containing a JSON payload with the request parameters to the /query
endpoint (except for q
). The meaning of the parameters is the same.
The response returns the same parameters as /query
, with the addition of text
, which is the raw extracted text from the sound file.
Convert text to an audio file.
POST /en-US/voice/stt
Host: nlp.almond.stanford.edu
Content-Type: application/json
{
"text": "Text to convert to speech."
}
HTTP/1.1 200 Ok
Content-Type: audio/x-wav
... raw audio data...
The request returns the generated audio file directly.
Tokenizes and preprocesses a sentence, extracting numbers, dates, times, etc.
POST /en-US/tokenize
Host: nlp.almond.stanford.edu
Content-Type: application/json
{
"q": "wake me up at 7 am with 30 cat pictures"
}
HTTP/1.1 200 Ok
Content-Type: application/json
{
"result": "ok",
"tokens": ["wake", "me", "up", "at", "TIME_0", "with", "NUMBER_0", "cat", "pictures"],
"entities": {
"TIME_0": { "hour": 7, "minute": 0, "second": 0 },
"NUMBER_0": 30
},
"raw_tokens": ["wake", "me", "up", "at", "7:00", "with", "30", "cat", "pictures"],
}
All parameters are optional except for q
.
q
: the input from the userexpect
: what type the client is expecting; the values are the same as the /query
APIresult
: either ok
, or absent in case of an errortokens
: the tokenization of the input sentenceentities
: entities extracted from the sentence; this is an object with one key for each upper-case token in tokens
raw_tokens
: the tokenization of the input sentence, before recognizing entitiesThis API trains the model interactively, and stores the new sentence for later retraining.
POST /en-US/learn
Host: nlp.almond.stanford.edu
Content-Type: application/json
{
"q": "get a cat picture",
"target": "now => @com.thecatapi.get => notify",
"thingtalk_version": "1.9.0",
"store": "online"
}
HTTP/1.1 200 Ok
Content-Type: application/json
{
"result": "ok",
"message": "Learnt successfully",
"example_id": 123456
}
q
(required) : the input from the usertarget
(required): the ThingTalk code corresponding to this input from the user, in neural network syntax, with tokens separated by a single spacethingtalk_version
(required): the version of ThingTalk used by the client application; this must be exactly the same version as the server is using, or the request will have no effectstore
: one of no
, automatic
, online
, commandpedia
; indicates the provenance of the sentence, which affects how it is stored and how it is used for training; if store is no
, then the request checks that the code is compatible with the sentence, but has no persistent effects; defaults to automatic
access_token
: access token to control access to a private NLP modeldeveloper_key
: Thingpedia developer key to use to access unpublished devicesowner
: an opaque string identifying the user that wrote this sentence; this can be used to support deletion of sentences from the training set, for example for compliance purposesresult
: ok
on success, or absent on failuremessage
: a human-readable string indicating what actually happened to the sentenceexample_id
: if the sentence was added to the database, this is the ID of the newly created training example