
You should always start by defining example sentence of how a user might trigger your code.

It allows your skill to define which sentences will trigger specific intents so you must provide enough data for it to understand patterns.


You should define training data in all languages that you wish to support in your skill.


It uses a specific interpreter agnostic format called chatl that I also maintain. Its goal is to be easy to write and read by humans.

This tiny DSL will be transformed to a format understandable by your interpreter of choice.

So, going back to our skill, let’s define some training data:

from pytlas import training

def my_data(): return """
  turn the @[room]'s lights on would you
  turn lights on in the @[room]
  lights on in @[room] please
  turn on the lights in @[room]
  turn the lights on in @[room]
  enlight me in @[room]

  turn the @[room]'s lights off would you
  turn lights off in the @[room]
  lights off in @[room] please
  turn off the lights in @[room]
  turn the lights off in @[room]


  living room

Where %[lights_on] and %[lights_off] define intents, @[room] is an entity and ~[basement] is a synonym.

Best practices

Here is some thoughts about making great training data.

  • Use lowercase
  • Avoid punctuation
  • Give at least 10 sentences per intent
  • Provide variety in your samples