When I first started delving into bots, it was more a programming challenge and less of a serious venture. Bots are fun to hack together and a great way to flex programming muscles.
Then bots were announced in F8 and suddenly bots became serious business. It’s funny how trends in the software world rapidly gain critical mass. We are barely 4 months into the bot hype cycle but it seems like its been 4 years . At FactorDaily, my mandate is to figure new ways of news interaction, distribution and all other things tech that impact publishing. Obviously bots are now a BIG part of my job.
When we started thinking about bots as a long term product, we began thinking of intents, actions, and dialogs. We began thinking of context, user-sessions for bots and bot vocabulary. And, we began thinking of decoupling the bot from the brain.
This post outlines all these things and introduces Athena, the platform we have created for rapidly rolling out bots on any network for any use case.
Our hope is to share our learning and also to get feedback / suggestions on how we might have done things better or differently.
The bot and the brain
This isn’t about AI, but software can be smart without being AI. The usage of the term brain is to differentiate between the action and the logic part of a bot.
When users interact with a bot on any medium – say Messenger or Slack, it’s not just the conversation that matters. A good bot / a meaningful bot, can do a few things well, in an intelligent manner. This part of the bot experience, that actually “understands, interprets and does things based on commands” is what we call the brain.
Athena is the “brain platform” we built at FactorDaily for powering our bots. It isn’t AI for sure. But it’s certainly not dumb.
Athena works on a JSON based request response mechanism, and provides very simple ways to achieve the following –
- Signup users.
- Maintain context and detect returning users.
- Sockets based event broadcast mechanism to enable push notifications.
- Create conversation flows and change conversations on the fly.
- Create new intents and actions at runtime.
- Plug in to any intent management system you choose, at runtime.
- In short, Athena allows you to rapidly write bots without having to worry about a server side framework.
Since Athena was designed for a news and publishing world, it works best for such use cases, but is not limited to them. The following sections detail out each part of Athena architecture.
Intents are common knowledge in the bot world. The core idea behind an intent is that when a user interacts with a bot, the bot needs to find out what the user wants (intention) and the perform an action accordingly.
In Athena, early on we realised that it would be impossible to plan for the various kinds of use-cases that might arise in the future and therefore the hundreds of different kinds of intents. In fact, we strove to create a system that was as future proof as possible. So we developed a simple protocol to map intent names to intent logic files. Let’s understand this more with an example.
Assume you want to create a bot that tells local time of any city of the world. The first thing you would need to do, is to teach the bot to understand a user’s intent when the user says something like
What's the time right now in Paris ?
Without delving into how this is done, let’s say you mapped it to an intent called ask_time
So, your intent parser, given a phrase What's the time right now in Paris will return an intent ask_time and an entity Paris
Now, you need to write some code that will do the job of finding what the time right now in Paris is and return this to the user.
With Athena, you put this code in ask_time.js place it in the /intents folder. The only other thing you need to do is to ensure you maintain the correct JSON response signature. We discuss this a little later, below. This is a very trivial example, but it outlines a very crucial aspect of Athena.
This mechanism allows us to create unlimited intents and actions like these and keep deploying them at runtime (literally) and make our bots more capable as new requirements surface.
JSON based I/O
We don’t know what kind of bots we will build in the future. We don’t know if they will be text based or voice based, we don’t know the channels we will build them for and we don’t know what interactions they might have. So we decided to create a JSON based input / output signature format. This works as below:
Prospective Bot: Hey Athena, I want to send you some information for a conversation that I had with a human. Athena: No problem, just package it in a nice JSON and label it with the following things - what is your bot unique identifer - what did the human say to you - who is the human - anything else the human sent to you Prospective bot: Awesome, I will do that, but how will I know what to say to the human in return. Athena: No worries, I will take this JSON, do my magic and reply to you in another JSON, that has the following information - what the human really wants - what did he want last time he came - anything else as an attachment that you might send to the human.
Now, it becomes the calling bot’s job to take this response and show it in a way needed on its interfaces. This signature based approach has allowed us to write multiple bots for slack, Facebook (telegram and Skype coming soon) and run all of them from a single backend.
Externalised Dialog files
This is an old trick applied famously by I18N and L10N experts. It’s known in the software world as externalisation of content. Here is a simple example –
When you want to say Click Here for action on a button, you never write that as a text on the button in your code. You simply put a unique meaningful string identifier such as "BUTTON_ACTION_TEXT" in your code. Then you create a text (or JSON) file that says BUTTON_ACTION_TEXT: "Click Here for action"
Now, if you have two different kinds of applications all running on the same code base, and similar logic – say version simple action button and version complex action button – you could create two separate files, each containing the same string but different content.
Simple version BUTTON_ACTION_TEXT: "Click Here for simple action" Complex version BUTTON_ACTION_TEXT: "Click Here for complex action"
We extended this same concept to Athena, and externalised all bot conversation to external dialogs.json files. This allows us to drive different kinds of conversation profiles for different kinds of bots.
For example, we might want to build a serious bot, where the welcome message might be very different from the welcome message in a bot aimed at teenagers. We can do this with Athena, simply by loading different dialog files based on the identifier assigned to the bot.
Maintaining user context
Often in the conversation flow of a bot, you need to know what was said in the earlier steps of the discussion. Sometimes you also want to know the discussion a user had when he visited the bot last. Another tricky area in the bot world is the concept of a session – since a user never logs in our logs out, we never know when a session starts and when it ends – unless we track the context and duration of conversation.
We solved all these problems through a simple idea of maintaining conversation logs.
We drew an analogy from how our brain works, when we meet and interact with people in our daily lives.
While conversing with people, we store information in our short term memories all the time. This is akin to maintaining a conversation log, and then accessing the conversation log of the person when we meet them next time. Similarly, when we meet someone new, we don’t have a conversation log for that person – hence no context – which signals our brain to gather information about this person and create a log file.
We used this concept in Athena and developed a system of scratch files that keep track of the users’ conversations. When the bot sends Athena an input JSON, we check this scratch file and factor the context of the conversation to figure out the real intent of the user.
Summing it up
This post has already gotten much longer than I wanted it to be – but it’s hard to capture all aspects of the platform in short. There’s more to cover, which I will in part 2 of this post.
The astute reader will point out that there are many bot platforms out there that allow all of this and have a lot more features.
Our aim at building Athena was to have complete control over all aspects of the end goal of publishing and news delivery – we believe that Athena does that well.
We also believe that conversational UI is here to stay and to give users a meaningful experience, a simple bot built on a third party platform isn’t sufficient. We wanted to own the core of the experience. We don’t do everything in-house – for example intent parsing is left to the experts and we use Recast and API.ai for that. Similarly, a lot of other APIs are used for performing various functionalities.
But owning the core platform has helped us tweak user experience at the end, and be more confident of a long term play. Factorbot, the simple QnA and notifications bot is powered by Athena, and we are about to release the first set of bots powered news experiences soon.
Our bucket list of next steps is a long one, and we plan to improve the “brain” aspects of Athena more, based on what we learn from the field.
We are open to sharing the Athena platform with a others in the publishing or news industry. We aren’t ready to fully open source Athena just yet (though we might do that sometime soon) but if you are interested in your own bot experience and want to try out Athena, do ping us. We are a tiny engineering team, but always happy to share and help.
If you have reached here, then Thanks! Happy to hear your thoughts / feedback and observations. This is a brave new world, and we will only learn from sharing and from mistakes.
End- Note: My editorial team insists I disclose the fact that our project is internally named Cassandra (on github). We chose to publicly release it as athena due to the existence of the Open-source project Cassandra, to avoid confusion.