raffle.ai makes AI tools to realize our vision of giving employees or end-users seamless access to company information. We use natural language processing (NLP) machine learning so that the user can search with natural text in the same way you would pose questions to humans.
Machine learning needs training data to work — and the more the better. Making data, for example, by labeling historical queries (that associates them with correct answer) is time-consuming and therefore expensive, and will delay when your AI solution performs well enough to be deployed.
So it sounds like natural text search is out of reach for companies who don’t have the resources to make sufficient training data. But this is actually no longer the case.
NLP has made a lot of progress in recent years because of what we call pre-training. This has been a real game changer for achieving good performance with a smaller investment. To explain pre-training we need to be a bit more specific about what we mean by training data when we’re discussing NLP:
Once trained, a language model can “understand” the meaning of sentences. Or, more precisely, if we take two sentences that carry the same meaning, then their representations will be similar.
This is a very good foundation for building other NLP applications such as a question-answering system because we now have a way to represent our questions in a way that robustly reflects how we ask them.
So the NLP application recipe à la 2020 is to:
But how do we leverage large unlabeled datasets to get representations that learn the meaning of sentences? The key here is context: a single word in a sentence gets part of its meaning from the surrounding text.
So if we train a model to predict a word given a context such preceding words: “Josef walks his <fill in="">” or from surrounding words: “the cat <fill in=""> the mouse,” then the model is forced to learn a representation which is context-aware. </fill></fill>
There are many language models on the market. An early famous one is word2vec. A fascinating finding of its representations is that you can do approximate arithmetic with them such as: “king” - “man” + “woman” ≈ “queen”.
Today, the most popular one is BERT which is short for Bidirectional Encoder Representations from Transformers. BERT is a masked language model which means that the model’s task is to predict one or more words that have been masked out of the input, as shown in the example below.
As is often the case in deep learning, more data and larger models help performance. The standard pre-trained BERT model is a 300 million parameter transformer model trained on the entire Wikipedia and other sources.
It sounds gigantic but it is actually possible to put it into production and run it without noticeable lag for the user. You can try it out with raffle.ai’s own AutoPilot.
In the next post in this series, we’ll look closer at how we fine tune to solve question-answering tasks. We will also look into a larger trend of how search is changing.