In short applying a data-driven approach means we use training data instead of defining rules. For example in Google duplex examples of conversations are used to learn to understand natural interactions when booking a table at a restaurant or making an appointment at a hairdresser and Tesla uses rain and non-rain images to decide when to turn on the window sweeper. This approach taken to the extreme has by some been called software 2.0. The paradigm has become the leading because it gives a performance which given enough training data is better than the rule-based one. The downside is that it that “enough data” is often quite a lot. However, better methods for example word embeddings and few-shot learning are becoming available all the time so the amount of data to reach enough will decrease. In we will use similar ideas so that models trained on one task with little adaptation can be applied in new domains or for new customers. This is an example of few-shot learning.

When adopting this paradigm in the natural language setting we have to pay careful attention to set up data collection schemes so that we can improve the system’s intelligence over time. This could for example be that we make sure that we collect data every time our current deployed system does something that does not makes sense for the user. By focusing on the errors the system will improve faster.