To Get Better Customer Data, Build Feedback Loops into Your Products

The combination of user data and AI often creates data feedback loops. This means that as a firm gathers more customer data, it can feed that data into machine learning algorithms to improve its product or service, thereby attracting more customers, generating even more customer data. Think, for example, of search engines: the more people search on Google and click on the links provided, the more data Google gathers, which allows its algorithms to provide more accurate and relevant search results, attracting even more users and searches, and so on.

Such data feedback loops can help create a sustainable competitive advantage, provided that certain conditions exist. But the strength of these data feedback loops can vary greatly, and companies can make deliberate choices in their products or services to strengthen them. How to do so is the focus of this piece.

Not All Data Feedback Loops Are Created Equal

Some products have naturally very strong data feedback loops. Think of smart thermostats, where every temperature adjustment by a user provides a valuable data signal that the device can use to achieve better personalization. Or Google Maps, where every user’s choice of route and the time taken to reach the destination help the algorithm improve its route recommendations and traffic predictions. Or Spotify, whose recommender system learns directly from users’ choices of which recommended songs to include in their playlists and how often they listen to those songs. The reason these feedback loops are so strong is that users reveal clear and unambiguous signals of their preferences in the natural process of consuming the product, which are leveraged to further improve the product or the service for those users.

At the other end of the spectrum are products with naturally weak data feedback loops. Their usage is hard to track or does not reveal useful information about user preferences, or it is difficult and slow to gather informative feedback from users.

This is obviously true of traditional “dumb” products like cars, furniture, and clothes: They are not digitally connected, so the only way to create any data feedback loops from customers is to manually collect feedback via focus groups and surveys, which can only help for future product releases. And it is also true of products where the feedback loops involve very long cycles of learning and improvement such as financial institutions’ credit scoring systems (feedback mainly comes from defaults, which take years to materialize), or venture investing (it takes years to figure out which startups succeed and which fail).

Less obvious is that even some digital products that collect lots of user data may have weak data feedback loops. This means most of their value comes from pre-programming and in-house data training and does not increase much, if at all, via learning from users. For instance, popular wearables such as Fitbit, Whoop, Nutrisense, and Oura have fairly weak data feedback loops, even though they provide lots of data insights.

Consider the latest Fitbit tracker, the Charge 5, which has an impressive list of features. The tracker measures a wide array of data from the user (heart rate, speed and distance of movement, sleep, skin temperature) and provides valuable insights such as heart rate variability, time spent in different heart rate zones, a cardio fitness score, a readiness score for working out, a quality of sleep score. However, these insights do not appear to get better with more usage or more users. They are straightforward summary statistics of what is being measured by the tracker, or the result of comparisons between user data measured by the Fitbit tracker and relevant reference points based on pre-existing research, which are pre-programmed in the Fitbit system.

For example, Fitbit explains that “the cardio fitness score is determined by your resting heart rate, age, sex, weight, and other personal information.” Similarly, the readiness score is based on the user’s recent sleep patterns as measured by the tracker — which are presumably compared to some stable reference points. In other words, these are estimates of the “true” state of a user, but they do not improve with more usage data, simply because Fitbit has no way of figuring out how close its estimates came to the “true” state and adjusting them accordingly.

Of course, there may be some limited ways in which wearables could improve the value provided to each user based on usage data from that user or other users. At the most basic level, any connected product can learn which user interface is more engaging (e.g., via A/B testing), which is technically a form of data feedback loop, albeit a limited one — table stakes for most products.

And there are correlations that wearables could analyze to recommend behavior. For instance, a correlation between the time users go to bed and the quality of their sleep, which can be used to recommend an optimal time to go to bed; or between the time users exercise and the quality of their sleep, which can be used to recommend an optimal time to exercise. In these cases, as users adjust their behavior based on the recommendations, the wearable gets some feedback on whether this is helping or not, and so can further improve by learning from users. Still, this type of process requires lots of data from lots of users, and the wearable provider may never really know the extent to which correlation means causation.

Finally, consider the data feedback loops of large language models (LLMs) such as Open AI’s Chat GPT or Google’s Bard. These models ingest enormous amounts of data from the web and use machine learning models to generate answers to user questions. In the early versions of these models, their capabilities are largely determined by pre-launch “in-house” training and testing — the quality of their answers improves only to a moderate extent with more users or more usage. Indeed, as of this writing, there are two main mechanisms that create data feedback loops around LLMs:

  1. Users click on the thumbs-up or thumbs-down buttons at the end of the answers. (Most users probably ignore this most of the time.)
  2. Users ask follow-up questions, which could signal whether an answer was satisfactory or not. (This inference is likely difficult in most cases.)

Note the difference with regular search (on Bing or Google), where the user’s choice of which displayed links to click on provides a much clearer signal of the relevance of the search results. Of course, things may change drastically as the LLMs are updated and start incorporating more reliable ways to generate data feedback loops — a topic we discuss in the next section.

How to Enhance Data Feedback Loops

The key question is then, for those products where the data-enabled learning feedback loops are not naturally strong, what can be done to enhance them?

(Re)design the product to create natural data feedback loops.

Ideally, one would want to (re)design the product or service in such a way that customers, in the natural course of using the product, are creating data that signals how useful/effective the product is for users. This data can then be used by the provider to improve the quality of the product or service.

For example, LLMs could add new features that allow users to save and organize the responses they found most helpful into folders of favorites (akin to bookmarks) and delete those they don’t want to keep. They could add a document-creation feature (akin to Microsoft Word) where they would copy and edit LLM responses. They could also create challenge games (AI vs. users) and leader boards where the AI and users seek to answer questions, and users vote on the answers. And so on. The idea is to create opportunities for users to provide reliable signals of the perceived quality of the LLMs’ answers, which can then be used to improve their algorithms.

Fitbit and other wearables could add standardized fitness tests or challenges (e.g., run one mile or do three sets of 20 sit-ups) and measure users’ total time, heart rate, and other biometrics before and after. This would allow the wearables’ AI to more accurately predict users’ readiness for working out as they observe more data from more users. Which in turn should induce users to rely on the wearables more and more for deciding when to work out. Of course, this still relies on assumptions about unobserved user behavior, but the key point is that the devices need to perform some sort of quasi-experiments on users in order to obtain data that can help them learn and improve.

Integrate with other products to create data feedback loops.

In many cases, redesigning the actual product or service to engineer feedback loops may be challenging (as illustrated by the wearables example above). An alternative way of achieving the same goal is to integrate your offering with other existing products that customers already use or could use.

For example, Fitbit (or Whoop or Oura) could create an integration with smart thermostats to enable the wearable to automatically control the ambient temperature during a user’s sleep. This would allow the wearable’s AI to adjust the temperature and determine its effect on the quality of a user’s sleep as measured by the wearable. The more a customer uses the wearable (i.e., wears it to sleep), the closer the wearable can come to figuring out the ideal temperature pattern for any given user. This can be achieved by automatically experimenting with many different temperature patterns as the user sleeps.

Or the wearables could integrate with Peloton or Tonal, which provide standardized workouts, so that Fitbit’s AI could correlate a user’s biometrics with the type and intensity of workouts. The advantage relative to the option described above (where the wearables are simply asking users to perform specific workouts) is that here the actual workout behavior can be observed.

Similarly, LLMs could integrate with whatever software or tools their answers are being used in. For example, they could integrate with content creation/editing software (e.g., Google Docs, Substack, Salesforce), which would allow them to observe which parts of their answers end up being used in the content created by users and use that data to improve their answers.

Ask users for feedback in a minimally intrusive way that makes the benefits clear to them.

Still, for many products, finding such ways of making user feedback inherent to product usage — directly or via integration — may be hard or impossible. Short of that, the next-best option is to explicitly ask users for feedback. Most online products and services do this to a certain extent. As mentioned above, LLMs ask users for thumbs-up or down after every answer generated; Netflix asks users for a thumbs-up or down after each piece of content they watch in order to improve its recommendation system; Amazon asks buyers to rate the products they purchased; Airbnb asks both travelers and hosts to rate each other, and so on.

Of course, the difficulty with asking for explicit feedback is obtaining useful information while not inconveniencing users too much. Nobody likes being badgered with surveys that “only take three minutes of your time,” or seeing feedback polls (“how likely are you to recommend us to your friends?”) pop up every other screen.

In addition to making requests for user feedback as easy and painless as possible, it helps to clearly communicate to users how that feedback might benefit them personally whenever possible (e.g., “by rating this movie, Netflix will be able to give you better recommendations of which other movies you are likely to enjoy”). This ensures that they have an incentive to provide honest feedback.

Include humans in the loop.

An important way to manufacture a data feedback loop while minimizing the feedback burden on users is to include humans in the loop to complement (or even replace) user feedback. A good example of this is Alexi, an AI-powered legal research service offered to law firms. Customers can submit legal questions along with any relevant case facts, and Alexi sends back a legal research memo containing an answer, complete with summaries of the relevant case law and litigation. The memos are generated by Alexi’s AI but then reviewed and amended (when necessary) by Alexi’s in-house legal team — the humans in the loop. Thus, Alexi’s AI gets the benefit of learning from customer queries and the implicit feedback provided by the in-house team’s corrections, without burdening customers. This works well here because customers do not expect instant answers. (Alexi promises an answer within 24 hours, a very reasonable turnaround in this context.) Nor do they ask hundreds of questions a day like Chat GPT or Bard users might.

Another example of good use of humans in the loop is Grammarly, an AI-powered writing assistant that helps users improve grammar, spelling, punctuation, and style. Grammarly provides suggestions in real time, while users are writing in any application or site (e.g., word processing, email, social media, communication apps). Users have the option to accept or reject the suggested edits, which obviously helps improve the algorithm and creates a data feedback loop. However, Grammarly also uses human reviewers to check the suggestions made by the AI model and review ambiguous cases (a user rejects a suggested edit, but it is unclear why he did so), instead of asking users for feedback.

Other examples of AI-based products or services that use humans in the loop to complement or replace user feedback include content moderation on social media platforms (human moderators get involved in the most complex cases that the AI has trouble with), AI-powered radiology services such as Gleamer (human specialists complement AI diagnostics and would correct any issues in the diagnosis the system automatically writes up), and AI-powered security services such as Deep Sentinel (the AI detects threats and decides when to escalate to human guards, who can then decide whether the AI made the correct call).

Of course, the main weakness of the humans-in-the-loop approach is that it does not scale well to hundreds of thousands or millions of customers or to services with very high frequency of usage and expectation of quick turnaround times. Not many companies can afford to hire armies of over 15,000 content moderators as Meta does for Facebook. This is why most companies employing humans in the loop try to minimize the time required from humans in the process. Nevertheless, the humans-in-the-loop approach can be very effective in the early stages of a product/service, when the learning curve is steepest. The hope is that the need to involve humans decreases rapidly over time as the AI system learns.

• • •

The increasing availability of artificial intelligence, including machine learning algorithms, means that deliberately creating data feedback loops is now possible for most products and services. For some products, it is easy; for others, one needs to find more creative ways to engineer the data feedback loops — via integrations or minimally intrusive requests for user feedback that provide benefits for the users. When they are strong, these feedback loops can create a form of network effect (more users bring more data, which makes the product better, in turn attracting more users, and so on) and compounding competitive advantage.

Leave a Reply

Your email address will not be published. Required fields are marked *