How to evaluate the result of a feature

When making features of a application, it’s important to evaluate the result of the feature. It’s beneficial to keep track of how well the team is working and motivate teammates in the perspective of their impact on the users.

The most important tool is AB Test. Now I’m using Statisig for AB Testing tool. There are some categories of things to measure.

  1. How much the feature is tracked
    • For example, it can be tracked by the numbers of new button tap events.
  2. How the feature increased the use of the screen it is embeded.
  3. How the feature increased the other key feature
    • For example, for finance app, the number of stocks user registered as their asset can be increased by adding a feature that improve the analysis of their asset.
  4. DAU

However AB Test is not always possible to be accompanied by all features. For this case, there are some sideways. But because the result is dependent on the time, it can be affected by other things aside the feature. For finance app, some events that occurred during the time of the measurement can affect the results. And also if there were other features released in the meantime of the measurement, the result can be affected. Anyway, there are some ways to measure in this case. It’s quite similar to the things when there was an AB Test.

  1. Divide the timeline. If the release date is 5/3/23, the period to be compared as the period without feature is two weeks before the 5/3/23 to 5/3/23. It’s important not to vary the day of the week between the to period, because usually user app behaviors are often affected by the day of the week.
  2. Now compare the four things above.
  3. If the result does not seem to work, there’s other way to measure. It’s to measure the things done by only users who experienced the new feature. You can do this by limiting the user to whom performed the new feature after the release date. And add one more period. It’s 4 weeks before the release date to 2 weeks before the release date. It’s to limit to users who were already used the app before the period to compare. Limit the users who did any active event during the period. If you don’t, the result will be biased, because the result of the period after the release will include the activities who joined after the released. Then the statistics will show the users’ activity changed after they experience the new feature.

++

For making statistics requirement, it’s important to have consistency of what to log. Plus it’s good to collect all users tapping data from start. At least if you have the logs of tap events, it can imply any screen exposures.