Bot mitigation programs often look stronger on dashboards than they feel in operations.

Every threshold spends something

False positives are not an abstract statistic. They become:

  • checkout friction
  • login escalations
  • support load
  • conversion loss

If you do not model that cost explicitly, the system will drift toward invisible harm.

Tune against outcomes, not only scores

A production review should ask:

  1. Which enforcement decisions were reversed?
  2. Which customer journeys degraded?
  3. Which features created the most analyst disagreement?

Minimum review loop

That is what separates an interesting research detector from a production control.