Human Mirror Test: Should we use AI?

Do I get to oversee humans if I am right more often?

The Human Mirror Test

Designers and Product Managers often come to me with questions about whether we should use AI, given issues like bias, hallucinations, and inaccuracies. These are big concerns that need to be thought through. However, I believe that avoiding AI altogether usually does more harm than good. To help us reflect on this question, I devised a thought experiment called the Human Mirror Test.

Identify a specific standard by which AI’s performance or behavior is judged, including the logical consequences of failing to meet this standard. The Human Mirror Test requires replacing ‘AI’ with ‘Humans’ in the argument. If the ‘Human Mirror Test’ — this transposition of the standard from AI to Humans — is deemed acceptable, then we affirm the validity of the standard. If not, we should reconsider or reject the standard as inappropriate.

Example 1:

AI Evaluation Standard: ‘AI is biased, and so we should not use AI for hiring decisions’.

Human Mirror Test: ‘Humans are biased, and so we should not use Humans for hiring decisions’.

Example 2:

AI Evaluation Standard: ‘Language Models are known to have produced scientifically inaccurate content that sounds plausible, and so LLMs should not be used in critical sectors’.

Human Mirror Test: ‘Humans are known to have produced scientifically inaccurate content that sounds plausible, and so Humans should not be used in critical sectors’.

Most of us would find the straightforward decision to not use humans for hiring or in critical sectors to be absurd or unacceptable. The Human Mirror Test asks that in assessing the reliability of intelligence – whether artificial or human – we must apply a consistent standard.

Human fallibility

I started thinking about the inconsistency when someone argued that we should not use AI in judicial matters because it might be biased. In essence, their point was, ‘AI can learn our biases, so it’s safer to stick with humans.’ The thought of a biased AI making wrong decisions is worrying. Yet, relying solely on a system that’s already flawed did not feel right either.

When we encounter human bias or errors, we don’t give up; instead, we create systems to manage these issues. Similarly, I believe that we should approach AI with the mindset of improvement, not abandonment.

Data scientists are actively working on algorithmic approaches to improve the reliability of AI systems, and there has been meaningful progress in debiasing algorithmic decisions. Although there’s still progress to be made, it’s reassuring to know that further advancements are in the pipeline.

I also believe that designers of technical systems can play a role in using AI systems with their current imperfections by forging systems that blend human and machine intelligence, compensating for the limitations of both. To me, it’s a win if we manage to improve the current state of affairs, even with a system that’s not flawless.

The Human Mirror Test recognizes that people are flawed as well and that we have devised systems to overcome or at least mitigate the flaws. To avoid the bias of one, we often require committees. In critical areas, we require specialized training and testing before a person is allowed to practice in that domain. Strategies like these can be applied to AI as well.

In essence, I hope that it will lead us to think not about AI overall but about the use of specific models in specific cases. It may turn out that a specific model provides consistently good accounting advice in the USA but fails to do so in Luxembourg. If so, we can use it where it works. 

We may find a model useful but reliable only 90% of the time. In such cases, we would want to design for human verification and oversight. In essence, my argument is that despite imperfections, language models could be useful in several domains. Discarding their use overall may result in more harm than good, especially in the context of human imperfections. The ideal scenario is to design processes that combine our intelligence with that of the machines to create a system that works better overall.

About Vivek Srinivasan

I work with the Program on Liberation Technology at Stanford University. Before this, I worked with the Right to Food Campaign and other rights based campaigns in India. To learn more, click here.

Have thoughts to share?

This site uses Akismet to reduce spam. Learn how your comment data is processed.