Automation is coming to HR. By automating the collection and analysis of large datasets, AI and other analytics tools offer the promise of improving every phase of the HR pipeline, from recruitment and compensation to promotion, training, and evaluation. These systems, however, can reflect historical biases and discriminate on the basis of race, gender, and class. Managers should consider that 1) models are likely to perform best with regard to individuals in majority demographic groups but worse with less well represented groups; 2) there is no such thing as a truly “race-blind” or “gender-blind” model, and omitting race or gender explicitly from a model can even make things worse; and 3) if demographic categories aren’t evenly distributed in your organization (and in most they aren’t), even carefully built models will not lead to equal outcomes across groups.
People analytics, the application of scientific and statistical methods to behavioral data, traces its origins to Frederick Winslow Taylor’s classic The Principles of Scientific Management in 1911, which sought to apply engineering methods to the management of people. But it wasn’t until a century later — after advances in computer power, statistical methods, and especially artificial intelligence (AI) — that the field truly exploded in power, depth, and widespread application, especially, but not only, in Human Resources (HR) management. By automating the collection and analysis of large datasets, AI and other analytics tools offer the promise of improving every phase of the HR pipeline, from recruitment and compensation to promotion, training, and evaluation.
Now, algorithms are being used to help managers measure productivity and make important decisions in hiring, compensation, promotion, and training opportunities — all of which may be life-changing for employees. Firms are using this technology to identify and close pay gaps across gender, race, or other important demographic categories. HR professionals routinely use AI-based tools to screen resumes to save time, improve accuracy, and uncover hidden patterns in qualifications that are associated with better (or worse) future performance. AI-based models can even be used to suggest which employees might quit in the near future.
And yet, for all the promise of people analytics tools, they may also lead managers seriously astray.
Amazon had to throw away a resume screening tool built by its engineers because it was biased against women. Or consider LinkedIn, which is used all over the world by professionals to network and search for jobs and by HR professionals to recruit. The platform’s auto-complete feature for its search bar was found to be suggesting that female names such as “Stephanie” be replaced with male names like “Stephen.” Finally, on the recruiting side, a social media ad for Science, Technology, Engineering and Math (STEM) field opportunities that had been carefully designed to be gender neutral was shown disproportionately to men by an algorithm designed to maximize value for recruiters’ ad budgets, because women are generally more responsive to advertisements and thus ads shown to them are more expensive.
In each of these examples, a breakdown in the analytical process arose and produced an unintended — and at times severe — bias against a particular group. Yet, these breakdowns can and must be prevented. To realize the potential of AI-based people analytics, companies must understand the root causes of algorithmic bias and how they play out in common people analytics tools.
The Analytical Process
Data isn’t neutral. People analytics tools are generally built off an employer’s historical data on the recruiting, retention, promotion, and compensation of its employees. Such data will always reflect the decisions and attitudes of the past. Therefore, as we attempt to build the workplace of tomorrow, we need to be mindful of how our retrospective data may reflect both old and existing biases and may not fully capture the complexities of people management in an increasingly diverse workforce.
Data can have explicit bias baked directly into it — for example, performance evaluations at your firm may have been historically biased against a particular group. Over the years, you have corrected that problem, but if the biased evaluations are used to train an AI tool, the algorithm will inherit and propagate the biases forward.
There are also subtler sources of bias. For example, undergraduate GPA might be used as a proxy for intelligence, or occupational licenses or certificates may be a measure of skills. However, these measures are incomplete and often contain biases and distortions. For instance, job applicants who had to work during college — who are more likely to come from lower-income backgrounds — may have gotten lower grades, but they may in fact make the best job candidates because they have demonstrated the drive to overcome obstacles. Understanding potential mismatches between what you want to measure (e.g., intelligence or ability to learn) and what you actually measure (e.g., performance on scholastic tests) is important in building any people analytics tool, especially when the goal is to build a more diverse workplace.
How a people analytics tool performs is a product of both the data it’s fed and the algorithm it uses. Here, we offer three takeaways that you should bear in mind when managing your people.
First, a model that maximizes the overall quality of the prediction — the most common approach — is likely to perform best with regard to individuals in majority demographic groups but worse with less well represented groups. This is because the algorithms are typically maximizing overall accuracy, and therefore the performance for the majority population has more weight than the performance for the minority population in determining the algorithm’s parameters. An example might be an algorithm used on a workforce comprising mostly people who are either married or single and childless; the algorithm may determine that a sudden increase in the use of personal days indicates a high likelihood of quitting, but this conclusion may not apply to single parents who need to take off from time to time because their child is ill.
Second, there is no such thing as a truly “race-blind” or “gender-blind” model. Indeed, omitting race or gender explicitly from a model can even make things worse.
Consider this example: Imagine that your AI-based people analytics tool, to which you have carefully avoided giving information on gender, develops a strong track record of predicting which employees are likely to quit shortly after being hired. You aren’t sure exactly what the algorithm has latched onto — AI frequently functions like a black box to users — but you avoid hiring people that the algorithm tags as high risk and see a nice drop in the numbers of new hires who quit shortly after joining. After some years, however, you are hit with a lawsuit for discriminating against women in your hiring process. It turns out that the algorithm was disproportionately screening out women from a particular zip code that lacks a daycare facility, creating a burden for single mothers. Had you only known, you might have solved the problem by offering daycare near work, not only avoiding the lawsuit but even giving yourself a competitive advantage in recruiting women from this area.
Third, if the demographic categories like gender and race are disproportionately distributed in your organization, as is typical — for example, if most managers in the past have been male while most workers female — even carefully built models will not lead to equal outcomes across groups. That’s because, in this example, a model that identifies future managers is more likely to misclassify women as unsuitable for management but misclassify men as suitable for management, even if gender is not part of the model’s criteria. The reason, in a word, is that the model’s selection criteria are likely to be correlated with both gender and managerial aptitude, so the model will tend to be “wrong” in different ways for women and men.
How to Get It Right
For the above reasons (and others), we need to be especially aware of the limitations of AI-based models and monitor their application across demographic groups. This is especially important for HR, because, in stark contrast to general AI applications, that data that organizations use to train AI tools will very likely reflect imbalances that HR is currently working to correct. As such, firms should pay close attention to who is represented in the data when creating and monitoring AI applications. More pointedly, they should look at how the makeup of training data may be warping the AI’s recommendation in one direction or another.
One tool that can be helpful in that respect is a bias dashboard that separately analyzes how a people analytics tool performs across different groups (e.g. race), allowing early detection of possible bias. This dashboard highlights, across different groups, both the statistical performance as well as the impact. As an example, for an application supporting hiring, the dashboard may summarize the accuracy and the type of mistakes the model makes, as well as the fraction from each group that got an interview and was eventually hired.
In addition to monitoring performance metrics, managers can explicitly test for bias. One way to do this is to exclude a particular demographic variable (e.g., gender) in training the AI-based tool but then explicitly include that variable in a subsequent analysis of outcomes. If gender is highly correlated with outcomes — for example, if one gender is disproportionately likely to be recommended for a raise — that is a sign that the AI tool might be implicitly incorporating gender in an undesirable way. It may be that the tool disproportionately identified women as candidates for raises because women tend to be underpaid in your organization. If so, the AI-tool is helping you solve an important problem. But it could also be that the AI tool is reinforcing an existing bias. Further investigation will be required to determine the underlying cause.
It’s important to remember that no model is complete. For instance, an employee’s personality likely affects their success at your firm without necessarily showing up in your HR data on that employee. HR professionals need to be alert to these possibilities and document them to the extent possible. While algorithms can help interpret past data and identify patterns, people analytics is still a human-centered field, and in many cases, especially the difficult ones, the final decisions are still going to be made by humans, as reflected in the current popular phrase “human-in-the-loop-analytics.”
To be effective, these humans need to be aware of machine learning bias and the limitations of the model, monitor the models’ deployment in real-time and be prepared to take necessary corrective action. A bias-aware process incorporates human judgement into each analytical step, including awareness of how AI tools can amplify biases through feedback loops. A concrete example is when hiring decisions are based on “cultural fit,” and each hiring cycle brings more similar employees to the organization, which in turn makes the cultural fit even narrower, potentially working against diversity goals. In this case broadening the hiring criteria may be called for in addition to refining the AI tool.
People analytics, especially based on AI, is an incredibly powerful tool that has become indispensable in modern HR. But quantitative models are intended to assist, not replace, human judgment. To get the most out of AI and other people analytics tools, you will need to consistently monitor how the application is working in real time, what explicit and implicit criteria are being used to make decisions and train the tool, and whether outcomes are affecting different groups differently in unintended ways. By asking the right questions of the data, the model, the decisions, and the software vendors, managers can successfully harness the power of People Analytics to build the high-achieving, equitable workplaces of tomorrow.