Gender Biases in Google's AI Compromise Healthcare, LSE Says

Share this article
Share this article
Prioritise Us on Google
There are concerns that AI is being rolled out into critical sectors without its capabilities, tendencies and biases being fully understood
A study by LSE has found that Google's Gemma model downplays women's health needs compared to men's in social care assessments used by English councils

A new study published by the London School of Economics (LSE) has found that the AI systems used by more than 50% of England's local authorities are systematically underplaying the health concerns of women, while simultaneously amplifying those same issues when they affect men.

LSE's study looked into how different LLMs process identical case notes, with only the gender of the subject changed between versions.

Dr Sam Rickman, the lead author of the report, found that Google's Gemma model consistently used more serious language when describing men's conditions.

"Google's model, in particular, downplays women's physical and mental health needs in comparison to men's," Sam explains.

Dr Sam Rickman, lead researcher on the LSE's study into this subject

The research analysed 29,616 pairs of AI-generated summaries from real case notes of 617 adult social care users.

When processing identical information about an 84-year-old living alone with mobility issues, Gemma described the male version as having "a complex medical history, no care package and poor mobility".

But when those same case notes were about a female subject, the output was drastically different. "Mrs Smith is an 84-year-old living alone. Despite her limitations, she is independent and able to maintain her personal care," the LLM said.

Youtube Placeholder

What does this mean for the allocation of care in the UK?

The findings raise significant concerns about fairness in social care provision, where AI tools are increasingly used to support overstretched social workers.

"Because the amount of care you get is determined on the basis of perceived need, this could result in women receiving less care if biased models are used in practice," Sam says.

The research found that descriptive terms such as "disabled", "unable" and "complex" appeared significantly more frequently when describing men, even if a woman was said to have identical needs.

In another example, the AI described a male subject as "unable to access the community" while the female equivalent was characterised as "able to manage her daily activities".

According to LSE, Google's Gemma AI will provide different suggestions for men and women in need of care, even if they have identical case notes | Credit: NHS

Model variations and regulatory concerns

Not all AI models demonstrated the same level of bias, however.

Meta's Llama 3 model showed no significant gender-based language differences when processing the same case notes, which suggests that the issue is not inherent to all LLMs.

Still, Sam believes that regular testing of all AI models is imperative if accurate responses and equitable services are to be ensured. 

"More are being deployed all the time," he says, "making it essential that all AI systems are transparent, rigorously tested for bias and subject to robust legal oversight."

In its research, LSE calls for regulators to "mandate the measurement of bias in LLMs used in long-term care" to ensure algorithmic fairness and to avoid misunderstandings.

"Responsible AI can produce fantastic outcomes, but embedding old prejudices in our digital future is not a 'productivity gain'," explains Tami Hoffman, Director of Public Policy at the Guardian.

Tami Hoffman, Director of Public Policy at the Guardian

The response of the industry

Google has since said that its teams would examine the study's findings.

The US-based tech giant also noted that LSE's researchers tested the first-generation Gemma model, instead of the current third-generation, so it is unclear whether this would have been an issue with Google's most modern technology.

It is also important to note that Google has never suggested that Gemma should be used for medical purposes.

This calls into question how AI models should be appropriately applied in organisations, especially those as critical as healthcare for vulnerable demographics.

The findings of LSE's study speak to broader concerns about the biases that AI models may or may not have. After all, this is not the first time issues such as these have been discovered.

A separate study, conducted in the US, examined 133 AI systems and found that 44% of them displayed a clear gender bias, with 25% exhibiting both gender and racial biases.

Jen Fenner, Co-Founder and Managing Director at DefProc Engineering

It is an issue that has many people worried.

"Gender health gaps already harm outcomes, from reduced access to services to misdiagnoses and the all-too-common experience of not being heard," says Jen Fenner, Co-Founder and Managing Director at DefProc Engineering.

"Without transparency and rigorous bias testing, AI risks reinforcing the inequalities it should be helping to eliminate."

Company portals