gender-biases-mt

Assessing Gender Biases in Machine Translation

Millions of people use machine translations (MT) every day.

However, these machines are only as good as the data that we provide to them.

Researchers recently discovered that the majority of the data used to train MTs provided more gender associations to male than female.

While MTs are fast, easy, and always improving, they’re not the most accurate form of translations—especially when it comes to gender.

Dreaming of Machines

For hundreds of years, humans have imagined the means of using knowledge and technology to bridge language gaps.

In the 17th century, French philosopher René Descartes proposed the concept of a universal language featuring words in different languages represented with symbols.

By the time the complex global wars and politics of the 20th century were ushered in, many nations were hard at work attempting to develop the technology to automatically generate translations.

After many staple advances, like the Georgetown–IBM experiment in 1954, Noam Chomsky’s theory of generative linguistics, and the METEO System in Canada in 1977, machine translation has come a long way from its original roots.

Many machines used today have been updated from statistical or rules-based systems to neural systems. The latter is better able to understand the context of a sentence and translate more accurately.

Is Translation Technology Gender-Biased?

Despite these advances, the global community has pointed out a glaring issue in many commonly used machine translation engines, such as Google Translate or SYSTRAN: there is a clear gender bias toward men and male pronouns.

Accurate translations can be a nuanced task depending on the language pairing—some languages rely heavily on gendered pronouns and conjugations. In other languages, they are almost entirely absent, meaning there can be more than one accurate translation for a term or phrase.

French

In French, a strongly gender-inflected language, the gender-ambiguous English phrase “my friend” can be translated as either “mon ami” or “mon amie” depending on the friend in question.

This can present an issue with many Indo-European languages, as machines can translate nouns with a gender bias depending on the cultural implications associated with the word.

Terms like “doctor” that have been historically associated with men are often translated with male pronouns, and “nurse” with female pronouns, even if the text is discussing someone of the opposite gender.

Turkish

The same issue can arise in the reverse scenario.

Some languages like Turkish use gendered terms very minimally. The gender-neutral pronoun “o” can be translated in English as “she,” “he,” or “it” depending on the context.

This has resulted in a variety of biased translations, depending on the term in question, including “o bir aşçı” translated as “she is a cook” and “o bir mühendis” translated as “he is an engineer,” following the gender roles society has prescribed for such professions.

Pro-Drop Languages

Some languages, like Spanish, Chinese, and Japanese, are pro-drop languages, meaning they often omit pronouns entirely from text. This gender ambiguity can lead to a similar issue of biased translations.

For example, in Spanish, the sentence, “‘Me encanta el conocimiento,’ dice,” literally reads, “I like knowledge, said.” But, it should actually be translated based on context as either, “I like knowledge, she said” or “he said.”

The crucial factor is being able to analyze and implement the context of the full text, which some machines cannot do for every language.

Training Machines

The question then is, why do MT engines, which are supposedly unbiased, unemotional machines, behave this way?

The answer lies in the way they are trained.

The enormous data sets used to prepare and configure MT engines are compilations of human data, information, and language, all of which contain our own cultural and societal biases.

One study of pronouns in U.S. English books found that masculine pronouns are significantly more frequent in English than feminine pronouns (the ratio peaked in 1968 at 4:1 and was still 2:1 by 2000), which could lead machines learning from these books to have their own sexist biases.

The result of these gender biases are not only inaccurate translations, but also the perpetuation of harmful gender stereotypes that place men in positions of societal power above women.

Fixing the Imbalance

In 2018, Google announced that they recognized the gender-related shortcomings of their platform. They implemented a three-step plan to correct the issue. These steps included detecting gender-neutral queries, generating gender-specific translations, and then checking the accuracy between the two options and deciding which to provide to the user.

Google Translate began showing both female and male options for translations. Recently, the site even started presenting the female option first, before the male translation.

To improve our machine translations for accuracy and equality, steps must be taken to rectify the present gender issues and monitor the progress of translation quality closely to see the results.

The need for machines to unlearn their implicit biases reflects our own march toward dismantling systems of oppression and ensuring inclusivity in the modern world.

At G3 Life Sciences, we understand how important fluent, accurate, and culturally adapted translations are to bringing the global community closer together.

To learn more about how you can work with us and our expert linguists toward the highest quality translations possible, contact us.

Scroll to top