More from this creator
Other episodes by Kitty Cat.
More like this
If you liked this, try these.
Transcript
The full episode, in writing.
Artificial intelligence has been making headlines in various fields, from healthcare to autonomous vehicles. However, today, we're diving into a less obvious yet profoundly impactful domain: language preservation, particularly the revitalization of endangered languages. With AI stepping in, there's potential for a cultural renaissance among communities whose languages were on the brink of disappearing.
First, let's explore the recent advancements in AI translation for rare languages. A study by RWS TrainAI has brought to light significant improvements in how large language models (LLMs) handle these languages. Google's Gemini Pro model scores above 4.5 out of 5 in translating Kinyarwanda, a language spoken by about 12 million people in East Africa. This is a big deal because it demonstrates that AI can effectively translate languages that have, until now, received little attention in the tech world.
This progress is largely due to cross-lingual learning and enhanced tokenizer efficiency. Cross-lingual learning allows AI to leverage similarities between languages to improve translation accuracy, even when the AI has limited data for certain languages. Enhanced tokenizer efficiency, on the other hand, involves breaking down text into meaningful units more effectively, which is crucial for understanding and processing languages accurately. These technological advances mean that AI can now serve communities speaking languages that are at risk of being lost.
The significance of AI in language preservation is highlighted by initiatives like Google's Woolaroo. In July 2025, this AI experiment was expanded to include 30 endangered languages, adding 10 new African languages, as well as languages from Brazil, Mexico, Turkey, and Scotland. Woolaroo acts as an interactive learning tool. It allows users to point their device at an object and learn the name of the object in their language. This tool is not just about preserving words; it's about connecting people to their culture and heritage in everyday life.
But why does all this matter? Let's consider the broader implications. When a language disappears, it takes with it unique worldviews, histories, and cultural practices. Each language embodies a distinct way of interpreting the world. By preserving these languages through AI, we are not just saving words; we're safeguarding cultural identities and preventing the homogenization of global culture. This is crucial for cultural diversity, which is as important to humanity as biodiversity is to nature.
The University of Hawaiʻi at Mānoa is also making strides in this area. In September 2025, researchers there introduced FORMOSANBENCH, a benchmark for evaluating AI performance on low-resource Austronesian languages. They focused on languages like Atayal, Amis, and Paiwan, uncovering significant gaps in AI capabilities. The university's work underscores the challenges AI still faces. While tools are improving, there's a long way to go before AI can fully support all endangered languages.
Dr. Jacqueline Brixey, a computer scientist hailing from the Choctaw Nation of Oklahoma, is another key figure in this field. Since January 2025, she has been developing AI tools to support Indigenous language preservation. Her work includes creating "ChoCo," a Choctaw language corpus, and "Masheli," a conversational AI. These tools aim to make language learning accessible and engaging for younger generations who might not have had the opportunity to learn their ancestral language from family.
In Brazil, IBM and the University of São Paulo have teamed up to develop AI-powered writing tools aimed at promoting endangered languages like Nheengatu. This collaboration, initiated in May 2024, involves direct engagement with Indigenous communities to ensure the tools meet their needs. The ability to write in one's native language can be empowering, fostering a sense of pride and ownership over one's cultural heritage.
Another fascinating project comes from Jared Coleman, a Ph.D. graduate from the University of Southern California. In June 2024, he developed AI tools to aid in revitalizing the critically endangered Owens Valley Paiute language. By creating digital resources, Coleman hopes to provide speakers with the tools they need to document and teach their language. This initiative highlights the role of individual efforts in the broader movement to preserve endangered languages.
There's a broader perspective to consider here. As AI continues to develop, it could become the "King of Babel," mastering rare and obscure languages as suggested by recent research. This would mark a transformative moment in the history of language preservation. The potential for AI to democratize language access means that even the most isolated communities could preserve their linguistic heritage.
But this journey is not without its challenges. The digital preservation of languages raises questions about data privacy and the ethical use of technology. For AI to learn a language effectively, it requires data, often sourced from native speakers. Ensuring this data is used ethically and that communities retain control over their linguistic assets is crucial. Additionally, there's the risk of commercialization, where tech companies might prioritize languages based on profitability rather than cultural significance.
Moreover, AI's role in language preservation isn't just about the tools themselves but also about fostering a community around these languages. Technology can facilitate learning and documentation, but the human element—community involvement and cultural engagement—remains essential. AI can provide the means, but it is the people who must carry forward the torch of their linguistic heritage.
So, where do we go from here? The path forward involves collaboration between tech companies, academic institutions, and cultural communities. By working together, these groups can ensure that AI tools are developed in a way that respects and promotes cultural diversity. It also involves continued investment in research to bridge the gaps that still exist in AI's capabilities for low-resource languages.
In conclusion, AI's impact on language preservation is a burgeoning field with enormous potential. The advancements we've seen, from Google's Gemini Pro to the University of Hawaiʻi's FORMOSANBENCH, demonstrate that AI can play a pivotal role in revitalizing endangered languages. These technologies offer hope for communities seeking to preserve their linguistic heritage and, with it, their cultural identities. As we look to the future, the challenge will be to harness AI's power responsibly and inclusively, ensuring it serves as a tool for cultural empowerment rather than erasure.
Ultimately, AI's role in language preservation is about more than technology. It's about bridging generations, connecting cultural dots, and ensuring that the diversity of human expression continues to thrive in a digital age. As these initiatives progress, they promise not only to save languages but also to enrich the cultural tapestry of our world, one language at a time.