Breaking language barriers in science through semantic multilingual search – COAR
peter.suber's bookmarks 2025-09-08
Summary:
"Every day, researchers around the world publish knowledge in hundreds of languages — Spanish in Argentina, Portuguese in Brazil, Arabic in Egypt, Japanese in Japan, Swahili in Kenya. This linguistic diversity is not a side note; it is the lifeblood of global scholarship. And yet, when we go looking for that knowledge, the tools at our disposal behave as if only a handful of languages truly matter. A vast amount of valuable research remains hidden simply because it was written in another tongue. This is because most discovery systems still rely on keyword search — matching the exact words in your query with the exact words in an index. That works fine in a monolingual setting, but it breaks down in a multilingual world.
But, what if search worked differently? What if you could type a query in your own language — cambio climático, énergies renouvelables, 再生可能エネルギー — and find relevant results in English, French, Spanish, Japanese, or beyond, without ever translating a word?
That’s the promise of semantic multilingual search: searching not by exact words, but by meaning. In June 2025, COAR embarked on a project to investigate the potential of semantic multilingual searching in the context of scholarly literature and develop a proposed conceptual model that could apply this technology in repositories and their full text aggregations. This work involved interviews with experts in the field, a review of current technical options, as well as a short survey about current practices in the scholarly ecosystem. These efforts build on the foundational work undertaken over the last few years by the COAR Task Force on Supporting Multilingualism and non-English Content, and also on early proof of concepts that were recently undertaken in Latin America by LA Referencia and IBICT (Instituto Brasileiro de Informação em Ciência e Tecnologia). The blog post presents an overview of our initial deliberations and conclusions and the next phase of our work."