Home » Latam-GPT: The Free, Open Source, and Collaborative AI of Latin America

Latam-GPT: The Free, Open Source, and Collaborative AI of Latin America

by Carl Nash
0 comments


Latam-GPT is new large language model being developed in and for Latin America. The project, led by the nonprofit Chilean National Center for Artificial Intelligence (CENIA), aims to help the region achieve technological independence by developing an open source AI model trained on Latin American languages and contexts.

“This work cannot be undertaken by just one group or one country in Latin America: It is a challenge that requires everyone’s participation,” says Álvaro Soto, director of CENIA, in an interview with WIRED en Español. “Latam-GPT is a project that seeks to create an open, free, and, above all, collaborative AI model. We’ve been working for two years with a very bottom-up process, bringing together citizens from different countries who want to collaborate. Recently, it has also seen some more top-down initiatives, with governments taking an interest and beginning to participate in the project.”

The project stands out for its collaborative spirit. “We’re not looking to compete with OpenAI, DeepSeek, or Google. We want a model specific to Latin America and the Caribbean, aware of the cultural requirements and challenges that this entails, such as understanding different dialects, the region’s history, and unique cultural aspects,” explains Soto.

Thanks to 33 strategic partnerships with institutions in Latin America and the Caribbean, the project has gathered a corpus of data exceeding eight terabytes of text, the equivalent of millions of books. This information base has enabled the development of a language model with 50 billion parameters, a scale that makes it comparable to GPT-3.5 and gives it a medium to high capacity to perform complex tasks such as reasoning, translation, and associations.

Latam-GPT is being trained on a regional database that compiles information from 20 Latin American countries and Spain, with an impressive total of 2,645,500 documents. The distribution of data shows a significant concentration in the largest countries in the region, with Brazil the leader with 685,000 documents, followed by Mexico with 385,000, Spain with 325,000, Colombia with 220,000, and Argentina with 210,000 documents. The numbers reflect the size of these markets, their digital development, and the availability of structured content.

“Initially, we’ll launch a language model. We expect its performance in general tasks to be close to that of large commercial models, but with superior performance in topics specific to Latin America. The idea is that, if we ask it about topics relevant to our region, its knowledge will be much deeper,” Soto explains.

The first model is the starting point for developing a family of more advanced technologies in the future, including ones with image and video, and for scaling up to larger models. “As this is an open project, we want other institutions to be able to use it. A group in Colombia could adapt it for the school education system or one in Brazil could adapt it for the health sector. The idea is to open the door for different organizations to generate specific models for particular areas like agriculture, culture, and others,” explains the CENIA director.



Source link

You may also like

Leave a Comment

About Us

Advertisement

Latest Articles

© 2024 Technewsupdate. All rights reserved.