[ad_1]
emember the well-known Chandamama journal, which used to publish a lot liked mythological tales and native folklore. Whereas the journal is not printed, these tales at the moment are getting a GenerativeAI push. As many as 10,000 volunteers, largely from totally different engineering schools, have helped a small crew of techies construct a repository of tales in Telugu for a Small Language Mannequin (SLM).
Whereas GenerativeAI fashions equivalent to ChatGPT and Bard have been constructed on Massive Language Fashions (LLMS) and may churn out prolonged output, the SLMs can generate centered and smaller quantities of content material. College students from 30 engineering schools participated in a four-hour hackathon that witnessed importing of 40,000 pages of tales (PLEASE CROSSCHECK – IS IT 40,000 STORIES OR 40,000 PAGES OF STORIES. ) from Chandamama. This dataset has simply been launched for the general public.
A mannequin for tiny tales
Because it will get the feed (digital textual content), the SLM mannequin – AI Chandamama Kathalu – has began studying. “For now it’s churning out significant content material, small although. We’re planning to launch an LLM mannequin in March, which could have the potential to generate prolonged outputs,” Kiran Chandra, Founding father of Swecha and a Free Software program Motion of India activist, instructed businessline.
Kiran teamed up with Chaitanya (Chief Product Officer and Co-Founder, Ozonetel) and Gaurav Raina (Professor at IIT Madras) to work on the SLM with an goal to construct a Language Mannequin for native Indian languages and constructing an AI resolution for brief tales.
“To construct a narrative oriented AI language mannequin, we don’t want a big language mannequin, which may be very useful resource intensive; a small language mannequin (SLM) must be sufficient. Our goal is to carry again the ethical and moral values embedded in ‘Chandamama Kathalu’ utilizing a brand new and inventive AI method,” he mentioned.
“The tales can be found within the PDF kind. It could have taken a number of months to digitise them. However we enrolled volunteers via the Swecha group to digitise the content material. We may simply end it off in 4 hours,” he mentioned.
Now it’s all obtainable on the web for anybody to obtain and / or enhance upon. The outdated standard tales thus can get a brand new twist. Says Chandra, “This complete effort jogs my memory of the hassle we put 20 years in the past in creating the primary Telugu Working System, creating the font and the glossary. This appears a logical step in our journey in direction of democratising know-how and we will proceed to take up extra work on this area.”
After getting the info prepared, the crew fed it into the AI mannequin for coaching and making it develop its personal content material. Chaitanya, whose firm can be concerned in deploying Generative AI for constructing customer-relationship administration options, chipped in to make this occur.
Gaurav Raina mentioned that the learnings would assist the crew to work on creating Generative AI options for different Indian languages.
[ad_2]
Source link