On documenting low resourced Indian languages insights from kanauji speech corpus

Dwivedi, P.; Kar, S.

DSpace Home
→
Research Publications
→
Year-2017
→
View Item

dc.contributor.author	Dwivedi, P.
dc.contributor.author	Kar, S.
dc.date.accessioned	2021-10-09T11:56:33Z
dc.date.available	2021-10-09T11:56:33Z
dc.date.issued	2021-10-09
dc.identifier.uri	http://localhost:8080/xmlui/handle/123456789/2967
dc.description.abstract	Well-designed and well-developed corpora can considerably be helpful in bridging the gap between theory and practice in language documentation and revitalization process, in building language technology applications, in testing language hypothesis and in numerous other important areas. Developing a corpus for an under-resourced or endangered language encounters several problems and issues. The present study starts with an overview of the role that corpora (speech corpora in particular) can play in language documentation and revitalization process. It then provides a brief account of the situation of endangered languages and corpora development efforts in India. Thereafter, it discusses the various issues involved in the construction of a speech corpus for low resourced languages. Insights are followed from speech database of Kanauji of Kanpur, an endangered variety of Western Hindi, spoken in Uttar Pradesh. Kanauji speech database is being developed at Indian Institute of Technology Ropar, Punjab. © Universitat de Barcelona	en_US
dc.language.iso	en_US	en_US
dc.subject	Endangered language	en_US
dc.subject	Kanauji	en_US
dc.subject	Language documentation	en_US
dc.subject	Speech corpus	en_US
dc.subject	Western Hindi	en_US
dc.title	On documenting low resourced Indian languages insights from kanauji speech corpus	en_US
dc.type	Article	en_US