INSTITUTIONAL DIGITAL REPOSITORY

On documenting low resourced Indian languages insights from kanauji speech corpus

Show simple item record

dc.contributor.author Dwivedi, P.
dc.contributor.author Kar, S.
dc.date.accessioned 2021-10-09T11:56:33Z
dc.date.available 2021-10-09T11:56:33Z
dc.date.issued 2021-10-09
dc.identifier.uri http://localhost:8080/xmlui/handle/123456789/2967
dc.description.abstract Well-designed and well-developed corpora can considerably be helpful in bridging the gap between theory and practice in language documentation and revitalization process, in building language technology applications, in testing language hypothesis and in numerous other important areas. Developing a corpus for an under-resourced or endangered language encounters several problems and issues. The present study starts with an overview of the role that corpora (speech corpora in particular) can play in language documentation and revitalization process. It then provides a brief account of the situation of endangered languages and corpora development efforts in India. Thereafter, it discusses the various issues involved in the construction of a speech corpus for low resourced languages. Insights are followed from speech database of Kanauji of Kanpur, an endangered variety of Western Hindi, spoken in Uttar Pradesh. Kanauji speech database is being developed at Indian Institute of Technology Ropar, Punjab. © Universitat de Barcelona en_US
dc.language.iso en_US en_US
dc.subject Endangered language en_US
dc.subject Kanauji en_US
dc.subject Language documentation en_US
dc.subject Speech corpus en_US
dc.subject Western Hindi en_US
dc.title On documenting low resourced Indian languages insights from kanauji speech corpus en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account