There are number of languages around the world and knowing all the languages is very difficult for any person. At the same time, unawareness about the language will hinder communication. Language identification is the process where the identifying the language(s) in text form is performed based on the writing style and looking at the unique diacritics of each language. When a multitude of languages are spoken in any circumstances, the first step in communication is the identification of the language. There are several techniques used for language detection like machine learning and deep learning. These are used in detecting languages like German. In India, numerous languages are spoken by the people and thus we propose to develop a model that detects two languages: Kannada and Devanagari/Sanskrit. In this study, Support Vector Machines classifiers were used, for classification and an accuracy of 99% was achieved.
Sengupta, D. and G. Saha, Study on Similarity among Indian Languages Using Language Verification Framework. Advances in Artificial Intelligence, 2015. 2015: p. 325703. J. Clerk Maxwell, A Treatise on Electricity and Magnetism, 3rd ed., vol. 2. Oxford: Clarendon, 1892, pp.68–73.
Marco Lui, J.H.L., Timothy Baldwin, Automatic Detection and Language Identification of Multilingual Documents. Transactions of the Association for Computational Linguistics, 2014. 2: p. 27-40.
Lopez-Moreno, I.;J. Gonzalez-Dominguez;O. Plchot;D. Martinez;J. Gonzalez-Rodriguez, and P. Moreno. Automatic language identification using deep neural networks. in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2014.
Jayanthi, N.;H. Harsha;N. Jain, and I.S. Dhingra. Language Detection of Text Document Image. in 2020 7th International Conference on Signal Processing and Integrated Networks (SPIN). 2020.
Rabby, A.K.M.S.A.;M.M. Islam;N. Hasan;J. Nahar, and F. Rahman. Language Detection using Convolutional Neural Network. in 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT). 2020.
Simões, A.;J.J. Almeida, and S.D. Byers. Language Identification: a Neural Network Approach. in SLATE. 2014.
Takçı, H. and T. Güngör, A high performance centroid-based classification approach for language identification. Pattern Recognition Letters, 2012. 33(16): p. 2077-2084.
Mioulet, L.;U. Garain;C. Chatelain;P. Barlas, and T. Paquet. Language identification from handwritten documents. in 2015 13th International Conference on Document Analysis and Recognition (ICDAR). 2015.
Zampieri, M. and B.G. Gebre. VarClass: An Open-source Language Identification Tool for Language Varieties. in LREC. 2014.
Padma, M.C.;P.A. Vijaya, and P. Nagabhushan. Language Identification from an Indian Multilingual Document Using Profile Features in 2009 International Conference on Computer and Automation Engineering, 2009.
Cite this article
Shashank Simha B K, Rahul M, Jyoti R Munavalli, Prajwal Anand, “Dual-Language Detection using Machine Learning", Advances in Intelligent Systems and Technologies, pp. 177-180, December. 2022. doi: 10.53759/aist/978-9914-9946-1-2_32