Tamil Document Classification and Topic Identification

Authors

  • Professor Dr. S. Saraswathi Department of Information Technology Pondicherry Engineering College Puducherry, India
  • R. Santhiya Department of Information Technology Pondicherry Engineering College Puducherry, India
  • B.L. Sanjeev Prasad Department of Information Technology Pondicherry Engineering College Puducherry, India
  • G. Gnanaprakasam Department of Information Technology Pondicherry Engineering College Puducherry, India

Keywords:

text classification, naïve bayes, svm, LDA, machine learning

Abstract

Nowadays the number of documents in electronic form is huge and grows day by day with the rapid development of internet.
It is extremely important to organize the documents according to the topic because huge number of documents are available nowadays.
Commonly, this can be achieved by using classification techniques. Document classification is an important tool for applications such as
web search engines. This proposal deals with classification of Tamil documents. Classification is a supervised learning process that
organizes documents or text files into distinct groups. Stop words will be removed from the input text document to decrease the size of
the document to be processed. In this project we have used naïve bayes algorithm, support vector machine algorithm to classify the
documents and latent drichelet allocation for topic modelling.

Published

2019-04-25

How to Cite

Professor Dr. S. Saraswathi, R. Santhiya, B.L. Sanjeev Prasad, & G. Gnanaprakasam. (2019). Tamil Document Classification and Topic Identification. International Journal of Advance Research in Engineering, Science & Technology, 6(4), 19–25. Retrieved from https://ijarest.org/index.php/ijarest/article/view/1921