Tamil Document Classification and Topic Identification

Professor Dr. S. Saraswathi; R. Santhiya; B.L. Sanjeev Prasad; G. Gnanaprakasam

Authors

Professor Dr. S. Saraswathi Department of Information Technology Pondicherry Engineering College Puducherry, India
R. Santhiya Department of Information Technology Pondicherry Engineering College Puducherry, India
B.L. Sanjeev Prasad Department of Information Technology Pondicherry Engineering College Puducherry, India
G. Gnanaprakasam Department of Information Technology Pondicherry Engineering College Puducherry, India

Keywords:

text classification, naïve bayes, svm, LDA, machine learning

Abstract

Nowadays the number of documents in electronic form is huge and grows day by day with the rapid development of internet.
It is extremely important to organize the documents according to the topic because huge number of documents are available nowadays.
Commonly, this can be achieved by using classification techniques. Document classification is an important tool for applications such as
web search engines. This proposal deals with classification of Tamil documents. Classification is a supervised learning process that
organizes documents or text files into distinct groups. Stop words will be removed from the input text document to decrease the size of
the document to be processed. In this project we have used naïve bayes algorithm, support vector machine algorithm to classify the
documents and latent drichelet allocation for topic modelling.

Tamil Document Classification and Topic Identification

Authors

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

google scholar

plagiarism

Make a Submission

Current Issue

Information