Documents Subject Identification and Clustering based on Subject

Authors

  • Shruthi S N PG Student, Department of Computer Science and Engineering, U B D T College of engineering.
  • Kavitha G Assistent Professor, Department of Computer Science and Engineering, U B D T College of engineering

Keywords:

Data mining; Text mining; Subject identification; clustering; PDF parse

Abstract

With the dramatic growth of textual information over the Internet or databases, there is an increasing need
for the system that can automatically discover useful knowledge from the text. Text Mining is the process of applying
automatic methods to analyze and structure textual data in order to create useable knowledge from previously
unstructured information. Standard text mining techniques of text document usually rely on word matching. This paper
describes how to recognize the subject of each document in the directory and categorizes into related subject directory.
mPDF and PDF parser are the powerful PHP libraries utilized in this work for recognizing the subject. Document
clustering is a technique used to group similar documents. This work proposes a tool for maintaining the large set of
PDF documents and having many applications.

Published

2016-06-25

How to Cite

Shruthi S N, & Kavitha G. (2016). Documents Subject Identification and Clustering based on Subject. International Journal of Advance Research in Engineering, Science & Technology, 3(6), 296–299. Retrieved from https://ijarest.org/index.php/ijarest/article/view/824