Clustering Based Approach for Novelty Detection in Text Documents

Authors

  • Sushil Kumar Assistant Professor, Department of Computer Engineering, J.C. Bose University of Science and Technology, YMCA Faridabad, Haryana, India
  • Komal Kumar Bhatia Professor, Department of Computer Engineering, J.C. Bose University of Science and Technology, YMCA Faridabad, Haryana, India

DOI:

https://doi.org/10.51983/ajcst-2019.8.2.2130

Keywords:

Novelty Detection, Information Retrieval, Clustering, Cluster Head, Jupyter Note-Book Python

Abstract

As the information is overloaded over the internet accessing of information from the internet according to a given query provides redundant and irrelevant information. It is necessary to retrieve relevant and novel information from a given query by the user. With the result of this the user will require minimum effort to access the information need. In this work we proposed a clustering based approach for novelty detection which will provide the relevant and novel documents for the information need. Based on the user query the incoming stream of documents will be clustered using k-means algorithm. Then the cluster heads are selected from the various clusters with the minimum distance. These cluster heads are the novel documents from a collection of documents from different clusters having the large distance. The proposed technique can be further used in the field of information retrieval.

References

R.C. Balabntaray, C Sharma, and M. Jha, "Document Clustering using K-means and K-medoid," Vol. 1, No. 1, June, 2013.

T. Brants, F. Chen, and A. Farahat, "A System for New Event Detection," in Proc. SIGIR-03, pp. 330-337, 2003.

J. Allan, C. wade, and A. Bolivar, "Retrieval and novelty detection at sentence level," in Proceeding of the 26th annual international ACM SIGIR conference on Research and development in information retrieval, Toronto, Canada, pp. 314-321, 2003.

G. Satheelaxmi, M.R. Murty, J.V.R. Murty, and P. Reddy, "Cluster analysis on complex structured and high dimensional data objects using K-means and EM algorithm," Vol. 1, No. 1, 2012.

H. Song, L. Wang, B. Li, and X. Liu, "New Trending Events Detection based on the multi representation index tree clustering," Int. J. Intell. Syst Appl., Vol. 3, No. 3, pp. 26-23, 2011.

G. Hu, S. Zhou, J. Guan, and X. Hu, "Towards effective document clustering: A constrained K-means based approach," Information, Processing and Management, Vol. 44, No. 4, pp. 1397-1409, 2008.

E. Greengrass, "Information Retrieval: A Survey," DOD Technical Report TR-R52-008-001, November, 2000.

X. Li and W. B. Croft, "Sentence level information patterns for novelty detection," Ph.D. dissertation, University of Massachusetts Amherst, 2006.

J. Carthy, "First Story Detection using a Composite Document Representation," in Proc. HLT01, 2001.

L. Zhao, M. Zheng, and S. Ma, "The nature of novelty detection," information retrieval, Vol. 9, pp. 527-541, 2006.

J. Allan, R. Paka and V. Lavrenko, "Online new event detection and tracking," in Proc of SIGIR-98, pp. 32-45, 1998.

X. Li, and W. B. Croft, "Novelty detection based on sentence level patterns," In: CIKM 2005, pp. 744–751, 2005.

The anaconda website. [Online] Available: https://www.anaconda. com/distribution/

S. Kumar and K. K Bhatia, "Document to sentence level technique for novelty detection," in Proc. of CSI Dec, XIV 215 illus., softcover., pp. 104, 2015.

C. Ding, and X, He, "K-means Clustering via Principal Component Analysis," pp. 225-232, 2004.

E.J. Spinosa, A.C.P.L.F. Carvalho, and J.Gama, "Novelty detection with application to data streams," Intell. Data Anal., Vol. 13, No. 3, pp. 405–422, 2009.

EE. R. Faria, J Gama, and A.C.P.L.F Carvalho, "Novelty detection algorithm for data streams multi class problems," in Proc. 28th Symp. Appl. Comoute., pp. 795-800, 2013.

Downloads

Published

20-05-2019

How to Cite

Kumar, S., & Bhatia, K. K. (2019). Clustering Based Approach for Novelty Detection in Text Documents. Asian Journal of Computer Science and Technology, 8(2), 116–121. https://doi.org/10.51983/ajcst-2019.8.2.2130