Document Classification Using Artificial Neural Network

Authors

  • Kshitij Tripathi Department of Computer Applications, The Maharaja Sayajirao University of Baroda, Vadodara, Gujarat, India
  • Rajendra G. Vyas Department of Mathematics, The Maharaja Sayajirao University of Baroda, Vadodara, Gujarat, India
  • Anil K. Gupta Department of Computer Applications, Barkatullah University, Bhopal, Madhya Pradesh, India

DOI:

https://doi.org/10.51983/ajcst-2019.8.2.2140

Keywords:

N-Fold Cross Validation, Validation, Classification, Neural Network, Bag of Words

Abstract

The Document classification system is the field of data mining in which the format of data is based on bag of words (BoW) or document vector model and the task is to build a machine which after successfully learn the characteristic of given data set, predicts the category of the document to which the word vector belongs. In this approach document is represented by BoW where every single word is used as feature which occurs in a document. The proposed article presents artificial neural network approach which is hybrid of n-fold cross validation and training-validation-test approach for classification of data.

References

Allahverdipoor and F. S. Gharehchopogh, "A new hybrid model of K-means and Naïve Bayes algorithms for feature selection in text documents categorization," Journal of Advances in Computer Research, vol. 8, no. 4, 2017.

A. Kakade, K. Dhumal, S. Das, S. Jain, and N. M. Ranjan, "A neural network approach for text document classification and semantic text analytics," Journal of Data Mining and Management, vol. 2, no. 2, pp. 1-6, 2017.

A. M. Butnarua and RaduTudorIonescua, "From Image to Text Classification: A Novel Approach based on Clustering Word Embeddings," Procedia Computer Science, vol. 112, pp. 1783-92, 2017.

C. Brouard, "Document classification by computing an echo in a very simple neural network," in IEEE 24th International Conference on Tools with Artificial Intelligence, 2012.

C. Naik, V. Kothari, and Z. Rana, "Document classification using neural networks based on words," International Journal of Advanced Research Computer Science, vol. 6, no. 2, 2015.

E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning internal representation by error propagation," in Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1, Bradford Books, Cambridge, MA, 1986.

G. Liu, "The Semantic Vector Space Model: implementation and evaluation," Journal of American Society for Information Science, vol. 48, no. 5, pp. 395–417, 1997.

H. Yu-Lun, S.-H. Liu, Y.-C. Chang, and W.-L. Hsu, "Neural network-based vector representation of documents for reader emotion categorization," in IEEE 16th International Conference on Information Reuse and Integration, 2015.

J. Alcalá-Fdez, A. Fernández, J. Luengo, et al., "KEEL data mining software tool: data set repository, integration of algorithms and experimental analysis framework," Journal of Multiple-Valued Logic and Soft Computing, vol. 17, no. 2-3, pp. 255–287, 2011.

K. Bache and M. Lichman, "UCI Machine Learning Repository, University of California," School of Information and Computer Science, Irvine, California, USA, 2013. [Online]. Available: http://archive.ics.uci.edu/ml/

M. Dieter and R. Andreas, "Uncovering the hierarchical structure of text archives by using an unsupervised neural network with adaptive structure," in Proceedings of the 4th Pacific Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications, 2000.

M. L. C. Passini, K. B. Estébanez, G. P. Figueredo, and N. F. F. Ebecken, "A Strategy for Training Set Selection in Text Classification Problems," IJACSA, vol. 4, no. 6, 2013.

M. Filannino, "DB World e-mail classification using a very small corpus."

O. Awodele and O. Jegede, "Neural networks and its application in engineering," in Proceeding of Informing Science & IT Education Conference (InSITE), pp. 83-95, 2009.

P. Kumar, M. Ra, and J. B. Prabhu, "Role of sentiment classification in sentiment analysis: a survey," Annals of Library and Information Studies, vol. 65, pp. 196-209, 2018.

S. D. Sarkar and S. Goswami, "Empirical study on filter-based feature selection methods for text classification," IJCA, vol. 81, no. 6, 2013.

S. D. Sarkar, S. Goswami, A. Agarwal, and J. Akhtar, "A novel feature selection technique for text classification using Naive Bayes," Hindawi Publishing Corporation International Scholarly Research Notices, vol. 2014, Article ID 717092, 10 pages, 2014.

S. Haykin, "Neural Networks: A Comprehensive Foundation," 2nd ed., 1998.

S. Kumar, Neural Networks: A Classroom Approach, Tata McGraw Hill, 2013.

S. Liu, Z. Liu, J. Sun, and L. Liu, "Application of synergetic neural network in online write print identification," International Journal of Digital Content Technology and its Applications, vol. 5, no. 3, 2011.

K. Tripathi, R. G. Vyas, and A. K. Gupta, "The classification of data: A novel artificial neural network (ANN) approach through exhaustive validation and weight initialization," International Journal of Computer Sciences and Engineering, vol. 6, no. 5, pp. 241-254, 2018.

Downloads

Published

28-04-2019

How to Cite

Tripathi, K., Vyas, R. G., & Gupta, A. K. (2019). Document Classification Using Artificial Neural Network. Asian Journal of Computer Science and Technology, 8(2), 55–58. https://doi.org/10.51983/ajcst-2019.8.2.2140