Study of text categorization using feature weight learning
Abstract
Text categorization (TC) is an essential stage in many information organisation and management processes. Two major TC issues are feature coding and classifier construction. The Euclidean distance is frequently chosen as the similarity metric in the K-nearest neighbour classification method. The properties of each vector are used to characterise samples in a certain way. We can thus determine the multiple functionalities of each feature by using feature weight learning. The feature weight learning-based K-nearest neighbour text classification method is discussed in this article. The numerical experiments show that this learning method is accurate.
References
Luigi Galavotti, Fabrizio Sebastiani, Maria Simi, "Feature Selection and Negative Evidence in Automated Text Categorization", Proceedings of the 4th European Conference on Research and Advanced Technology for Digital Libraries, ECDL-00, 2000
Gupta, K., Jiwani, N., & Whig, P. (2023). An Efficient Way of Identifying Alzheimer’s Disease Using Deep Learning Techniques. In Proceedings of Third Doctoral Symposium on Computational Intelligence (pp. 455-465). Springer, Singapore.
Yang, Y., Pedersen J.P. A, "Comparative Study on Feature Selectioin Text Categorization", Proceedings of the Fourteenth International Conference on Machine Learning (ICML'97), 1997, pp. 412-420.
N. Jiwani, K. Gupta and P. Whig, "Novel HealthCare Framework for Cardiac Arrest With the Application of AI Using ANN," 2021 5th International Conference on Information Systems and Computer Networks (ISCON), 2021, pp. 1-5, doi: 10.1109/ISCON52037.2021.9702493.
H.B. Mitchell, P.A. Schaefer, "A "soft" K-Nearest Neighbor Voting Scheme", International Journal of Intelligent Systems 2001, pp. 459-468
T. M. Mitchell, Machine Learning, New York: McGraw-Hill Companies Inc., 1997. pp. 230~247
N. Jiwani, K. Gupta and N. Afreen, "A Convolutional Neural Network Approach for Diabetic Retinopathy Classification," 2022 IEEE 11th International Conference on Communication Systems and Network Technologies (CSNT), 2022, pp. 357-361, doi: 10.1109/CSNT54456.2022.9787577.
J. Basak, R. K. De, S. K. Pal, "Unsupervised feature selection using a neuro-fuzzy approach", Pattern Recognition Letters. 1998. Vol.19, No.11, pp. 997-1006
Gupta, K., & Jiwani, N. (2020). Effects of COVID-19 risk controls on the Global Supply Chain. Transactions on Latest Trends in Artificial Intelligence, 1(1).
He Jian-Yong, Foundation of Operational Research, Tsinghua University Press, Beijing, 2000, pp. 301-306.
UCI Repository of machine learning databases and domain theories. FTP address: ftp://ftp.ics.uci.edu/pub/machine-learning-databases