Metrika članka

  • citati u SCindeksu: 0
  • citati u CrossRef-u:0
  • citati u Google Scholaru:[=>]
  • posete u poslednjih 30 dana:3
  • preuzimanja u poslednjih 30 dana:3
članak: 1 od 1  
Telfor Journal
2018, vol. 10, br. 2, str. 123-128
jezik rada: engleski
vrsta rada: neklasifikovan
doi:10.5937/telfor1802123A


Automatic complaint classification system using classifier ensembles
(naslov ne postoji na srpskom)
Brawijaya University, Faculty of Computer Science, Malang, Indonesia

e-adresa: moch.ali.fauzi@ub.ac.id

Sažetak

(ne postoji na srpskom)
Sambat Online is an online complaint system run by the city government of Malang, Indonesia. Because most citizens do not know to which work units (Satuan Kerja Pemerintah Daerah [SKPDs]) their complaints should be sent, the system administrator must manually sort and classify all of the incoming complaints with respect to the appropriate SKPDs. This study empirically evaluated the application of an automated system to replace the manual classification process. The experiments, which used Sambat Online data, involved five individual classification algorithms- Naïve Bayes, Maximum Entropy, K-Nearest Neighbors, Random Forest, and Support Vector Machines-and two ensemble strategies-hard voting and soft voting. The results show that the Multinomial Naïve Bayes classifier achieved the best performance, an 80.7% accuracy value, of the five individual classifiers. The results also indicate that generally all of the ensemble methods performed better than the individual classifiers. Almost all of them had the same accuracy level of 81.2%. In addition, the soft voting strategy had slightly higher accuracy than the hard one when all five classifiers were used. However, when the three best classifier combinations were used, both had the same level of accuracy.

Ključne reči

Ensemble Learning; E-Government; Machine Learning; Hard Voting; Soft Voting; Complaint classification

Reference

Anonim,, Fauzi, M. A., Utomo, D.C., Setiawan, B.D., Pramukantoro, E.S. (2017) Automatic Essay Scoring System Using N-Gram and Cosine Similarity for Gamification Based E-Learning. u: Proceedings of the International Conference on Advances in Image Processing - ICAIP 2017, New York: Association for Computing Machinery (ACM), str. 151-155
Berger, A.L., Pietra, V.J.D., Pietra, S.A.D. (1996) A maximum entropy approach to natural language processing. Computational linguistics, vol. 22, no. 1, pp. 39-71
Breiman, L. (2001) Random forests. Machine Learning, 45(1): 5-32
Burges, C.J.C. (1998) A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2): 121-167
da Silva, N.F.F., Hruschka, E.R., Hruschka, E.R. (2014) Tweet sentiment analysis with classifier ensembles. Decision Support Systems, 66: 170-179
de Vel, O., Anderson, A., Corney, M., Mohay, G. (2001) Mining e-mail content for author identification forensics. ACM SIGMOD Record, 30(4): 55
Dietterich, T.G. (2000) Ensemble Methods in Machine Learning. Berlin-Heidelberg: Springer Nature, str. 1-15
El-Halees, A.M. (2015) Arabic text classification using maximum entropy. IUG Journal of Natural Studies, vol. 15, no. 1, pp. 157-167
Fauzi, M. A., Nur, F.R.F., Afirianto, T. (2018) Improving Sentiment Analysis of Short Informal Indonesian Product Reviews using Synonym Based Feature Expansion. Telkomnika (telecommunication computing electronics and control), 16(3): 1345
Fauzi, M. A. (2018) Random Forest Approach for Sentiment Analysis in Indonesian Language. Indonesian Journal of Electrical Engineering and Computer Science, 12(1): 46
Fauzi, M. A., Arifin, A.Z., Yuniarti, A. (2017) Arabic Book Retrieval using Class and Book Index Based Term Weighting. International Journal of Electrical and Computer Engineering (IJECE), 7(6): 3705
Fauzi, M. A., Yuniarti, A. (2018) Ensemble Method for Indonesian Twitter Hate Speech Detection. Indonesian Journal of Electrical Engineering and Computer Science, 11(1): 294
Fauzi, M. A., Arifin, A.Z., Gosaria, S.C. (2017) Indonesian News Classification Using Naïve Bayes and Two-Phase Feature Selection Model. Indonesian Journal of Electrical Engineering and Computer Science, 8(3): 610
Fauzi, M.A., Arifin, A., Yuniarti, A. (2014) Term Weighting Berbasis Indeks Buku dan Kelas untuk Perangkingan Dokumen Berbahasa Arab. Lontar Komputer: Jurnal Ilmiah Teknologi Informasi, vol 5, no. 2, pp. 435-442
García, A.J.J., Pikatza, A.J.M., Ubeda, C.M., Ansuategi, Z.E. (2014) Automatic text classification to support systematic reviews in medicine. Expert Systems with Applications, 41(4): 1498-1508
Garcia, M.A.M., Rodriguez, R.P., Ferro, M.V., Rifon, L.A. (2016) Wikipedia-Based Hybrid Document Representation for Textual News Classification. u: 2016 3rd International Conference on Soft Computing & Machine Intelligence (ISCMI), Institute of Electrical and Electronics Engineers (IEEE), str. 148-153
Goel, A., Gautam, J., Kumar, S. (2016) Real time sentiment analysis of tweets using Naive Bayes. u: 2016 2nd International Conference on Next Generation Computing Technologies (NGCT), Institute of Electrical and Electronics Engineers (IEEE), str. 257-261
Hearst, M.A. (1999) Untangling text data mining. u: Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics -, Morristown, NJ, USA: Association for Computational Linguistics (ACL), str. 3-10
Holle, K.F.H., Arifin, A.Z., Purwitasari, D. (2015) Preference based term weighting for arabic fiqh document ranking. Jurnal Ilmu Komputer dan Informasi, vol. 8, no. 1, pp. 45-52
Kittler, J. (2002) Multiple classifier systems. u: Soft Computing Approach to Pattern Recognition and Image Processing, pp. 3-22
Larkey, L.S., Croft, W. B. (1996) Combining classifiers in text categorization. u: Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '96, New York: Association for Computing Machinery (ACM), str. 289-297
Li, L., Zhang, Y., Zou, L., Li, C., Yu, B., Zheng, X., Zhou, Y. (2012) An Ensemble Classifier for Eukaryotic Protein Subcellular Location Prediction Using Gene Ontology Categories and Amino Acid Hydrophobicity. PLoS One, 7(1): e31057
Liu, B. (2007) Web data mining: Exploring hyperlinks, contents, and usage data. Springer Science & Business Media
Mccallum, A., Nigam, K. (1998) A comparison of event models for naive Bayes text classification. u: AAAI'98 Workshop on Learning for Text Categorization, Proc, 41-48
Nguyen, A.N., Lawley, M.J., Hansen, D.P., Bowman, R.V., Clarke, B.E., Duhig, E.E., Colquist, S. (2010) Symbolic rule-based classification of lung cancer stages from free-text pathology reports. Journal of the American Medical Informatics Association, 17(4): 440-445
Nikhath, A.K., Subrahmanyam, K., Vasavi, R. (2016) Building a KNearest Neighbor Classifier for Text Categorization,. International Journal of Computer Science and Information Technologies, vol. 7, no. 1; pp. 254-256
Pedregosa, F., Varoquaux, G., Gramfort, A., i dr. (2011) Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, vol. 12, pp. 2825-2830
Pop, I. (2016) An approach of the Naive Bayes classifier for the document classification. General Mathematics, vol. 14, no. 4, pp. 135-138
Pramukantoro, E.S., Fauzi, M. A. (2016) Comparative analysis of string similarity and corpus-based similarity for automatic essay scoring system on e-learning gamification. u: 2016 International Conference on Advanced Computer Science and Information Systems (ICACSIS), Institute of Electrical and Electronics Engineers (IEEE), str. 149-155
Salton, G., Buckley, C. (1988) Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5): 513-523
Shelke, N., Deshpande, S., Thakare, V. (2017) Domain Independent Approach for Aspect Oriented Sentiment Analysis for Product Reviews. Singapore: Springer Nature, str. 651-659
Shravan, K.B., Ravi, V. (2017) Text Document Classification with PCA and One-Class SVM. u: Satapathy, Suresh Chandra; Bhateja, Vikrant; Udgata, Siba K.; Pattnaik, Prasant Kumar [ur.] Proceedings of the 5th International Conference on Frontiers in Intelligent Computing: Theory and Applications, Singapore: Springer Nature, str. 107-115
Watanabe, K. (2018) Newsmap: A semi-supervised approach to geographical news classification. Digital Journalism, 6(3): 294-309
Wehrmann, J., Becker, W., Cagnini, H.E.L., Barros, R.C. (2017) A character-based convolutional neural network for language-agnostic Twitter sentiment analysis. u: 2017 International Joint Conference on Neural Networks (IJCNN), Institute of Electrical and Electronics Engineers (IEEE), str. 2384-2391
Wu, Q., Ye, Y., Zhang, H., Ng, M.K., Ho, S. (2014) ForesTexter: An efficient random forest algorithm for imbalanced text categorization. Knowledge-Based Systems, 67: 105-116
Yang, Y., Liu, X. (1999) A re-examination of text categorization methods. u: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '99, New York: Association for Computing Machinery (ACM), str. 42-49
Yan-Shi, D., Ke-Song, H. (2004) A comparison of several ensemble methods for text categorization. u: IEEE International Conference onServices Computing, 2004. (SCC 2004). Proceedings. 2004, Institute of Electrical and Electronics Engineers (IEEE), str. 419-422
Zhou, Z. (2012) Ensemble Methods. Informa UK Limited