Metrika članka

  • citati u SCindeksu: 0
  • citati u CrossRef-u:0
  • citati u Google Scholaru:[=>]
  • posete u poslednjih 30 dana:14
  • preuzimanja u poslednjih 30 dana:6
članak: 4 od 5  
Back povratak na rezultate
Telfor Journal
2014, vol. 6, br. 1, str. 64-68
jezik rada: engleski
vrsta rada: neklasifikovan
doi:10.5937/telfor1401064B


Evaluation and classification of syntax usage in determining short-text semantic similarity
(naslov ne postoji na srpskom)
Univerzitet u Beogradu, Elektrotehnički fakultet

e-adresa: bv115045p@student.etf.rs, bojic@etf.rs

Projekat

Razvoj hardverske, softverske i telekomunikacione infrastrukture e-sistema za kontrolu prometa i poreza (MPNTR - 32047)

Sažetak

(ne postoji na srpskom)
This paper outlines and categorizes ways of using syntactic information in a number of algorithms for determining the semantic similarity of short texts. We consider the use of word order information, part-of-speech tagging, parsing and semantic role labeling. We analyze and evaluate the effects of syntax usage on algorithm performance by utilizing the results of a paraphrase detection test on the Microsoft Research Paraphrase Corpus. We also propose a new classification of algorithms based on their applicability to languages with scarce natural language processing tools.

Ključne reči

natural language processing; MSRPC; parsing; part-of-speech tagging; semantic role labeling; short-text semantic similarity; syntax; word order

Reference

Barzilay, R., McKeown, K.R. (2005) Sentence Fusion for Multidocument News Summarization. Computational Linguistics, 31(3): 297-328
Blacoe, W., Lapata, M. (2012) A comparison of vector-based representations for semantic composition. u: Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 2012 Joint Conference, Proceedings, pp. 546-556
Charniak, E., Johnson, M. (2005) Coarse-to-fine n -best parsing and MaxEnt discriminative reranking. u: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics - ACL '05, str. 173-180
Dolan, B., Quirk, C., Brockett, C. (2004) Unsupervised construction of large paraphrase corpora. u: Proceedings of the 20th international conference on Computational Linguistics - COLING '04, Article No. 350
Fernando, S., Stevenson, M. (2008) A semantic similarity approach to paraphrase detection. u: 11th Annual Research Colloquium of the UK Special Interest Group for Computational Linguistics, Proceedings, pp. 45-52
Furlan, B., Sivački, V., Jovanović, D., Nikolić, B. (2011) Comparable evaluation of contemporary corpus-based and knowledge-based semantic similarity measures of short texts. Journal of Information Technology and Applications, vol. 1, no. 1, pp. 65-72
Harabagiu, S.M., Maiorano, S.J., Pasca, M.A. (2003) Open-domain textual question answering techniques. Natural Language Engineering, 9(03): 231-267
Harris, Z. (1954) Distributional structure. Word, vol. 10, no. 23, pp. 146-162
Islam, A., Inkpen, D. (2008) Semantic text similarity using corpus-based word similarity and string similarity. ACM Transactions on Knowledge Discovery from Data, 2(2): 1-25
Li, L., Zhou, Y., Yuan, B., Wang, J., Hu, X. (2009) Sentence Similarity Measurement Based on Shallow Parsing. u: Sixth International Conference on Fuzzy Systems and Knowledge Discovery, str. 487-491
Lim, S., Lee, C., Ra, D. (2013) Dependency-based semantic role labeling using sequence labeling with a structural SVM. Pattern Recognition Letters, 34(6): 696-702
Lintean, M., Rus, V. (2012) Measuring semantic similarity in short texts through greedy pairing and word semantics. u: Twenty-Fifth International Florida Artificial Intelligence Research Society Conference, Proceedings, pp. 244-249
Liu, X., Zhou, Y., Zheng, R. (2008) Measuring semantic similarity within sentences. u: International Conference on Machine Learning and Cybernetics, str. 2558-2562
Manning, C.D. (2011) Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics?. u: Proceedings of the 12th International Conference on Computational Linguistics and Intelligent Text Processing, str. 171-189
Marcus, M.P., Marcinkiewicz, M.A., Santorini, B. (1993) Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, Jun, vol. 19, no. 2, pp. 313-330
Mihalcea, R., Corley, C., Strapparava, C. (2006) Corpus-based and knowledge-based measures of text semantic similarity. u: Artificial Intelligence, (21st) National Conference, Proceedings, pp. 775-780
Miller, G.A. (1995) WordNet: a lexical database for English. Communications of the ACM, 38(11): 39-41
Ming, C.L., Jia, W.C., Tung, C.H., Tzone, I.W., Chien, Y.S., Hui, H.C., Ching, H.C. (2012) A Syntactic Based Approach for Evaluating Semantics of Texts. International Journal of Advancements in Computing Technology, 4(21): 220-229
Oliva, J., Serrano, J.I., Castillo, M.D.del, Iglesias, Á. (2011) SyMSS: A syntax-based measure for short-text semantic similarity. Data and Knowledge Engineering, 70(4): 390-405
Ramage, D., Rafferty, A.N., Manning, C.D. (2009) Random walks for text semantic similarity. u: Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing - TextGraphs-4, str. 23-31
Wiemer-Hastings, P. (2004) All parts are not created equal: SIAM-LSA. u: 26th Annual Conference of the Cognitive Science Society, Proceedings