A MULTILEVEL APPROACH FOR KANNADA STOPWORDS GENERATION USING SENTENCE FEATURES

Authors

  • Sowmya M S, Dr. Panduranga Rao M V, Dr. Ashok Kumar P S Author

Abstract

The generation of kannada stopwords with multiple iterations from the e-newspaper is a logical task. It proposes work that makes use of POS tagging and stemming approach for Kannada text preprocessing with NLP algorithms. Text summary is the process of condensing the original text's content into a shorter version that nevertheless gives the consumer essential information. The extractive summaries of Kannada text documents are generated by the summarizer that is provided in this research. The significant sentences in the paper are determined by the proposed summarizer system based on five features.Sentence length, sentence location, keywords feature, term frequency, and term frequency-inverse sentence frequency are the features that are used. Each feature's value is calculated, and the average of all the feature score values is used to get the score for each sentence in the document. The extracted summary contains the sentences that have received the highest marks.After applying TF and IDF, ML algorithms normalize the e-newspaper to choose the appropriate stopwords. Experiments conducted on a specially constructed dataset including fifty Kannada text documents demonstrate notably superior performance in extractive summarization as compared to human summaries. After evaluation, the improved stopword set for additional Kannada NLP study was discovered.

Downloads

Published

2023-01-20

Issue

Section

Articles