Comprehensive Processing for Arabic Texts to Extract Their Roots
Arabic language is a highly inflectional language where a single word can have different forms using a single root with different interpretations. Arabic does not have a standard way to find roots, the reasons for having inflectional language: suffix, prefix and infix Vowels, which built in complex processes. That is why, words require good processing for information retrieval solutions, until now, and there has been no standard approach to attaining the fully proper root. The applications on Arabic words show around 99% are derived from a combination of bilateral, Trilateral and quad lateral roots.
Processing word- stemming levels in order to extract a root is the process of removing all additional affixes. In case the process of matching between a word and Proper names is available, take off the affixes away, according to patterns and rules with reference to root dictionaries.
This research is new series of steps using a new way of affixes' browsing, vowels and Patterns through three stages of stemming. I f a match is not found, vowel replacement and patterns readjusted to check, if not, then the word is kept unmodified.
Search engine, indexing, file classification, clustering etc. need developing the root extraction, where the researcher will introduce recommendations and solutions that participate in improving Arabic root extraction.
Research applies comprehensive processing on general collection of documents that done gradually to improve the root extraction by 96%.