میزان زایایی وندهای اشتقاقی در زبان کردی سورانی، رویکرد پیکره بنیان (مقاله علمی وزارت علوم)
درجه علمی: نشریه علمی (وزارت علوم)
آرشیو
چکیده
پژوهش حاضر با استفاده از پیکره کردپرس(1) به بررسی زایایی وندها در زبان کردی سورانی پرداخته است. در علم صرف، میزان زایایی یک وند را به میزان تعداد کلمات جدیدی بیان کردند که یک وند ساخته است. بدین ترتیب به بررسی میزان زایایی پسوند به صورت پیکره بنیاد پرداخته ایم. در گام نخست فهرستی از پسوندهای زبان کردی مشخص گردید. سپس با استفاده از برنامه نویسی خزش وب به زبان پایتون از وب سایت شبکه خبری کردپرس داده تهیه گردید. معیار سنجش برای زایایی هر یک از این وندها معیار باین مقدار p است. درنهایت مشخص گردید که وندهای «-چی، -دان، -ین، -زن، -یلانە» از زایاترین وندهای زبان کردی با مقدار زایای بالاتر از ۵۰ درصد بوده است. از دیگر موارد بحث شده در حوزه صرف تفاوت مفهوم زایایی و میزان اهمیتی است که یک وند دارد. میزان اهمیت یک وند را در تعداد کلماتی می دانند که در فهرست واژگانی برای زبان به کار برده می شود و در علم صرف این دو مفهوم از یکدیگر متمایز شناخته می شوند.Productivity of Derivational Affixes in Surani Kurdish Language: Corpus-based Approach
The present research has investigated the production of derivational affixes in the Surani Kurdish language, based on the corpus of “Kurdpress”. In morphology, the productivity of a suffix was expressed as the number of new words created by that suffix. In this way, the present research has investigated the frequency of the derivational affixes of the Kurdish language in the final position of the word, based on a corpus approach. In the first step of this research, a list of Kurdish suffixes was identified. Then, by using web crawling programming in Python language, the required data was prepared from the Kurdpress news network. The measuring criterion used for the productivity of each of these suffixes is based on the criterion of the p value. Finally, it was determined that the words “چی، دان، ین، زن، یلانە” were among the most productive suffixes in the Kurdish language with a productivity rate of more than 50%. Among the other things discussed in this study is the difference between the concept of productivity and the importance that an affix assigns to itself. The importance of an affix is known in the number of words that are used in the vocabulary list of the language.
Extended abstract
1.Introduction
The use of affixes in a language varies in terms of their productivity in word formation. Some affixes tend to generate more new words than others and can be considered more active in the process. However, their productivity rates may change over time, resulting in shifts in their usage. This study is distinctive in that it delves into the investigation of Kurdish language derivational affixes in a rigorous manner, exploring their productivity rates based on a considerable amount of data. Additionally, the study examines the significance of existing variables in relation to the productivity rates of these affixes. The primary objective of the study is to identify which of the Kurdish language's derivational affixes is the most productive. The study aims to fill a gap in the literature by conducting a detailed analysis of this aspect of the Kurdish language.
2.Theoretical framework
As is typically the case, studies conducted in the realm of morphology are concerned with topics such as affixation, location and types of affixes, affix productivity, affix importance, and changes over time in affixes. The two common types of affixes observed in most languages are prefix and suffix affixes. A prefix is a dependent morpheme that is attached to the beginning of a base. A suffix is a dependent morpheme that is attached to the end of a base. Among the affixes used in a language, some are generally more productive in word formation and create a greater number of new words, while others have less productivity. Even the productivity of affixes may change over time, with some affixes being more productive at certain times and transformed into less productive affixes at other times. Affix productivity is defined as the number of new words created by that affix in word formation, and the importance of an affix is measured by the number of words that are created on a diachronic basis based on that affix. The frequency of a unit, along with all its repetitive forms, occurring multiple times in the text is referred to as its token frequency. The number of occurrences of a unit under study in the corpus, disregarding repetition, is referred to as its type frequency. Affix productivity is considered as a value obtained by dividing the number of single occurrences of affixes known as hapax legomena by the total number of tokens in the corpus. The formula for affix importance and productivity is presented below.
Hapax Token Rate= HTR =
Type Token Rate= TTR =
3.Methodology
For corpus-based studies in Kurdish linguistics, the most important issue is the availability of suitable data for analysis. Therefore, the first step was to collect sufficient data for this research, using the Kurdish language corpus and web crawling techniques. The initial version of this corpus consisted of 69,000 news documents, containing various news items from different categories, which were collected using a web crawler program written in Python version 3.4 that focused on news sources.
In the second step, a specific list of derivational suffixes in Kurdish was determined using Kurdish grammar books, such as books on grammar and a book that explains the structure of the Sorani dialect of the Kurdish language. Since the frequency of single-word counts is commonly used for examining the productivity of suffixes, after collecting the data and the list of derivational suffixes in Kurdish, the frequency of all words was calculated. This calculation was performed using Python scripts. Then, words with a frequency of one were identified, and it was determined how many words are formed by each of these suffixes. Suffixes that produced fewer than five single-word forms were removed to obtain more accurate results.
4.Result and discussion
After identifying the derivational suffixes, the frequency of different affix types, markers, and single-frequency words for each morpheme at the end of the examined word were considered. The results, along with the values of productivity and importance of the morphemes, are displayed in a table within the article. Productivity is generally considered as a spectrum in which less productive units are on the left side and more productive units are on the right side. Another noteworthy point in this study is that in studies of word structure, paying attention to the frequency of examined units alone cannot show accurate results. This is because various factors may influence the frequency of words, and therefore focusing solely on word frequency is not enough.
5.Conclusion and Suggestions
This study discusses the central concept of productivity. It was found that the suffixes “chi”, “dan”, “yan”, “zan”, and “ylaneh” make up over 50% of the most productive suffixes. The average productivity rate for suffixes in Kurdish language is 37.25%. Attention to the prefixes in Kurdish language, the order of affixes, and the hierarchy that plays a role in the productivity rate and importance of affixes are among the topics that can be explored in future research to complement this study.
Select Bibliography
Kohanzad, P., Fallahi, M., Pahlevanzadeh B. A Corpus-based Study of the Productivity of Derivational Affixes in Persian. Journal of Researches in Linguistics. 2021; 2 (23): 219-240. [in Persian]
Badakhshan E. Kurdish corpus project. International Institute for the Study of Kurdish Societies First Biennial Conference Germany, Frankfurt, 2017; 16-19.
Gaeta L. Ricca D. Productivity in Italian word formation: A variable-corpus approach, Berlin: De Gruyter Mouton. 2006; 44(1): 57-89.
Montero-Fleta, B. Suffixes in word-formation processes in scientific English. LSP Journal-Language for special purposes, professional communication, knowledge management and cognition. 2011; 2(2): 4-14.
Motsch W. On inactivity, productivity and analogy in derivational processes. In the Contribution of Word-Structure-Theories to the Study of Word Formation. 2018; 1-30.
Stefanowitsch A. Corpus linguistics: A guide to the methodology. Berlin: Language Science Press; 2020. DOI: 10.5281/zenodo.3735822
Ten H., P. Productivity and Anticipation in Language Processing. SKASE Journal of Theoretical Lingui stics. 2020; 17(4): 23-36.