result

Seminars of the Linguistic Convergence Laboratory

If you are interested in participating in the laboratory seminars, please register here.

Seminar schedule 2026

23 June

Natalia Kuznetsova (Università Cattolica del Sacro Cuore, Milan)

Can the locus of stress-related lengthening be placed outside the stressed syllable (the Finnic case)?

Abstract

Lengthening of segments has been recognised as the cross-linguistically most common cue for lexical stress (Gordon & Roettger 2017; van Heuven & Turk 2020; Himmelmann 2023). The typical locus is the stressed vowel, but surrounding consonants may also lengthen (Giavazzi 2010; White & Turk 2010; González 2013), as a result of localised hyper-articulation near stress (de Jong 1995; Shao et al. 2025; Katsika et al. 2026). In a few languages, however, the post-stressed or pre-stressed syllable nucleus has been reported as lengthened equally to, or more than, or even instead of the stressed vowel. This talk will discuss one such case, attested in the varieties of Finnic (Uralic) and exemplified on the author’s experimental phonetic data from Soikkola Ingrian. In a di- or trisyllabic foot with a stressed light syllable CV, the post-stressed vowel undergoes varying degrees of stress-induced lengthening, e.g. kana [ˈkɑnɑː] ‘hen’, harakat [ˈhɑrɑːk̬ɑt̬] ‘magpie:PL’, known in Finnic studies as the “half-long vowel” since Setälä (1882) and Hakulinen (1922). Given an untrivial relationship between the presumed phonological position of lexical stress (foot-initial) and the locus of its phonetic realisation (post-stressed syllable), both phonetic and phonological aspects are discussed. The effect is phonetically demonstrated based on a dataset of 22 trisyllabic and 4 disyllabic structural types of feet (each represented by 5 sentence-final words; each sentence recorded 4-7 times from five female speakers born in the 1930s; altogether 4241 tokens), cf. Kuznetsova et al. (2023). The effect is highlighted in blue on the post-stressed vowels of the feet (C)VCV and (C)VCVCV with the first light syllable. The foot with the first heavy syllable, exemplified in Figure 1 by the (C)V:CVCˑV structure, on the contrary, manifests post-stressed vowel reduction (in pink). The phonological part of the talk, based on available phonetic and phonological studies on Finnic languages, aims to answer the following questions: (1) why the lengthened vowel is phonologically unstressed; (2) why it is not a long vowel; (3) why this lengthening is a lexical stress manifestation; (4) whether it manifests foot stress, word stress, or both. The evidence includes the convergence of the following phenomena: (a) consonant gemination around the lexically stressed syllable; (b) vowel shortening, reduction, mergers, and loss in lexically unstressed syllables; (c) stress-distinguished minimal pairs; (d) shifts in word recognition triggered by changes in stress position; (e) anchoring of intonational high tones and modern poetic metre ictuses to lexically stressed syllables. Cases like Finnic and similar suggest that the theory and typology of lexical stress should consider a maximally trisyllabic window (the stressed, the pre-stressed, and the post-stressed syllable) as the default potential stress-bearing unit, rather than just the stressed vowel or the stressed syllable rhyme.

16 June

Irina Politova (HSE University)

Network analysis as a way to explore (hidden) linguistic patterns in grammar

Abstract

In contemporary linguistics, there is a considerable switch towards the use of computational instruments for automatizing parts of linguistic research. During the last reading group, we discussed a paper by (Östling 2016), where the author proposed a method to investigate colexification patterns by applying a number of mathematical algorithms to find necessary lexemes in large parallel corpora. In quite a similar vein, I suggest using instruments of network analysis to examine the co-occurrence of linguistic glosses, and thus automatically reveal some morphological patterns in languages.

During the talk, I will present two case studies. In the first study, I explored the co-occurrence of linguistic glosses inside morphemes, words, and sentences of individual languages, based on the corpora of glossed texts. I believe that a number of questions can be answered this way (Which grammatical categories can or cannot be expressed together within one word in a given language? Is it possible to automatically separate nominal categories from verbal? What can be learned about the tense and aspect system of a language, if no description of this language exists yet? etc.). The second study was aimed at analyzing the co-occurrence of glosses within linguistic articles. I examined their distribution in small abstracts on typological topics, to determine whether it would be possible to automatically identify differences between language families and their grammatical systems, as abstracts are most often dedicated to one language or language family.

As this study is yet a work-in-progress, I invite you to evaluate the methodology and its usefulness for linguistic research, as well as the plausibility of results that the discussed techniques can produce.

9 June

Marina Chumakina, Steven Kaye (Surrey Morphology Group, University of Surrey)

Attributivization in Nakh-Daghestanian: exploring the possibilities

Abstract

This talk looks at the formal and functional properties of Nakh-Daghestanian attributivization – the morphosyntactic process found widely across the family which produces adnominal modifiers out of a range of bases, from entire syntactic phrases down to units below the word level.

Beginning with a characterization of the core morphosyntactic phenomenon based on examples from different Nakh-Daghestanian branches, we then survey some of the ‘variations on a theme’ that emerge from a more detailed look at the family, highlighting diversity of various kinds in both the input and the output of attributive formation across languages. Finally, the talk focuses in on one language in particular: Archi, an outlying member of the Lezgic branch. While in many ways its characteristics fit neatly into the general picture built up so far, attributivization in Archi is remarkable, because its productivity as a source of modifiers has all but done away with the need for a true adjective class.

2 June

Nicola Lampitelli, Neige Rochant (Université Paris Nanterre) in collaboration with Lasha Kvlividze and Ketevani Lomidze

Increasing valency in Tsova-Tush (Nakh-Daghestanian)

Abstract

Tsova-Tush has two valency-increasing, agent-introducing operations (Wichers-Schreur 2024): a complex suffix -CM-i glossed TR, which has high productivity in the creation of transitive verbs from non-verbal stems, and limited productivity when applied to verbs; and a suffix -it glossed CAUS, which can virtually apply to any verb. We inquire into the distribution of the two suffixes in order to understand their respective semantics and functions. More specifically, we address the following questions: (a) How do these suffixes partition the verbal lexicon? Relatedly, do they sometimes overlap (i.e., apply to the same verb), and do they combine together and how? (b) When they overlap, how do their readings differ? The issues raised in (a) entail the identification of all bases, which is rendered complex by a very high propension of the TR suffix to apply to bound roots of unknown meaning and/or category (e.g., *tag-CM-i ‘do’), including borrowings from Georgian verbs. We intend to address it via corpus exploration (Hauk 2016-2019, Kakashvili & Skopeteas 2025, Wichers-Shreur 2026) and elicitation. As regards question (b), preliminary field research conducted in 2025 suggests a direct vs. indirect distinction, translated into an intentional vs. unintentional reading.

19 May

Yury Lander & Vasiliy Zerzele (HSE University)

West Circassian distributive constructions: how do they do it?

Abstract

West Circassian distributive constructions display many properties that make them different at least from distributive constructions in Standard Average European languages: this concerns both their alleged distributive quantifiers and their alleged distributive numerals. In this talk we will discuss these constructions (partly described in Nikolaeva 2012, Lander 2012, Arkadiev & Lander 2013; see also Minor 2005 on similar constructions in closely related Kabardian) and offer a hypothesis of how these constructions can function.

Arkadiev P., Lander Yu. 2013. Non-quantificational distributive quantifiers in Besleney Kabardian. Snippets 27: 5–7. Lander, Yu. 2012. Релятивизация в полисинтетическом языке: адыгейские относительные конструкции в типологической перспективе [Relativization in a polysynthetic language: West Circassian relative clause constructions in a typological perspective]. Cand. (PhD) diss. Moscow: Institute of Oriental Studies RAS. Minor, S. 2005. Кабардино-черкесский язык как полисинтетический [Kabardian as a polysynthetic language]. Diploma thesis. Moscow: Moscow State University. Nikolaeva L. 2012. Quantifiers in Adyghe. In: E.L. Keenan, D. Paperno (eds), Handbook of Quantifiers in Natural Language. Dordrecht: Springer. P. 21—82.

12 May

Maria Ermolova (HSE University)

On the Question of the Origin of the Past Tense Form masc.sg. in the Ukrainian Language

Abstract

In the talk I will focus on the historical origins of the past tense form in the masculine singular in the Ukrainian language. Analyzing Old Ukrainian texts from the early 16th to the late 17th centuries, as well as texts from the 18th century, allows us to conclude that Ukrainian past tense forms like ходив became widespread, at least in the central and eastern territories of Ukraine, by the end of the 18th century (although they did not yet completely displace the forms ending in -л-!). In the analyzed texts from the early 16th to late 17th centuries, forms ending in -в are practically absent. This leads to the conclusion that forms like ходив do not directly derive from participial forms ending in -въ, as proponents of the morphological hypothesis suggest, since such forms are absent in the texts for several centuries before the emergence of the new past tense form. The base participial form in the analyzed period is the form ending in -(в)ши. This form can be used as an independent predicate and can be employed in the subjunctive mood with the particle бы as the -л- forms. The absence of such contexts in the texts of preceding centuries indicates an enhancement of the predicative properties of the participial form and its transformation into a finite form. As a result of the parallel use of -л- and -(в)ши- forms in the same functions and as forms with the same grammatical content, contaminated forms ending in -лши appear in the texts. Considering all the aforementioned facts, it seems logical to assume that forms ending in -в are also a result of the contamination of the discussed forms, having taken a different path.

5 May

Jacob Lvovski (HSE University)

Reading group: Östling, R. (2016). 6. Studying colexification through massively parallel corpora. In P. Juvonen & M. Koptjevskaja-Tamm (Ed.), The Lexical Typology of Semantic Shifts (pp. 157-176). Berlin, Boston: De Gruyter Mouton.

Abstract

Large-sample studies in lexical typology are limited by whatever lexical information is available or can be obtained for all the languages in the study. Various types of word lists, from simple Swadesh lists to large dictionaries, can be used for this purpose. Unfortunately, these resources often present only a very fragmentary view of a given language’s vocabulary. As a complement, we propose an additional source of lexical information: parallel texts. Books such as the New Testament have been translated into thousands of languages, and it is possible to automatically extract word lists from their vocabulary, which can then be applied to lexical typological studies. In particular, we focus on studying colexification using a sample of 1001 different languages, based on 1142 translations of the New Testament. We find that although the automatically extracted word lists contain errors, their quality can be sufficiently good to find real areal patterns, such а5 е ‘tree’/’fire’ colexification that is widespread in the Sahul area.

28 April

Anastasia Alekseeva (HSE University)

Negation in Chicham: preliminary findings

Abstract

Chicham (also known as Jivaroan) is a small language family, consisting of four languages (Aguaruna, Shuar, Shiwiar and Wampis), that are spoken in the foothills of the Andes in Peru and Ecuador. Their genealogical relation is not very deep, but at the same time they show some notable differences, which allows for productive comparison. One of these differences lies in the domain of negation. In all Chicham languages there are two negation markers: -cha and -tsu. They do not show any differences in semantics, but they are used in different contexts, e.g. -cha — in the past tenses, and -tsu — in the present tense. However, their contextual distribution varies in different Chicham languages. In this talk I am going to show the distribution of the negation markers across Chicham as it is described in the grammars, provide a more detailed analysis of it and show some preliminary results based on a small corpus compiled from examples in the grammar.

21 April

Anastasia Alekseeva, Akhmed Dugrichilov, Lilya Fayzeeva, Konstantin Filatov, Diana Khayaleeva, Yury Koryakov, Timur Maisak, Maksim Melenchenko, Lena Mironova, Varvara Nikolaeva (HSE University)

Field trip report: the Kusur dialect of Avar

Abstract

We report on a 2026 field trip to Kutan Kambulat (Rutulsky district, Republic of Daghestan) to document the Kusur dialect of Avar, a highly endangered variety spoken by a small seasonally mobile community. The trip took place in early April (11 days), when most speakers are in the lowland settlement before moving to the mountain village of Kusur for summer pastoral work. The main aim of the trip was to compile a spoken corpus. We recorded spontaneous speech, focusing on narratives and conversation, as well as collected basic sociolinguistic metadata. A key interest is the dialect’s unusual contact setting, shaped by long-term interaction with Tsakhur and Azerbaijani languages. We briefly discuss the results of the field trip, our sociolinguistic data, as well as findings regarding some previously undescribed aspects of grammatical structure. The language corpus resulting from the field trip is intended for future analysis of micro-isoglosses and contact-driven variation.

George Moroz, Asya Antsupova, Viktoria Zubkova (HSE University)

Field trip report: Zilo Andi

Abstract

We present a report on a field trip to Zilo (Botlikhskiy district, Republic of Daghestan) conducted in April 2026. We had several documentation-focused goals:

expansion of the online dictionary of Zilo Andi
clarification of some morphology related information
solution of several morphological questions
rewriting of the morphological analyzer (currently, it processes pronouns, numerals, and adjectives). Additionally, we refined some details described in the grammatical discription of the Zilo Andi in [Kaye et al., to appear].

Additionally, in pursuing the goal of documenting previously undescribed aspects of Zilo Andi, we addressed several topics, including the syntax and semantics of temporal clauses, verbal actionality and stress pattern.

14 April

Aleksey Starchenko (HSE University)

GNMCCs are not always semantically based relativization: Evidence from Northern Khanty

Abstract

General noun-modifying clause construction (GNMCC) is “a single construction covering a wide range of semantic relations between the head noun and the clause” [Matsumoto et al. 2017]. These relations include relativization of various positions and extended NMCC: modification of valency-bearing nouns and constructions in which the relations are not deduced from any syntactic clues. In addition to the distributional criteria, the GNMCC in the strict sense are claimed to be built based on semantic/pragmatic mechanisms and not to be subject to syntactic constraints. This property brings together the GNMCC and semantically-based relativization [Comrie 1998]. On the other hand, constructions described as GNMCCs exhibit syntactic restrictions, primarily in terms of island effects [Kornfilt, Vinokurova 2017; Kim, Sells 2017; Nikolaeva 2017].

In the talk, I will focus on the Northern Khanty adnominal modification constructions with non-finite forms in -ti and -əm. Introducing novel data on the extended NMCCs, I will show that Northern Khanty non-finite forms meet the distributional criteria of the GNMCC. Despite this fact, in relativization contexts, they show syntactic restrictions of various kinds that can be related to the presence of a gap [Bikina 2019]. On the other hand, extended NMCC in Khanty show semantic/pragmatic effects expected from the classical NMCC. I argue that Northern Khanty non-finite forms constitute a single noun-modifying construction, that is, they share the same external syntax. Their difference in syntactic restrictions stems from the presence of a gap, while its absence gives a way to the semantic/pragmatics mechanisms. The interpretation of the Northern Khanty data presented here indicates that the GNMCC is not equal to semantically based relativization. Taken together with the data on island effects in other languages, one could claim that purely distributional definition is more favourable for building the typology of GNMCC.

31 March

Anna Panova (HSE University)

Towards a typology of nominal coordination systems

Abstract

This talk investigates systems of nominal coordination in the languages of the world. Though coordination has been examined from functional–typological point of view (e.g., Mithun 1988, Stassen 2000, Haspelmath 2007, Mauri 2008), some questions regarding syntax and semantics of coordinating constructions still remain unanswered. Based on a sample of 336 languages (118 families, 25 isolates, 4 macroareas), we consider factors such as number of constructions for nominal coordination, other functions of the coordinator, position of the coordinator in a construction with two conjuncts, and possibility of using the same coordinator for coordinating verb phrases and clauses. We will discuss minimum, average, and maximum number of coordinating constructions in one language, frequency of mono- and bisyndetic constructions, and attested paths of the grammaticalization of a coordinator. We will consider which are the most common sets of coordinators and whether they are “aligned” by syndetism in languages with multiple coordinating constructions; which coordinators can and can not be used in clausal coordination; which coordinators are most often mono- and bisyndetic. We tentatively identify five types of coordinating constructions: purely logical; introducing an additional participant; referring to participants as a set; describing change of states; (not) fully covering all participants. Additionally, we will discuss areal tendencies.

24 March

Olga Alieva (HSE University)

Testing Plato’s Chronology with Phylogenetic Methods

Abstract

This project critically reexamines the long-standing stylometric basis for the standard tripartite chronology of Plato’s dialogues (early–middle–late), arguing that core assumptions underpinning this model are methodologically dubious. While stylometric analysis has often been portrayed as a ‘scientific’ foundation for dating dialogues, the clustering patterns it reveals are not reliably correlated with any temporal sequence. Combining insights from Classics with methodologies drawn from evolutionary biology and computational stylometry, I apply modern phylogenetic tools—including tree-based and network-based models—to the entire Platonic corpus, for the first time integrating lesser-studied spuria from the Appendix Platonica. Using high-dimensional distance metrics (e.g., cosine similarity) across most frequent features and incorporating robustness checks via bootstrapping, I demonstrate that only two stylistic groupings emerge as stable under various models—what scholars would traditionally label ‘late’ dialogues (e.g., Laws, Timaeus, Philebus), and Republic (except for book 1). However, no statistically robust cluster corresponds to the so-called ‘early’ and the rest of the ‘middle’ dialogues, while some of the later dubia and spuria exhibit stylometric proximity to allegedly early texts. This suggests that the stylometric features thought to define philosophical ‘youth’ in fact correspond to ‘Socratic’ genre: stylometry measures style, not time. This critique is of dual interest: first, it underscores the need for philological caution when engaging with statistical claims about authorial development; second, it offers a cautionary tale about interpretability, domain assumptions, and the transfer of methods from bioinformatics to historical linguistics and literary studies.

17 March

Aigul Zakirova (University of Potsdam)

The expression of necessity in the Volga-Kama area: argument marking and (im)personality

Abstract

The Volga–Kama (VK) area encompasses Turkic (Tatar, Bashkir, Chuvash) and Uralic (Moksha, Erzya, Meadow Mari, Hill Mari, Udmurt) languages, all spoken in the Middle Volga region of Russia (Johanson 2000; Helimski 2003).

Drawing on spoken corpora, grammatical descriptions, and literature on modality in the VK area, I will present the types of constructions found in these languages, which include but are not limited to the following:

‘Need’-predicates, compatible both with infinitives and NPs (Hill Mari keleš, Chuvash kirlë, Tatar kiräk, Moksha er’avi).
Non-finite future/necessity forms (Chuvash -mAllA, Tatar -As- EXIST, Udmurt -ono).
Finite verbs from other semantic domains, grammaticalized into necessity meanings (Bashkir tura kilew ‘come straight’, Hill Mari väreštäš, Meadow Mari logalaš ‘end up’, Udmurt lunə̑ ‘be’).
Personal ‘must’-predicates, often borrowed (Tatar tiješ, Bashkir teješ, cf. also Kipchak-borrowed tijə̑š Southern Udmurt, Russian-borrowed dolžen in Moksha).

After discussing the diachronic sources of these constructions, I will focus on the argument marking and argument expression vs. omission in these constructions. Using data from spoken texts, I will show that with the majority of the VK necessity constructions, both in terms of type and token frequency, do not express their A- or S-argument and are (largely) impersonal. However, the diachronic tendency appears to be that in the Uralic languages of the area new personal constructions develop from already existing or borrowed material.

10 March

Eva Poliakova (HSE University)

Reading group: Levshina, N. (2022). Corpus-based typology: applications, challenges and some solutions. Linguistic Typology, 26(1), 129-160.

Abstract

Over the last few years, the number of corpora that can be used for language comparison has dramatically increased. The corpora are so diverse in their structure, size and annotation style, that a novice might not know where to start. The present paper charts this new and changing territory, providing a few landmarks, warning signs and safe paths. Although no corpus at present can replace the traditional type of typological data based on language description in reference grammars, corpora can help with diverse tasks, being particularly well suited for investigating probabilistic and gradient properties of languages and for discovering and interpreting cross-linguistic generalizations based on processing and communicative mechanisms. At the same time, the use of corpora for typological purposes has not only advantages and opportunities, but also numerous challenges. This paper also contains an empirical case study addressing two pertinent problems: the role of text types in language comparison and the problem of the word as a comparative concept.

3 March

Lena Mironova (HSE University)

Verbal plurality in Papuan languages

Abstract

Despite its global distribution and increasing interest in recent years, verbal plurality (VPL) remains an inconsistently treated grammatical domain. This talk investigates patterns in the (co)expression and formal encoding of VPL functions in order to clarify its internal structure. The analysis draws on a large convenience sample of Papuan languages (164 languages from 42 families and 19 isolates), that is, the non-Austronesian languages of New Guinea and the surrounding islands, where VPL is widely attested but has not yet been examined systematically.

In this study, VPL is defined as the verbal encoding of plurality independent of other grammatical categories such as person or gender. Four major functional types are distinguished: collective (non-individuated participant plurality), distributive (individuated participant plurality), intRAoccasional (repetitions within a single occasion), and intERoccasional (repetitions across separate occasions). The results highlight VPL as a heterogeneous yet internally structured functional domain. They suggest a continuum of functional adjacency from collective to intERoccasional plurality and reveal distinct tendencies in the coexpression and formal marking across the different functional types.

24 February

Maksim Melenchenko (HSE University)

Omission of the light verb kin- ‘do’ in Shughni complex verbs

Abstract

In the Shughni language (‹ Eastern Iranian), spoken in the Pamir mountains, the majority of verbal lexical meanings are expressed with multiword constructions called “complex verbs”. The talk focuses on a puzzling phenomenon: in complex verbs with the light verb kin- ‘do’ (for example, rāng kin- ‘paint’, lit. ‘color-do’) the root of the light verb can sometimes be omitted. In such instances, the subject agreement suffix on the verb attaches to the non-verbal component of the complex verb (for example, rāng kin-um ‘I color [smth]’ → rāng-um). This raises many interesting questions about morphosyntactic properties of the resulting construction (for example, about the phrasal / lexical status of the non-verbal component). In the talk, I will discuss these questions and their relation to the general phenomenon of complex verbs in Shughni, its diachronic development and the role of language contact in this process, as well as draw unexpected typological parallels (for example, with the Lezgic language [‹ East Caucasian]).

17 February

Nikita Muravyev (HSE University)

Light verbs as a missing link in grammaticalization: a case study of Russian posture verbs and a brief typological overview

Abstract

Posture verbs frequently undergo grammaticalization across languages. In the literature, their typical grammaticalization path is described as developing from literal posture use to an aspectual marker and/or a locative or existential copula. However, what is often overlooked is the wide range of light verb constructions (LVCs) — idiomatic expressions consisting of a semantically bleached verb and a predicatively used syntactic constituent (usually NP or PP), e.g. German unter Druck stehen ‘be (lit. stand) under pressure’ or Russian sidet’ na diete ‘be (lit. sit) on a diet’. In this talk, drawing on Russian corpus data, I demonstrate that such uses cannot simply be treated as ordinary copular constructions, as they convey more specific meanings rooted in residual semantic components of posture. I argue that these meanings are quasi-grammatical and pre-grammatical, in that they reflect a lower degree of grammaticalization and possibly represent an intermediate stage in the diachronic development of posture verbs. Finally, I briefly compare posture-based LVCs across twelve Eurasian languages and discuss their typological variation.

10 February

Mark Stoneking (Biométrie et Biologie Évolutive, UMR 5558, CNRS & Université de Lyon)

The Genetic History of the Caucasus

Abstract

To paraphrase Tolstoy, all populations are alike in have interesting histories, but each history is interesting in its own way. In the case of the Caucasus, the interest centers around the extensive linguistic diversity (in particular, the relationship of Caucasian populations speaking Indo-European or Turkic languages to those speaking Caucasian languages), the position of the Caucasus as a potential crossroads for contact between the East and the West, and the impact of the Caucasus Mountains on the genetic diversity and structure of populations living in the mountainous regions. In this presentation, which I shall endeavour to make accessible to non-geneticists, I will take an historical approach: first, I will describe early studies of genetic variation in Caucasian populations that I carried out with my colleague, Ivane (Vano) Nasidze, that focused mostly on analyses of the maternally-inherited mitochondrial DNA (mtDNA) and the paternally-inherited Y chromosome. I will then discuss the more detailed insights into population history provided by analyses of genome-wide variation in modern human populations from the Caucasus, followed by the additional insights arising from the recent studies of ancient DNA. The results to date of this relatively under-studied region indicate a complex history of both contact and continuity, and a major impact of the mountainous regions on the genetic structure of the populations living there.

27 January

Petr Rossyaykin (Lomonosov Moscow State University)

Indefinites, scalar particles, and question semantics

Abstract

According to a barely controversial generalization, the licensing of (at least some) negative polarity items (NPIs) is dependent on a particular entailment pattern between the assertion and its alternatives, viz. (entailment) scale reversal or downward entailingness (Fauconnier 1975, 1978; Ladusaw 1979; et seq.). Questions are one of the environments in which NPIs are licensed (e.g. Did you eat anything?), yet there is no obvious entailment relation between questions. This raises the puzzle of why NPIs are acceptable in questions. In this talk I will present and discuss a cross-linguistic dataset concerning the distribution of scalar particles (like English even), showing that indefinite NPIs and NPIs with scalar particles behave differently w.r.t. their acceptability in (polar) questions. In particular, PQs are not a scale reversal environment for scalar particles (contra some earlier proposals). I will discuss the consequences of this observation for the theory of NPI licensing.

20 January

Anastasia Panova (Stockholm University)

Subordination strategies in Gawarbati (Indo-Aryan): an areal-typological perspective

Abstract

Gawarbati is an under-described Indo-Aryan language spoken by approximately 20,000 people in the border area between Pakistan and Afghanistan. Since 2021, it has been documented by a Swedish-Pakistani team under the supervision of Henrik Liljegren (Stockholm University). One of the main outputs of the documentation project is a spoken corpus containing more than 20 hours of transcribed, glossed and translated speech from various genres. The focus of this talk will be on the use of finite subordination strategies in the Gawarbati corpus. I will start by presenting an overview of finite subordination strategies in neighboring Indo-Iranian languages. Against this background, I will describe the functions of each of the subordinators attested in the Gawarbati corpus. On the basis of the analysis of the functional distribution of various subordinators, I will try to reconstruct the possible stages in the evolution of subordination strategies in Gawarbati and discuss the role of language contact in this process.

Seminar schedule 2025

23 December

Leah Finkelberg, Polina Nasledskova, Johanna Nichols (HSE University)

Progress on causative alternations and noun synthesis databases

Abstract

In this talk two on-going projects will be discussed. The causative alternations project is dedicated to collecting and analyzing the data on causal-noncausal alternations in verbs. 18 pairs of verbs across 238 languages (from various language families and continents) have been collected and coded. The noun synthesis project is dedicated to collecting and analyzing inflectional noun categories in 172 languages. In this talk, we are going to talk about typological findings and geographic distribution of features present in the collected data. We are going to discuss 1) Is there a tradeoff between noun and verb synthesis or between causativization and decausativization (worldwide or continent by continent); 2) Is lower morphological complexity a feature of more archaic patterns?; 3) Are there resemblances in causal / non-causal patterns or in noun inflectional categories between, on the one hand, the languages of Western North America and Australasian languages, and, on the other hand, between the languages of Eastern North America and Siberian languages. We will compare our observations to the conclusions made in [Sokur & Nichols, 2018], [Hartmann & Nichols, forthcoming], and in [Nichols 2024].

16 December

Maria Kholodilova (Institute for Linguistic Studies, HSE University)

Locative relativization in Slavic languages and beyond

Abstract

In most European languages, locative relativization involves competition between at least two relativization strategies, roughly corresponding to English house in which I live vs. house where I live. The former strategy is more explicit, as it specifies the spatial relation, while the latter neutralizes at least the distinction between ‘in’ and ‘on’. Based on my current sample of 12 Slavic and 8 non-Slavic European languages, I will discuss the corpus distribution of these strategies with particular attention to the impact of head noun semantics. I propose that there is a consistent tendency toward greater explicitness of marking along the following hierarchy of head nouns: ‘place’ < ‘house’ < ‘book’, i.e., the marking is more explicit with the nouns that are less likely to appear in locative expressions. These findings align with a broad range of phenomena showing more explicit marking in less frequent configurations — both in relative clauses (Keenan, Comrie 1977; Fox, Thompson 2007; Cristofaro, Ramat 2007) and in non-relative locative expressions (Stolz & al. 2014; Haspelmath 2019).

9 December

Maria Ermolova (HSE University)

Gerunds in the Russian language of the 17th century: a transitional period in the history of their grammatical development

Abstract

I will present the results of a corpus study on the functioning of gerunds in the Russian language of the 17th century. Comparing these findings with data from the 18th century allows tracing the stages of the evolution of the grammatical meaning of gerunds and making adjustments to existing theories. The situation with the use of gerunds remains relatively stable throughout the 17th century. Continuing the Old Russian tradition, gerunds can be used in the living language as finite forms for both past and present tenses. Compared to the earliest period, gerunds in the 17th century become even more similar to the -л- form, as evidenced by their use with the particle бы in the subjunctive mood, which is not documented in early texts. The 17th century demonstrates how gerunds lost their tense meaning, acquiring a relative one depending on the tense of the main predicate, while still remaining formally autonomous predicates. The establishment of the gerund’s function as a predicate in a dependent adverbial clause occurs in the 18th century.

2 December

Maria Pupynina (Institute for linguistic studies)

Multilingualism in the Northeastern Siberia

Abstract

In this talk, I will present the results of the ongoing study of small-scale multilingualism in the Northeastern Siberia (2017-present). The study focuses on the languages that have been in contact for more than 1,5 century: Tundra Yukaghir, Yakut, Chukchi, Even, Naukan and Chaplino Yupik Eskimo. I will discuss small-scale multilingual areas in the north of Yakutia (Lower Kolyma) and Chukotka (Chukchi peninsula) and touch upon the linguistic outcomes of long-lasting multilingualism. Both multilingual areas involve unrelated languages, and Lower Kolyma individual language repertoires can consist of five unrelated lects. The possible ways to measure the level of language distance/similarity/convergence between the languages of these areas will be discussed.

25 November

Zaira Khalilova (Institute of Linguistics, HSE University)

Typology of verbal borrowings in Tsezic and beyond

Abstract

Khwarshi, which is distant and geographically separated from the other Tsezic languages and surrounded by Avar- and Andic-speaking villages, combines all three strategies identified by Wohlgemuth (2009) for the integration of borrowed verbs: a light verb strategy is used for Russian borrowings, while both direct and indirect insertion are used for borrowings from Avar and Andic. Combining several strategies in a single language is typologically a rare phenomenon; the other Tsezic languages make use of only one integration strategy. The paper explains the factors underlying the distribution of verbal borrowing strategies within the Tsezic languages. The crucial factor accounting for this distribution is the variation found across the family in the degree of bilingualism and language contact with donor languages.

18 November

Anna Grishanova (HSE University)

Stress variation in the speech of L2 Russian and dialectal speakers: the case of verbs in past indicative

Abstract

Stress variation in standard and dialectal Russian is an interesting and well-researched phenomenon. While A. Zaliznjak (1985) attributes stress variation to the pragmatic factor, W. Lehfeldt (2006) suggests that frequency of the lexeme plays an important role as well. Data presented in the study by D. Savinov, E. Skachedubova, A. Somova (2020) indicates that sociolinguistic factors like age are crucial to thoroughly describe stress variation in Standard Russian. Differences in stress patterns in various Russian dialects are often explained by the history of the dialect under observation. To our knowledge, the variation of stress in the speech of L2 Russian speakers has not been discussed before. This study aims to grasp what factors influence the stress variation of the verbs in past indicative in dialectal and L2 Russian speakers. The data comes from dialectal and bilingual corpora of the Linguistic Convergence Laboratory. Specifically, I investigate the verbs that have been previously outlined by D. Savinov, E. Skachedubova, A. Somova (2020).

11 November

Anna Panova, Yury Lander (HSE University)

Competing coordinating constructions in the languages of the North Caucasus

Abstract

In this talk we discuss systems displaying several (two to three) constructions for nominal coordination in eleven languages of the North Caucasus. Our sample includes representatives of West Caucasian, East Caucasian, Indo-European and Turkic languages. The data come both from corpora and from elicitation. We tentatively propose a syntactic prototype of nominal coordination based on the collective contexts and certain other formal properties. We suggest that the variation observed among the systems discussed in the talk results from competition between coordinate constructions covering contexts closer to this prototype and additive constructions extending from less prototypical contexts.

28 October

Timofei Dedov, Alexander Letuchiy (HSE University)

Distant negative concord in Ashkharywa Abaza

Abstract

Our talk focuses on the phenomenon of negative concord (NC) in the Ashkharywa dialect of Abaza, a West Caucasian language and a close relative of Abkhaz. The class of negative concord items contains items like aʒ̂-g’ə́ ‘nobody’. Although these elements do not contain a negative marker in the proper sense, they correspond to the definition of NCI, because they are usually licensed by a predicate negation (see, for example, Zeijlstra 2004, Giannakidou 2006 for the general analysis of negative concord).

In the talk, distant negative concord will mostly be discussed: this notion covers cases when a negative concord is licensed by a predicate negation from a higher clause (‘I do not want to see anyone’, and not a clausemate one (‘I do not see anyone’). While local negative concord is described in detail in descriptive, typological, and theoretical studies, not all relevant parameters of distant NC organization have been analyzed. The main question to be considered is what factors facilitate the distant NC or make it problematic. In Russian, for instance, finiteness of the embedded verb seems to affect the possibility of this negative concord type: the distant NC is possible in most nonfinite constructions (Я не хочу никого обидеть), but marginal or highly colloquial in finite ones (?Я не хочу, чтобы ты никому звонил).

In Abaza, finiteness and the opposition of finite vs. nonfinite forms is organized differently from the European finite vs. nonfinite form opposition (see a recent paper by Arkadiev (2023) for details). As will be demonstrated in the talk, finiteness itself cannot be regarded as the main factor of the (im)possibility of distant NC, although sometimes different complement types behave differently regarding the distant NC.

The main factor is the semantic type of the matrix verb. It turns out that factive verbs of knowledge and emotional attitude, modal verbs, opinion verbs and so on differ in their (in)availability to license negative concord items in subordinate clauses. In our talk, we will discuss semantic parameters of matrix verbs that can account for these differences.

21 October

Irina Politova (HSE University)

Reading group: Ploeger, Esther, Wessel Poelman, Miryam de Lhoneux, and Johannes Bjerva. 2024. What is “typological diversity” in NLP? In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pp. 5681–5700. Association for Computational Linguistics, Miami, Florida, USA.

Abstract

The NLP research community has devoted increased attention to languages beyond English, resulting in considerable improvements for multilingual NLP. However, these improvements only apply to a small subset of the world’s languages. An increasing number of papers aspires to enhance generalizable multilingual performance across languages. To this end, linguistic typology is commonly used to motivate language selection, on the basis that a broad typological sample ought to imply generalization across a broad range of languages. These selections are often described as being ‘typologically diverse’. In this meta-analysis, we systematically investigate NLP research that includes claims regarding typological diversity. We find there are no set definitions or criteria for such claims. We introduce metrics to approximate the diversity of resulting language samples along several axes and find that the results vary considerably across papers. Crucially, we show that skewed language selection can lead to overestimated multilingual performance. We recommend future work to include an operationalization of typological diversity that empirically justifies the diversity of language samples. To help facilitate this, we release the code for our diversity measures.

14 October

Konstantin Filatov (HSE University)

More evidence for withdrawal effects: the case of Andic future marking

Abstract

The talk is a revised and expanded version of the author’s SLE 2025 presentation. It discusses the system of future grams in the Anchiq dialect of Karata (< Andic < Nakh-Daghestanian). The two core future forms can be described as marking future certainty vs. future possibility. While this type of system is not typologically uncommon, the diachrony of similar systems in closely related Andic languages (Godoberi and Bagvalal) poses a challenge for Bybeean source determination principle. This principle requires attributing all semantic differences between grams to differences between their grammaticalization sources. However, the principle can not fully account for the emergence of Andic future systems: the same source seems to have developed into the certain future gram in Anchiq, while having developed into the uncertain future gram in Godoberi. The talk presents a proposed diachronic scenario for the three Andic systems and explains their differences using the notion of withdrawal effects (as defined by Reinöhl and Himmelmann, 2017). This notion refers to a situation where the semantic network of an earlier gram is deformed by the «intrusion» of a newer one.

7 October

Marina Gasanova (Daghestan State University)

О работе Центра изучения родных языков Дагестанского государственного университета

Abstract

Семинар посвящён деятельности Центра изучения родных языков Дагестанского государственного университета, созданного в 2016 году при поддержке Министерства по национальной политике и делам религии Республики Дагестан. В рамках доклада будут представлены основные направления работы Центра: популяризация и просветительская деятельность, организация семинаров для учителей, а также реализация научно-исследовательских проектов. Особое внимание будет уделено задачам по сохранению и развитию родных языков народов Дагестана.

30 September

Alina Russkikh (HSE University)

From Additivity to Optative? Evidence from the Avar Language

Abstract

In Avar, there are two functionally distinct markers that are synchronically homonymous. The first is the multifunctional additive particle =gi. The second is the optative suffix -gi. In [Zhirkov 1936: 157–158], the particle =gi and the optative marker -gi are treated as one and the same marker. Z. Mallaeva notes that these two forms may have originated from a common source, despite their synchronic functional differences. N. Dobrushina discusses additive particles as one of the possible sources of optatives, with material parallels attested in four languages of the Eastern Caucasus, including Avar [Dobrushina 2024]. However, in typological works on additives [Forker 2016; Gast & van der Auwera 2011], the use of additive markers in optative constructions is not attested.

All this suggests, on the one hand, that in several languages of the area additive and optative markers coincide, while on the other hand the semantic link between the optative and the additive is far from straightforward. In this talk, I will present field data and discuss possible motivations for the development of optative meaning from the additive particle, as well as counterarguments to this hypothesis.

24 June

Polina Nasledskova (HSE University)

Compatibility of ordinal numerals with nouns of various semantic classes

Abstract

In this study, I investigate the compatibility of ordinal numerals with nouns of various semantic classes in 5 languages: Russian, English, Spanish, Indonesian and Rutul. The comparison is based on the parallel translations of the New Testament. Nouns of four semantic classes are present in the data: names of living creatures, inanimate objects, time periods and abstract concepts. Additionally, I analyze constructions of the type “for the X-th time”. According to the data, different semantic classes of nouns are used with ordinal numerals with varying frequency. The talk includes the discussion of the results and their possible theoretical implications.

3 June

Elena Shvedova, Elizaveta Zabelina, Yuri Koryakov (HSE University)

Quantifying lexical distances among Nudiz, Mahmudi, and Verin Dvin Urmi (North-Еastern Neo-Aramaic)

Abstract

Our study documents and analyzes lexical data from four Christian North-Eastern Neo-Aramaic varieties: Mahmudi, Nudiz, Verin Dvin Urmi, and Urmia Urmi, focusing on the previously undescribed Mahmudi and Nudiz. We provide correspondences from these lects for an extended 226-item basic vocabulary list collected for this study with etymologies, cognates from earlier Aramaic, and loanword sources. Cognate share calculations reveal that all four varieties belong to a single language. Notably, Mahmudi and Verin Dvin Urmi—spoken in the same village for 67 years—exhibit stronger convergence than with their genealogical relatives (Nudiz and Urmia Urmi respectively), highlighting contact-driven divergence from inherited patterns.

27 May

Timur Maisak (Institute of Linguistics RAS & HSE University)

Numeral ‘one’ + additive ‘also, even’: one source structure for two Udi particles

Abstract

In Udi, a Nakh-Daghestanian language of the Lezgic branch, two function words sal and saal seem to share the same source structure: both appear to represent combinations of the numeral sa ‘one’ with the additive clitic =(a)l ‘and, also, even’. At the same time, synchronically the two are both formally and functionally distinct. The word sal is an emphatic negative polarity item ‘not a (single one)’, ‘(not) at all’. The word saal can be used with the meaning ‘again, one more time’, but even more often, one finds it as a coordinating conjunction ‘and’. Cross-linguistically, the numeral ‘one’ is a common grammaticalization source: for example, the World Lexicon of Grammaticalization lists nine paths leading from ‘one’ to a grammatical marker. Additive markers (‘and, also, even’) are also known to take part in the derivation of various grammatical forms or classes of forms. What makes the two Udi particles unusual is the fact that two very different words go back to one and the same combination of two grammaticalization sources. In the talk, I plan to illustrate the uses of both sal and saal (mainly based on textual data from the Nizh dialect). I will also discuss some structural and functional parallels of the two Udi words found in the languages of the area.

20 May

Ellie Wren-Hardin (The Ohio State University)

Computer-Assisted Differentiation of Loans and Cognates: Possibilities and Pitfalls

Abstract

The past several decades have seen a dramatic rise in the creation of computational and computer-assisted approaches to cognate and loanword detection. However, many cognate and loanword detection methods rely on identifying surface lexical similarity, creating a challenge for the differentiation of family-internal loanwords from cognates. While true cognates typically demonstrate higher lexical similarity than non-cognates due to shared genetic inheritance, loanwords also demonstrate higher lexical similarity than non-borrowed words, meaning lexical similarity is an insufficient metric on its own. In this talk, I will discuss how computational and qualitative methods can be combined to tackle the challenge of differentiating cognates from family-internal loanwords in the Northeast Caucasian language family. First, I will discuss the sociolinguistic factors in the Northeast Caucasian language family that make it useful for studies of family-internal contact. Then, I will talk through several computational methods for cognate and borrowing detection and explain why they alone are insufficient for this specific challenge. Lastly, I will demonstrate how utilizing computational methods in conjunction with knowledge of the languages and sociolinguistic factors involved in a contact situation provides improved results over computational methods alone, exemplifying the benefits of a “computer-assisted” approach.

13 May

Maria Ermolova (HSE University)

On the grammaticalization of -no/-to-forms in the history of the Polish language in comparison with Middle Russian

Abstract

I will examine the history of indefinite personal -no/-to-forms analyzing two Polish texts from different centuries (the Bible of Queen Sophia from the mid-15th century and “Roczne dzieje kościelne” from the early 17th century). I will analyze contexts with past passive participles (PPP) in a predicative position, where actions of the preterit type are described, as well as examples with PPP in the subjunctive mood – those in which contemporary Polish -no/-to-forms are used. Based on the conducted analysis, the following conclusions can be made about the stages of the formation of the -no/-to-form. The first stage in the evolution of PPP was the loss of the copula needed with the participial form: PPP begins to be used as a finite form of the past tense (on zabit instead of on był zabit). Concurrently, by grammaticizing and losing its participial properties, PPP gradually loses agreement with the semantic object, with which it originally agreed, and solidifies in the neuter gender form. Losing its nominal properties and retaining exclusively verbal ones, PPP in the form of -no/-to ceases to agree with the semantic object and begins to govern it (zabito go). The analyzed texts demonstrate how, throughout the 15th century and up to the early 17th century, there is a decrease in the frequency of contexts like on zabit due to an increase in the frequency of contexts like zabito go. During this period, in the paradigm of -no/-to-forms, intransitive verbs are also included, as the former PPP loses the characteristics of passive participles, which could only be formed from transitive verbs.

6 May

Viktoria Zubkova, George Moroz, Chiara Naccarato (HSE University)

Phonological adaptation of Russian borrowings in Avar-Andic languages

Abstract

In this talk we will discuss processes of phonological adaptation of Russian borrowings in languages of the Andic branch of East Caucasian. Dictionary data from eight Andic languages are compared to data from Avar, the closest relative of Andic within the family and a major lingua franca in northern Daghestan. We will illustrate the process of data annotation, our qualitative analysis of the correspondences, and their modeling with a mixed effect logistic regression. As we will show, modeling the probability of loanword adaptation gives a hierarchy of languages that is partially explained by a language’s history of direct contact with Russian and authorship of the dictionaries, but does not fully match with geographic distances and phylogenetic classifications, nor with population sizes. Factors known to play a role in processes of loanword adaptation, i.e., time depth and frequency of use, show the expected effect but their predictive strength is not statistically significant.

29 April

George Moroz, Chiara Naccarato, Natalia Koshelyuk, Maya Artyukh (HSE University)

The DiaL2 project: progress and future plans

Abstract

In this talk, we will discuss the progress and future plans of the DiaL2 project, which is aimed at studying linguistic variation in spoken corpora of bilingual and dialectal Russian. In particular, we will discuss the following topics: - non-standard negative existential constructions in L2 Russian; - preposition drop in Khanty and Mansi L2 Russian; - non-standard word order in noun phrases with a genitive modifier in L2 Russian.

22 April

Asya Alekseeva (HSE University)

Verb system of Aguaruna

Abstract

Aguaruna is a language spoken in Peru and Ecuador, in the foothills of the Andes. It belongs to the Jivaroan language family, which consists of only four languages (Shuar, Achuar-Shiwiar, Huambisa, and Aguaruna). These languages are located between two areas: the languages of the Andes and the languages of the Amazon, and they exhibit properties similar to both areas. Aguaruna can be characterized as an agglutinative language with a high degree of fusion and complex morphophonology. In this talk, I will present a work in progress, which is part of a study on verb systems of the Jivaroan languages. I will provide an overview of the Aguaruna verb system and discuss some of its peculiar features, such as the orientation of the verb system to illocutionary force/modality and the discourse-oriented nature of the tense system.

15 April

Egor Kashkin, Irina Khomchenkova (Vinogradov Russian Language Institute of the Russian Academy of Sciences)

Russian in contact: projects of the Language Contact Group at the Vinogradov Russian Language Institute

Abstract

The presentation will outline the research on language contact which is carried out at the Vinogradov Russian Language Institute. We are interested in contact situations where Russian is either the source or the target language.

First, we will discuss the influence of Russian on neighbouring languages, particularly instances of code-switching and borrowings. We will elaborate on corpus-based case studies of Russian conjunctions, which interact with the local system of clause combining.

Second, we will present our projects on non-standard varieties of Russian (mainly those used by the speakers of different Uralic languages, with some parallels drawn from the study of Russian speech in Kyrgyzstan and preliminary typological observations). In addition to the evidence from participant observation, they involve data from the corpus specifically designed to annotate contact-induced features. The main principles of the development of such a corpus will be summarized. Selected corpus-based case studies will be presented (e.g. prepositional phrases in the Russian variety used by Nganasan speakers).

8 April

Тимкин Тимофей Владимирович (Институт филологии СО РАН)

Исследование длительности гласных в языках народов Сибири: новые методы и материалы

Abstract

Длительность является одним из ключевых признаков гласных, участвующих в организации вокалической системы. Однако высокая вариативность темпа речи, слоговых и ритмических факторов затрудняют анализ этого признака в типологической перспективе. Для учета этих факторов и построения динамической модели длительности гласных в Институте филологии СО РАН ведется сбор фонетического материала по тюркским и финно-угорским идиомам Сибири. Комплексный подход предполагает использование экспериментально-фонетических методик, а также сбор соматических данных с помощью УЗИ и МРТ. В докладе будут представлены предварительные результаты работы.

1 April

Oxana Goncharova (Pyatigorsk State University)

Emotion Recognition in Bilingual Speech: A Comprehensive Deep Learning-Based Method

Abstract

This study explores emotion recognition in bilingual speech through a comparative analysis of machine learning (ML) and deep learning (DL) techniques. Initially, a hybrid framework was implemented, combining Mel-frequency cepstral coefficients (MFCCs) with prosodic features (e.g., pitch, intensity, speech rate) and conventional ML algorithms. While preliminary results were encouraging, the approach suffered from overfitting and limited robustness to minor data variations. To overcome these limitations, we propose a deep learning architecture that integrates a CNN-based autoencoder with an embedding network. Experimental evaluations demonstrate a significant enhancement in performance metrics compared to traditional methods, highlighting the potential of multimodal frameworks for emotion analysis in bilingual speech.

25 March

Eva Poliakova (HSE University)

Field notes on Khwarshi: the biabsolutive construction and information structure (PART 2)

Abstract

This talk will be dedicated to two topics which were the focus of my research during a field trip to the Khwarshi language (Nakh-Daghestanian) that took place in January of this year. Therefore, the talk will be divided into two parts.

First, I will discuss the biabsolutive construction in Khwarshi. In this construction both arguments of a transitive verb are marked by the absolutive case, and the verb form is restricted to (periphrastic) progressive. I will discuss some of its properties, including ones that were not discussed before (e.g. its behavior in an embedded clause). I will also show that some speakers allow forming an absolutive construction not only with a progressive form, but also with a resultative one, though in this case some additional restrictions seem to hold.

Second, I will discuss some findings about information structure in Khwarshi. This topic was investigated mostly based on question-answer tasks and tasks involving picture description. I will show that different word orders can be used to mark focus in Khwarshi, including insertion of the focused constituent inside a periphrastic verb form and inversion of lexical verb and auxiliary.

18 March

Eva Poliakova (HSE University)

Field notes on Khwarshi: the biabsolutive construction and information structure

Abstract

11 March

Masha Volina (HSE University)

Demonstrative Pronouns in Khwarshi

Abstract

The Khwarshi language (Nakh-Daghestanian) has a rich system of demonstrative pronouns. Three series of demonstratives can be distinguished: žu — idu, o-CL-žu — a-CL-du and hobo-žu — hobo-du. Each series includes proximal and distal pronouns (which can be used both attributively and substantively), a demonstrative adjective with a meaning close to such and several adverbs. Pronouns from all three series can function deictically and anaphorically, although there is a ‘primarily anaphoric’ series žu — idu and a ‘primarily deictic’ series o-CL-žu — a-CL-du. Also, the paradigmatic structure of the Khwarshi demonstrative system is quite complex.

In this talk, mainly based on my fieldwork data, I will describe the morphological structure of Khwarshi demonstratives, and the syntactic differences in their usage (mostly outlined in ‘A language for Guinness World Records: Fifteen (or more?) reflexive pronouns in Khwarshi’ by Yakov Testelets). I will also briefly discuss spatial and discursive factors that influence the choice of a deictic, as well as my hypotheses regarding the differences in their semantics and, accordingly, their functions.

4 March

Timofey Mukhin (HSE University, University of Liège), Michael Daniel (University of Tübingen)

From space to anaphora: there and back again

Abstract

In this talk, we consider the anaphoric uses of demonstratives in Mehweb Dargwa. The main goal is to explore how the primary spatial deictic meaning of demonstratives is refracted in the textual - anaphoric - dimension.

We found that the choice of the demonstratives cannot be fully explained in terms of discourse dimensions such as anaphoric distance (Givón 1983). In narrative uses of elevational demonstratives, the center relative to which the referent’s position is determined shifts as compared to its deictic uses. In deictic uses of the elevational demonstratives, the deictic center is primarily associated with the speaker. In their anaphoric uses, the elevation value is calculated wrt the most topical/activated referent.

We suggest that the deictic uses of demonstratives defined by spatial relation with the deictic center do not fully convert into the textual dimension of anaphora when the same demonstratives are used in narratives. While this is easily seen with elevational demonstratives, the question remains whether the same factor is not present in elevation-neuter demonstratives in Mehweb and cross-linguistically, where researchers attempt to provide a full account of their use in purely anaphoric terms.

25 February

Lena Mironova (HSE University), Yury Lander (HSE University), Shamset Unarokova (Adyghe State University)

One or two approaches to West Caucasian demonstratives

Abstract

In this talk, we discuss the demonstrative systems in (Temirgoi) West Circassian and (Ashkharawa) Abaza, which represent two branches of the West Caucasian family. Our data come from an experiment based on the questionnaire (Wilkins 2018), which helps to establish the parameters that affect the choice of a demonstrative in its exophoric non-contrastive function. Both West Circassian and Abaza languages have tripartite demonstrative systems. Our data show the relevance of both the distance parameter and the speaker- or addressee-anchoring parameter, as well as the parameter of visibility, the presence of spatial boundaries and the presence of gestures. However, there are significant differences between the systems of the two languages. We will present our experimental results, describe the meanings of each demonstrative, outline the structural differences between the systems, and suggest some generalizations. Finally, we will discuss two possible interpretations of our data: it remains to be determined whether these treatments are complementary or whether they should be strictly differentiated.

18 February

Maksim Stepanyants (HSE University)

An attempt at a comprehensive description of Modern Eastern Armenian additive marker ēl

Abstract

Modern Eastern Armenian (MEA) discourse markers have been generally neglected in the typological literature. However, there is one that has been included in the sample of Forker’s (2016) influential paper on additives’ polysemy, namely, ēl (էլ). A closer look at the semantics and morphosyntactic properties of this exponent reveals its broad polysemy, which can contribute to the theory of additive markers, cf. also (Gast & van der Auwera 2013. Its diachronic development also needs to be addressed: it presents a case of divergent development of multiple specialized markers (with different morphosyntactic properties) from a conceivable common source, possibly affected by areal influence. The marker ēl is special among other MEA focus markers (cf. Giorgi & Haroutyunian 2016) due to its almost unique enclitic status. In this talk an attempt will be made to address all these issues in a wholistic typologically-anchored approach.

11 February

Masha Krivolap, Maksim Melenchenko (HSE University)

Predicting Shughni gender with machine learning

Abstract

Our study aims to investigate the influence of various factors of gender assignment in the Shughni language (Eastern Iranian) using machine learning. We have trained several models to predict grammatical gender (feminine or masculine) on a dataset of 2,390 nouns from the Shughni-Russian dictionary. For training, we used both semantic features (semantic classes and vectorized Russian definitions) and formal features (word endings and the last vowel of the stem). Our results show that semantics plays a primary role in gender assignment in Shughni, as the proposed semantic features can correctly predict the gender for ≈80% of nouns in our sample. Formal features seem less significant and can correctly predict the gender for only ≈70% of nouns in the dataset. The correlation between these two types of gender predictors is high (especially for feminine gender), so combining them does not yield significantly better results.

4 February

Ivan Olkhov (HSE University)

Gender agreement slots in East Caucasian verbs: An areal-typological study and a case study of Andic

Abstract

In this talk I will discuss the findings of two studies of gender agreement on verbs in East Caucasian languages. In most languages within this language family, gender agreement on verbs is sporadic, meaning that some verb lexemes have an agreement slot in the root while others do not. The first part of my talk will focus on a typological investigation of sporadic agreement on verbs across the East Caucasian family, which was done for a chapter of the Typological Atlas of the Languages of Daghestan. I will provide information on the number of agreeing lexemes for those languages where such data are available. These numbers vary significantly, with some languages like Rutul and Tsakhur having agreement on all verbs, while others like Agul and Lezgian have none. Additionally, I will explore the possible positions of non-root agreement slots. The second part of my presentation will delve into a case study of the Andic branch. I will examine verbs with the same meanings in languages within the sample; for each meaning I check in how many languages it is expressed by agreeing verbs and in how many languages these verbs are cognate. By analyzing this data, we can draw conclusions about how the verb agreement slots in Andic are preserved.

28 January

Alexander Letuchiy (HSE University)

Abaza masdars: what regulates the choice of marking?

Abstract

In this talk, I focus on the types and properties of masdars (nominalizations) in Abaza (a language of the West Caucasian family spoken in Russia). A special feature of Abaza is that it has one marker of masdar (the suffix -ra) – however, masdars themselves fall into several types, based on the person marking. Masdars can inherit the argument marking from the verb (the polypersonal agreement with A and DO of transitive verbs, as well as S and IO of intransitive verbs), show possessive agreement with the argument of the masdar, take a definiteness marker a- or remain unmarked in the prefixal part. In this case, the Abaza system of masdars is rich and poor at the same time.

Each type of masdar marking is in a sense a separate complementation strategy. The four strategies are not freely combined with any matrix verb, but chosen according to semantics of the matrix verb (especially reality- and modality-related properties) and syntactic properties of the construction. Although Abaza has no canonical control structures, some features of masdar constructions are reminiscent of control / restructuring phenomena.

The existence of several masdar types are compatible with the fact that nominalizations, including masdars in Caucasian languages, occupy an intermediate place in the system: on the one hand, they denote a situation and inherit many verbal properties; on the other hand, they get some nominal properties. However, very often, as in English or Arabic, it is the syntactic construction with a nominalization that shows similarities with verbal vs. nominal constructions. In Abaza, this intermediate nature of nominalization is manifested in morphology.

The data, considered in the talk, are collected during fieldwork organized by the HSE University in 2024.

21 January

Elena Shvedova (HSE University)

Lability drift in Neo-Aramaic languages

Abstract

In this talk, I examine labile verbs in Neo-Aramaic languages (< Semitic), focusing on diachrony and semantics. Labile verbs, which can be used both transitively and intransitively without morphological change, are widespread in Modern Aramaic languages, in contrast to earlier Aramaic varieties where anticausative or causative marking was more prevalent. The verbal system of Christian Urmi (< North-Eastern Neo-Aramaic) can illustrate this expansion of lability: I analyzed 1811 verbs from the dictionary (Khan 2016) and at least 172 of them are labile.

Neo-Aramaic languages can be divided into two main genealogical groups: Eastern and Western Aramaic, which separated during the first millennium BC. In my study I use data from both branches. I categorize Neo-Aramaic labile verbs into three groups based on their historical development: (1) verbs such as ‘freeze’, ‘fill’, and ‘begin’, which retain lability from earlier stages of Aramaic; (2) verbs such as ‘open’, ‘break’, and ‘close’, which transitioned from anticausative marking in Middle Aramaic to lability in Modern Aramaic, reflecting parallel development in Eastern and Western varieties; and (3) verbs unique to Modern Western Aramaic (MWA), including ‘boil’, ‘dry’, and ‘wake up’. In other Middle and Modern Aramaic languages the meanings from the third group are expressed by causatively marked pairs, so the lability of these verbs in MWA represents a morphological innovation.

I will also propose some explanations for the lability drift in Neo-Aramaic languages of different branches, such as the phonetic loss of the anticausative marker, the expansion of verbs with four root consonants that cannot be causativized, and possible areal factors. The study is still a work in progress, so I would like to discuss some future plans, including the research of corpus data from historical texts and modern corpora to trace the development of labile verbs in more detail.

Seminar schedule 2024

17 December

Rita Popova (Saarland University)

Where have all the humans gone? Gender assignment of human nouns in Bantu

Abstract

The Bantu languages (Atlantic-Congo), a group of 400–500 varieties, are spoken on the southern part of the African continent, from Nigeria and Cameroon in the west, to the Kenyan coast in the east, and South Africa in the south. These languages are known for their grammatical gender (or noun class) systems, where nouns are categorized into as many as 19 classes that govern agreement in verbs, nominal modifiers, and other targets (Maho 1999). Unlike the gender systems in Indo-European languages, Bantu noun classes are not based on the sex distinction. Instead, the primary semantic contrast in Bantu gender systems lies between humans and non-humans. In a typical Bantu gender system, most nouns referring to humans are assigned to a single ‘human’ gender value (traditionally labelled as Gender 1/2 in Bantuist notation). In contrast, non-human nouns are distributed across several other gender values, often according to principles that are highly opaque (Corbett 1991, Katamba 2003). Occasionally, nouns denoting humans with unusual characteristics are found in gender values other than 1/2 (Van de Velde 2019). However, the gender assignment of human nouns has not been systematically investigated, and most of the widely accepted generalizations are derived from observations on a few well-studied Bantu languages. In this talk, I will demonstrate that gender assignment of human nouns is a parameter of intra-Bantu variation. My study is based on the investigation of more than 30 Bantu lexicons available at the RefLex database (Segerer & Flavier, 2011-2023). I will show that while some Bantu languages assign most human nouns to Gender 1/2, others have a significant number of human nouns in gender values other than 1/2. In fact, some languages seem to assign most human nouns outside 1/2, ‘scattering’ them across other gender values. Languages of my sample that exhibit this latter pattern come from the North-Western Bantu region, a zone traditionally recognized as the most diverse within the otherwise relatively homogenous Bantu-speaking world (Nurse & Philippson 2003, p. 165). I will argue that systems where human nouns are dispersed over different gender values challenge the traditional typological account of nominal classification. According to this view, human nouns — being the semantic core of any nominal classification system — are expected to consistently follow transparent semantic rules of gender assignment (Corbett 1991).

10 December

Martin Haspelmath (Max Planck Institute for Evolutionary Anthropology)

Language parameters and construction parameters in the CrossGram database collection

Abstract

Replicability of research results minimally relies on data accessibility, but the data should ideally be FAIR: Findable, Accessible, Interoperable, and Reusable. For technical interoperability, the CLDF standard (Forkel et al. 2018) could be used by typologists, though uptake seems to have been slow so far. In this presentation, I discuss the design of the CrossGram database collection, which is specifically designed for typological datasets (it has been public since the summer of 2024: https://crossgram.clld.org/), Here I describe how CrossGram enhances findability and reusability, and I highlight the two different data types that it supports (language parameters and construction parameters). CrossGram makes typological data more findable in that it “brings to light” what is often “hidden away” in supplementary spreadsheet files (or even tables in PDF files, though this is becoming rare). Research papers typically limit themselves to summary tables or graphs and a few small maps, but ideally we want to access all typological datasets with the amenities known from CLLD applications such as WALS Online (Dryer & Haspelmath 2013, wals.info) or Grambank (Skirgård et al. 2023, grambank.clld.org). These provide not only easy exporting in CLDF format, but also easy searching and sorting in data tables, as well as map visualization, and links to references and Glottolog language information. In addition, CrossGram provides glossed example sentences in tabular form, similar to the thousands of examples in the APiCS database (Michaelis et al. 2013). These are a particularly striking case of increased transparency, because it is not uncommon for example sentences to be hidden in PDF supplements (for example, Bugaeva 2022 has a supplement of 80 pages of annotated examples). Interlinear glossed text has a range of applications even independently of the typological claims that the examples illustrate, so this is another obvious improvement in reusability. CrossGram supports two types of typological data: Language parameters that classify entire languages (i.e. parameters of the type known from the maps of WALS and Grambank), and construction parameters that classify constructions. There are many grammatical meanings or functions that can be rendered by multiple constructions in a given language, and if we only consider language parameters, the language must be classified as “mixed” (or a minor construction must be ignored). For example, Kashmiri has both correlative relative clauses and postnominal relative clauses, so both of these strategies could be included in the database and their properties recorded. Stereotypically, typology consists in classifying languages into types, but in reality, languages often have multiple types coexisting with each other, so the addition of constructions and construction types as a data type makes typological databases more fine-grained. Finally, CrossGram parameters (both language parameters and construction parameters) are not only explained succinctly and clearly, but there is also a sophisticated keyword annotation that allows users to easily find grammatical information on a wide range of topics, and for the future, integration with the envisaged “Grammaticon” reference catalogue is planned (see Haspelmath 2022). This will enhance findability and accessibility even further, and it will facilitate replication and (more generally) cumulative science.

3 December

Daria Ryzhova (HSE University), Polina Padalka (HSE University)

YES and NO answers and their synonyms in Shughni

Abstract

Shughni response particles ůn ‘yes’ and nāy ‘no’ may be used in various contexts, including, besides answers to polarity questions, reactions to requests, suggestions, opinions and other types of speech acts. In addition to their usage in a dialogue setting, these particles can function as discourse markers in narratives. In this talk, we will outline the range of their functions and present their synonyms: other response particles (e.g. en ‘yes’, an ‘yes’, nāyo ‘no’, na-a ‘no’) and multiword expressions (discourse formulae). We will show that synonymous items tend to distribute across different discourse functions. For some items, we will trace their presumable pragmaticalization paths.

26 November

Leah Finkelberg (HSE University)

Reading group: Cormac Anderson et al. (2023) Variation in phoneme inventories: quantifying the problem and improving comparability

Abstract

For over a century, the phoneme has played a central role in linguistic research. In recent years, collections of phoneme inventories, originally designed for cross-linguistic purposes, have increasingly been used in comparative studies involving neighbouring disciplines. Despite the extended application of this type of data, there has been no research into its comparability or tests of its reliability. In this study, we carry out a systematic comparison of nine popular phoneme inventory collections. We render them comparable by linking them to standardised formats for the handling of cross-linguistic datasets, develop new measures to test both size and similarity, and release the organised data in supplementary material. We find considerable differences in inventories supposedly representing the same language variety, both in terms of size and transcriptional choices. While some of these differences appear to be predictable, reflecting design decisions in the different collections, much of the observed variation is unsystematic. These results should sound a note of caution for comparative studies based on phoneme inventories, which we suggest need to take the question of comparability more seriously. We make a number of proposals for improving the comparability of phoneme inventories.

19 November

Nina Dobrushina (Laboratoire Dynamique du Langage, CNRS, Lyon), Chiara Naccarato (HSE University) and Samira Verhees (independent researcher)

Discourse in contact: an areal study of wish formulas in Daghestan

Abstract

Research in the domain of language contact and areal studies so far has focussed on the diffusion of lexical, phonological and grammatical elements, and to a smaller extent — lexical semantics (Koptjevskaja-Tamm et al. 2022). Much less is known about the spread of discourse forms, although the mechanisms through which discourse units spread are presumably different from that characterizing the lexicon, phonology and grammar. In this talk, we will look at the diffusion of three discourse formulas across the East Caucasus (46 languages from four families), using elicitation, grammars and dictionaries as source: commemorative formulas, farewell wishes and morning greetings. We look at instances of pattern and matter copying across the languages of the East Caucasus, and analyze their areal distribution. The case studies show that the formulas are diverse (there is great variation across the area), and that both matter and pattern copying are abundant. Some formulas cover very large areas and cross genetic boundaries. In all cases the same spread zones influenced by large lingua francas — Avar (in Central Daghestan) and Azerbaijani (in South Daghestan) — were attested. The distribution of discourse formulas thus shows a very strong areal signal; we cannot come up with any phonological or grammatical phenomena which are spread in the East Caucasus to the same extent. However, there is also evidence of inheritance, especially for those languages that are spoken outside their genealogical area.

12 November

Alina Russkikh (HSE University)

Adnominal Possessive Constructions in Christian Urmi (Neo-Aramaic) from a Typological Perspective

Abstract

The Northeastern Neo-Aramaic (NENA) varieties are a group of dialects or closely related languages of the Semitic branch of the Afro-Asiatic language family. This study aims to distinguish the inventory of adnominal possessive constructions and to describe the formal and syntactic properties of each construction in a particular variety of NENA, the Christian Urmi, from a typological perspective. Data were collected in the village of Urmiya, Krasnodar Region (Russia) and in Verin Dvin, Ararat Province (Armenia) during field trips in 2021, 2022 and 2024. The presentation will also focus on the locus of marking in adnominal possessive constructions and its evolution. Based on the applied syntactic tests, I will propose the new interpretation of the basic construction with the particle ət attached to the head, analyzing its type of marking as detached rather than head marking.

5 November

Konstantin Filatov (HSE University)

Reading group: Alexandre François (2014) Trees waves and linkages: Models of language diversification

Abstract

Contrary to widespread belief, there is no reason to think that language diversification typically follows a tree-like pattern, consisting of a nested series of neat splits. Except for the odd case of language isolation or swift migration and dispersal, the normal situation is for language change to involve multiple events of diffusion across mutually intelligible idiolects in a network, typically distributed into conflicting isoglosses. Insofar as these events of language-internal diffusion are later reflected in descendant languages, the sort of language family they define - a “linkage” (Ross 1988) - is one in which genealogical relations cannot be represented by a tree, but only by a diagram in which subgroups intersect. Non-cladistic models are needed to represent language genealogy, in ways that take into account the common case of linkages and intersecting subgroups. This paper will focus on an approach that combines the precision of the Comparative Method with the realism of the Wave Model. This method, labeled Historical Glottometry, identifies genealogical subgroups in a linkage situation, and assesses their relative strengths based on the distribution of innovations among modern languages. Provided it is applied with the rigour inherent to the Comparative Method, Historical Glottometry should help unravel the genealogical structures of the world’s language families, by acknowledging the role played by linguistic convergence and diffusion in the historical processes of language diversification.

29 October

Polina Nasledskova (HSE University)

Typology of ways of expressing ordinal meaning: work in progress

Abstract

Languages of the world vary with respect to the way they form ordinal numerals: some languages form them using a specialized ordinal marker, some languages form ordinals with a marker that is not specialized, and some languages lack ordinal numerals altogether. In this talk, I am going to present a classification of ordinal markers and other ways of expressing ordinal meaning based on a sample of 100 languages. As this is a work in progress, I am also going to discuss some challenges I am facing while working on this topic.

22 October

Alexander Rostovtsev-Popiel (Mainz University)

Suppletion and Selective Restrictions in the Kartvelian Verb

Abstract

This talk is about suppletion and selectional restrictions in the Kartvelian verb. This phenomenon, albeit well-known and widely acknowledged, has never been subject to a dedicated study. Suppletion in Kartvelian has a number of facets that are distributed among a number of diverse domains of linguistic structure, viz. morphology, morphosyntax, semantics, and social deixis, as well as cross-sections thereof. This talk thus aims to provide a concise overview on the patterns found in Kartvelian and categorize them in scalar format.

15 October

Akhmed Dugrichilov, Taisia Trenikhina, Maksim Melenchenko (HSE University)

The typological database of vigesimal numeral systems

Abstract

In our talk, we present the typological database of vigesimal (base-‘20’) numeral systems in languages of the world, which is currently in development. First, we discuss some theoretical problems of numeral systems, the solutions implemented in existing typological databases (WALS and Grambank) and their shortcomings. Then we describe the details of our approach and the preliminary results of the project. We have created a sample of 256 languages which are claimed to have vigesimal systems by Grambank or WALS and annotated 73 so far, focusing on two linguistic areas with high concentration of base ‘20’: Mesoamerica and Papunesia. We show that the distribution of types of numeral systems differs significantly from the one presented by the Grambank data. This is caused not only by annotation mistakes in Grambank but also by the application of a more strict methodology in our study.

8 October

Igor’ Marchenko (University of Groningen) and Roman Ron’ko (HSE University; Vinogradov Russian Language Institute, RAS)

Database of the Dialectological Atlas of the Russian language and the classification of Russian dialects

Abstract

In this talk, we will provide an overview of the key functions of the Database of the Dialectological Atlas of the Russian Language. Additionally, we will present two case studies that utilize this resource. The first case study assesses the stability of dialects using the database alongside dialect corpora, focusing on the dialect vocabulary spoken in a set of villages in the Zapadnodvinsky district of the Tver region. The second is devoted to the classification of Russian dialects using dialectometric methods, specifically multidimensional scaling (MDS). This study draws on data from the Dialectological Atlas of the Russian Language in its entirety, offering four classifications based on individual linguistic levels—morphology, phonetics, syntax, and lexis—as well as a comprehensive classification that accounts for all linguistic features reflected in the atlas. The primary focus of the study will be on distinguishing the differences between eastern and western Russian dialects based on the database materials, as well as offering a historical interpretation of this dialect division.

1 October

Yury Lander (HSE University)

Differential argument (un)marking: A new survey of alignment in West Circassian

Abstract

West Circassian, also erroneously known as Adyghe, is usually characterized in literature as showing ergative alignment in morphology and possibly even in syntax (Gishev 1985; Kumakhov et al. 1996; Kumakhov & Vamling 2006; Kumakhov & Vamling 2009; Lander 2010; Letuchiy 2012; Ershova 2019 inter alia). In this talk I will show that if we consider differential argument marking (see, e.g., Arkadiev & Testelets 2019) and certain syntactic properties of nominals (such as those discussed in Arkadiev et al. 2009; Lander et al. 2021), the actual situation turns out to be much more complex. No novel data will be provided for scholars of Circassian languages, but I am going to discuss various kinds of pressure in the West Circassian alignment system and the related issues (including the distinction between the “canonical” differential object marking and incorporating processes and the alignment preferences displayed by different kinds of nominals).

24 September

Irina Politova (HSE University)

Reading group: Peter W. Smith et al. (2019) Case and number suppletion in pronouns

Abstract

Suppletion for case and number in pronominal paradigms shows robust patterns across a large, cross-linguistic survey. These patterns are largely, but not entirely, parallel to patterns described in Bobaljik (2012) for suppletion for adjectival degree. Like adjectival degree suppletion along the dimension positive < comparative < superlative, if some element undergoes suppletion for a category X, that element will also undergo suppletion for any category more marked than X on independently established markedness hierarchies for case and number. We argue that the structural account of adjectival suppletive patterns in Bobaljik (2012) extends to pronominal suppletion, on the assumption that case (Caha 2009) and number (Harbour 2011) hierarchies are structurally encoded. In the course of the investigation, we provide evidence against the common view that suppletion obeys a condition of structural (Bobaljik 2012) and/or linear (Embick 2010) adjacency (cf. Merchant 2015; Moskal and Smith 2016), and argue that the full range of facts requires instead a domain-based approach to locality (cf. Moskal 2015b). In the realm of number, suppletion of pronouns behaves as expected, but a handful of examples for suppletion in nouns show a pattern that is initially unexpected, but which is, however, consistent with the overall view if the Number head is also internally structurally complex. Moreover, variation in suppletive patterns for number converges with independent evidence for variation in the internal complexity and markedness of number across languages.

17 September

George Moroz (HSE University), Olga Gich (FEFU), Anna Grishanova (HSE University), Natalia Koshelyuk (HSE University), Chiara Naccarato (HSE University), Anna Panova (HSE University), Anastasia Yakovleva (HSE University), Svetlana Zemicheva (HSE University)

The DiaL2 project: pipeline, results, news and future work

Abstract

There are 24 dialectal and 8 bilingual corpora of Russian at the Linguistic Convergence Laboratory (see the resources page), and more are coming. The DiaL2 project was launched two years ago with an aim to study the linguistic variation found in these corpora. We applied a UDpipe morphological and syntactic parser, manually annotated a set of linguistic features (sometimes relistening the recordings in order to check the transcriptions), and implemented statistical models for each feature that predict the probability of divergence from Standard Russian. During the talk we will discuss our results based on several features:

non-standard marking in numeral constructions (dva dom [two.M house.SG] ‘two houses’);
preposition drop (rodilas’ [v] tridcat’ devjatom godu ‘(she) was born (in) nineteen thirty-nine’);
non-standard marking in negative existential constructions (ranše sadiki ne byli ‘there were no kindergartens before.’).

As possible predictors in the models, we used sociolinguistic features (gender, year of birth, years of education), measures of collocationality, and some relevant linguistic features. During the work we discovered multiple typos, inconsistent and wrong transcriptions, and corrected a lot of them. Therefore, we started a parallel project dedicated to automatic correction of the Lab’s corpora, which will also be discussed during the talk.

25 June

Peter Arkadiev (Freiburg Institute for Advanced Studies)

Towards a typology of passive lability with special reference to Abaza

Abstract

Uncoded passive alternations, also known as passive lability, are only rarely mentioned and discussed in the typological and theoretical literature on passive and voice, despite their being prominent in some language families (e.g. Mande) and linguistic areas (e.g. Western Africa). In this talk I start by discussing the peculiar objective resultative construction in Abaza, a polysynthetic Northwest Caucasian language, and investigate the degree of its similarity to the cross-linguistic prototype of the passive. Given that the Abaza resultative is morphologically unmarked, I argue that it can be considered an instance of passive lability. Further, I propose a preliminary typology of uncoded passives on the basis of a small convenience sample of examples I could gather from the literature. I’ll try to show that this phenomenon is somewhat more widespread than is usually believed and that its cross-linguistic variation largely fits within the typology of “canonical” morphologically marked passives and complements it.

18 June

Ilia Afanasev (HSE University/MTS AI)

A new method for genetic language distance measurement between closely related lects

Abstract

Measuring distance between different language varieties (lects) generally must rely on an extensive linguistic research that includes collecting wordlists and information on evolution of the phonetic system (Campbell, 2013). However, sometimes gathering this kind of data seems to be impossible, due to the lack of material, as the only one researchers stay with is a small sample of remaining texts. Most often this is the case of historical small territorial varieties. This eliminates any possibility of a reliable automatic classification, yet still preserves the possibility of a preliminary one. The talk proposes a new method for measuring language distance between small historical closely-related lects, that is based on the combination of frequency-based methods and string similarity measures, and introduces a corpus-based string similarity measure that intends to imitate more advanced phonetic-based scores. The materials for its evaluations are modern and historical Slavic lects, including Slovak, Slovenian and Croatian standards, Belogornoje, Megra and Zialionka dialects, as well as Novgorod, Smolensk and Polack legal texts of XII – XIV centuries. The key technique used is cross-evaluation with more traditional dialectometry methods, where it is possible. Python implementation of the methods given is available as a Python package.

11 June

Alena Muravyova (HSE University)

Syllable structure in Andic languages: data-driven approach

Abstract

In this talk I will present results of the data-driven research of syllable structure of Andic languages. Despite the fact that the syllable structure is described in every grammatical description of the Andic languages, in my work I try to get the same result using a database of dictionaries (Moroz et al. 2023). To this data, I applied the LexStat method (List 2012) in order to automatically match cognacy and then used the Levenshtein distance search function for comparing syllabic structures of cognates. As a result of a pairwise comparison of cognates from different languages, the general tendency of the Andic languages to prioritize the CV-syllable. What is more, I came to particular conclusions about the beginning of the word (strict openness of the first syllable in Ahvakh, high tolerance of the closed syllable in Botlikh and Andi) and about the end of the word (mainly open syllable in Ahvakh, closed syllable in Botlikh, Chamalal and Bagvalal). In addition, I have separately examined the processes of reducing consonants r, l, m, n, j, w, b in Andic languages. The results I have obtained are of scientific interest, as they testify to the historical processes in the Andic languages.

4 June

Dmitry Ganenkov (Leibniz-Zentrum Allgemeine Sprachwissenschaft)

Causative in Dargwa infinitival constructions

Abstract

In this talk, I report on a work-in-progress concerning the behavior of the morphological causative inside infinitival (obligatory control) constructions in Dargwa, as shown in (1).

nab          [ ħe-zi       ʡinc-bi            d-iʡ-aq-es ]                 dig-ul-ra
  I(dat)      2sg-loc   apple-pl(abs)   n.pl-steal:pf-caus-inf  want-dur-1

‘I want you to steal apples.’

Based on the appearance of example (1), we might expect it to mean ‘I want to make you steal apples.’ However, the presence of the causative construction in the embedded clause is not reflected in the semantics, since the sentence in (1) cannot be understood as expressing a want to cause the stealing. Instead, as the translation of (1) shows, the embedded clause is non-controlled, with the locative “causee” understood as the agent of the embedded event. I present an overview of the phenomenon across Dargwa and concentrate on the details of the construction in Standard/Aqusha Dargwa.

21 May

Maria Kyuseva (University of Surrey), Daria Ryzhova, Ekaterina Rakhilina, Tatiana Reznikova (HSE University)

Parts of the body: New insights into cross-linguistic variation

Abstract

We discuss cross-linguistic variation in the usage of the body part terms within the frame-based approach to lexical typology (Rakhilina & Reznikova 2016). We contribute to the extensive body of research by providing a detailed analysis of one type of contexts in which body part terms appear, i.e., non-semiotic bodily movements, such as to cross the hands behind the back, to cover the face with the hands, etc. We find that different languages use different body part terms to describe the same bodily movement. The choice of the term depends on a range of conditions, including the inventory of the available terms, the сhoice of the linguistic construction, and the degree of conventionalisation. We take this as evidence for an additional aspect of the meaning of a body part term, which has been largely ignored in the previous typological literature. This aspect is the set of semantic restrictions on constructions in which the body part term can appear. We argue that it needs to be addressed in order to ensure the exhaustive cross-linguistic description of this semantic domain.

14 May

Maria Ermolova (HSE University)

On the genesis of the anti-resultative meaning of the pluperfect in the history of Russian

Abstract

As is well known, Russian constructions with the participle было (пошел было, но вернулся) go back to the Old Russian pluperfect with the anti-resultative meaning. Denoting the anti-resultative meaning is one of the main secondary functions of the pluperfect in the languages of the world. V.A. Plungian and D.V. Sitchinava, based on typological data, connect the emergence of the anti-resultative with different pluperfect meanings from the semantic zone of the discontinuous past. However, data from the Russian chronicles, as well as live dialect data, show that the anti-resultative meaning in the history of Russian could develop not from the meaning of the discontinuous past, but from the resultative meaning.

23 April

Anna Golovina and Ksenia Dunaeva (HSE University)

Using transducers to create morphological parsers and other NLP tools for Nakh-Daghestanian languages

Abstract

Our talk is dedicated to creating morphological parsers for low-resource Nakh-Daghestanian languages. Morphological parsers can be created based on either processing of a set of grammatical rules of the language or probabilistic models underlying neural networks. The latter are not suitable for languages with a small collection of annotated texts. A finite-state transducer is a rule-based parser that can be defined as a type of finite-state automata with two input tapes. Whereas ordinary finite-state automata can merely determine whether a concrete string belongs to the described regular language, a transducer maps between two sets of symbols: input symbols and output symbols. The transducer makes correspondence between a surface word form and a string with morphological analysis. Building a two-level rule-based parser requires combining a minimum of two different finite-state transducers: one for lexicon storing and morphotactics modeling and another for implementing morphophonological rules. In recent years, morphological parsers based on transducers were implemented for a wide range of East Caucasian languages, including Tsez (Wilson & Howell, 2022), Andi Proper (Buntiakova 2023) and Zilo Andi (Moroz 2022), Bagvalal (Ignatiev 2022), and some others. Our talk will be focused on building parsers for Avar and Bezhta Proper. We will discuss in detail the tools that can be used to create a morphological transducer, the difficulties that one may encounter while computationally modeling the morphology and morphophonology of Nakh-Daghestanian languages, the projects that are already being implemented at the Higher School of Economics, and the future prospects for using rule-based morphological parsers.

16 April

Konstantin Filatov (HSE University)

Reading group: Rik van Gijn & Max Wahlström (2023) Linguistic areas

Abstract

Linguistic area research has received ample attention in the last century. Nevertheless, methodology remains somewhat underdeveloped, and there seem to be few, if any, generalizations about the relation between the processes underlying area formation and their outcomes. The main challenge is that, in most cases, the past is not directly accessible and therefore has to be reconstructed. Linguistic area research, therefore, stands to gain immensely from a firm embedding into a framework that includes both other strands of contact linguistics and extra-linguistic disciplines to complete the picture.

9 April

Rita Popova (HSE University) and Michael Daniel (Collegium de Lyon / Laboratoire Dynamique du Langage)

Number-conditioned stem alternation in adjectives of size (East Caucasian supplement)

Abstract

In our previous talk at the Lab, we showed that, across the world, property words denoting size may use different stems depending on the number of the entities they are predicated of; and that, in a cross-linguistic perspective, they do so visibly more often than other property words (Popova and Daniel in prep.; see also Nurmio 2017). This pattern of stem alternation for ‘small’ has also been observed in East Caucasian languages (Yakovlev 1960 on Chechen, Kibrik & Kodzasov 1988 on Lak, Azaev 2000 on Botlikh, Kibrik 1999 on Tsakhur, and arguably Nichols 2011 on Ingush).

Notably, the phenomenon has been reported in areally separated (south, central, northwest) languages belonging to different branches (Lezgic, Lak, Andic and Nakh); but the lexical items involved do not always seem to be cognate. It thus seems another case of a cross-linguistically rare phenomenon that emerged as a result of parallel independent development (cf. Daniel & Maisak 2014 on verificative, Nasledskova & Netkachev, under review on ordinal numerals, Daniel 2017 on “person by other means”). Such phenomena are notoriously challenging for an evolutionary interpretation.

It is not obvious that we deal here with suppletion in number. In the languages under study, number agreement is not necessarily present in adjectives, so that the two forms may not belong to one inflectional paradigm. Instead, we will call this pattern number-driven dislexification (cf. François 2022) in the sense that two meanings, ‘small (of one)’ and ‘small (of many)’, commonly expressed by the same lexical item, are split into two different lexical items (cf. similar approach to verbal number in Durie 1986, Mithun 1988, François 2019). Strikingly, while the phenomenon is attested in individual languages dispersed across different branches, and thus could be suspected of being inherited, the cognacy of the alternating stem is not obvious. In some of these languages ‘small (of many)’ is recruited from ‘fine-grained (of e.g. sand)’ (a component structure adjective in terms of Maiden 2014, Nurmio 2017, who discuss the same path of emergence of number suppletion is discussed). We hypothesized that the development of number dislexification emerged through gradual lexical extension of ‘fine-grained’, from more to less mass-like nouns. To test this claim, we ran an online elicitation test, collecting data from each but two languages of the family, in most cases from several respondents. We investigated lexical preferences for expressing the meaning of ‘small (of many)’, expecting to find cases intermediate between languages that only apply ‘fine grained’ to masses and languages that apply it to all plural nouns. We used similar number contexts for ‘big’ as fillers.

Our expectation was partially borne out. In addition to this, we also had some less expected results. We noticed that the phenomenon of number dislexification has a wider spread than reported in the literature in terms of languages involved. We noticed that, in some languages, the dislexified adjective for ‘small (of many)’ and the adjective for ‘fine-grained’ may be unrelated, at least synchronically. We noticed that, notwithstanding plural reference of the noun, ‘small (of one)’ may be induced by the use of the singular form in NPs modified by a numeral - a switch more expected under a grammatical (suppletion) rather than lexical (dislexification) perspective on the phenomenon. Finally, we discovered the same phenomenon present, even if more rarely, in adjectives for ‘big’, our intended fillers.

We expected that a more nuanced model distinguishing between – (a) absence of lexical extension of ‘fine-grained’ altogether, (b) presence of such an extension as a preference and (c) a strict dislexification – complemented by historical analysis of cognacy of the items could provide a more feasible explanation of the distribution of the phenomenon of number dislexification across the family in genealogical or contact terms. So far, our results are not conclusive in this respect.

2 April

Matthew Sung and Jelena Prokić (Leiden University)

Recent Developments in Quantitative Approaches to Linguistic Micro-Variation

Abstract

Dialectometry is the quantitative branch of dialectology which utilizes computational methods to calculate linguistic distances and generate visualizations which allow us to explore relationships between dialects. Although dialectometry is a growing field with an increasing number of new approaches, some corners in dialectal variation are still rather unexplored. Firstly, dialectometry is a popular method in Europe, but not so much in other parts of the world. It is unclear whether our findings of dialectal variation in Europe (e.g. the existence of a dialect continuum, the specific dynamics of dialect spread) are also found in other corners of the world. Secondly, most of the work on phonetic variation is based on segments, while most of the world’s languages are tonal (Yip 2002). It is unclear how dialects vary on the tonal level. Lastly, the outcome of a dialectometric analysis is usually a classification of dialects, but the features that contribute to this classification, i.e. dialect features which are exclusive to certain groups, are not explored in these classifications. In this talk, we would like to address the issues raised above based on our latest work done in Leiden.

26 March

Irene Gorbunova (Russian State University for the Humanities)

Nominal Causal Constructions in Khwarshi Proper

Abstract

In this talk I will address the various ways of nominal cause marking in Khwarshi Proper (a dialect of Khwarshi < Tsezic < East Caucasian). The data was collected in Daghestan during several field trips in 2022-2023, the research was based on (but not limited to) the NoCaCoDa project questionnaire. The grammar descriptions of Khwarshi mention the causal case, which seems to be a unique feature of Khwarshi as compared to other (West) Tsezic languages. Even more peculiar is the fact that the causal case in Khwarshi, albeit attested, is not the default option for marking the nominal cause: rather, a designated postposition or spatial case forms are used. Furthermore, the designated postposition, as well as the spatial case most commonly used for cause marking, both show unexpected semantic shifts under certain predicates.

19 March

Насиб Амирхан-оглы Искандаров (м.н.с., ФГБНУ “МГНЦ”)

Истоки генофонда народов Восточного Кавказа: вклад автохтонного населения бронзового века и миграций из Передней Азии по данным Y-хромосомы

Abstract

Генофонд Восточного Кавказа, включающий народы Азербайджана и Дагестана, систематически охарактеризован по единой панели 83 SNP-маркеров Y-хромосомы в контексте населения окружающих регионов. Анализ генетических расстояний между 18 популяциями (N=2216) нахско-дагестанской, алтайской и индоевропейской языковых семей выявляют три компонента генофонда - «степной», «иранский» и «дагестанский» - с разным весом вклада в генофонд и в разные периоды его формирования. «Степной» компонент выражен только у караногайцев и отражает хронологически самую позднюю волну миграций – тюркоязычных кочевников Евразийской степи в средние века. «Иранский» компонент выражен в генофонде азербайджанцев Азербайджана и Дагестана, табасаран Дагестана и всех ираноязычных народов Кавказа. «Дагестанский» компонент преобладает во всех популяциях, говорящих на дагестанских языках (кроме табасаран), и у тюркоязычных кумыков. Каждая компонента связана с определенным комплексом Y-гаплогрупп: «степной» комплекс - C-M217, N-LLY22g, R1b-M73 и R1a-M198; «иранский» комплекс - J2-M172(×M67,M12) и R1b-M269, «дагестанский» комплекс - J1-Y3495. Выдвинута гипотеза, что гаплогруппа J1-Y3495 возникла 6,5±0,6 kya в автохтонной прапопуляции центральной части Дагестана. Около 6 kya она подразделилась на две основные линии: J1-ZS3114 (с максимумом у народов даргинской, лакской, лезгинской языковых ветвей) и J1-CTS1460 (с максимумом у народов для аваро-андо-цезской языковой ветви) с ее дальнейшим ветвлением около 4-5 kya. Результаты анализа филогеографии J1-Y3495 в контексте данных археологии и палеоДНК указывают на рост численности населения на территории Дагестана, начиная с бронзового века, расселение и дальнейшую микроэволюцию подразделенной популяции.

12 March

Maksim Melenchenko (HSE University)

Reading group: Martin Haspelmath (2023) Inflection and derivation as traditional comparative concepts

Abstract

This article revisits the distinction between inflectional and derivational patterns in general grammar and discusses the possibility that this well-known distinction is not rooted in the reality of languages, but in the Western tradition of describing languages, through dictionaries (for words, including derived lexemes) and through grammar books (where we often find tables of exemplary paradigms). This tradition has led to rather different terminological treatments of the two kinds of patterns, but from the perspective of a constructional view of morphology, there is no need to incorporate such differences into formal grammatical descriptions. For practical purposes, we need clear and simple definitions of entrenched terms of general linguistics, so the article proposes semantically based (retro-) definitions of inflection, derivation and lexeme that cover the bulk of the existing usage. Finally, I briefly explain why we need sharp definitions of comparative concepts, and why prototype-based and fuzzy definitions of traditional terms are not helpful.

5 March

John Mansfield (University of Zurich)

When social contact promotes diversification

Abstract

In much linguistic literature, small, socially isolated speech communities are the main locus of diversification and grammatical complexity (e.g. Trudgill 2011). Similarly, linguistic differentiation is traditionally viewed as resulting from social separation of groups (Paul 1888), while intensive social contact between groups can lead to structural convergence of their languages (e.g. Gumperz & Wilson 1971; Ross 1996). However, sociolinguistic literature shows that social groups in regular contact use language as a way of developing and maintaining distinct group identities (Eckert 2008), and in regions with many small ethnic groups this can drive diversification (François 2011; Evans 2019; Epps 2020), a kind of ‘sympatric speciation’ in linguistic evolution. In this presentation I consider evidence for contact-driven diversification, paying particular attention to which dimensions of language may be used to index group identity. I present a cross-linguistic database on dialect differentiation, which analyses grammatical variation and dialect differences in 42 languages, drawing on data from reference grammars. The main finding is that grammatical ordering very rarely differentiates dialects in close contact, but the form of grammatical markers (affixes, clitics and function words) frequently does differentiate dialects in close contact.

27 February

Silvia Luraghi (University of Pavia) and Chiara Zanchi (University of Pavia)

Introducing PaVeDa – The Pavia Verb Database

Abstract

PaVeDa – The Pavia Verb Database is the focus of the project “Verbs’ constructional patterns across languages: a multi-dimensional investigation”, a joint enterprise of two teams of researchers from the Universities of Pavia and Naples “Federico II.” PaVeDa is an open-source relational database for investigating verb argument structure across languages (Zanchi et al. 2022), intending to expand and enhance the Valency Patterns Leipzig (ValPaL) database (Hartmann et al. 2013), developed within the Leipzig Valency Classes Project, which carried out a large-scale cross-linguistic comparison of valency classes. The project relied on a group of contributors, who collaborated by providing a consistent set of cross-linguistic data. The online database ValPaL contains data from 36 languages, based on a database questionnaire for a selected sample of 80 verb meanings. Apart from valency frames, contributors provided information about possible alternations, both uncoded and coded.

In spite of the research carried out within the ValPaL project, no systematic comparative study on diachronic developments across languages is available. The PaVeDa project intends to expand and enhance the ValPaL database with more languages and further features and is configured to contrastively display valency patterns simultaneously in different languages. Within this project, the Pavia team cooperates with a number of international partners who provide sets of data for the new languages uploaded in the database. For the time being, the datasets from several ancient languages (Old Latin, Ancient Greek, Gothic, Old English, Classical Armenian, Old High German) and modern languages (Modern Greek) have been uploaded in the database, along with the modern languages stored in the ValPaL database. As for the additional features, an intermediate level of annotation to the original ValPal have been added, the alternation class, which categorizes language-specific alternations into four cross-linguistic types (valency re-arranging, valency augmenting, valency decreasing, argument identifying), and serves as the initial comparative tool. While the ValPaL database does not allow for contrastive visualization of constructions across the languages it contains, developers of the PaVeDa database designed a special layer of annotation that allows generalizing over language-specific patterns, and makes them visually comparable. Work on ancient languages also brought to methodology redesign, as ancient languages can only be studied based on corpus data rather than relying on the native speakers’ knowledge. This practice brings about a usage-based methodology that we have started implementing for modern languages too, linking the data on constructional patterns to existing digitalized corpora. In the near future, we aim to further develop both typological and diachronic comparison by adding more languages, both ancient and modern, from language families already represented in the ValPal database (Indo-European and Afro-Asiatic), as well as from families that are not represented (Uralic and Turkic).

20 February

Nina Zdorova (HSE University), Olga Parshina (HSE University), Bela Ogly (HSE University), Irina Bagirokova (HSE, IL RAS), Ekaterina Krasikova (HSE University), Shamset Unarokova (Adyghe State University), Aida Bguasheva (Adyghe State University), Maria Rodina (HSE University), Susanna Makerova (Adyghe State University), Olga Dragoy (HSE, IL RAS)

Language processing while reading in Adyghe: evidence from eye-tracking studies

Abstract

A bulk of psycholinguistic research is dedicated to eye movements while reading that reflect online language processing. Yet, little is known about language processing in polysynthetic languages. The talk will cover the latest eye-tracking studies of language processing while reading sentences in Adyghe conducted by the researchers from the Center for Language and Brain HSE University, Moscow, together with colleagues from the Laboratory of Experimental Linguistics of the Adyghe State University, Maykop. Experimental studies in focus answer fundamental questions about language processing like 1. What features of language processing are observed while reading in a polysynthetic language (Adyghe) compared to reading in other languages? 2. How does language processing change depending on morphosyntactic features when reading in Adyghe? Apart from that, some ongoing research projects of text reading in Adyghe will be presented.

13 February

Svetlana Zemicheva (HSE University)

Reading group: Philipp Stöcklec (2023) “Dialect areas and contact dialectology” in Language contact: Bridging the gap between individual interactions and areal patterns

Abstract

Spatial variation of language has been researched qualitatively and quantitatively for at least 150 years by different sub-disciplines of linguistics, each defining differently what dialects and dialect areas are. Linguists agree, however, that the concept of dialect is vague and the extent of a dialect is fuzzy. With contact being a crucial driver of linguistic change at sublanguage levels, we attempt to sketch the perspective that contact dialectology and related sub-disciplines can offer on this fuzziness with regard to the spatial variation of dialects and dialect areas. Thus we address contact processes and patterns characterizing individuals, groups, communities, areas and beyond, at temporal scales spanning from mundane contact through generations to deeper time enough for dialects to diverge and disappear.

6 February

Maria Khachaturyan (University of Helsinki), Maria Konoshenko (University of Helsinki), George Moroz (HSE University) and Valentin Vydrin (INALCO)

Valency patterns in Mande: contact vs inheritance

Abstract

In our talk, we address valency patterns in seven Mande languages of different genealogical subgroupings. Our study is based on the BivalTyp questionnaire focusing on 130 two-place predicates (Say 2018). We explore areal and genealogical factors in valency expression. While belonging to two distinct genealogical groupings, two languages of the set, i.e. Mano (Southern Mande) and Kpelle (Southwestern Mande), are in intense contact with one another (Khachaturyan & Konoshenko 2021). We investigate to what extent the synchronic patterning of valency expression in the data can be accounted for by contact and / or inheritance. We found that on the basis of the lexical equivalents for a given predicate, the languages are distributed strictly following the genealogical principle, and Mano clusters together with other Southern Mande languages. Yet the type of construction chosen for a particular predicate, as well as, for intransitive constructions, the postposition introducing the second argument, are subject to areal influence, with Mano clustering together with its Southwestern neighbors, Kpelle and Kono, and not with its closest genealogical Southern Mande relatives, Guro and Dan-Gweetaa. In addition, the structure of complex verbs in Mano resembles more that of Kpelle and Kono than that of Guro and Dan-Gweetaa. Thus, although Mano verbal forms are virtually unaffected by contact, the patterns of valency expression, as well as verbal lexical patterns are strongly influenced by neighboring languages. While this is by far not the only study showing predominance of pattern-borrowing in multilingual settings (Epps 2008; François 2011, inter alia), it showcases argument coding as a useful parameter for a comparative study of both pattern and matter.

30 January

Polina Padalka (HSE University)

Reading group: Blum, F., Barrientos, C., Ingunza, A. et al. Grammars Across Time Analyzed (GATA): a dataset of 52 languages. Sci Data 10, 835 (2023). https://doi.org/10.1038/s41597-023-02659-1

Abstract

Grammars Across Time Analyzed (GATA) is a resource capturing two snapshots of the grammatical structure of a diverse range of languages separated in time, aimed at furthering research on historical linguistics, language evolution, and cultural change. GATA comprises grammatical information on 52 diverse languages across all continents, featuring morphological, syntactic, and phonological information based on published grammars of the same language at two different time points. Here we introduce the coding scheme and design features of GATA, and we describe some salient patterns related to language change and the coverage of grammatical descriptions over time.

23 January

Natalia Stoynova (University of Hamburg)

Morphosyntactic variation in Evenki dialects: A corpus-based study of argument encoding

Abstract

The paper deals with variation in argument encoding attested in Evenki dialects. Evenki (Tungusic) is spoken in a very large area throughout Siberia and manifests a great dialectal diversity. I consider 10 verbs with variable valency patterns across 15 Evenki dialects. The main data were obtained from two corpora of Evenki, supplementary data come from available descriptions. The cluster analysis of variation in argument encoding shows the following results. Clusters of dialects based on valency patterns do not match with the existing classification of Evenki dialects. At the same time, they correlate much better with their areal distribution and with presence/absence of contact with other languages. This supports the claim that valency patterns are diachronically unstable and tend to be easily borrowed.

16 January

Konstantin Filatov, George Moroz, Chiara Naccarato, Elena Shvedova (HSE University)

TALD Update 2024

Abstract

The seminar will be devoted to an informal discussion of the current version of the TALD (Typological Atlas of the Languages of Daghestan) website, which you can find here: https://lingconlab.github.io/TALD/ We invite you to take a look at the website before the seminar, so we can discuss together the most urgent questions to be solved before the second official release, which is (hopefully) going to take place soon.

Seminar schedule 2023

19 December

Benedikt Szmrecsanyi (KU Leuven)

Variation-Based Distance and Similarity Modeling: Varieties of English and beyond

Abstract

Inspired by work in comparative sociolinguistics and quantitative dialectometry, I will sketch a corpus-based method (Variation-Based Distance & Similarity Modeling — VADIS for short) to rigorously quantify the similarity between varieties and dialects as a function of the correspondence of the ways in which language users choose between different ways of saying the same thing. To showcase the potential of the method, I present a case study that investigates three syntactic alternations in some nine international varieties of English.

12 December

Natalia Logvinova (HSE, ILI RAS)

Concord in Russian close appositional constructions: a quantitative study

Abstract

In this talk I will discuss case concord in Russian close appositional constructions, which manifests itself in optional case concord of the proper name (v rek-e Don-e/ v rek-e Don ‘in the river Don’). The study provides an in-depth corpus analysis of more than 15,000 examples, using a logistic regression statistical model to predict the choice between presence and absence of concord. The results indicate concord is most likely to occur in constructions with structurally simple and frequent proper names that exhibit adjectival properties and match the common noun in grammatical gender. Proper names with the Goal semantic role show concord with a higher probability than proper names with other roles. It is proposed that all relevant factors refer to frequency or convenience. A diachronic investigation shows that concord has become a much less preferred option over time. It is argued that concord is of low functional significance and suggest that this may explain the gradual loss of concord over time.

Polina Nasledskova (HSE, IL RAS)

Topological relations in Kina Rutul

Abstract

Kina variety of Rutul (< Lezgic < East Caucasian) has several different means for describing spatial relations. In this study, I analyze the way topological relations are described in Kina Rutul by means of spatial cases, spatial adverbs and postpositions, and spatial verbal prefixes. The data for this study was collected in field in 2019 and is based on a questionnaire “Topological relations picture series” by Bowerman&Pederson (1992). This talk depicts my first attempt at analyzing the collected data and this work is still in progress. My main objective is to determine in which contexts the spatial meanings of case, adverb/postposition and verbal prefix are different and what aspects of topological relations each of these elements relate to.

5 December

Sofia Oskolskaya (ILS RAS, HSE), Anna Smetina (IL RAS), Natalya Stoynova (University of Hamburg)

Analysis of the Gorin Nanai texts from A. P. Putintseva’s text collection (1935-1936)

Abstract

Gorin is the most northern Nanai dialect which is spread along the Gorin river, to the North from Komsomolsk-on-Amur. The ancestors of the Gorin speakers used to speak a Northern Tungusic language and shifted to Nanai about 150 years ago. Gorin Nanai is highly endangered and very poor-documented. A. P. Putintseva collected 18 notebooks of texts in Nanai during her work in 1935-1936. More than a half of the texts were recorded from Gorin Nanai speakers. Her manuscripts contain a lot of her own corrections. In our talk, we will focus on analysis of these corrections. We believe that some of them may reveal underdescribed dialectal features of Gorin Nanai.

Andrey Chirkin (HSE University)

Reading group: Kilu von Prince (2019) Counterfactuality and past

Abstract

Many languages have past-and-counterfactuality markers such as English simple past. There have been various attempts to find a common definition for both uses, but I will argue in this paper that they all have problems with (a) ruling out unacceptable interpretations, or (b) accounting for the contrary-to-fact implicature of counterfactual conditionals, or (c) predicting the observed cross-linguistic variation, or a combination thereof. By combining insights from two basic lines of reasoning, I will propose a simple and transparent approach that solves all the observed problems and offers a new understanding of the concept of counterfactuality.

28 November

Vladimir Plungian (MSU, IL RAS, RLI, HSE)

Quechua “restrictive” marker =lla: semantic and morphosyntactic properties

Abstract

“Restrictive” markers (like Latin solum, English only etc.) represent an important type of units involved in the organization of discourse: being almost pervasive in the world’s languages, they often have unobvious patterns of polysemy, as well as non-trivial morphosyntactic properties.

Quechua is no exception to this phenomenon. The restrictive marker =lla is attested in all Quechuan idioms (surfacing in slightly different phonetic shape). My talk addresses the variety of its uses mostly in Ecuadorian Quechua (or Kichwa) and elaborates on two main features of interest. Firstly, =lla exhibits a surprisingly wide polysemy in non-nominal domain, ranging from diminutive to focus-contrastive uses. Secondly, it displays a unique morphosyntactic behavior that suggests an intermediate status between a mesoclitic and an affix. I will present the main facts and discuss possible ways of accounting for these intriguing properties.

Arkadiev, Peter M. 2010. Notes on the Lithuanian restrictive. Baltic Linguistics 1, 9–49.

Grzech, Karolina. 2016. Discourse Enclitics in Tena Kichwa: A Corpus-Based Account of Information Structure and Epistemic Meaning. London: SOAS.

Myler, Neil. 2009. Linearization and post-syntactic operations in the Quechua DP. Cambridge Occasional Papers in Linguistics 5, 46–66.

Ricca, Davide. 2017. Meaning both ‘also’ and ‘only’? The intriguing polysemy of Old Italian pur(e). In: Anna-Maria De Cesare & Cecilia Andorno (eds.), Focus on Additivity: Adverbial modifiers in Romance, Germanic and Slavic languages, 45–76. Amsterdam: John Benjamins.

Stoynova, Natalia. 2021. A nonstandard type of affix reordering: The restrictive kə̄n in Ulcha. Studies in Language 46 (1), 1–39.

Tellings, Jos. 2014. Only and focus in Imbabura Quichua. Annual Meeting of the Berkeley Linguistics Society 40, 523–544.

Пекелис, Ольга. 2021. Один в значении ‘только’: синтаксис и семантика в синхронии и диахронии. Jezikoslovni Zapiski 27 (2), 143–155.

Ivan Osorgin (HSE University), Konstantin Filatov (HSE, ILS RAS)

Parabible: a researcher’s tool for small-scale parallel Bible studies

Abstract

This talk summarizes the recent progress in the Parabible project. The machine-readable Massively Parallel Bible Corpus of Mayers & Cysouw (2014) is specifically designed for large scale quantitative analysis of Scripture, especially for purposes of grammatical typology. However, using this database as is, seems to be quite inconvenient for small-scale qualitative research. Our main aim for creating the Parabible tool was to facilitate the use of the database for unsophisticated researchers. We will present the current state of affairs, as well as discuss future paths of development.

21 November

Asya Alekseeva (HSE University)

Inclusive/exclusive distinction in personal pronouns in East Caucasian languages

Abstract

In this talk I will present the results upon my project in TALD on inclusive/exclusive distinction in personal pronouns. I will show the idioms where there is such a distinction and where there is none. Also the morphological relation between the pronouns will be taken into account: are the forms for 1PL related to ones for 1SG or, probably, for 2SG or 2PL? In addition, I will say a couple of words about the diachrony of personal pronouns systems in East Caucasian languages.

Anastasiya Ivanova (HSE University)

Question marking strategies in East Caucasian languages

Abstract

In East Caucasian languages, various strategies are used for coding questions. During the talk, we will discuss the data on question marking, which was collected for TALD. Both polar and content questions, as well as interrogative and indirect questions will be taken into accoun. Additionally, we will briefly address meditative questions (a distinct semantic type of (non-canonical) questions often posed in the absence of an addressee and within the speaker’s inner speech) and the issues encountered during the data collection process.

Natalia Koshelyuk (HSE University)

LingvoDoc as a system for documenting and analyzing languages

Abstract

In this talk I will present the LingvoDoc platform, a multifunctional linguistic system designed for compiling, analyzing and storing dictionaries and corpora of various languages and dialects. It was developed under the guidance of Yu. V. Normanskaya and programmers of ISP RAS in 2012 as one of the electronic libraries of endangered languages. But with time, it became possible to conduct phonological, morphological, lexical and other types of analysis of linguistic data using special tools installed on the LingvoDoc. During the talk, I will give an idea of what features this platform has, what options and tools are installed in the system and how else it can be useful to researchers.

14 November

Nina Sumbatova (HSE, IL RAS), Svetlana Toldova (HSE University)

Accessibility and morphological complexity: locative forms in Dargwa

Abstract

In many works, there is a discussion on the connection between certain sociolinguistic and even geographical characteristics of languages and the complexity of their phonological and/or morphological systems. This talk presents a case study where we check a possible connection of this type.

The data representing morphological complexity are systems of locative forms of nouns in 33 lects of the Dargwa language group (Nakh-Dagestanian). As a simple correlate of the complexity parameter, we used the number of locative series, the number of nominal forms in each series and the total number of locative forms.

The geographical characteristics to be checked is the accessibility of the villages where these lects are spoken. We used several estimates of inaccessibility like height above sea level, distance from the administrative center, and a complex parameter that takes into account several characteristics.

The results are moderately positive: the correlation in question exists, but its interpretation is not obvious and requires a serious discussion.

7 November

Timur Maisak (IL RAS, HSE)

‘What’s your name?’ in Daghestan and beyond

Abstract

Inspired by the paper “‘What’s your name?’ in Tungusic and beyond” by Andreas Hölzl (2022, DOI: 10.5281/zenodo.7053365), I look at the translations of three sentences related to personal names: ‘What is your name?’, ‘My name is [X]’ and ‘The boy (or: girl) was given the name [X]’. My focus will be on the morphosyntactic structure of these sentences in the languages of Daghestan, in particular on the parameters of variation. The approach is close to that of TALD (http://lingconlab.ru/dagatlas/), although at the same time different: my sample is smaller, and the stimuli sentences were elicited from native speakers rather than extracted from secondary sources.

Yury Koryakov (IL RAS, HSE)

Reading group: Paul Heggarty et al.(2023) Language trees with sampled ancestors support a hybrid model for the origin of Indo-European languages

Abstract

The origins of the Indo-European language family are hotly disputed. Bayesian phylogenetic analyses of core vocabulary have produced conflicting results, with some supporting a farming expansion out of Anatolia ~9000 years before present (yr B.P.), while others support a spread with horse-based pastoralism out of the Pontic-Caspian Steppe ~6000 yr B.P. Here we present an extensive database of Indo-European core vocabulary that eliminates past inconsistencies in cognate coding. Ancestry-enabled phylogenetic analysis of this dataset indicates that few ancient languages are direct ancestors of modern clades and produces a root age of ~8120 yr B.P. for the family. Although this date is not consistent with the Steppe hypothesis, it does not rule out an initial homeland south of the Caucasus, with a subsequent branch northward onto the steppe and then across Europe. We reconcile this hybrid hypothesis with recently published ancient DNA evidence from the steppe and the northern Fertile Crescent.

31 October

Zaira Khalilova (IL RAS, HSE)

The desiderative in Bezhta

Abstract

In this talk, I will analyze the desiderative in Bezhta. The Nakh-Daghestanian languages express desideration using several strategies, like verbs and nouns. Special desiderative morphology, which is a rare phenomenon in the area, is present in two Nakh-Daghestanian languages, Hunzib and Bezhta. Additionally, Bezhta exhibits an unusual causative reading in the desiderative construction with different subjects. This study is the first report on the Bezhta desiderative. It reveals morphosyntactic and semantic properties of desiderative construction formed with the dedicated suffix.

Elena Shvedova (HSE University)

The rise and fall of the Aramaic anticausative marker

Abstract

Valence orientation is generally believed to be a diachronically stable parameter (Nichols et al. 2004; Comrie 2006), but the history of Aramaic languages challenges this view. While the causative marking has been stable and productive in these idioms for several millennia, the anticausative coding has undergone dramatic changes. It was very common in Old and Middle Aramaic, but has been completely lost in most Modern Aramaic varieties due to phonetic assimilation or for other as-yet unidentified reasons. In my study, after defining the class of the verbal concepts that were typically coded with anticausative marker in Old and Middle Aramaic, I explore how these meanings are expressed in Neo-Aramaic. The data shows that the large class of Neo-Aramaic labile verbs can be separated into two groups: those that have been labile since the first millennium BCE and those whose lability is a novel Neo-Aramaic trait that has replaced anticausative marking.

Comrie, Bernard. 2006. Transitivity pairs, markedness, and diachronic stability. Linguistics 44(2). 303–318.

Nichols, Johanna, David A. Peterson & Jonathan Barnes. 2004. Transitivizing and detransitivizing languages. Linguistic Typology 8(2). 149–211.

24 October

Asya Alekseeva (HSE University), Nikita Beklemishev (Universität Tübingen / Leibniz-Zentrum Allgemeine Sprachwissenschaft (ZAS)), Michael Daniel (Collegium de Lyon / Laboratoire Dynamique du Langage), Nina Dobrushina (CNRS), Anastasiya Ivanova (HSE University), Konstantin Filatov (HSE / ILS RAS), Timur Maisak (HSE / IL RAS), Maksim Melenchenko (HSE University), Ivan Netkachev (HSE University), George Moroz (HSE University), Ilya Sadakov (HSE University)

Atlas of Rutul dialects

Abstract

The purpose of this talk is to show the Atlas of Rutul dialects, a database with visualized grammatical features for 12 Rutul idioms. Traditionally, five dialects of Rutul (Lezgic < Nakh-Dagestanian) are distinguished: Mukhad, Shinaz, Myukhrek, Ikhrek and Borch-Khnov, and so called “mixed” dialect for a number of villages (Ibragimov 2004). During our trip to South Dagestan in July 2022, we visited 12 Rutul villages. Everyone from our team gathered their questionnaire from at least two speakers in each village. The questionnaires were designed for different domains of Rutul phonology, morphology (nominal, verbal, pronominal, etc.), vocabulary and two discourse formulas. In this talk we will present some preliminary results of our work, and also we will talk about some difficulties and problems which arose during data processing.

Polina Bychkova (University of Ljubljana), Daria Ryzhova, Polina Padalka, Aleksandra Martynenkova (HSE University)

Multilingual Pragmaticon: towards the typology of pragmaticalization

Abstract

In this talk, we will present a database of discourse formulae —` idiomatic multiword constructions serving as positive or negative answers to the previous utterance (cf. Be my guest or By no means). By the moment, the database contains more than 2000 items from 10 languages. We describe their pragmatic functions and reconstruct constructions they emerge from, classifying their source semantics. This way we attempt to reveal recurrent paths of pragmaticalization within the domain of ‘yes’ and ‘no’ answers.

17 October

Anastasia Yakovleva, Natalia Koshelyuk, George Moroz (HSE University)

Preposition drop in bilingual and standard speakers of Russian: A corpus-based study

Abstract

In this talk, we present our corpus-based study of preposition drop in the speech of Mari-Russian and Besermyan Russian bilinguals compared with the speech of monolinguals. On the basis of the data from the ConLab’s collection of spoken corpora (cf. http://lingconlab.ru/resources.html), we demonstrate that the preposition ‘v’ is omitted in the speech of bilinguals more often than in monolingual speech and propose some possible explanations for the variation across different bilingual speakers. We will also highlight some methodological problems of p-drop studies and discuss ways of solving such problems.

Aleksandra Martynenkova (HSE University)

Reading group: Judith Aissen (2023). Documenting topic and focus

Abstract

In the paper, four information structure relations are discussed: two types of topic (non-contrastive and contrastive) and two types of focus (information focus and contrastive focus). All four depend crucially on discourse context. Although topic and focus are sometimes viewed as complementary relations, they belong to distinct dimensions of information structure, with one (focus) having to do with the locus of new information in an utterance, and the other (topic) with the entity that the utterance is about. The following questions are specified in the article:

What is the nature of the relation?

How is it marked cross-linguistically?

How can it be elicited and documented?

The utility of various techniques for documenting these relations is observed, including the study of naturally occurring speech, the use of constructed contexts, and the role of elicitation.

10 October

Maria Starodubtseva (HSE University)

Distinction between transitive and intransitive imperatives in Daghestanian languages

Abstract

Formation of imperative and its connection to transitivity is relatively well studied for the Nakh-Daghestanian family at the level of individual languages, but no areal view is presented so far. The aim of this research is to describe the formal distinction between transitive and intransitive imperatives on the basis of Nakh-Daghestanian languages. A significant number of languages of the family show this distinction in morphology. Transitivity may also bear on the marking of the number of addresses. The chapter contains two interactive maps based on the data collected from the grammars. The current research has been carried out within the framework of the TALD project.

Vasiliy Zerzele (HSE University)

Plural marking of imperatives and prohibitives in the languages of Daghestan

Abstract

My study within the TALD (Typological Atlas of the Languages of Daghestan) project is concerned with plural marking on imperatives and prohibitives among the languages of Daghestan. I will discuss the types of plural marking (found among ~55% of Daghestanian languages) and the problems with classifying these markers. Then I will discuss the same classification issues with prohibitives, as well as look at examples of languages with a transitive split. Finally, I will summarize it by discussing the geographical distribution of these features within Daghestan.

Chiara Naccarato (HSE University), George Moroz (HSE University)

Non-standard numeral constructions in L2 Russian: A corpus-based study

Abstract

In this talk we present our corpus-based study of numeral constructions in the Russian speech of bilinguals from different regions of Russia. Data from the ConLab’s collection of spoken corpora of L2 Russian (cf. http://lingconlab.ru/resources.html) show that non-standard encoding of numeral constructions is not infrequent, e.g., četyre brat’ja instead of Standard Russian četyre brata. Variation is attested in all corpora, but to different extents (it is higher in Daghestan as compared to corpora from other regions), and generally much lower as compared to results obtained for other varieties of bilinguals’ Russian (cf. Stoynova 2021 on Nanai and Ulcha Russian). The attested variation can only partly be explained as the result of pattern borrowing from the speakers’ L1s, and correlations with the type of numeral involved (higher variation with paucals and collectives) would point to incomplete or non-standard L2 learning as a more viable hypothesis. For most of the corpora, variation seems to characterize the speech of few speakers-outliers, so we cannot extrapolate our conclusions to the whole population.

3 October

Sergey V. Knyazev (RLI), George Moroz (HSE University), Svetlana Dyachenko (RLI)

Корпус ПРуД

Abstract

В докладе речь пойдет о создаваемом нами корпусе “Просодия русских диалектов”. Мы расскажем о его устройстве, а также о тех вопросах, решению которых, на наш взгляд, могут способствовать представленные в нем данные:

Какова вариативность русских диалектных просодических систем?

Чем они могут отличаться от литературной?

Возможно ли решение проблем интонационной фонологии литературного языка на основании диалектных данных?

Svetlana Zemicheva (HSE University)

Reading group: Коломацкий, Д. И. Дистрибуция русских пассивных форм: корпусное исследование : дисс. канд. филол. наук : специальность 10.02.21 “Прикладная и математическая лингвистика”. М., 2009.

Abstract

Цель исследования – установить, от чего зависит выбор средства обозначения страдательного залога в русском языке. В работе используется обширный корпусный материал: 3154 предложения с возвратными глаголами, 2807 предложений с пассивными причастиями. На основе применения критерия хи-квадрат определяется вероятность (не)случайного распределения. Исследуются такие возможные параметры дистрибуции, как семантический класс глагола, вид и акциональность глагольной словоформы, а также свойства актантов пассивной конструкции. Особенностью работы является анализ данных русского языка в свете глобальных теоретических подходов, в частности, используется типологическое определение пассива, предложенное М. Хаспельматом, и теория семантической переходности П. Хоппера и С. Томпсон.

26 September

Maksim Melenchenko (HSE University)

Evidential Perfects in Shughni (and other Pamir languages)

Abstract

Perfect, one of three main tense-aspect verb forms in Shughni (< East Iranian), is mainly used as an evidential. While Preterite denotes witnessed events, Perfect is used for non-witnessed events (for example, reported by other people or inferred). It is also used in experiential (such as ‘I have never…’) and counterfactual contexts. For a closed set of verbs that denote states, the Perfect form serves as a progressive present tense (e.g. ‘I am sitting’). In narratives, Perfect is generally for introductory or backgrounding events. The Perfect form of the Shughni verb ‘to be’ вуδҷ can refer to present-tense situations that are witnessed by the speaker. Similar forms in other languages have been dubbed “mirative”; I will attempt to argue against such a label for Shughni. In the talk, I will discuss theoretical implications of these topics and draw areal and typological parallels.

Polina Padalka (HSE University)

Reading group: Bohnemeyer J. Elicitation and Documentation of Tense and Aspect

Abstract

All languages seem to have provisions for the representation of time in their lexicons and discourse structures. But evidence has been mounting in recent years that the ways in which the representation of time is inscribed into the grammars of different languages varies substantially. This article reviews this evidence and introduces some empirical and analytical tools that facilitate the study of tense-mood-aspect systems in the field. The article begins by laying out some basics of temporal semantics (following Klein 1994 and Bohnemeyer 2014) and surveying tools and methods for the study of temporal semantics in the field. It then zeroes in on the topic of tenselessness. After reviewing the empirical case for tenselessness in one language, Yucatec Maya (Bohnemeyer 2002, 2009), four recent field-based studies of future-time reference (FTR) in (superficially or profoundly) tenseless languages are compared: Yucatec, Kalaallisut (Bittner 2005), St’át’imcets (Matthewson 2006), and Paraguayan Guaraní (Tonhauser 2011). Tenseless languages often grammaticalize a distinction between reference to factual and non-factual situations, treating FTR as non-factual. Past counterfactuals can serve as a diagnostic context, as they permit non-future tenses, but not irrealis markers.

19 September

Svetlana Zemicheva, George Moroz (HSE University)

Modeling the use of non-standard participles in dialect varieties of Russian

Abstract

In this talk we explore dialect ši-participles. Our sample includes 2,027 utterances with standard and non-standard verbal forms extracted from 12 dialect corpora of the linguistic convergence laboratory. We identified functions of non-standard participles and compare them to standard forms (for example, -ši forms in subject resultative construction vs verbs in past tense; -ši forms in adverbial function vs standard converbs). Then we apply logistic regression model trying to answer the following research questions: What is the probability to have a person with dialect participles in each corpus? Is there any effect of education on speaker standardness by this feature? Is there any priming effect in the use of non-standard forms?

27 June

Nina Dobrushina

Variation and change in Rutul optatives: results of Rutul dialectological survey

Abstract

In the beginning of this talk, I will introduce the project of the dialectological survey of Rutul (Lezgic), its scope and its aims. I will then present the results of the project in one functional domain, the optative suffixes. A great diversity of suffixes found across twelve villages of the Rutulski district will be analyzed in terms of inter-village, inter-speaker and intra-speaker variation. The patterns of geographic and social distribution of optative suffixes will be discussed in order to raise the question of the diachronic dynamics that underlie the observed variation. I will compare the results of the project with the existing literature on Rutul dialects and other Lezgic languages, and propose a diachronic scenario for the development and divergence of Rutul optative suffixes.

Vera Tsukanova (University of Marburg)

Areal factors in nominal plural formation in Semitic languages

Abstract

The Semitic languages have different strategies of forming nominal plurals. In some languages, only suffixes are used, while in others, suffixation or transfixation (“broken plural”) depending on the lexical item. The first group also includes languages where several nouns can undergo apophonic alternations in the plural; and the second group includes some languages where suffixes and transfixes can apply to the same lexical item at the same time. For a long time, the presence of the “broken plural” was considered to be an important isogloss which separates South Semitic from the rest of the Semitic languages. Recently this classification has, however, been revised, and the “broken plural” has been reinterpreted as an areal phenomenon. Departing from this, I am going to address the role of areal factors in the shaping of the traditional phylogenetic configurations of the Semitic languages.

20 June

Alina Russkikh (HSE University), Aigul Zakirova (independent researcher)

Peripheral functions of additive and intensifying clitics: a typological perspective

Abstract

Additive focus particles, such as English also or German auch, have been extensively discussed in the literature (König 1991, Krifka 1998, Forker 2016). Their semantic contribution to the meaning of the sentence is the presupposition that there is at least one true alternative to the focussed constituent. Another, less common type of focus particles are intensifying particles, such as Avar =go (Forker 2015). The core function of such particles is contrast or emphasis marking. In some languages, besides their core functions, additive and intensifying particles also occur in other, peripheral functions. Interestingly, in some of these functions both types of particles are used. The purpose of this talk is to establish a list of “shared” functions where both types of particles can be used, and compare the distribution of additive and intensifying particles across the shared functions. Our sample thus includes languages that feature both additive and intensifying particles. Besides, we limit ourselves to clitics. These conditions hold for languages spoken in the Eastern and Southern Caucasus and languages of the Volga-Kama area.

13 June

Maria Ermolova (HSE University)

On the grammaticalization of the past passive participles in Middle Russian

Abstract

I will analyze the functioning of the past passive participles (PPP) in the preterit meaning used without auxiliary in the Russian language of the XVIIth c. on the data of private letters and the first Russian handwritten newspaper Kuranty. The grammatical interpretation of this use is proposed in the talk. By the evolution of the old temporal system the PPP tends to grammaticalize and transform into a special form which has the preterit meaning and doesn’t need indication on the action’s subject. Different types of the constructions with PPP in this function (the PPP coordinates with the subject / has the neutral form by the subject in Nom. or Acc.) reflect different stages of the grammaticalization process, which wasn’t fully realized in the history of Russian, but reached the final phase in Polish.

Elena Shvedova (HSE University)

Reading group: Jacques, Guillaume 2023. Folia Linguistica, https://doi.org/10.1515/ flin-2023-2013

Abstract

This paper is the first survey of verbal affixes encoding the day period (‘at night’,‘in the morning’ etc) or the yearly seasons (‘in win- ter’ etc) when the main action takes place. It introduces the term ‘periodic tense’ to refer to this comparative concept, explores the attested paradigms, their interactions with other verbal categories (including the more usual deictic tense) and investigates their diachronic origins. It shows that periodic tense markers are not restricted to incorporated nouns of time period, but constitute a highly grammaticalized verbal category in some languages, which redundantly co-occurs with adverbs or nouns of time.

6 June

Gromov V. A., Borodin N. S., Dang Q. N., Ivanov A. A., Ivanov D. K., Kogan A. S., Xu J.

Spot the Bot: Large-Scale Structure of Natural Languages

Abstract

In the presentation, we attempt to examine a natural language (as a whole) as a natural phenomenon. We ascertain that it constitutes a self-organised critical complex system. The presentation explores the language coarse-graining structure: sets of word, bi- and trigram embeddings. We ascertain that these sets are fractal, estimate their internal dimensions, and contour ‘holes’ in the language (using topological data analysis).We analyse semantic trajectories of literature masterpieces as those of chaotic dynamical systems. The results obtained are employed to distinguish texts written by humans and those generated by bots.

Timofei Dedov (HSE University)

Reading group: Nataliia Hübler, Simon J. Greenhill (2023) Modelling admixture across language levels to evaluate deep history claims

Abstract

The so-called ‘Altaic’ languages have been subject of debate for over 200 years. An array of different data sets have been used to investigate the genealogical relationships between them, but the controversy persists. The new data with a high potential for such cases in historical linguistics are structural features, which are sometimes declared to be prone to borrowing and discarded from the very beginning and at other times considered to have an especially precise historical signal reaching further back in time than other types of linguistic data. We investigate the performance of typological features across different domains of language by using an admixture model from genetics. As implemented in the software STRUCTURE, this model allows us to account for both a genealogical and an areal signal in the data. Our analysis shows that morphological features have the strongest genea- logical signal and syntactic features diffuse most easily. When using only morphological structural data, the model is able to correctly identify three language families: Turkic, Mongolic, and Tungusic, whereas Japonic and Koreanic languages are assigned the same ancestry.

30 May

Zaira Khalilova (IL RAS, HSE)

Adaptation of verbal borrowings in Nakh-Daghestanian languages

Abstract

The Nakh-Daghestanian languages have many borrowings from different languages. The majority of borrowings are nouns and adjectives, whereas verbs are borrowed the least. When verbs are borrowed, they are adapted by using different integration strategies, which are light verb strategy, indirect insertion, and direct insertion [Wohlgemuth, 2009].

In this talk, I will present verbal adaptation strategies found in the languages of Daghestan. For instance, languages in contact with Avar integrate verbal borrowings using light verb strategy and indirect insertion. Languages in contact with Turkic languages use light verb strategy and direct insertion. It is generally the case that a recipient language has one strategy for integrating verbs from various donor languages, e.g., Bezhta uses light verb strategy to integrate Avar, Georgian, and Russian verbs. Other languages apply one integration strategy per donor language, e.g., Khwarshi uses light verb strategy for Russian verbs and indirect insertion for Avar verbs. Few languages combine two strategies for one donor language, e.g., Lezgian uses light verb strategy and direct insertion for adapting Russian verbs.

Asya Alekseeva (HSE University)

Reading group: Petrini, S., Casas-i-Muñoz, A., Cluet-i-Martinell, J., Wang, M., Bentz, C., & Ferrer-i-Cancho, R. (2023). Direct and indirect evidence of compression of word lengths. Zipf’s law of abbreviation revisited. arXiv preprint arXiv:2303.10128.

Abstract

Zipf’s law of abbreviation, the tendency of more frequent words to be shorter, is one of the most solid candidates for a linguistic universal, in the sense that it has the potential for being exceptionless or with a number of exceptions that is vanishingly small compared to the number of languages on Earth. Since Zipf’s pioneering research, this law has been viewed as a manifestation of a universal principle of communication, i.e. the minimization of word lengths, to reduce the effort of communication. Here we revisit the concordance of written language with the law of abbreviation. Crucially, we provide wider evidence that the law holds also in speech (when word length is measured in time), in particular in 46 languages from 14 linguistic families. Agreement with the law of abbreviation provides indirect evidence of compression of languages via the theoretical argument that the law of abbreviation is a prediction of optimal coding. Motivated by the need of direct evidence of compression, we derive a simple formula for a random baseline indicating that word lengths are systematically below chance, across linguistic families and writing systems, and independently of the unit of measurement (length in characters or duration in time). Our work paves the way to measure and compare the degree of optimality of word lengths in languages.

23 May

Vladimir Plungian (MSU, IL RAS, RLI, HSE)

“Temporal mobility” outside Armenian: the case of Dardic and Iranian

Abstract

In a number of my previous publications (Plungian 2018 being the most recent), I suggested a label of “temporal mobility” for a specific type of verbal systems where the opposition of (present and past) tense may not extend to the whole set of aspectual forms. That means some aspectual series behave as “temporally mobile” (= they allow both past and present reference), and some other series remain “temporally stable” (= they can have only one type of temporal reference). My main evidence came from Modern Eastern Armenian, where, additionally, a nice morphosyntactic correlation is observed: temporally mobile forms of indicative (as perfect, prospective, and durative) tend to be periphrastic, while temporally stable forms (as aorist) tend to be synthetic. The problem of how pervasive this type of verbal systems may be was not addressed, however.

In fact, there are increasingly fascinating samples of temporal mobility, which happen to be found in several Dardic languages (as Gawri, Khowar, and some others), as well as in neighboring Iranian, where temporal (and even aspectual) marking can be meaningfully absent in part of the finite forms – thus creating a complicated three-level opposition between “temporally mobile”, “aspectually mobile” and “rigid” forms. Presumably, systems of this kind demonstrate that a “mobility-based” typology could have a wider application.

Baart, Joan L. G. 1999. A sketch of Kalam Kohistani grammar (Studies in Languages of Northern Pakistan 5). Islamabad: SIL.

Liljegren, Henrik. 2016. A grammar of Palula (Studies in Diversity Linguistics 8). Berlin: Language Science Press.

Plungian, Vladimir. 2018. Notes on Eastern Armenian verbal paradigms: “temporal mobility” and perfective stems. In: Daniël Van Olmen, Tanja Mortelmans & Frank Brisard (eds.), Aspects of linguistic variation: Studies in honor of Johan van der Auwera, 233–245. Berlin: De Gruyter Mouton.

Rönnqvist, Hanna. 2013. Tense and aspect systems in Dardic languages: A comparative study. Stockholms universitet.

Schmidt, Ruth Laila & Razwal Kohistani. 2008. A grammar of the Shina language of Indus Kohistan. Wiesbaden: Harrassowitz.

Грюнберг, А. Л. 1987. Очерк грамматики афганского языка (пашто). Л.: Наука.

Эдельман, Д. И. 1983. Дардские и нуристанские языки. М.: Наука.

16 May

Rita Popova (HSE University), Michael Daniel (Collegium de Lyon / Laboratoire Dynamique du Langage)

‘Small’ is big. Number suppletion in size property words

Abstract

Adjectival number suppletion — a phenomenon considered as unexpected on theoretical grounds (Hippisley et al., 2004) — is attested in size-denoting adjectives of three different IE branches, namely in Mainland Scandinavian, Brittonic and Megleno Romanian languages. Swedish suppletive adjective ‘small’ is by far the most widely cited example:

en liten flicka a small girl ‘a small girl’
två små flickor two small.pl girl.pl ‘two small girls’ (Nurmio, 2017)

This, and the other two cases of suppletion attested in Indo-European, were object of detailed etymological investigations in the course of the last decade (see Börjars & Vincent, 2011 for Mainland Scandinavian, Jørgensen, 2012 and Nurmio, 2017 for Brittonic, Maiden, 2014 for Megleno-Romanian). Theory-wise, Maiden (2014) attempts to explain why it is specifically size adjectives that deviate from regular expression of number, by examining the historical stages that led to the eventual development of suppletive paradigms for ‘small’ and ‘big’ in Scandinavian and Megleno Romanian. It is argued that all known instances of number suppletion in adjectives came to life under similar circumstances and should be attributed to a single linguistic force that disrupts the usual pattern of inflection precisely in size properties. Nurmio (2017) takes a step forward by bringing into light evidence from 3 non-IE languages and suggests “a strong tendency for adjectives denoting size, ‘small’ and ‘big’, to develop this type of suppletion” (Nurmio 2017: 26). In this study, we hope to continue the typological line of research and to make a contribution to the topic by presenting a more diverse collection of number conditioned irregularities in the expression of size properties across the world.

First, we put the “strong tendency” suggested by Nurmio (2017) to a more rigorous testing. By researching several lexical databases and typological surveys, we obtained data on number encoding for a fixed subset of eight prototypical property meanings (Dixon 1982), namely ‘small’, ‘big’, ‘long’, ‘short’, ‘young’, ‘old’, ‘good’, ‘black’ in a convenience sample of some 100 languages where at least one of these items shows number-related irregularity. Our sources include typological studies of suppletion in general (Hippisley et al., 2004, Veselinova, 2006, Vafaeian, 2013) and three lexical databases: Global Lexicostatistical Database (Starostin, 2011), Intercontinental Dictionary Series (Key & Comrie, 2023), and Austronesian Basic Vocabulary Database (Greenhill et al., 2008). Even though absolute numbers remain low, we see that this broader cross-linguistic evidence supports the idea of size-properties being more prone to number-conditioned stem alternations than other properties.

Next, we discuss various theoretical interpretations and implications of this generalization. We revisit the analysis of sources and paths of grammaticalization of ‘small’-suppletion in (Maiden 2014) in the light of these new data. There are a few cases when we are able to trace down the histories behind the suppletive pairs similar to the studies of Indo-European suppletion; however, in most cases we do not know much about the etymologies of the lexical items involved. In a synchronic functional perspective, we try to relate ‘small’-suppletion to research on a different but related type of number suppletion, namely the studies on verbal number (see especially Durie, 1986, Mithun, 1988, François, 2019), a link hinted at by Nurmio (2017).

References:

Börjars, K., & Vincent, N. (2011). The pre-conditions for suppletion. In A. Galani, G. Hicks, & G. Tsoulas (Eds.), Linguistik Aktuell/Linguistics Today (Vol. 178, pp. 239–266). John Benjamins Publishing Company. https://doi.org/10.1075/la.178.13bor

Dixon, R. M. (1982). Where have All the Adjectives Gone?: And Other Essays in Semantics and Syntax. DE GRUYTER MOUTON. https://doi.org/10.1515/9783110822939

Durie, M. (1986). The Grammaticization of Number as a Verbal Category. In M. V. Vassiliki Nikiforidou, M. Niepokuj, & D. Feder (Eds.), Proceedings of the twelfth Annual Meeting of the Berkeley Linguistics Society, 15-17 February 1986 (pp. 355–370). Berkeley Linguistics Society Publications.

François, A. (2019). Verbal number in Lo-Toga and Hiw: The emergence of a lexical paradigm. Transactions of the Philological Society, 117(3), 338–371.

Greenhill, S. J., Blust, R., & Gray, R. D. (2008). The Austronesian Basic Vocabulary Database: From Bioinformatics to Lexomics. Evolutionary Bioinformatics, 4, EBO.S893. https://doi.org/10.4137/EBO.S893

Hippisley, A., Chumakina, M., Corbett, G., & Brown, D. (2004). Suppletion: Frequency, categories and distribution of stems. Studies in Language - STUD LANG, 28. https://doi.org/10.1075/sl.28.2.05hip

Jørgensen, A. R. (2012). The Plural of Middle and Early Modern Breton bihan. https://www.academia.edu/5789327/The_Plural_of_Middle_and_Early_Modern_Breton_bihan

Key, M. R., & Comrie, B. (Eds.). (2023). IDS. Max Planck Institute for Evolutionary Anthropology. https://ids.clld.org/

Maiden, M. (2014). Two suppletive adjectives in Megleno-Romanian. Revue Romane. Langue et Litterature. International Journal of Romance Languages and Literatures, 49, 32–52. https://doi.org/10.1075/RRO.49.1.02MAI

Mithun, M. (1988). Lexical categories and the evolution of number marking. Theoretical Morphology: Approaches in Modern Linguistics, 211–234.

Nurmio, S. (2017). The development and typology of number suppletion in adjectives. Diachronica, 34, 127–174. https://doi.org/10.1075/DIA.34.2.01NUR

Starostin, G. (2011, 2019). The Global Lexicostatistical Database.Moscow: Higher School of Economics, & Santa Fe: Santa Fe Institute. Available online at http://starling.rinet.ru/new100/, accessed on [25.04.2023].

Vafaeian, G. (2013). Typology of nominal and adjectival suppletion. STUF - Language Typology and Universals, 66(2). https://doi.org/10.1524/stuf.2013.0007

Veselinova, L. N. (2006). Suppletion in Verb Paradigms. In Tsl.67. John Benjamins Publishing Company. https://benjamins.com/catalog/tsl.67

2 May

George Moroz, Asya Alekseeva, Timofei Dedov, Artem Orekhov, Kirill Sidorov, Angelina Stepanova (HSE University)

What do we know about linguistic journals? Applying NLP methods to a dataset of abstracts from linguistic journals (work in progress)

Abstract

There is a new project within the Lab dedicated to the analysis of abstracts from linguistic journals. Within this project we extracted and annotated a list of linguistic journals, gathered the abstracts and metadata for all scientific papers from those journals, and stored them within the database. The main goal of this project is to create some kind of map of “who is who in linguistics”. During the talk we will share our preliminary results and discuss conceptual problems of this task, possible applications of the gathered data and our future plans.

25 April

Sergey Say (University of Potsdam), Natalia Logvinova (National Research University Higher School of Economics & RAS Institute for Linguistic studies), Elizaveta Zabelina (RAS Institute for Linguistic studies)

Introducing NoCaCoDa, а Typological Database of Nominal Causal Constructions

Abstract

In this talk, we will present NoCaCoDa, a typological database of nominal causal constructions. By nominal causal constructions we refer to causal constructions in which the causing event is syntactically represented by a noun phrase, as in He is shivering from [the cold] or She was late for work because of [her husband]. NoCaCoDa is based on a questionnaire containing 54 stimulus sentences that are annotated for five semantic parameters: direct vs. indirect causes, objective vs. subjective causes, internal vs. external causes, etc. Currently, NoCaCoDa displays first-hand data from 35 languages. In the talk, we will outline the structure of the database and summarize our preliminary findings. More precisely, we take the patterns of syncretism displayed by nominal causal markers as a proxy for the cognitive construal of the causal meaning.

Daria Ryzhova (HSE University), Yury Makarov (HSE, Institute of Linguistics, RAS), Ekaterina Rakhilina (HSE, Vinogradov Russian Language Institute, RAS)

An annotation tool for a typological research of verbal colexifications

Abstract

In this talk, we will report our progress in the research of verbal colexifications in Botlikh and Shughni. We will present the annotation system that we developed and the database that we designed specifically for the annotation purposes. Our markup allows to refrain from using such general labels as metaphor and metonymy and to extract more specific polysemy patterns, going deeper into the nature of verbal polysemy.

18 April

Timur Maisak (HSE University)

Imperative interjections ‘here, take it!’ in Daghestan and beyond

Abstract

Many languages of the world possess a dedicated imperative interjection used by the speaker to ask the addressee to take something (from the speaker’s hands), e.g. ma in the example from Agul below. The Russian na, as in na, beri ‘here, take it!’ in another instance of the same imperative interjection type. In the talk, I will present the results of my study of ‘take!’-interjections in the languages of the Caucasus: the study is part of the Typological Atlas of the Languages of Daghestan (http://lingconlab.ru/dagatlas/), but the sample was slightly expanded. I will discuss the phonological form of ‘take!’-interjections, their morphosyntactic behavior and also their “extended” functions.

Agul (< Lezgic < Nakh-Daghestanian)

ma čʷa-s xibu guni p.u-na

take.it you.pl-dat three bread say.pfv-cvb

{I have three loafs of bread.} ‘Take these three loafs, – he said.’

Ilya Sadakov (HSE University)

Reading Group: Dunn, M., & Bellamy, K. (2023). Evolution and Spread of Politeness Systems in Indo‐European

Abstract

In this paper, we investigate the phenomenon of pronominal politeness in the Indo-European languages and demonstrate that the processes of change of pronominal systems related to politeness follow two evolutionary regimes, one inside the ‘Standard Average European’ (SAE) linguistic area and another outside of it. Historical processes of language change differ at different levels of linguistic structure. In general, we presume that lower level, unconscious aspects of language change slowly over phylogenetic time, giving rise to patterns of relationship that can often be described as a family tree. Aspects of language that are consciously manipulated by speakers are expected to vary at a faster rate and to diffuse within areas of contact. Politeness is a social phenomenon, so we expect these systems to be highly susceptible to areal norms of interaction. We show that the similarities of SAE politeness systems can be accounted for with a model of convergence due to parallel evolution in a shared (social-demographic) environment, rather than by genealogical relatedness or borrowing. By quantifying and testing factors determining rates of structural change, we offer a novel and realistic approach that can explain similarities between distantly related languages sharing the same environment.

4 April

Michael Daniel (Collegium de Lyon / Laboratoire Dynamique du Langage)

Directionality is a key to understanding the part-of-speech typology in East Caucasian languages (and beyond)

Abstract

Starting from the 1990s to the present, functional typology has seen a lot of typological debates about (how to deal with) non-universality of part-of-speech inventories across languages (Croft 1990, Hengeveld 1992, and a survey in Bisang 2010). In these discussions, building part-of-speech typology is viewed as investigating the ways in which languages group words into classes so that the resulting classification is relevant to their morphosyntax. As morphosyntax of languages varies greatly, we can expect - and indeed observe - differences in their inventories of parts of speech. The discussions have been primarily focused on languages that show inventories that are reduced as compared to expectations (underspecified parts of speech in van Lier & Rijkhoff 2013), such as lacking adjectives as a separate part of speech (Dixon 1977) or lacking major distinctions of parts of speech altogether; see re-assessment of the assumed “mergers” in Arkadiev et al (2009) for Adyghe, in Dixon and Aikhenvald (2006) for adjectives cross-linguistically, and Haspelmath (2012) for a general methodological stance. But there is more to such a variation than eventual mergers. In this talk, I am going to suggest a tentative re-interpretation of the morphosyntactic status of the category of directionality in East Caucasian. In the existing descriptive grammars, markers of directionality are usually viewed as cross-categorial markers that can be added to different classes of words. In the talk, I will draft a comprehensive inventory of the notional classes of stems that are involved in the derivation of directional forms; will show how this grouping is also relevant to syntax; and will ultimately argue that they all can be seen as belonging to one - and separate - part of speech, which I suggest to call spatials (or spatial words).

Bisang, Walter. 2010. Word Classes. In: Song, Jae Jung (ed.) The Oxford handbook of linguistic typology. Oxford: Oxford University Press.

Croft, William. 1990. Typology and Universals. Cambridge: Cambridge University Press.

Dixon, Robert W. 1977. Where have all the adjectives gone? Studies in Language. Vol. 1. No. 1. 19–80.

Dixon, Robert W., and Alexandra Y. Aikhenvald. 2006. Adjective classes: a cross-linguistic typology. Oxford: Oxford University Press.

Haspelmath, Martin. 2012. How to compare major word-classes across the world’s languages. UCLA Working Papers in Linguistics, Volume 17, Article 16: 109-130.

Hengeveld, Kees. 1992. Parts of Speech. In: Non-verbal Predication: Theory, Typology, Diachrony. Berlin: Mouton de Gruyter.

Rijkhoff, Jan, and Eva van Lier (eds.) 2013. Flexible word classes: Typological studies of underspecified parts of speech. Oxford: Oxford University Press.

Vogel, Petra M., and Bernard Comrie (eds.) 2011. Approaches to the Typology of Word Classes. Berlin: Mouton de Gruyter.

Kirill Chuprinko (HSE University)

Andic dictionaries’ examples database: state of affairs and possible applications

Abstract

In this talk I present the current status of the Andic dictionaries’ examples database. This database is a collection of linguistic examples with Russian translation provided within the entries of dictionaries presented in the Andic dictionary database (https://github.com/phon-dicts-project/comparative_andic_dictionary_database). In our vision it can be used as an addition to other corpus materials. I am going to discuss some difficulties that appeared at its creation, its current structure and content as well as prospects for improvement. Also, in order to show a possible way of application, I am going to discuss a possible way to investigate variation in Andic languages with this database.

28 March

Ekaterina Voloshina (AIRI), Anastasia Cheveleva (AIRI, HSE), Viktoria Knyazkova (HSE University), Vitaly Protasov (AIRI), Tatiana Shavrina (AIRI), Oleg Serikov (AIRI)

Climbing the WALS: Typological Case Studies in LLMs Probing Using Syntactic Probing Framework

Abstract

Theoretical discourse on language predominantly revolves around the hierarchical arrangement of language, which encompasses phonetics, morphology, syntax, and semantics. Although knowledge of language structure is integral in evaluating the accuracy and comprehension of language models, the emphasis on syntax in interpretation methods is insufficient. In this study, we introduce a framework to conduct syntactic probing across various languages and large language models. Drawing data from the World Atlas of Language Structures (WALS) and Universal Dependencies (UD), we compile a linguistically diverse set of languages and investigate mBERT multilingual language model for its capacity to encode syntactic information in relation to word order and relative clauses. In order to do so we make use of a novel syntactic querying interface allowing to gather data from UD collection. Our findings support the notion that mBERT exhibits a gap in recognizing syntax and a tendency to favour Standard Average European (SAE) languages when intersecting with existing diverse datasets.

Yury Koryakov (IL RAS, HSE)

Language diversity in Daghestan and beyond. How many languages are spoken there?

Abstract

It has long been known that the Caucasus is a mountain of languages, and Dagestan then was its peak. But how many languages are actually spoken in Dagestan and how can we count them? In the talk, I will tell about the project to compile a list of languages of Russia and the principles that underlie it. And also about the application of these criteria for the languages of Dagestan and the difficulties that arise in this case. The main algorithm that we use to decide whether a given lect is a separate language includes the following criteria: tradition; mutual intelligibility, lexical distance; written tradition, linguistic and/or ethnic identity. However, in Dagestan, the only region of Russia, additional difficulties arose due to the position of some local scholars.

21 March

Yury Lander, Ksenia Romanova (HSE University)

Comparatives in the North Caucasus may be not so comparative

Abstract

In this talk, we discuss the specifics of certain patterns that are usually described as constructions expressing the comparison of inequality. We provide evidence that these constructions may function in a completely different way as compared with their Standard Average European translations. Most data are taken from West Caucasian languages, but similar conclusions may apply to East Caucasian languages as well as to some Turkic languages of the area.

Svetlana Kuznetsova (HSE University)

Reading group: A sampling technique for worldwide comparisons of language contact scenarios by Francesca Di Garbo and Ricardo Napoleão de Souza

Abstract

Existing sampling methods in language typology strive to control for areal biases in typological datasets as a means to avoid contact effects in the distribution of linguistic structure. However, none of these methods provide ways to directly compare contact scenarios from a typological perspective. This paper addresses this gap by introducing a sampling procedure for worldwide comparisons of language contact scenarios. The sampling unit consists of sets of three languages. The Focus Language is the language whose structures we examine in search for contact effects; the Neighbor Language is genealogically unrelated to the Focus Language, and counts as the potential source of contact influence on the Focus Language; the Benchmark Language is a relative of the Focus Language neither in contact with the Focus nor with the Neighbor language, and is used for disentangling contact effects from genealogical inheritance in the Focus Language. Through this design, we compiled a sample of 49 three-language sets (147 languages in total), which we present here. By switching the focus of typological sampling from individual languages to contact relations between languages, our method has the potential of uncovering patterns in the diffusion of language structures, and how they vary and change.

14 March

George Moroz, Chiara Naccarato, Anastasia Yakovleva, Svetlana Zemicheva (HSE University)

Non-standard features in spoken corpora of dialectal and regional varieties of Russian

Abstract

In this talk, we will present the preliminary results of ongoing research on variation in spoken corpora of dialectal and regional varieties of Russian. The collection of spoken corpora developed by members of the Linguistic Convergence Laboratory is constantly growing and currently includes 21 corpora of bilinguals’ and dialectal varieties of Russian. Based on data from such corpora, we investigate phenomena of variation at different linguistic levels. Some of the topics currently being investigated include preposition drop, case marking in numeral constructions, and functions of participles and converbs. After manual annotation of corpus data, we apply statistical methods to test linguistic and sociolinguistic hypotheses on the motivations of certain patterns of variation.

7 March

Alina Russkikh (HSE University)

Constructions with collective numerals in typological perspective

Abstract

This study represents a typological investigation on constructions with collective numerals based on the data collected from 105 languages. By constructions with collective numerals, I understand such cases where a quantifying group indicates selection of N items out of the set with N items in total, cf. English both, French tous les trois ‘all the three’ or Chuvash ik aʨ-i=de [two child-P_3=ADD] ‘both children’. This study is aimed at identifying possible strategies of forming such construction in the languages of the world, and their distribution. In a number of languages, constructions with collective numerals are formed by using morphological markers or syntactic models which occur in other grammatical functions as well. In such cases, I take into consideration those adjacent functions and their possible semantic relations with a collective meaning in constructions with numerals.

Aigul Zakirova

Number marking on verbs in the East Caucasian languages

Abstract

In this talk I describe different patterns of number marking on verbs, using mainly grammatical descriptions Possible ways to analyze number marking on verbs are discussed: 1) as agreement, i.e. as an operation by which morphological featuresare copied from one word form onto another, 2) as a special category of verbal number. Besides, I dwell on the factors by which this marking is conditioned. As it turns out, the Nakh and the Avar-Ando-Tsez branches feature number marking strategies, which are limited to a lexically defined set of verbs. Number markers in this case are localized in the verb stem. On the other hand, in many branches of the family, a different situation is also common: number marking is conditioned by the TAM-forms of the verb, i.e. by grammatical factors. Finally, the diachrony of number marking is considered. I propose that the verb forms which are marked for number often go back to copular constructions where the predicate position is occupied by a participial form. As a result of grammaticalization of these constructions into verbal forms, the original nominal number marking on the participle develops into verbal number marking.

21 February

Alexey Koshevoy (LPC, Aix-Marseille Université and Institut Jean Nicod, ENS-PSL), Anastasia Panova (Stockholm University), Ilya Makarchuk (HSE Univeristy)

Building a Universal Dependencies Treebank for a Polysynthetic Language: the Case of Abaza

Abstract

In this talk we are going to discuss the challenges that we faced during the construction of a Universal Dependencies treebank for Abaza, a polysynthetic Northwest Caucasian language. We propose an alternative to the morpheme-level annotation of polysynthetic languages introduced in Park et al. (2021). Our approach aims at reducing the number of morphological features, yet providing all the necessary information for the comprehensive representation of all the syntactic relations. Besides, we suggest to add one language-specific relation needed for annotating repetitions in spoken texts and present several solutions that aim at increasing crosslinguistic comparability of our data.

Alina Russkikh (HSE University), Maksim Melenchenko (HSE University)

Reading group: C. T. Schütze (2011) Linguistic evidence and grammatical theory

Abstract

This article surveys the major kinds of empirical evidence used by linguists, with a particular focus on the relevance of the evidence to the goals of generative grammar. After a background section overviewing the objectives and assumptions of that framework, three broad kinds of data are considered in the three subsequent sections: corpus data, judgment data, and (other) experimental data. The perspective adopted is that all three have their place in the linguist’s toolbox: they have relative advantages and disadvantages that often complement one another, so converging evidence of more than one kind can reasonably be sought in many instances. Points are illustrated mainly with examples from syntax, but often can be easily translated to other levels (e.g., phonology, morphology, semantics, and pragmatics).

14 February

Philip Shushurin (ILS RAS/ Ben Gurion University of the Negev, Beer Sheva)

Nouns, adjectives and other lexical categories

Abstract

Nouns and adjectives are often contrasted with other lexical categories as a category which can get case marking. Detailed typological studies reveal many additional similarities between nouns and adjectives such as the inability to have direct objects (cf. performance (of) the song, full *(of) problems). I propose a new analysis of the traditional syntactic categories N and Adj, suggesting that the principal properties of nouns and adjectives are largely determined by the presence of (valued or unvalued) inherent gender. Furthermore, I show how this system is able to extend to other lexical categories such as prepositions (and linkers) as well as verbs.

7 February

Anastasiya Ivanova (HSE University)

Reading group: Hübler N. 2022 Phylogenetic signal and rate of evolutionary change in language structures. R. Soc. Open Sci. 9: 211252.

Abstract

Within linguistics, there is an ongoing debate about whether some language structures remain stable over time, which structures these are and whether they can be used to uncover the relationships between languages. However, there is no consensus on the definition of the term ‘stability’. I define ‘stability’ as a high phylogenetic signal and a low rate of change. I use metric D to measure the phylogenetic signal and Hidden Markov Model to calculate the evolutionary rate for 171 structural features coded for 12 Japonic, 2 Koreanic, 14 Mongolic, 11 Tungusic and 21 Turkic languages. To more deeply investigate the differences in evolutionary dynamics of structural features across areas of grammar, I divide the features into 4 language domains, 13 functional categories and 9 parts of speech. My results suggest that there is a correlation between the phylogenetic signal and evolutionary rate and that, overall, two-thirds of the features have a high phylogenetic signal and over a half of the features evolve at a slow rate. Specifically, argument marking (flagging and indexing), derivation and valency appear to be the most stable functional categories, pronouns and nouns the most stable parts of speech, and phonological and morphological levels the most stable language domains.

Johanna Nichols (Berkeley, HSE)

Work in progress on using typological distributions to identify language family homelands

Abstract

This is a preliminary report applying principles worked out in identifying homelands and centers of dispersal for Uralic and some of its branches (Grünthal et al. 2022) to reconstructing a center and trajectories for the Nakh-Daghestanian dispersal. Selection of typological variables, choice of methods of comparison, and types of coding can all selectively enhance the visibility of starting points or endpoints of dispersal trajectories. For Uralic we can reconstruct a broadly easterly homeland and an east-to-west distribution of the earliest branch ancestors, followed by northward spreads. Applying similar reasoning to Nakh-Daghestanian, there is also a directionality in early branch ancestor locations and diffusions of typological features, and there is geographical and archaeological evidence not available for Uralic. Compared to Uralic, a Nakh-Daghestanian origin appears to be less precise in time but more precise in space. Grünthal, Riho, Volker Heyd, Sampsa Holopainen, Juha Janhunen, Olesya Khanina, Matti Miestamo, Johanna Nichols, Janne Saarikivi, and Kaius Sinnemäki. 2022. Drastic demographic events triggered the Uralic spread. Diachronica 39:4.490–524.https://doi.org/10.1075/dia.20038.gru (open access)

31 January

Konstantin Filatov (HSE University), Vladimir Plungian (MSU, IL RAS, RLI, HSE)

New Testament as a parallel corpus, and parallel corpus as a typological data base: a different look

Abstract

The use of parallel texts for typological research is a long-existing practice, significantly enhanced with the advent of corpus technologies (informative surveys of the problem can be found in Cysouw & Wälchli 2007, Aijmer 2008, Frajzyngier & Mettouchi 2015, Doval & Sánchez Nieto 2019, among others). For a wide-scale cross-linguistic study, especially efficient instrument is what is usually called “massively parallel corpus” including sample texts from a very large number of languages (up to several hundred; cf. Östling 2016, Нестеренко 2019). For that purpose, the best choice would be the texts which are not only the most translated but targeting the less studied languages at the same time. Virtually the only candidate that satisfies both conditions is obviously the New Testament: as of now, at least fragments of this text are translated into more than 3,200 languages (including extinct, poorly documented or unwritten ones). Again, its importance for cross-linguistic studies is a well-known fact (first attempts to use New Testament translations going back as early as to the 18th century, as in Gottfried Hensel’s “Synopsis universae philologiae” with the earliest language maps known). In present-day typology, the story begins probably with Haspelmath 1997 and is continued by a number of fairly diverse approaches, as, for example, Barentsen 2008, Wälchli & Cysouw 2012, or Dahl 2014. However, all existing attempts to use New Testament follow roughly the same pattern: they (i) take one lexical or grammatical phenomenon of a presumably universal extent (as verbs of motion or perception, spatial adpositions, markers of current relevance, etc.), then (ii) try to identify cross-linguistically reliable contexts for it and (iii) analyze the variation observed. Here, the focus is on one particular piece of data supposed to be identifiable in many doculects constituting the parallel corpus. What we would like to propose is somewhat different. Our focus is not on a single category, but on the whole set of cross-linguistically relevant values belonging to the Universal Grammatical Inventory. Accordingly, the parallel corpus is viewed not as a mere collection of repeated occurrences, but as a complete database of grammatically relevant contexts where a corresponding grammatical value is most expected. The results of our preliminary research suggest that it is possible to isolate typical contexts for main tense and aspect values (as future and prospective, progressive, iterative and habitual, resultative and framepast), for main argument roles, for number and determination values, etc. Not all grammatical categories can be localized in this way, but only those with a clear semantic prototype, sometimes called “inherent” (cf. Frajzyngier & Mettouchi 2015). Nevertheless, the cases in point are numerous, and we are going to demonstrate some examples of how the relevant grammatical contexts can be profitably studied using the continuous annotation of New Testament parallel corpus. Bibliography Aijmer, K. 2008. Parallel and comparable corpora // A. Lüdeling & M. Kytö (eds.). Corpus Linguistics: An International Handbook. Berlin: De Gruyter Mouton, vol. 1, 275–291. Barentsen, A. 2008. О конструкциях при глаголах восприятия в различных европейских языках (на основе переводов Нового завета) // E. de Haard, W. Honselaar & J. Stelleman (eds.). Literature and Beyond: Festschrift for Willem G. Weststeijn on the Occasion of his 65th Birthday. Amsterdam: Pegasus, vol. 1, 103–134. Cysouw, M. & Wälchli, B. (eds.). 2007. Parallel Texts: Using Translational Equivalents in Linguistic Typology // Theme issue in Sprachtypologie & Universalienforschung, 60.2. Dahl, Ö. 2014. The perfect map: Investigating the cross-linguistic distribution of TAME categories in a parallel corpus // B. Szmrecsanyi & B. Wälchli (eds.). Aggregating dialectology and typology: linguistic variation in text and speech, within and across languages. Berlin: De Gruyter, 268–289. Doval, I. & Sánchez Nieto, M. T. (eds.). 2019. Parallel corpora for contrastive and translation studies: New resources and applications. Amsterdam: John Benjamins. Frajzyngier, Z. & Mettouchi, A. 2015. Functional domains and cross-linguistic comparability // A. Mettouchi, M. Vanhove & D. Caubet (eds.). Corpus-based Studies of Lesser-described Languages: The CorpAfroAs corpus of spoken AfroAsiatic languages. Amsterdam: John Benjamins, 257–279. Нестеренко, Л. В. 2019. Мультиязычные параллельные корпуса: новый источник данных для типологических исследований, перспективы использования и проблемы // Вопросы языкознания (2), 111–125. Östling, R. 2016. Studying colexification through massively parallell corpora // P. Juvonen & M. Koptjevskaja-Tamm (eds.). The lexical typology of semantic shifts. Berlin: De Gruyter, 157–176. Wälchli, B. & Cysouw, M. 2012. Lexical typology through similarity semantics: Toward a semantic map of motion verbs // Linguistics 50.3, 671–710.

Konstantin Filatov (HSE University)

Reading group: Dahl, Ö. & B. Wälchli. 2016. Perfects and iamitives: two gram types in one grammatical space. Letras de Hoje 51(3). 325. https://doi.org/10.15448/1984-7726.2016.3.25454.

Abstract

This paper investigates the grammatical space of the two gram types – perfects and iamitives. Iamitives (from Latin iam ‘already’) overlap in their use with perfects but differ in that they can combine with stative predicates to express a state that holds at reference time. Iamitives differ from ‘already’ in having a higher frequency and showing a strong tendency to be grammaticalized with natural development predicates. We argue that iamitives can grammaticalize from expressions for ‘already’. In this study, we extract perfect grams and iamitive grams iteratively starting with two groups of seed grams from a parallel text corpus (the New Testament) in 1107 languages. We then construct a grammatical space of the union of 370 extracted grams by means of Multidimensional Scaling. This grammatical space of perfects and iamitives turns out to be a continuum without sharp boundaries anywhere.

24 January

Polina Nasledskova (HSE University)

Ordinal numerals in sign languages

Abstract

This is a continuation of my typological research of formation of ordinal numerals. Sign languages can provide valuable insights to typological generalizations. In this talk, I am going to compare the data from sign languages with my earlier observations based on languages from WALS-100 sample. Despite the fact that ordinal numerals in sign languages are in many respects similar to those of languages from WALS-100, some crucial differences between the two samples are also attested.

Daria Ryzhova (HSE University)

Reading Group: Margetts, A., Haude, K., Himmelmann, N. P., Jung, D., Riesberg, S., Schnell, S., Seifart F., Sheppard H., Wegener, C. (2022). Cross-linguistic patterns in the lexicalisation of bring and take. Studies in Language. International Journal sponsored by the Foundation “Foundations of Language”, 46(4), 934-993.

Abstract

This study investigates the linguistic expression of bring and take events and more generally of the semantic domain of directed caused accompanied motion (‘directed CAM’) across a sample of eight languages of the Pacific and the Americas. Unlike English, the majority of languages in our sample do not lexicalise directed CAM events by simple verbs, but rather encode the defining meaning components – caused motion, accompaniment, and directedness – in morphosyntactically complex constructions. The study shows a high degree of crosslinguistic diversity, even among closely related languages. Meaning components are contributed to directed CAM expressions by a mix of lexical semantics, morphosyntax, and pragmatic means. The study proposes a text-based, semantic typology of directed CAM events by drawing on corpus data from endangered languages.

17 January

Elena Shvedova (HSE University)

An outline for studying the system of verbal patterns in Urmi Neo-Aramaic

Abstract

There is some general idea of what Semitic verbal “patterns” (also called templates, binyanim, породы) are — not only among semitists, but also among linguists. The expected canonical verbal root in Semitic languages is purely consonantal, and inflected verb stems are created through a restricted number of derivational templates. These templates encode in a semi-systematic manner such dimensions of verb meaning as agency and voice. It is well described that verbal patterns can vary between languages — usually the variability of the number of the patterns, their individual form and function are mentioned. However, it is assumed that the template systems have more or less the same morphosyntactic status in all Semitic languages. The main task of my future research is to determine the status of verbal patterns in the system of Christian Urmi Neo-Aramaic (< Northeastern Neo-Aramaic < Semitic), a language with a significantly reduced, “non-classical” Semitic verbal system. On the one hand, three Urmian verbal patterns just mark the inflection-class membership of the verb. On the other hand, for some verbs the change of the pattern can be described as a morphological mechanism of changing valency. I will discuss the problems of establishing consonantal root in Christian Urmi, regularity of the patterns’ meanings and possible directions of my future work. Reading Group: Schreur, J. W., Allassonnière-Tang, M., Bellamy, K., & Rochant, N. (2022). Predicting grammatical gender in Nakh languages: Three methods compared. Linguistic Typology at the Crossroads, 2(2), 93-126.

Kirill Chuprinko (HSE University)

Reading Group. Predicting grammatical gender in Nakh languages: Three methods compared by Wichers Schreur, Allassonnière-Tang, Bellamy, Rochant

Abstract

The Nakh languages Chechen and Tsova-Tush each have a five-valued gender system: masculine, feminine, and three “neuter” genders named for their singular agreement forms: B, D and J. Gender assignment in languages is generally analysed as being dependent on both form and semantics (e.g. Corbett 1991), with semantics typically prevailing over form (e.g. Bellamy & Wichers Schreur 2022, Allassonnière-Tang et al. 2021). Most previous studies have considered only binary or tripartite gender systems possessing one masculine, one feminine, and one neuter value. The five-valued system of Nakh thus represents a more complex and insightful case study for analysing gender assignment. In this paper we build on the existing qualitative linguistic analyses of gender assignment in Tsova-Tush (Wichers Schreur 2021) and apply three machine-learning methods to investigate the weight of form and semantics in predicting grammatical gender in Chechen and Tsova-Tush. Our main aim is thus to show how three different computational classifier methods perform on a novel set of non-Indo-European data. The results show that while both form and semantics are helpful for predicting grammatical gender in Nakh, semantics is dominant, which supports findings from existing literature (Allassonnière-Tang et al. 2021), as well as confirming the utility of these computational methods. However, the results also show that the coded semantic information could be further fine-grained to improve the accuracy of the predictions (see also Plaster et al. 2013). In addition, we discuss the implications of the output for our understanding of language-internal and family-internal processes of language change, including how loanwords are integrated from Russian, a three-gender language.

Seminar schedule 2022

20 December

Svetlana Zemicheva (HSE University)

Tomsk dialect corpus as a comprehensively annotated resource

Abstract

Tomsk dialect corpus (https://losl.tsu.ru/losl_search) is the biggest Russian dialect corpus with different types of annotation. It is based on the recordings of Russian dialect speech which were made in dialectological expeditions along the Middle Ob River (Tomsk and Kemerovo regions, West Siberia) from 1946 to the present (more than 400 settlements of the region were surveyed). In this talk I will present the results of three-years project devoted to creating the corpus. I will characterize its materials by touching on the issues of balance and representativeness and describe corpus design in details. The corpus as a comprehensively annotated resource includes 3 modules: 1) textual – annotation and search by a) extralinguistic parameters (year & place of the recording; informant’s sex, age, educational level) b) texts parameters (topic and genre); 2) grammatical – annotation and search by morphological parameters; 3) lexicographical – definitions of dialect lexemes. Also I will present several case studies to demonstrate how this new electronic resource can be used in research practice.

13 December

Maria Brykina (The University of Hamburg), Josefina Budzisch (The University of Hamburg), Sergei V. Kovylin (Laboratory «Linguistic Platforms» Ivannikov Institute for System Programming of the RAS, Moscow; Tomsk State Pedagogical University)

A speaker-oriented study of dialectal features in Selkup

Abstract

Selkup is a Samoyedic (< Uralic) language known for numerous dialects and subdialects, and a lack of sharp boundaries between them (cf. Kazakevich 2022, Klump & Budzisch, forthc.). Our study aims at giving a new perspective on Selkup dialectal distribution.We use phonetic, morphological and lexical features to compare the speech of several dozens of speakers on the basis of quantitative data acquired from several corpora. In our talk we show how such data helps to improve our knowledge about individual features and present some preliminary results of speaker clusterization, comparing them to existing dialectal classifications of Selkup. We will also discuss methodological problems that we had to face when dealing with missing values, heterogeneous features and a small amount of data for some speakers.

Konstantin Filatov (HSE University)

Towards a working definition of “verbal grammatical system”

Abstract

This talk is part of my project on the diachrony of verbal grammatical systems in Andic languages (< Avaro-Ando-Tsezic < Nakh-Daghestanian). Before analyzing diachronic scenarions in the domain of verbal grammar, it would be wise to delimit the scope of relevant phenomena, i. e. to define what the verbal systems are. However, in existing literature on verbal categories (including a well-known Dahl 1985’s monograph with the word “system” in the title), a little attention is paid to the core concept, and the term is most often just taken for granted. Combining the approaches that have been advanced by V. A. Plungian (1998, 2011) and V. S. Khrakovsky (1996), I believe that a verbal grammatical system can be profitably described considering at least 5 types of relations between meanings and forms: (i) clustering of meanings — focusing on the internal structure of a polysemous marker (ii) cumulation of meanings — focusing on cumulative / separated expression within grammatical markers (iii) categorization patterns — focusing on how grammatical meanings are allocated to (same or different) categories (iv) degree of morphologization — focusing on morphological / periphrastic continuum (v) morphotactic interactions — focusing on how a meaning can affect the expression of other grammatical meanings within the wordform I will discuss each of these features in detail and exemplify them with facts from Andic languages.

6 December

Masha Kyuseva (Surrey Morphology Group)

Semantic factors in case loss: the Serbian-Bulgarian dialectal continuum

Abstract

Over time there has been a dramatic loss of rich case systems across languages of Europe. The analysis of historical texts has revealed the general picture about how this process occurred, yet the details of how it was implemented largely elude us. In particular, what happens to the case meanings when the morphological form falls out of use? Are they all expressed by an alternative form? Do they merge with the meanings of another case? Or is the unity of meanings that was supported by a common inflectional form completely dismantled? To answer these questions, we propose to look at data where this change is still taking place, namely within the South Slavonic dialect continuum formed by Serbian and Bulgarian. We focus on the decline of one case, the instrumental, in its non-prepositionally governed uses. Our analysis shows that the meanings of the instrumental are not covered by one alternative means of expression, but are split over a number of different prepositional constructions. The choice of prepositions is not random and is largely determined by functions of the original case form. This suggests that this case has no unified meaning (contra Jakobson 1936), and behaves more like a contingent cluster of functions. Reading Group: Poplack, S., & Levey, S. (2010). Contact-induced grammatical change: A cautionary tale. Language and space: An international handbook of linguistic variation, 1, 391-419.

29 November

Alina Russkikh (HSE University)

Functions of additive particle =lo in Zilo Andi (in the wake of the fieldtrip in September 2022)

Abstract

In this talk, I am going to discuss the results of my recent fieldtrip to the village of Zilo (Daghestan) aimed at researching the functions of the additive particle =lo in Zilo Andi (< Andic < Nakh-Daghestanian). In existing descriptions (Verhees 2019, Maisak 2021) for other Upper Andi varieties (predominantly of the villages of Andi, Gagatli and Rikvani), it is shown that besides typologically common functions of additive particles (such as additive, scalar additive, concessive, coordination, topic marking, part of indefinite pronouns), the particle =lo can be used as well in typologically less described contexts, such as constructions for collective numerals, converbal clauses, and as a part of the subordinating marker =lodːu and comitative marker -loj. The goal of this talk is to make a description of those contexts in which =lo is attested in Zilo Andi and to understand the semantic contribution of =lo in different functions. Special attention will be paid to the uses of =lo with universal quantifiers and different series of indefinite pronouns. I will also consider the semantics of =lo in combination with other components, especially with the emphatic (or also called antiadditive) particle =gu. Maisak, T. “Endoclitics in Andi” Folia Linguistica, vol. 55, no. 1, 2021, pp. 1-34. Verhees S. General converbs in Andi //Studies in Language. International Journal sponsored by the Foundation “Foundations of Language”, 2019. 43 (1). p. 195

Svetlana Zemicheva (HSE University)

Reading Group: Larsson, Egbert, Biber (2022) On the status of statistical reporting versus linguistic description in corpus linguistics: a ten-year perspective

Abstract

This study investigates (i) whether there has been a shift towards increased statistical focus in corpus linguistic research articles, and, if so, (ii) whether this has had any repercussions for the attention paid to linguistic description. We investigate this through an analysis of the relative focus on statistical reporting versus linguistic description in the way the results are reported and discussed in research articles published in four major corpus linguistics journals in 2009 and 2019. The results display a marked change: in 2009, a clear majority of the articles exhibit a preference for linguistic description over statistical reporting; in 2019, the exact opposite is true. The number of different statistical techniques employed has also gone up. Whilst the increased statistical focus may reflect increased methodological sophistication, our results show that it has come at a cost: a diminished focus on linguistic description, evident, for example, through fewer text excerpts and linguistic examples, which appears to be symptomatic of increasing distance from the language that is the object of study. We discuss these shifts and suggest some ways of employing sophisticated statistical techniques without sacrificing a focus on language.

22 November

Chiara Naccarato, George Moroz, Konstantin Filatov, Asya Alekseeva, Anastasiya Ivanova, Maria Godunova, Maksim Melenchenko, Timofey Mukhin, Ilya Sadakov, Elena Shvedova (HSE University)

TALD (Typological Atlas of the Languages of Daghestan): Update

Abstract

In this talk we will report on some of the recent updates made to the Typological Atlas of the Languages of Daghestan. We will discuss some of the chapters that were recently uploaded to the website, as well as new topics that are currently being developed. In the final part of the talk we will address some practical questions that are still being solved, and the future steps we intend to take.

15 November

Jesse Wichers Schreur (University of Groningen)

Contact-induced change in Tsova-Tush: a small typology of clause combining

Abstract

Many ‘small’ languages of the Caucasus, especially those in Daghestan, have been known to be relatively stable in the last centuries in terms of their numbers of speakers (Daniel, Chechuro, et al. 2021, p. 523). Other ‘small’ languages, especially those spoken in Georgia and Azerbaijan, have been characterised by heavy language contact or language shift (or sometimes both), such as Khinalug (Rind-Pawlowski, p.c.), Kryz (Authier 2010) and Udi (Gippert 2008). The Nakh language Tsova-Tush has been spoken in Georgian-dominated territory since time immemorial, and shows heavy lexical borrowing (Desheriev 1953; Wichers Schreur 2021). An often noted, but not thoroughly investigated aspect of Georgian linguistic influence is the restructuring of the Tsova-Tush system of clause combining, especially subordination. In this talk, the most common types of Tsova-Tush subordinate clauses will be presented, along with their comparison to Georgian on the one hand, and to other Nakh languages on the other. This will give us the opportunity to hypothesise about the origin of the various subordination strategies, especially taking into account historical Tsova-Tush data. Furthermore, work in progress will be presented on Tsova-Tush ‘cosubordination’ and a possible system of switch-reference marking. Authier, G. (2010). “Azeri morphology in Kryz (East Caucasian)”. In:Turkic languages14, pp. 14–42 Daniel, M., I. Chechuro, et al.(2021). “Lingua francas as lexical donors: evidence from Daghestan”. In: Language97 (3), pp. 520–560 Desheriev, Y. D., [Дешериев] (1953). Bacbijskij jazyk.Fonetika, morfologija, sintaksis, leksika [The Batsbi language. Phonetics, morphology, syntax, lexicon]. Moscow, Leningrad: Akademija Nauk SSSR, Institut Jazykoznanija Gippert, J. (2008). “Endangered Caucasian languages in Georgia. Linguistic parameters of language endangerment”. In:Lessons from documented endangered languages. Ed. by K. D. Harrison, D. S. Rood, and A. Dwyer. Amsterdam: John Benjamins, pp. 159– 194. Wichers Schreur, J. (2021). “Nominal borrowings in Tsova-Tush (Nakh-Daghestanian, Georgia) and their gender assignment”. In:Language contact in the territory of the former Soviet Union. Ed. by D. Forker and L. A. Grenoble. Amsterdam: John Benjamins, pp. 15–33.

Konstantin Filatov (HSE University)

Reading group: Kalinina, E., & Sumbatova, N. (2007). Clause structure and verbal forms in Nakh-Daghestanian languages. Finiteness: Theoretical and empirical foundations, 183-249.

Abstract

This chapter addresses finiteness in the languages of the Nakh-Daghestanian (East Caucasian) group. We argue that none of the approaches mentioned above yields satisfactory results when applied to the Daghestanian data. We claim that the important oppositions in the verbal system of the Nakh-Daghestanian languages are based on the illocutionary force and information structure of the sentences where the verbal forms occur, rather than on the dependent/independent distinction or the presence/absence of inflectional categories. Hence, the data of the Nakh-Daghestanian languages shed a new light on the definition of finiteness in terms of verb properties.

8 November

Yury Lander (HSE University)

Describing narrow focus marking: a typological framework

Abstract

In this talk I discuss the diversity of constructions expressing narrow focus (also called “argument focus” by Lambrecht (1994) but including focalization of some adjuncts) and suggest a framework for describing the typology of (primarily monoclausal) narrow focus constructions. This framework, which was originally based on a scheme similar to Nichols’s (1986; 1992) “locus of marking” typology as modified in Lander & Nichols (2020), can be also considered a development of typological schemes proposed by Creissels (1978), Aannerstaad (2021) and possibly some others. At the time of writing this abstract, however, I also hope to spend some parts of the talk touching upon the problems with this approach and discussing if it may become something more than a tool for description. References Aannestad, A. A. (2021). A Typology of Morphological Argument Focus Marking. MA thesis. The University of North Dakota. Creissels, D. (1978). Réflexions au sujet de l’article de Maurice Coyaud: “Emphase, nominalisations relatives”. La linguistique, 14(Fasc. 2), 117-141. Lambrecht, K. (1996). Information structure and sentence form: Topic, focus, and the mental representations of discourse referents. Cambridge University Press. Lander, Yu., & Nichols, J. (2020). Head/dependent marking. In M. Aronoff (ed.), Oxford Research Encyclopedia of Linguistics. Nichols, J. (1986). Head-marking and dependent-marking grammar. Language, 56-119. Nichols, J. (1992). Linguistic diversity in space and time. University of Chicago Press.

Anastasia Yakovleva (HSE University)

Reading Group: Martti Leiwo (2020) L2 Greek in Roman Egypt: Intense language contact in Roman military forts

Abstract

This paper will focus on analysing user-related variation in Greek inEgypt as seen through potsherd letters (ostraka) of the residents of Roman forts,praesidia, in the Eastern Desert of Egypt. The letters can be dated to the ﬁrst andsecond centuries CE. I suggest that the linguistic situation in the forts can beseen as evidence of extensive language contact that was connected with theconsiderable economic activity of the Roman Empire. All military forts hadseveral L2 Greek speakers of various ethnicity. In what follows I will suggestthat Roman soldiers and their civil partners had created a system that can bedescribed as a feature pool of Greek variables. I suggest that the data from Egyptshow that L2 speakers of Greek had an effect on Greek at all grammatical levels,strengthening existing and ongoing endogenous changes by creating sub-stantial contact-induced variation in phonology as well as in morphosy-ntax and even phraseology. The intense language contact suggests, in myopinion, that language dynamics of this period follow the resilience theory,where various different phases of the adaptive cycle can be simultaneous, asalmost all possible varieties of Greek, from historical High Attic to MultiethnicGreek are in use.

25 October

Sara Zadykian (independent researcher), Polina Artemeva (independent researcher)

The Botlikh fieldtrip of August 2022: the materials collected and the conclusions made about the semantics of -ɬːu and -ɬi spatial markers

Abstract

In this talk we plan to give a brief overview of the collected materials which include recordings of texts, an experiment and questionnaires; we also plan to present the conclusions made about the semantics of -ɬːu and -ɬi spatial markers based of the collected data.

Daria Ryzhova (HSE University)

Approaching (verbal) colexifications in Andic dictionaries

Abstract

Theoretical linguistics, especially cognitive semantics, has developed many theories about types of semantic relations between different meanings of one and the same word. Until recently, these approaches were based on rather limited data, mostly coming from the so-called SAE languages. With the emergence of a large number of digitalized dictionaries and wordlists (and of the CLICS database that aggregates them), huge amounts of data on colexifications in various languages became available. One of the topical tasks for lexical semantics now is to check whether all these data fit into the existing classifications of semantic shifts. In this talk, I will discuss my first attempt to classify colexifications found in Andic dictionaries according to semantic relationship within a pair of colexified meanings.

18 October

Konstantin Zaitsev (HSE University), Anzhelika Minchenko (HSE University)

Automatic Detection of Borrowings in Low-Resource Languages of the Caucasus: Andic Branch

Abstract

We would like to present to you a logistic regression model that automatically detects borrowings in Andic languages. We will describe how we improved our model’s quality using feature analysis and a language model approach. Finally, we would like to discuss our study results and future research.

George Moroz

Morphological transducers: from 0 to 40 000 forms

Abstract

In this talk I will cover some aspects of morphological transducers (created with lexd and twol tools) and their application to East Caucasian languages. Nick Howell (one of the developers of lexd) and I have collaborated on this project since 2020 and several transducers were created under our supervision. In this talk I will try to cover the pipeline and possible results of this work using examples from Rutul, Agul, Chamalal, Botlikh, and several dialects of Andi.

Tatiana Kazakova (HSE University)

Morphological transducer for Even

Abstract

In this talk I will discuss the creation of a morphological transducer for Bystraja Even language and some problems that emerged during this process.

Daniil Ignatiev (HSE University)

Exploring the limits of HFST tools

Abstract

HFST is a mature framework for natural language processing that has been around for many years. In this report, we describe how various HFST tools can be applied to Nakh-Dagestanian languages, and discuss the benefits and limitations imposed by this approach. The claims are illustrated by examples from Bagvalal.

11 October

Ilya Makarchuk

Towards a typology of “small” eventualities: on discontinuatives and verbal diminutives

Abstract

Languages of the world sometimes have verbal derivations for eventualities that are incomplete, deficient or in other way lesser than the norm. Such derivations come in very different forms. This talk is about a subset of such derivations: on what I called discontinuatives and verbal derivations exemplified by (1) and (2) respectively. On the first glance they seem very similar, but, as I show in the talk, they behave differently. I will look at behaviour of the derivations with different aspectual classes, show where these behaviours differ and propose an analysis of their semantics. (1) vašʲa uj-a suxala-kala-r-ě Vasya field-DA plow-KALA-PFV-3SG ‘Vasya plowed (the same) field with interruptions.’ (Chuvash; Tatevosov 2006) (2) Zan kontan sant-sante. Jean like.SF sing-sing.LF ‘Jean likes humming (different melodies).’ (Mauritian Creole; Henri, Winterstein 2014)

Polina Nasledskova

Reading group: Rice (2006) Ethical issues in linguistic fieldwork

Abstract

Ethical issues in linguistic fieldwork have received surprisingly little direct attention in recent years. This article reviews ethical models for fieldwork and outlines the responsibilities of linguists involved in fieldwork on endangered languages to individuals, communities, and knowledge systems, focusing on fieldwork in a North American context.

4 October

Yuri Koryakov

Jalgan-Mitagi Tat: sociolinguistics and affiliation

Abstract

In my talk, I would like to tell about the sociolinguistic situation in a small Tat community in the south of Dagestan which we visited this summer. Jalgan-Mitagi Tat appeared to be quite a living language, which is still spoken by some children. Nevertheless, language shift is in progress and now efforts can be made to stabilize the situation. I will also touch upon the place of JM Tat within the Tatic group, which includes closely-related varieties originally spoken in North-Eastern Azerbaijan and Southern Dagestan. Finally, I will discuss an attempt to use a writing system for JM Tat, against the background of other projects for the alphabetization of Tat

Timur Maisak

Reading group: Cysouw and Forker (2009)

Abstract

I suggest to read a paper by Cysouw and Forker (2009) on Tsezic languages: the authors look at the encoding of certain (nonspatial) functions by spatial cases in the modern languages and a) try to reconstruct the Proto-Tsezic encodings of these functions, b) look whether it is possible to draw a genealogical tree based on the distribution of nonspatial uses of spatial cases. Additionally, I recommend another paper by Forker (2010) which you can look into if you find the data in the 2009 paper insufficient; the 2010 paper has more examples of the usage of nonspatial cases. Our main paper for discussion will be Cysouw and Forker (2009). Both papers are attached. Cysouw, Michael and Diana Forker. “Reconstruction of morphosyntactic function: Nonspatial usage of spatial case marking in Tsezic.” Language, vol. 85 no. 3, 2009, p. 588-617. https://doi.org/10.1353/lan.0.0147 Forker, Diana. “Nonlocal uses of local cases in the Tsezic languages.” Linguistics, vol. 48, no. 5, 2010, p. 1083-1109. https://doi.org/10.1515/ling.2010.035

20 September

Alina Russkikh

Typologically oriented questionnaire for describing additive functions

Abstract

This talk presents a typologically oriented questionnaire for describing additive functions. The typological study of additives (Forker 2016) shows that there is a list of common additive functions which are connected to each other semantically and сan be represented on a semantic map. This questionnaire takes into account existing studies on additive particles, my own field experience of researching functions of additives in Turkic languages, and methodological aspects of elicitation. This particular questionnaire tests 14 additive functions: additive, scalar additive, standard concessive constructions, concessive conditionals, coordination, converbial clause, collective numerals, universal quantifiers, indefinite pronouns, mirative, contrastive topic, conjunctional adverb. In addition to a detailed discussion on additive functions and the reasons for choosing them to test in the questionnaire, I will consider ways to distinguish between functions semantically close to each other and possible problems that may be raised during elicitating stimulus with additive particles.

Ivan Netkachev (HSE University)

Multifunctional additive particles in Rutul dialects: a microtypology

Abstract

In this talk, I discuss the functions of additive particles in 12 dialects of Rutul language (< Lezgic < East Caucasian). I show that, although those dialects are generally mutually intelligible, there is a significant variation with respect to the functions that the additive particles may perform. I discuss (i) their ability to conjoin NPs («A and B»), (ii) their semantics (whether they can have scalar additive semantics or not), (iii) their ability to cooccur with other coordinating particles, (iv) their occurrence in various series of indefinite pronouns (specific vs. non-specific, free-choice) and (v) their occurrence in concessive and concessive-conditional clauses. Then I build up a microtypology based on those parameters, and sketch out the emerging theoretical generalisations.

13 September

Timofei Dedov (HSE University), Samira Verhees (independent researcher)

A database for Arabic, Persian, and Turkic loanwords in Dagestanian languages

Abstract

In this talk we will present the DAG<APT database. The database currently contains lexemes from Dagestanian languages that have been established as borrowings from Arabic by Zabitov (2001). We intend to digitize more etymological sources in a similar manner. The goal is to create a comprehensive database of borrowings from major contact influences like Arabic, Persian, and Turkic languages into Dagestanian languages. In our talk we explain the design of the database and we discuss our plans and ideas for the future.

Ilya Sadakov

Update: Tsnal Lezgian Spoken Corpus

Abstract

This short talk will be an update on my work with the Tsnal Lezgian Spoken Corpus. After a brief introduction, I would like to discuss any noticed features of the variety. Some of them might be distinctive within the Jark’i dialect of Lezgian, to which the Tsnal variety presumably belongs (Mejlanova 1964). Reading Group G. Moroz Barth , D , et al. (2021). Language vs individuals in cross-linguistic corpus typology. in G. Haig, S. Schnell & F. Seifart (eds) Doing corpus-based typology with spoken language data: State of the art. University of Hawai’i Press, Honolulu, pp. 179–232.

28 June

Polina Nasledskova, Tatiana Philippova

Postpositions in East Caucasian? An areal-typological study of a category development

Abstract

In our brief talk we will report on our progress in the areal-typological study of postpositions in East Caucasian languages. We will show you the three chapters and several maps that we have contributed to the Typological Atlas of the languages of Daghestan, highlighting the key results obtained. After that, we will present our theoretical ideas on how to analyze the emerging category of postpositions in East Caucasian.

Samira Verhees

Language vitality and attitudes in Botlikh (Dagestan)

Abstract

Botlikh - a minor unwritten language of Dagestan - is evaluated by UNESCO as “definitely endangered”, which means the language is no longer passed on to children. Like many Dagestanian languages, Botlikh is under pressure from Russian as the language of socio-economic mobility. Additionally, Botlikhs have been subsumed under Avars since the 1930s as part of linguistic and ethnic planning policies of the Soviet Union, and their language is still not officially recognized. As a result, there are no resources for the language besides two academic dictionaries that are not for sale to the public. Despite all these factors that we might expect to have a negative impact on language vitality, the language seemed rather alive to me during my trips to Botlikh. I observed children of different ages speaking Botlikh at home as well as to their peers. In the village Miarso I even collected language data at the local school, where all of the children were proficient in the language. So I decided to conduct a survey among speakers of Botlikh to learn more about their language habits and how they view their own language: with whom do they speak it, do they find it important to pass it on to the next generation, and how do they see the future of the language. In the talk I will discuss the results of my survey and the method I used to collect data from Dagestan remotely.

14 June

Nikita Beklemishev (HSE University)

History of the spread of /f/ in southern Daghestan

Abstract

Among Nakh-Daghestanian languages, /f/ is found in the inventories of most Lezgic languages and Khinalugh. In early comparative works (e.g. Gigineyshvili 1977) the presence of /f/ was considered a Lezgic innovation, but, as it turns out, there might be an areal trace. I suggest that the sound was introduced to some languages through lexical borrowing, and to some others through inheritance. As a consequence, /f/ has different degrees of phonological “entrenchment”, and different patterns of distribution across lexicon. Notably, all f-languages are located in southern Daghestan and have been under strong influence from Azerbaijan. My goal is to examine the phonemic status of /f/ in the languages of southern Daghestan, to survey the ways how, and time intervals when, it might have appeared or have been introduced, and to discuss how one single process could have been applied to different languages in different ways. I will discuss several methods to evaluate the phonological entrenchment of /f/ and ways to determine the probable donor of /f/ as a borrowed phoneme, as well as complex areally-genetic generalizations that the study delivers.

7 June

Maksim Melenchenko

Dialectal variability and diachrony of numeral systems in East Caucasian languages

Abstract

According to the traditional overviews, languages of the East Caucasian family have decimal, vigesimal, or mixed numeral systems. Using data from grammars and dictionaries, I have tried to explore the variability of these numeral systems and their morphological features in detail, focusing on isoglosses between languages and their dialects. In the talk, I will present the results of this research and discuss several cases of dialectal variability of numeral systems and their possible implications for diachrony of numeral systems in the family. The found results call into question the consensual opinion that vigesimality is „native“ for East Caucasian languages and that it existed in proto-East Caucasian.

31 May

Rita Popova, Michael Daniel (HSE University)

Size matters? Testing size effects in gender assignment in four East Caucasian languages

Abstract

In this study we test for referent size effects in nominal classifications of four East Caucasian languages. It was suggested by Kibrik (1977) that in Archi, Lezgic, assignment of nouns to Gender 3 and Gender 4 shows tendencies based on referent size. The idea has been echoed in Corbett (1991), Corbett & Fedden (2018) who suggest, more specifically, that, in Archi, big entities are assigned to Gender 3. For Lak, in his discussion of the reconstruction of the «original» system of class assignment, Zhirkov (1955) proposes that historically Gender 3 was assigned to all animals, natural phenomena, round-shaped and large objects. To our knowledge, for other Lezgic languages that are genealogically related to Archi and show a similar four-way classification (i.e. have two inanimate genders in addition to feminine and masculine), no such effects have been reported, whether because they are weaker or absent altogether. The aim of this talk is to statistically test the hypothesis of referent size effects for Lak, Archi, Rutul and Tsakhur. What we want to see is whether Archi is indeed so different in this respect from Rutul and Tsakhur, its sister languages, and whether it is similar to Lak, its neighbour. We will first classify and refine the original hypotheses. We suggest that three different types of effects can in principle be expected, including absolute size effects observed in the lexicon at large (pace Corbett and Zhirkov), categorial size effects observed within specific conceptual categories (pace Kibrik) and, finally, referent size effects leading to flexible gender assignment (again, pace Kibrik; see also Di Garbo 2013, Di Garbo 2014, Di Garbo & Agbetsoamedo 2018), the latter functionally akin to diminutives and augmentatives (Grandi 2015). We then review cross-linguistic evidence of any of the three types. Next, we will discuss methods to detect such effects in a statistically meaningful way. Unlike shape or conceptual categories, size is not based on a (nearly) categorical judgment, such as ‘is.human’ or ‘is.round’, but is a relative and scalar category based on judgments like ‘is.bigger’ or ‘is.smaller’. It is not immediately clear how to manually annotate referent size or establish thresholds for entities to be judged absolutely big or small. This may be the reason why they are rarely mentioned in overviews of East Caucasian nominal classifications (Xajdakov 1980, Ivanova 2019). We decided to run several experiments about size judgments, for which we used Russian speakers in the hope that size judgments will have at least some cross-linguistic validity. After running two different experiments (and also using data from McRae et al. (2005) and Binder et al. (2016)), we collected a small database of concepts that made it possible to check for different types of size effects, not only in the four languages in the analysis, but also, in principle, for any language. To test for correlations between being small and Gender 4 and being large and Gender 3, we mapped the concepts of the database onto nominal vocabularies of the four languages. We do not observe absolute size effects in gender assignment. We do observe categorial size effects in some but not other tested conceptual categories. Referent size as reflected in flexible gender assignment has not been tested experimentally; in Archi, it seems to be more lexically limited than in the systems discussed by Di Garbo in African languages, and requires further investigation. References: Binder, J. R., Conant, L. L., Humphries, C. J., Fernandino, L., Simons, S. B., Aguilar, M., & Desai, R. H. (2016). Toward a brain-based componential semantic representation. Cognitive neuropsychology, 33(3-4), 130–174. Corbett G. (1991). Gender. Cambridge: Cambridge University Press, 1991. Di Garbo, F. (2013). Evaluative morphology and noun classification: A cross-linguistic study of Africa. Skase Journal of Theoretical Linguistics, 10(1), 114–136. Di Garbo, F. (2014). Gender and its interaction with number and evaluative morphology: An intra-and intergenealogical typological survey of Africa (Doctoral dissertation).Department of Linguistics, Stockholm University. Di Garbo, F., & Agbetsoamedo, Y. (2018). Non-canonical gender in African languages: A typological survey of interactions between gender and number, and between gender and evaluative morphology. In S. Fedden, J. Audring, & G. G. Corbett (Eds.), Non-canonical gender systems Oxford University Press. Fedden, S., & Corbett, G. G. (2018). Extreme classification. Cognitive Linguistics, 29(4). Grandi, N. (2015). Edinburgh Handbook of Evaluative Morphology. Edinburgh University Press. Ivanova, V. (2019). Korreljacija mezhdu imennym klassom i semantikoj i fonetikoj suschestvitel’nogo v nakhsko-dagestanskikh jazykakh [Correlation between the noun class and semantics and phonetics of the noun in the Nakh languages]. Kibrik, A., Olovjannikova, I., & Samedov, D. (1977). Opyt strukturnogo opisanija archinskogo jazyka [Structural description of Archi] (Vol. 1). Izd-vo Moskovskogo universiteta. McRae, K., Cree, G. S., Seidenberg, M. S., & McNorgan, C. (2005). Semantic feature production norms for a large set of living and nonliving things. Behavior research methods, 37(4), 547–559. Xajdakov, S.M. Principy imennoj klassifikacii v dagestanskih yazykah. M.: Izdatel’stvo Nauka, 1980. Zhirkov, L. (1955). Lakskij jazyk [Lak Language]. Fonetika i morfologija. M.: Izd-vo AN SSSR

24 May

Gasangusen Sulaibanov (École Pratique des Hautes Études - PSL, Paris, France)

Сложные глаголы с идеофонами в диалекте даргинского языка с. Цугни

Abstract

В цугнинском диалекте даргинского языка в ходе анализа было выявлено более ста лексических единиц, которые можно было рассматривать как идеофоны. Эти лексические единицы используются с вспомогательными глаголами и образуют подкласс ковербов. Количество глаголов, используемых в ковербиальных конструкциях с идеофонами, ограничено примерно десятью глаголами. Но только некоторые из этих глаголов имеют чрезвычайно продуктивное использование. В докладе будут представлены классификация идиофонов а также особенности взаимодействий идиофонов с различными вспомогательными глаголами.

17 May

Timur Maisak

Morphological marking of “meditative” questions in Nakh-Daghestanian languages

Abstract

In the talk I will present the results of a pilot study of “meditative” questions, a special semantic type of non-canonical questions, which normally do not require an answer and can even be asked in the absence of an addressee (cf. ‘I wonder’-questions in English or ‘интересно’-questions in Russian). In a number of Nakh-Daghestanian languages, questions of this type have dedicated morphological marking (suffixes or enclitics), although there seems to be no systematic study of their marking types. I will look at the marking of meditative questions in comparison with the marking of ordinary (polar and content) and indirect questions in several languages of the family. I will also briefly discuss the typical contexts where meditative questions are found in texts.

26 April

Светлана Амосова (Еврейский музей и центр толерантности, ИСл РАН), Михаил Васильев (Центра «Сэфер», ИЯ РАН)

Евреи Дагестана: история, современное состояние этнической группы и памятники материального наследия

Abstract

В первой части доклада речь пойдет об этнической группе Дагестана, которую называли и называют по-разному: горские евреи, таты, евреи Дагестана, джуури и Кавкази. Мы расскажем, что означают все эти термины, откуда они появились, рассмотрим разные точки зрения на этногенез этой группы, поговорим о территории проживания и диалектах языка, как складывались в разное время отношения с другими этническими группами Северного Кавказа. Кроме того, на материалах экспедиций последних нескольких лет мы покажем особенности современной идентичности этой группы, как она менялась на протяжении XX в. Во второй части мы познакомимся с памятниками материального наследия горских евреев в Южном Дагестане, которые представлены главным образом сохранившимися зданиями синагог, а также горско-еврейскими кладбищами XVII – XX вв. При этом мы покажем, как при недостатке других письменных свидетельств надгробная эпиграфика становится одним из важнейших источников сведений о географии расселения и локальной истории небольших еврейских общин, проживавших в удалённых районах Южного Дагестана и прекративших существование в начале XX в. В заключении мы на примере экспедиций Центра «Сэфер», проводившихся в 2018 – 2020 гг., кратко расскажем об особенностях и сложившейся практике исследований по изучению традиционной и современной культуры, а также наследия горских евреев как в регионах традиционного проживания, так и в диаспоре.

19 April

Matthew Carter (University of California, San Diego)

Polyfunctional Argument Markers in Ket: Implicative Structure within the Word

Abstract

Ket is the last of the indigenous Yeniseian languages of central Siberia. Ket indexes both subjects and direct objects on the verb, but the way in which this is done varies significantly from one lexeme to another, forming a fairly complex system of inflectional classes (Nefedov & Vajda 2015). There is substantial reuse of material across different classes, such that the same marker may be the sole marker of the subject (1), or a co-exponent of the subject with another marker (2), or an object marker (3), depending on the verb. Furthermore, several argument markers represent a fusion or reanalysis of historically distinct markers, and alternatively or simultaneously encode completely orthogonal functions, (marking tense, cf. 4 and 5), or serve no obvious function. This situation, wherein same marker systematically encodes different functions across different lexemes, is known as Polyfunctionality (Stump 2015). It represents a type of complexity of exponence (Anderson, 2015), a phenomenon wherein there is a non-isomorphic or otherwise opaque relationship between units of meaning (e.g. tense, person) and the formal units which are used encode them (e.g. affixes, stem alternations). Polyfunctionality of the type seen in Ket would seem to present a communicative challenge in decoding; If the same marker can encode many different functions across different lexemes within the same subsystem of the morphology, on an arbitrary basis, how does a Ket listener understand which of the possible functions is intended in the given instance? This problem is made more acute by the fact that the language is pro-drop and exclusively head marking with regard to core syntactic arguments (Kotorova & Nefedov 2016). The potential communicative challenges presented by complex form~meaning mappings has been a major focus of much recent work in morphological complexity (Ackerman et al. 2009, Ackerman & Malouf 2015, Sims & Parker 2016). Such work has largely focused on the question of how speakers of morphologically complex languages predict forms which they have never directly encountered (the so-called Paradigm Cell Filling Problem). If a language can encode the same information in many different ways (via affix allomorphy, stem changes etc.), how does a speaker know how to encode the information in any given form, provided that they have never encountered that form before? As a solution, such work implicates the property of inflectional paradigms known as implicative structure (Wurzel 1984). The morphology of a language exhibits implicative structure if known forms of a lexeme provide clues to unknown forms, such that all cells in a paradigm can be predicted from some subset of these cells. By hypothesis, form~meaning mappings in a language may be complex, provided that the necessary form can be predicted in any given instance (the Low Conditional Entropy Conjecture). However, this work has largely not focused on the role that implicative structure may play in decoding, as opposed to encoding, complex form~function mappings, nor on the role of implicative structure in syntagmatic, as opposed to paradigmatic, structure. Using data drawn from both published sources and original fieldwork, this paper demonstrates that in Ket, although individual argument markers are often highly polyfunctional, they are organized into networks of implicative relations which greatly reduce uncertainty with regard to their function in any particular instance. In other words, the range of possible functions for a particular argument marker can be greatly reduced by observing which other argument markers are present or absent in the same wordform, and which features those encode (cf. 6 and the dependency graph in 7). In this way, uncertainty with regard to the function of the wordform can be kept low, even without reference to the syntactic context or knowledge of paradigmatically related forms. As a case study, Ket is suggestive of a sort of “Low Conditional Entropy Conjecture in decoding”, wherein individual markers may be highly polyfunctional, provided that their functions can be determined in any given instance. The role of syntagmatic implicative structure in achieving this in Ket underscores the point made by Sims and Parker (2016) that the amount of work done by implicative structure is a point of cross-linguistic variation. It also makes predictions for other head-marking languages with very high complexity of exponence.

12 April

Evgeniya Korovina (Institute of Linguistics, RAS)

Borrowings and contacts in basic vocabulary and classification

Abstract

Despite the fact that, by definition, the basic vocabulary consists of words that are borrowed least often, borrowings in this part of the lexicon happen regularly. This ranges from highly visible loanwords from languages of other families, such as Spanish borrowings in the languages of the indigenous population of Latin America, as well as hard-to-find loanwords and structural parallelism (homoplasy) between languages within the same subgroup. Cases of the second kind are especially typical for the so-called dialect chains, where it is sometimes difficult to draw a line between idioms, as well as in situations of significant phonetic conservatism of languages. Using examples from the history of, first of all, the languages of Central America and Polynesia, in my talk I’ll try to consider ways to formally mathematically identify such situations.

29 March

Рамазан Абдулмажидов (ИИАЭ ДНЦ РАН), Шахбан Хапизов (ИИАЭ ДНЦ РАН)

Тысячелетняя история письменности на языках народов Дагестана: взгляд сквозь призму веков

Abstract

Дагестан представляет собой регион с удивительным этническим и культурным многообразием, который единственный на Северном Кавказе имеет многовековую историю письменности. Еще в советский период был зафиксирован факт создания уникальной письменности Кавказской Албании, генетически связанной с армянским и грузинским письмом. Это государство, как известно, простирало свои границы на большей части современного Дагестана. Подлинный прогресс в исследовании албанской письменности связан с выявлением в 1990-х гг. в монастыре на Синайском полуострове 2 палимпсестов предположительно VII в., написанных на «агванском» языке. Только после их исследования удалось окончательно установить его место среди восточнокавказских языков. Второй по времени опыт письменной фиксации речи на восточнокавказских языках (в данном случае на аварском языке) связан с распространением в Дагестане православия и грузинской письменности. Деятельность миссионеров здесь сопровождалась подготовкой и обучением служителей церкви из числа местного населения, составлением текстов на грузинском языке. Начало профессиональному исследованию грузинографической эпиграфики Дагестана было положено в первой половине ХХ в. Ну и следующий этап развития письменности в Дагестане был связан с его исламизацией, и последовавшей за ней экспансией мусульманской культуры. Письменность на арабской графике для записи текстов на восточнокавказских языках начала использоваться еще в средневековый период, хотя вряд ли этот опыт имел системный характер. Из числа зафиксированных и ныне сохранившихся памятников наиболее ранней является аварская надпись XIV в. на камне, вставленном в стену мечети сел. Корода Гунибского района Республики Дагестан. Все эти этапы и процессы развития письменности в Дагестане будут в развернутом виде освещены в настоящем докладе.

22 March

Timofei Dedov (HSE University)

Days after tomorrow in the languages of Daghestan

Abstract

In the languages of Daghestan, days after tomorrow can be encoded in different ways. Three different types of strategies can be distinguished: 1) using semi-compositional terms with similar suffixes; 2) using transparently compositional constructions (like in most languages); 3) and using non-derived terms, which seems to be rare cross-linguistically. Some East Caucasian languages also differ from most other languages in the number of unique terms that are used to refer to the days after tomorrow (for the third strategy, the amount of unique terms for consecutive days after tomorrow can be as high as six). In my talk I will discuss all three strategies in more detail, and introduce the geographical distribution of these strategies, which was investigated for the “Typological Atlas of the languages of Daghestan”.

Katerina Dagkou (University of Groningen)

Systems of grammatical cases in the languages of Daghestan

Abstract

The languages of Daghestan vary in terms of the system of core (grammatical) cases they feature. A first distinction concerns ergative vs. accusative languages. For ergative languages the typical system of core cases includes absolutive, ergative, genitive, and dative. Accusative languages include nominative, genitive, accusative, and dative. Some languages also include other grammatical cases besides the basic ones, e.g., affective, comitative, instrumental, comparative, and ablative. More exotic cases like contentive and benefactive are reported for a couple of languages. In this talk, I discuss the classification and distribution of systems of grammatical cases in the languages of Daghestan, which is the result of my research within the TALD (Typological Atlas of the Languages of Daghestan) project. Apart from classifying the languages according to the type and number of cases they include, I will also present the morphology of the grammatical cases per language group, their syntactic functions, and instances of case syncretism.

15 March

Katherine Hodgson (University of Cambridge)

Zok, the Armenian dialect of Agulis

Abstract

Zok (otherwise known as the Agulis dialect of Armenian) is a form of Armenian that is so divergent that it has been described by some as a separate language. It was spoken in and around the town of Agulis in the southern part of Nakhijevan. The first written record of Zok is from 1711, but there were Armenians living in this area from at least the 5th century AD. The Armenian presence in Agulis itself ended with the massacre of 1919, but dialect-speaking populations remained in some of the surrounding villages, notably Tsghna, Tanakert, Ramis, and Paraka, until the 1970s and 80s. Closely related dialects are still spoken today in a few villages just across the border in the area of Meghri in Armenia. However, with the exception of Karchevan (population 292), these villages are now virtually abandoned, and the language is not being passed on to the younger generation. It is the subject of a documentation project funded by the Endangered Languages Documentation Programme. Speakers from the villages Tsghna, Tanakert, Ramis, and Paraka in Nakhijevan, and Karchevan and Kuris in the area of Meghri have produced 27 hours of video, of which 4 hours have so far been annotated using ELAN and FLEx. Zok is not intelligible to speakers of other forms of Armenian, and various claims have been made about the origin of the speakers. However, a closer linguistic examination reveals that many of its distinctive features are shared wholly or partly with neighbouring Armenian dialects, especially those of Karabagh and northern Iran, implying that the Zoks are a local Armenian population with a long-term, stable presence in the area. The existence of geographically-correlated dialect variation between the villages where the language is spoken (the closer together they are, the more features they have in common) also suggests a stable pattern of settlement. Apart from phonological features (vowel shift, vowel harmony), the most striking distinctive characteristic is the development of a verb system that is unique within Armenian. This involves the loss of all monolectic verb forms in the indicative mood, and their replacement with participles + auxiliary, a tendency that exists in Eastern Armenian in general, but has nowhere else reached this extent. This is accompanied by the shift of tense marking from the auxiliary, which has become essentially a person marker, to a particle added to the ‘present’ (unmarked) form, something which is also found in Khoy/Urmia dialect. The past subjunctive is also formed in this way. Both these processes, as well as the mobility of the auxiliary/person marker, which attaches to the element with the main sentential stress, are characteristic of languages of the Iran-Araxes area.

1 March

George Starostin (Centre for Comparative Studies and Phylogenetics of the Institute for Oriental and Classical Studies, HSE / External Fellow, Santa Fe Institute)

The role of proto-wordlists in modern historical-comparative studies: from phonetic and semantic to “onomasiological” reconstruction

Abstract

Although lexicostatistical methods of estimating linguistic distance between related or potentially related language units have become an essential staple of modern day phylogenetic linguistics, their reliability often depends more on the accurate collection and curation of data than on the specific mathematical / computational methods applied to said data. In my talk, I shall try to delineate the theoretical and pragmatic importance of a relatively new methodology, dubbed “onomasiological reconstruction”, which purports to introduce a new level of accuracy to the projection of lexical items onto proto-levels of varying time depth. This methodology, which requires paying equal attention to phonological, semantic, and distributional features of compared items, can then be combined with lexicostatistical methods and applied with equal efficiency to varying datasets and linguistic taxa of widely varying time depth. In addition to having already yielded efficient results across genetic lineages ranging from Indo-European to North Caucasian to African language families, onomasiological reconstruction seems to hold plenty of potential for successfully differentiating between “patently false” and genuinely promising hypotheses of distant linguistic relationship.

22 February

Matthias Urban (University of Tübingen)

Typological patterns and the language dynamics of the ancient Central Andes and South America

Abstract

In this presentation, I will sketch different aspects of the language dynamics of the ancient Central Andes of Peru and Bolivia –one of the few “cradles of civilization” of humanity – and South America more generally. I will highlight in particular the role of linguistic interaction and contact and the resulting typological distributions in understanding this dynamics. I will start out from the present-day linguistic landscape of the Central Andes, which is strongly dominated by the Quechuan and Aymaran families whose common contact-induced typological profile has for a long time influenced ideas of what Andean languages are like. I will then broaden the scope and explore how new analyses of the available materials for the now extinct languages of the Central Andes bring to light a now submerged interaction sphere in Northern Peru, and how this north-south structure is congruent with archaeological and molecular anthropological evidence, allowing for new ways of interdisciplinary dialogue beyond language expansions. Finally, I will broaden the scope again, and show how recent work in the areal typology of broader parts of the Andes and South America articulates with these new findings. This work suggests a finely spatially structured gradient of typological variation in the Andes into which the new evidence from the Central Andes fits seamlessly. The proper interpretation of this gradient is presently not clear yet, though one possibility is that it is a reflection of an ancient layer of affinities between the languages of the region.

1 February

Pavel Astafiev, Nikita Beklemishev, Nina Dobrushina, Alina Russkikh (in alphabetical order)

Looking for areal patterns in the domain of discourse formulae: The case of blessings and curses in Daghestan

Abstract

It is a well-known fact that certain discourse markers, such as interjections, formulae of greetings or leave-taking, vocatives or politeness markers are often borrowed (see Andersen 2014 for some references). The claim is primarily derived from the data on material borrowings, such as English OK or Russian davaj, but there is also scarce evidence of pattern borrowing in this domain. Studies mention the similarity of greetings in some areas (Matisoff 2011 in South-East Asia, Lüpke & Watson 2020 in West Africa) or good-night expressions in contacting languages (May God wake us up in Ewe and Likpe - Ameka 2006). There are few systematic studies of the spread of discourse patterns across certain areas, such as word iteration in the Mediterranean (Stolz 2004) or morning greetings in Daghestan (Naccarato & Verhees 2021). Areal comparison of some types of formulae can also be investigated in anthropology; for example, some formulae are included in the world-wide database of folklore and mythological motifs (Berezkin & Duvakin, http://www.ruthenia.ru/folklore/berezkin/). Many questions remain unanswered: are discourse formulae more diffusible than grammar? Do the areal distributions of formulae correspond to the areal distributions of linguistic features of other levels? How strong is the genealogical signal in the distribution of discourse formulae? What is their role in the transfer of various grammatical phenomena? In this talk, we will approach this issue from the perspective of wish-expressions, or blessings and curses in Daghestan. We will present the database of wish-expressions in nine languages of Daghestan. One of the problems with cross-linguistic comparison of blessing and curses is that it is not fully clear what are the grounds for such comparison, i.e. which wishes of one language should be mapped on which wishes of other languages. We will discuss the problem of cross-linguistics comparison of wishes and our first attempts to process the sets of formulae in these nine languages in order to detect the areal signal. References Ameka, Felix K. 2006. Grammars in contact in the Volta Basin (West Africa): On contact induced grammatical change in Likpe. In Alexandra Y. Aikhenvald & R. M. W. Dixon (eds.), Grammars in contact: A crosslinguistic typology, 114–142. Oxford: Oxford University Press. Andersen, G. (2014). Pragmatic borrowing. Journal of Pragmatics, 67, 17-33. Lüpke, Friederike & Rachel Watson. 2020. Language contact in West Africa. In: Adamou, Evangelia & Yaron Matras (eds.): The Routledge handbook of language contact. Matisoff, James A. 2011. Areal semantics - Is there such a thing? In: Saxena, A. ed. Himalayan languages: past and present (Vol. 149). Walter de Gruyter. Naccarato, Chiara & Samira, Verhees. 2021. Dobroe utro, prosnulis’? Utrennie privetstvija v jazykah Dagestana. In Durh#asi hazna. Sbornik statej k 60-letiju r. O. Mutalova, edited by Nina R. & Testelec Majsak Timur A. & Sumbatova. Moskva: Buki Vedi.

25 January

Chiara Naccarato, Ezequiel Koile, Michael Daniel, Nina Dobrushina, Samira Verhees (Linguistic Convergence Laboratory), Aleksey Vinyar, Alexandra Nogina, Daria Ignatenko, Tatiana Kazakova, Alexey Baklanov, Ksenia Lapshina (Arctic Lab)

Detecting regional areal patterning across multiple linguistic features. A discussion

Abstract

Systematic study of areal convergence is fed by comparable data on typological profiles of languages belonging to the area under consideration. But when envisaging an analysis of areal patterning of languages within a certain area, an expert in this area runs a risk of eschewing the results of her analysis by (unconsciously) selecting linguistic features known or more readily available to her to consider, which may shape the area in a specific way; while choosing a different set of features for data collection may result in another partitioning of the same area. In this seminar, we are going to discuss this in connection with the ongoing projects of areal typological study of the languages of Russia. In the first part, we will provide a brief recap of the Typological Atlas of the Languages of Daghestan, a project in which data on typological diversity of Daghestanian languages and their relatives and neighbors is being systematized, surveying a general approach to data collection and examples of features. In the second part, our colleagues from the Arctic Lab will present similar data from their research on linguistic diversity of the languages of Northeast Siberia (from Samoyedic branch of Uralic in the west to Turkic, Mongolian and Tungusic to Yukaghir, Nivkh, Chukotko-Kamchatkan and Aleut-Yupik-Inuit in the east). Finally, in a third part, we will discuss methodological issues and current approaches aiming to define linguistic areas with a global perspective. The seminar will be in a slightly unusual format, probably more interactive than usual and primarily intended to share experience and ideas, and with contributions from different research groups. We also invite comments from other participants that embark on comparable enterprises and encounter similar challenges.

18 January

Maria Khachaturyan (University of Helsinki)

Language contact between Mano and Kpelle: a holistic research program

Abstract

This talk presents an ongoing project on multilingualism and language contact between Mano and Kpelle, two Mande languages spoken in the South-East of Guinea. In the first part of the talk, I provide an overview of the project and its different strands, including 1) an investigation of the sociolinguistic situation of the region with a particular focus on strategies of language choice studied with ethnographic observations and with a sociolinguistic questionnaire (Khachaturyan and Konoshenko 2021); 2) a comparative study of the grammars of these two languages and their close linguistic relatives and identification of convergent and divergent features, including those potentially related to pattern (Konoshenko 2015: 176‑177) and matter borrowing (Khachaturyan 2019); 3) a study of translation, and especially religious translation, translatory artefacts and practice of translation, as a locus of contact and source of convergence (Khachaturyan 2020, Khachaturyan and Konoshenko in prep.). In the second part of the talk, I present an experimental study focusing on the acquisition of a particular morphosyntactic parameter of Mano, namely, reflexive marking, and the impact of the speakers’ exposure to Kpelle on the acquisition process. I present the experimental design, preliminary results and theoretical questions which the study aims to address.

11 January

Aigul Zakirova

Adjectival agreement in the East Caucasian languages: an overview

Abstract

Few sources deal with the origin of number agreement in the languages of the world. Apart from the theoretical work (Lehmann 1982), only several case studies have been published, among them (Frajzyngier 1997, Di Garbo 2020, Cruz 2015). The EC languages are numerous, and adjectival number agreement in EC seems to be morphologically and diachronically heterogeneous, which leads one to believe that it is innovative. This makes the EC languages suitable for investigating the origin of number agreement. However, no study of this kind has been undertaken yet. The goal of this study is to make a survey of number agreement patterns in EC, to assess the weight of genealogical and areal factors in the distribution of patterns and then try to describe paths by which adjectival plural agreement may have originated in the EC languages. I use the methodology adopted in the Typological Atlas of the Languages of Dagestan: So far, I searched in grammars of the EC languages and the neighboring languages (overall 65 idioms) to find out how adjectival plural agreement is expressed in each of them. I divided the languages into three types: obligatory / optional / absent plural agreement. For the optional type I established the factors that influence the presence of number agreement and plotted these types on maps.

Seminar schedule 2021

21 December

Polina Nasledskova, Tatiana Philippova

Postpositions in East-Caucasian languages: a description and comparative perspective

Abstract

Our study summarizes and analyzes the information about postpositions provided in the various grammatical descriptions of East-Caucasian languages. In our talk, we are going to briefly report on our findings and discuss the prospects. First, we are going to propose an overview chapter on postpositions and several features for maps in the Typological atlas of the languages of Daghestan (TALD) that we are currently working on. Second, we are going to present our conception of a paper about East-Caucasian postpositions from a typological perspective. In particular, we are going to show that despite the fact that a number of properties of East-Caucasian postpositions differ significantly from the typical properties of adpositions in general and Indo-European prepositions in particular, the difference is not due to their special status, but due to the fact that they usually do not serve the functions of primary adpositions in other languages. Rather, East-Caucasian postpositions are more similar to the secondary adpositions in other languages, while typical primary adpositions more often correspond to the East-Caucasian spatial suffixes rather than to postpositions. Finally, we are going to suggest that the notions of localization and directionality, widely used for the description of spatial forms in East-Caucasian languages, can help better describe the meanings and functions of primary adpositions in other languages (e.g. Russian), if applied as comparative concepts.

14 December

Ezequiel Koile, George Moroz

Detecting linguistic variation with geographic sampling

Abstract

Geolectal variation is often present in settings where one language is spoken across a vast geographic area. This can be found in phonological, morphosyntactic, and lexical features. For practical reasons, it is not always possible to collect fieldwork data from every single location in order to obtain this full pattern of variation, and we must select a group of locations to be surveyed, in order to resemble the underlying distribution of linguistic features. We propose and test a method for sampling different locations where a language is spoken, finding the optimal places to be included in a sample, with the goal of obtaining a distribution of typological features representative of the whole area. For this goal, we use different clustering algorithms such as k-means and hierarchical clustering of locations based on their geographic distribution, and define our sample of locations on the basis of this clusterization. We test our methods against simulated data with different distributions of linguistic features, on various spatial configurations, and also against real data from Circassian dialects (Northwest Caucasian). Our results show an efficiency higher than random sampling, both for detecting variation and for estimating its magnitude, which makes our method profitable to fieldworkers when designing their research.

7 December

Anastasiya Bonch-Osmolovskaya, Elena Klyachko, Sergey Kosyak, Lyuba Nesterenko, George Moroz, Oleg Serikov, Svetlana Toldova

Field NLP and where to find it in the School of linguistics

Abstract

In our talk we are going to present a new research area, which we call Field NLP — a mixture of several areas: application of Natural language processing methods to low resourced languages; creation of tools for field linguists; getting low resourced languages’ data from non-field: digitalisation, social media parsing etc; digital preservation of low resourced languages’ data; popularisation of low resourced languages’ data among speakers. Some of these domains are more developed and have more prominent results than others. We are going to highlight existing lacunae and make an overview of known tools for Field NLP and present our research in this field. We will cover automatic transliteration, segmentation, speech recognition, morphological glossing and others. We believe that the only way to advance the Field NLP is working as a community. Hence we aim to find common grounds with all scholars engaged in studies and documentation of minor languages. With this in mind, we will specifically address the problem of digital preservation of linguistic data.

30 November

Dmitry Nikolaev (University of Stuttgart)

Studying language contact using neighbour graphs: From consonant-inventory prediction to analysis of segment borrowability

Abstract

The aim of this talk is to demonstrate the advantages of using geographical nearest-neighbour graphs for large-scale study of language contact. After discussing the motivation for using nearest-neighbour graphs in typological linguistics and briefly surveying the ways of constructing them, I will present two case studies. In the first one, I will show that a nearest-neighbour graph gives a flexible and efficient way of showing the importance of language contact for modelling the composition of segmental inventories of Eurasian languages; I will argue that at a certain time depth language contact becomes a better predictor of consonant-inventory structures than phylogenetics. In the second case study, I will show how SegBo, a recently presented dataset of borrowed phonemes, can be used to construct a world-wide graph of language contact and then will use this graph to model comparative borrowability of different phonemes.

23 November

Валентин Гусев (Институт языкознания РАН)

К реконструкции древних контактов языков Северной Сибири

Abstract

Языки Северной Сибири обладают рядом интересных, в том числе типологически нетривиальных ареальных особенностей. В докладе будут рассмотрены некоторые из этих особенностей, будет показано, в какие кластеры на их основании можно объединить языки (а таких группировок может быть несколько в зависимости от того, какие черты мы рассматриваем) и какие из них обнаруживают неожиданные географические параллели. По крайней мере, некоторые из этих параллелей с большой вероятностью свидетельствуют о древних контактах.

16 November

Anastasia Panova

Chaplinsky and other Yupik languages of Chukotka: sociolinguistic situation and a case study in grammar

Abstract

In this talk I am going to present the results of my fieldtrip to Chukotka in October 2021. First, I will show a small corpus of narratives and songs in Yupik languages which I have collected during my fieldwork. Second, I will present sociolinguistic data: following (Dobrushina 2013), I used the method of retrospective family interviews and gathered some first-hand data on language repertoires of Yupik people (namely, their knowledge of other Yupik languages, Chukchi, Russian and English), the history of Yupik-Chukchi relations and the history of relations between speakers of Chaplinsky (Central Siberian) Yupik in Novoje Chaplino and on St. Lawrence Island (USA) (cf. Morgounova 2007). Third, I will describe constructions with wordforms containing suffixes -st caus, -sq ask, -nəχsiʁ expect, -niq say in Chaplinsky Yupik. As has been previously noted for the reportative suffix -niq say (Vakhtin 2007: 109-115), wordforms with ‑niq can be analyzed as consisting of two predicates: the matrix predicate ‘say’ and the dependent predicate. I develop this analysis and argue that constructions with all four listed suffixes represent examples of morphologically bound complementation (Maisak 2016, Panova 2020). Dobrushina N. (2013). How to study multilingualism of the past: Investigating traditional contact situations in Daghestan. Journal of Sociolinguistics, 17 (3). P. 376-393. Maisak T. A. (2016). Morphological fusion without syntactic fusion: The case of the “verificative” in Agul. Linguistics, 54(4). P. 815–870. Morgounova, D. (2007). Language, identities and ideologies of the past and present Chukotka. Études/Inuit/Studies, 31 (1-2). P. 183-200. Panova, A. B. (2020). Morfologicheski svyazannaya komplementatsiya v abazinskom yazyke. Voprosy Jazykoznanija, 4. P. 87–114. Vakhtin, N. B. (2007). Morfologiya glagol’nogo slovoizmeneniya v yupikskikh (eskimosskikh) yazykakh. S.-Petersburg: Nestor.

9 November

George Moroz

Comparing cross-language phonological profiles

Abstract

This talk considers different strategies for comparing the phonological profiles of languages. This can be useful for comparing different related lects (dialectology), unrelated lects (phonological typology), different diachronic states of the same lects (historical linguistics), models for language acquisition/loss, some NLP tasks, etc. I discuss two different strategies for comparing phonological profiles: the complexity-based approach and the distance-based approach. In the first approach, researchers propose different ways of calculating phonological complexity (Nichols 2009; Maddieson 2009; Coupé et al. 2009), which can be used in cross-language comparison (see criticism of this approach in (Simpson 1999; Deutscher 2009; Ohala 2009)). In the second approach, scholars apply different measures for calculating the distance between languages based on phonology (Heeringa 2004; Eden 2018; Anderson et al. 2021). There are two methods used in the distance measurement literature: * parametric approach: different feature sets (segment inventory, feature inventory, typological phonological features like stress and syllable structure) are used for distance calculation; * cross-entropy approach: entropy is used for the analysis of some samples of language data (corpus, dictionary). Anderson, C., Tresoldi, T., Greenhill, S. J., Forkel, R., Gray, R. D., and List, J.-M. (2021). Measuring variation in phoneme inventories (preprint v1). Research Square . Coupé, C., Marsico, E., and Pellegrino, F. (2009). Structural complexity of phonological systems. In Approaches to phonological complexity, pages 141–170. De Gruyter Mouton. Deutscher, G. (2009). “Overall complexity”: a wild goose chase? In Language complexity as an evolving variable, pages 243–252. Oxford University Press. Eden, S. E. (2018). Measuring phonological distance between languages . PhD thesis, University College London. Heeringa, W. J. (2004). Measuring dialect pronunciation differences using Levenshtein distance . PhD thesis, University Library Groningen. Maddieson, I. (2009). Calculating phonological complexity. In Approaches to phonological complexity, pages 83–110. De Gruyter Mouton. Nichols, J. (2009). Linguistic complexity: a comprehensive definition and survey. In Language complexity as an evolving variable, pages 110–125. Oxford University Press. Ohala, J. J. (2009). Languages’ sound inventories: the devil in the details. In Approaches to phonological complexity, pages 47–58. De Gruyter Mouton. Simpson, A. P. (1999). Fundamental problems in comparative phonetics and phonology: does UPSID help to solve them. In Proceedings of the 14th international congress of phonetic sciences, volume 1, pages 349–352.

26 October

Susanne Michaelis (Max Planck Institute for Evolutionary Anthropology)

Avoiding bias in comparative creole studies: Stratification by lexifier and substrate

Abstract

One major research question in creole studies has been whether the social/diachronic circumstances of the creolizaton processes are unique, and if so, whether this uniqueness of the evolution of creoles also leads to unique structural changes, which are reflected in a unique structural profile. Some creolists have claimed that indeed the answer to both questions is yes, e.g. Bickerton (1981), McWhorter (2001), and more recently Peter Bakker and Ayméric Daval-Markussen. But these authors have generally overlooked that cross-creole generalizations require representative sampling, especially when working quantitatively. Sampling for genealogical and areal control has been a much discussed topic within world-wide typology, but not yet in comparative creolistics. In all available comparative creoles studies, European-based Atlantic creoles are strongly overrepresented, so that typical features of these languages are taken as “pan-creole” features, e.g. serial verbs, double-object constructions, or obligatory use of overt pronominal subjects. But many of these Atlantic creoles have the same genealogical/areal profile, i.e. European (lexifier) + Macro-Sudan (substrate). I therefore propose a new sampling method that controls for genealogical/areal relatedness of both the substrate and the lexifier, which I call “bi-clan” control (where “clan” is a cover term for linguistic families and convergence areas).

19 October

Natalya Stoynova

Assessing inter-speaker variation in contact-influenced Russian

Abstract

In this talk, I will deal with Russian speech of older speakers of Nanai and Ulcha (Southern Tungusic, the Amur region). A great inter-speaker variation takes place: some bilingual Nanais and Ulchas are speakers of a “near-pidgin” Russian variety, the speech of some others does not differ greatly from the monolingual benchmark. The data used in the study come from the Corpus of contact-influenced Russian of Northern Siberia and the Russian Far East (http://web-corpora.net/ruscontact/corpus.html). This is a small spoken corpus provided with a manual annotation of contact-induced grammatical features (non-standard agreement, non-standard argument encoding etc.). Based on this annotation, I will try to assess the inter-speaker variation attested in the corpus. On the one hand, I will show which contact-induced features appear to be more stable, i.e. equally represented in texts produced by different speakers, and which ones contribute to inter-speaker variation most of all which features behave similarly, i.e. are equally frequent / infrequent in texts produced by the same speakers. On the other hand, I will discuss how speakers group together according to contact-induced features typical of them whether these clusters of speakers correlate with any sociolinguistic parameters whether they go in line with the researcher’s intuition or look surprising. An additional motivation for this study is methodological. I will test how precisely the existing corpus annotation captures the degree of deviation from monolingual benchmark and inter-speaker variation.

5 October

Daniil Ignatiev, Nick Howell, George Moroz

Computational processing of Bagvalal morphology: problems and future tasks

Abstract

Bagvalal is a minority language of the Nakh-Daghestanian language family. Like many indigenous languages, Bagvalal lacks tools for computational processing of language data. While field researchers have accumulated a relatively large amount of linguistic data in documentation projects, it is still insufficient for statistical approaches to text processing to be applied. The talk discusses a rule-based technology for text processing that was successfully used to design a prototype morphological glosser for the Kwanada dialect of Bagvalal. Lack or insufficiency of certain types of lexical and grammatical data, to be discussed in the talk, complicates further tuning of the instrument as well as its application to other Bagvalal dialects. However, further work on the analyzer could facilitate fieldwork and make it possible to design a machine translation system for Bagvalal.

28 September

Maksim Melenchenko, Aigul Zakirova

Several aspects of numeral morphology in the languages of Dagestan

Abstract

In this talk we will demonstrate new maps for the Typological Atlas of the languages of Dagestan, covering several topics of numeral morphology in the East Caucasian languages. We examine numeral markers appearing in different series (cardinals, ordinals, distributives, etc) and elaborate on their diachronic sources. We also address differences in the structure of complex numerals, e.g. the inventories of linking suffixes and the repetition of cardinal markers inside complex numerals. Finally, we will discuss several instances of borrowing in numeral systems, including lexical and morphological borrowings.

21 September

Pierpaolo di Carlo, Jeff Good (University of Buffalo)

Exploring socio-spatial networks and individual-based variation in the study of small-scale multilingualism

Abstract

This talk presents the initial results of research by a number of members of the KPAAM-CAM multidisciplinary team (including linguists, sociolinguists, anthropologists, and geographers) aiming to explore multiple methods and datasets in the study of small-scale multilingualism. The testbed is Lower Fungom, a rural area of western Cameroon where small-scale multilingualism has been widely documented. In the talk, we will present (i) epistemological issues posed by contexts of small-scale multilingualism and the methodological responses we have put in place to address them, mainly concerning the need to explore individual-based variation; (ii) initial findings from the study of individual-based wordlists by applying tools originally designed for cognate detection for historical linguistic purposes to questions of synchronic variation, and (iii) the correlations that such lexicostatistical data have with geographic distance vs. travel difficulty between locales associated with distinct languages.

14 September

Ezequiel Koile, Ilya Chechuro, George Moroz, Michael Daniel

Geography and language divergence: the case of Andic languages

Abstract

We study the correlation between phylogenetic and geographic distances for the languages of the Andic branch of the East Caucasian (Nakh-Daghestanian) language family. For several alternative phylogenies, we find that geographic distances correlate with linguistic divergence. Notably, qualitative classifications show a better fit with the geography than cognacy-based phylogenies. We interpret this result as follows: the better fit may be due to implicit geographic bias in qualitative classifications and conclude that approaches to classification other than those based on cognacy run a risk to implicitly include geography and geography-related factors as one basis of genealogical classifications.

7 September

George Moroz, Timofey Mukhin, Chiara Naccarato, Samira Verhees

Update: Typological Atlas of the Languages of Daghestan

Abstract

In this talk we introduce the recent updates made to the Typological Atlas of Daghestan, which include new topics and new visualizations. We would also like to use this opportunity to discuss how to turn the atlas into a resource with chapters and data that are both easy to use, cite and find on the one hand, and easy to edit and update on the other hand. During the talk we also will discuss a new phonological database of East Caucasian languages and patterns that it reveals. We will discuss the distribution of the following phonological features: inventory size, gemination, labialisation, laterals, nasal vowels, long vowels and briefly discuss correlation between elevation and inventory size (sorry for those of you, who have seen this on SLE conference).

15 June

Polina Nasledskova, Tatiana Philippova

Postpositions in Nakh-Daghestanian

Abstract

In this brief talk we will report on our ongoing project devoted to a general description of postpositional systems in the Nakh-Daghestanian languages. In particular, we shall talk about their case government properties and the ability to function as adverbs. At the end we will present ideas concerning our prospective contribution to the Typological Atlas of the languages of Daghestan.

George Moroz

Comparative Andic dictionary database: history of creation

Abstract

During the last two years, we worked together with Arseniy Averin, Anastasia Davidenko, Ilya Sadakov, Zlata Shkutko, Grigory Kuznetsov, Anna Tsysova, Wanshu Zhang on digitalisation of the Andic dictionaries. During compilation of the database we also worked on several subprojects on comparative phonology, colexicalisation and morphology of plural nouns forms. During the talk I would like to present the database and briefly discuss some preliminary results of the conducted research.

8 June

Anastasia Panova

Towards a typology of continuative expressions

Abstract

This study investigates how continuative semantics is encoded cross-linguistically. The work is based on two independent language samples: a sample with global coverage and an intragenealogical sample of four Northwest Caucasian (Abkhaz-Adyge) languages. The cross-linguistic sample is genealogically and geographically balanced and includes 120 languages. Means that convey continuative semantics — continuative expressions — are analyzed according to the following parameters: morphosyntactic type (affix, auxiliary, adverbial phrase), degree of grammaticalization, tense-aspect-actionality restrictions on the predicate, non-continuative uses of the continuative expressions and semantic effects when combined with negation. The data come mainly from secondary sources (grammatical descriptions and dictionaries) and parallel texts. The second part of the study focuses on the intrageneological typology of continuative expressions in the following Northwest Caucasian languages: Abaza, Abkhaz, Kabardian and West Circassian (Adyghe). The main sources for the study of continuative expressions in Northwest Caucasian are elicited data and parallel texts. Based on the results of the macro-typological and intrageneological studies and their comparison, I suggest that two typological clusters or profiles of continuative expressions can be distinguished — predicative and adverbial, and that continuative expressions belonging to different classes show different degrees of diachronic stability.

1 June

Nikita Muravyev, Daria Zhornik

Solving the puzzle of the Ob-Ugric passive

Abstract

In this talk, we look at the active/passive voice alternation in two Ob-Ugric languages of Western Siberia, Northern Khanty and Northern Mansi. This alternation has been described in the literature as primarily motivated by information structure: a sentence appears in active whenever an Agent is the primary topic of the sentence, otherwise passive voice is used (Kulonen 1989, Nikolaeva 2001). However recent text and elicitation data suggest that a purely information-structure based approach has a number of shortcomings. First, passive can be used if an Agent is topical yet low in animacy and/or definiteness. Second, focused Agents are allowed in special kinds of active sentences, e.g. interrogative contexts. Moreover, passivization is possible with a great variety of intransitive verbs with no Agent role whatsoever, including state verbs and verbs denoting spontaneous change of state. Also intransitive verbs can be passivised in adversative contexts in which some discourse participant external to the event gets affected in some way. These facts posit a problem both for the abovementioned information-structural approach and for the existing typological accounts of the active/passive alternation. We will discuss these facts in detail, compare the situation in Khanty and Mansi and present a model which helps at least partially solve the Ob-Ugric puzzle. Kulonen U. M. The Passive in Ob-Ugrian. Helsinki, Finno-Ugrian Society, 1989. Nikolaeva, I., 2001. Secondary topic as a relation in information structure. In: Linguistics, 39.1: 1–50.

25 May

Ilya Chechuro, Michael Daniel

Looking for areal convergence in nominal gender assignment in East Caucasian

Abstract

In this talk, we investigate whether the data on nominal gender assignment in East Caucasian - more specifically, Lezgic - languages show any evidence for areal convergence. To do so, we consider those Lezgic languages and their immediate neighbours that feature four-gender systems, including Budukh, Kryz, Rutul, Tsakhur and Archi, and compare them to Lak, Archi’s immediate neighbour, and Khinalug, immediate neighbour of Kryz and Budukh. In all these languages, Gender 3 and Gender 4 are semantically heterogeneous, so shared assignment may be due to (a) common inheritance, (b) areal convergence, or (c) pure chance. A quantitative analysis of gender assignment across the lexicon documented in Kibrik and Kodzasov (1990) suggests that Archi is more similar to its neighbour Lak than to any of its Lezgic cousins. No such result has been obtained by comparing Khinalug and its Lezgic neighbours Budukh and Kryz. We will discuss various methodological refinements we attempted to unravel the genealogical and areal signals, and to distill both of them from the impact of crude semantics. These attempts were purposefully based on the use of data external to East Caucasian (World Loanwords Database; Wordnet) but so far have not been successful - so we will ask for your ideas to improve our methodology.

18 May

Jérémy Pasquereau (University of Poitiers)

On tense, aspect, and evidentiality in Karata (East Caucasian, Karata village variety)

Abstract

Like other East Caucasian languages, Karata has elaborate verbal paradigms, in particular because of the high number of analytic constructions it uses. On the basis of a 40+-text corpus and Dahl’s 1985 TAM questionnaire, and building on previous work (Magomedbekova 1971, 1998, Magomedova & Xalidova 2001, Xalidova 2019), I present ongoing work aiming at describing the morphosyntax and the meanings of verbal forms in this language.

11 May

Samira Verhees

Karabagly - an Armenian village in Dagestan

Abstract

In this talk I will report on my two-day visit to Karabagly, a village in northern Dagestan (Tarumovsky district) that was originally mono-ethnic Armenian and presently still has a majority Armenian population. I will discuss some preliminary observations on the preservation of Armenian language and culture in the village, and the relationship of the Armenians with other local people as well as their historical homeland Armenia.

Nina Dobrushina, Michael Daniel, Kirill Koncha, Maksim Melenchenko

Tsudakhar - Lak contact: evidence from sociolinguistic field study in April 2021

Abstract

In this talk, we will briefly present the results of the sociolinguistic field study carried out in five adjacent Lak and Tsudakhar villages. We will focus on the Tsudakhar - Lak bilingualism and ethnic contacts, their main site being the Tsudakhar Monday market. Our attempt to observe communication at the Tsudakhar market will be discussed, with a brief reference to other markets of highland Daghestan. We will also mention Tsudakhar - Avar contact in the village of Karekadani.

20 April

Susanne Michaelis (Max Planck Institute for Evolutionary Anthropology)

Grammatical co-expression patterns in creoles and their parent languages: comitative and related functions

Abstract

In this talk, I will report on an ongoing project on grammatical coexpression patterns (or polysemy patterns) in creole languages and their parent languages, such as illustrated in examples (1)–(4). The Seychelles Creole polysemous marker (av)ek ‘with, and, by’ (< French avec ‘with’) is used to express four different grammatical functions: comitative (1), instrumental (2), passive agent (3), and noun phrase conjunction (4). (1) comitative Mon ’n travay avek Sye Raim. 1SG PRF work com Mr Rahim ‘I have worked with Mr Rahim.’ (Bollée & Rosalie 1994:14f.) (2) instrumental Nou fer servolan nou file ek difil. 1pl make kite 1pl let.glide with thread ‘We made a kite and let it glide with a thread.’ (Michaelis 1994:66) (3) passive agent Mon’n ganny morde ek lisjen 1sg.prf pass bite pass.agent dog ‘I have been bitten by a dog.’ (Michaelis & Rosalie 2000:82) (4) noun phrase conjunction Mari ek Pyer ‘Mary and Peter’ When comparing the specific coexpression pattern of Seychelles Creole (av)ek with the patterns in its parent languages, it becomes clear that in French, the lexifier language, the marker avec ‘with’ only covers a subset of the meanings that the Seychelles Creole marker (av)ek covers, namely only comitative and instrumental. By contrast, the Passive agent is expressed by par ‘by’ in French, and noun phrase conjunction is expressed by the coordination marker et ‘and’. However, Makhuwa and other neighboring Bantu languages of East Africa (the most important cluster of substrate languages relevant for Seychelles Creole) show the same coexpression pattern as the one cited for Seychelles Creole. Here, the marker ni (van der Wal 2009:113) covers all four grammatical meanings that we saw for Seychelles Creole, comitative, instrumental, passive agent, and noun phrase conjunction. The hypothesis of the paper goes beyond Seychelles Creole: It extends to potentially all creole languages. I suggest that grammatical coexpression patterns in creoles are not randomly distributed, but they systematically reflect the grammatical coexpression patterns of their substrate languages, and much less so those of their lexifier languages. Here I investigate 10 creole languages from around the world (genealogically maximally distinct) and their parent languages for the grammatical markers expressing comitative, instrumental, and noun phrase conjunction (and related meanings). Recent literature (e.g. Baptista 2020) suggests that “convergence” of functions (and possibly forms) of the parent languages is a major driving force for shaping creole grammars. Indeed, at first glance the coexpression pattern of a grammatical marker ‘with’ in a creole language seems to mirror overlapping, convergent grammatical meanings between its lexifier and its substrate language(s). But a closer look at the grammatical coexpression patterns of similar ‘with’-markers in genealogically different creoles and their parent languages reveals that it is the coexpression patterns of the substrates that tend to be imposed on the nascent creoles, irrespectively of the degree of convergence of the lexifier patterns with those of the substrates and/or the creole. Thus, comitative, instrumental, passive agent and np-conjunction are shared by Makhuwa and Seychelles Creole, whereas French only converges in comitative and instrumental with both Makhuwa and Seychelles Creole. References Baptista, Marlyse. 2020. Competition, selection, and the role of congruence in creole genesis and development. Language 96:1, 160-99. Bollée, Annegret and Rosalie, Marcel. 1994. Parol ek memwar. Récits de vie des Seychelles. Hamburg: Buske. Michaelis, Susanne. 1994. Komplexe Syntax im Seychellen-Kreol: Verknüpfung von Sachverhaltsdarstellungen zwischen Mündlichkeit und Schriftlichkeit. Tübingen: Narr. Michaelis, Susanne and Rosalie, Marcel. 2000. Polysémie et cartes sémantiques: Le relateur (av)ek en créole seychellois. Études Créoles 23. 79-100. van der Wal, Jenneke. 2009. Word order and information structure in Makhuwa-Enahara. Utrecht: Netherlands Graduate School of Linguistics.

13 April

Aigul Zakirova

From noun plural to plural agreement: evidence from Andi dialects (and beyond)

Abstract

Noun plural markers sometimes grammaticalize into markers of plural agreement on various targets: e.g. Turkic –lar (Erdal 2004: 231 for Old Turkic, Matasović 2018 for Karaim), similar developmens can be postulated for the Adyghe -xe (Lander et. al. forthc.), and Nivkh -ɣun (Gruzdeva forthc.). The process of grammaticalization of noun plural marking into plural agreement marking on other types of targets has not, to my knowledge, been dealt with in typological literature. A way to compensate for this gap would be to describe scenarios of such evolution in particular languages and language groupings. Andi (Avar-Andic < Avar-Ando-Tsez < East Caucasian) presents an interesting case of grammaticalization of a plural marker -(V)l into a number agreement marker. I will address the question of how this mechanism of number agreement might have evolved. -(V)l is most probably the reflex of *li, one of the reconstructed Proto-Andic plural markers (Alexeyev 1988: 92-93). Whereas related Andic languages have an extensive list of plural markers that hardly have something in common, in most Andi dialects -(V)l was generalized as a nominal plural marker. The next step was the extension of -(V)l onto other word forms, i.e. targets of agreement, both inside the NP and onto verbal forms and adverbs. The behavior of -(V)l on different types of targets will be condisered in order to come to a plausible scenario. In a more descriptive vein, I will compare the -(V)l-agreement to the more “canonical” gender agreement, also present in Andi. Finally, I will briefly consider examples of similar developments in the related East Caucasian languages.

6 April

Thomas Wier (Free University of Tbilisi)

Squaring the circle in the Caucasus: Perspectives on Sprachbünde and Language Contact

Abstract

Linguists have long noted both the exceptional internal diversity of the Caucasus, but also that many of the features of languages found there are not found in immediately adjacent regions of Eurasia. In the last two centuries, the question has thus arisen more than once: to what extent do these unusual features arise from language contact, and to what extent can they be explained by other (phylogenetic, typological, or indeed statistically random) traits? In this lecture I will review three different sets of answers that have been proposed: Klimov (1965, 1973); Tuite (1998); and Chirikba (2008). After reviewing these arguments, I will suggest that while autochthonous Caucasian languages do share a quantitatively large number of phonological and morphosyntactic traits in common, qualitative similarities are more probative in answering the question of whether the region constitutes a true Sprachbund, and a better approach might be to distinguish micro- and macro-Sprachbünde.

30 March

Oleg Belyaev

Contact influences on Ossetic: A general overview

Abstract

In many ways, Ossetic has a unique status among languages of the Caucasus. Belonging to the Iranian branch of the Indo-European language family, Ossetic is the last living representative of Sarmatian varieties once widely spoken in the northern Black Sea region. Having long developed in isolation from other Iranian languages, Ossetic has, on the one hand, preserved a number of archaic features; on the other hand, it has developed unique innovations, some of which may be explained by language contact. The Ossetic lexicon, mainly being of Iranian origin, has a comparatively large share of loanwords from neighbouring languages, many of them in the basic lexicon. In phonology, a key contact-induced feature is the presence of ejective consonants, mainly in Caucasian loanwords. Some grammatical features of Ossetic (word order, case system, structure of complex clauses) may also be contact-induced. Therefore, the data of Ossetic are valuable both for the typology of language contact and the study of early contacts of Ossetians / Alans and other ethnolinguistic groups. In the talk, I will provide a general overview and discussion of lexical and grammatical features of Ossetic that may be contact-induced, and a preliminary analysis of which contact situations could have led to these results.

23 March

Anastasia Panova, Michael Daniel

Linguistic complexity across East Caucasian: from the eye of the beholder to corpus based measures

Abstract

Measuring complexity in typology is deemed relevant for the sociolinguistic take on language diversity, connecting complexity of language structures to such diverse but correlated factors as language size, its relative isolation, its L2 acquisition and multilingualism of its speakers. Yet on the empirical side, measuring complexity is difficult not only because the measures are sometimes calibrated in what may seem an arbitrary way, but also - and certainly not less importantly - because they depend on the analysis in a grammar. As one example, Kibrik (1977) counts over a million of synthetic verbal forms in Archi; but excluding verificative and especially quotative ‘series’ from inflectional morphology dramatically reduces this abundance. Similarly, measuring phonetic complexity based on the cardinality of inventories may deliver different stories depending on the approach; the status of [x] in Archi (only Russian loans) is very different from its status in Rutul (native lexicon); including or excluding rare allophones (Archi [ɮ]) and variants (Mehweb [ɣ]) that sometimes do and sometimes do not make their way into the descriptive inventories could in theory influence the outcomes of the quantitative comparison, and it is not absolutely obvious what can be the impact of these factors on the comparison. A way to avoid this would be (i) using shallow counts that minimize the analytical impact of language descriptions and (ii) making counts in corpora rather than deriving them from descriptive grammars. In this talk, after a brief survey of the existing corpus based approaches to measuring language complexity, we discuss several experiments we carried out to measure morphological and phonetic complexity across unannotated corpora of the languages of Daghestan. We (dual, exclusive) are very much looking forward to having feedback and suggestions as to how further develop this take.

16 March

Alexandra Vydrina

Multilingualism as a genre-structuring strategy: the case of Kakabe traditional narratives

Abstract

Various West-African language communities show the use of a specific type of code-switching that is limited to the genre of traditional narratives: songs that appear in such narratives regularly include passages that are in a language different from the principal language of the narration. This type of conventionalized multilingualism is a regular phenomenon that is recurrently found across languages of West Africa. However, so far, it has never been object to any systematic investigation. In my presentation, I will analyze this type of multilingual practice on the data of 70 Kakabe traditional narratives, investigating the specific mechanism of switching from one language to the other and its relation to the wider context of the type of multilingualism found in this speech community.

9 March

Manuel Padilla-Moyano (University of the Basque Country & Linguistic Convergence Laboratory, HSE)

Revisiting motion events in Basque

Abstract

Asymmetries in spatial relations have been described cross-linguistically [Stefanowitsch & Rohde 2004; Luraghi, Nikitina & Zanchi 2017; Kopecka & Vuillermet 2021]. Basque has a set of spatial cases, in which the ablative encodes Source and Path, and the allative conveys Goal. Additionally, there are both directional and terminative case-markers. In some dialects, this general tableau becomes more complicated, and historical records also provide additional complexity, such as an ancient dedicated perlative marker [Lafon 1948]. As Basque spatial cases can mark animacy, asymmetries in the encoding of motion events must also consider this parameter [Creissels & Mounole 2011; Krajewska 2021]. I will present an incipient study on the Source-Goal asymmetry, which will be part of comprehensive research on the evolution of the Basque case-system. Pursuing Zaika’s study [2016], I will analyze the behavior of verbs of motion, putting and posture, as well as the case-markers and non-grammaticalized postpositions they make appear. This work will consider dialectal variation, diachronic factors, and the role of language contact. To this end, I will exploit existing corpora and other materials, and collect new data from fieldwork with speakers of several dialects. References Creissels, Denis & Mounole, Céline (2011). Animacy and spatial cases: Typological tendencies, and the case of Basque. In Seppo Kittilä, Katja Västi & Jussi Ylikoski (Eds.), Case, Animacy and Semantic Roles (Typological Studies in Language 99), pp. 157–182 Amsterdam/Philadelphia: John Benjamins. Kopecka, Anetta & Vuillermet, Marine (2021). Source-Goal (a)symmetries across languages. Studies in Language 45(1). Krajewska, Dorota (2021). The marking of spatial relations on animate nouns in Basque: a diachronic quantitative corpus study [submitted to Journal of Historical Linguistics]. Lafon, René (1948). Sur les suffixes casuels -ti et -tik. Eusko Jakintza 2, 141–150. Luraghi, Silvia; Nikitina, Tatiana & Zanchi, Chiara (Eds.) (2017). Space in Diachrony. Amsterdam/Philadelphia: John Benjamins. Stefanowitsch, Anatol & Rohde, Ada (2004). The goal bias in the encoding of motion events. Zaika, Natalia (2016). Вариативность падежных форм при глаголах движения в баскском языке в диахроническом и диалектном аспектах. Acta Linguistica Petropolitana 12(1), 428–441.

4 March

Natalia Kuznetsova

Rare features in phonological typology

Abstract

The talk will touch upon theoretical aspects of existing and emerging accounts on rare features in phonological typology, in general, and in word-prosodic typology, in particular. Rarities can be ignored by linguistic theory, be reanalysed as regular, or be incorporated by changing the theory. Phonological rara and rarissima used to be rather ignored or reanalysed, but the trend seems to be changing, with always more data coming in from lesser-studied languages, on the one hand, and a strengthening interest of linguistic typology in geographic and evolutionary aspects related to the cross-linguistic distribution of linguistic features, on the other hand.

Giuliano Castagna

Modern South Arabian: archaism, innovation and contact in the Arabian peninsula

Abstract

It has long been known that the Modern South Arabian subgroup of the Semitic language family, made up of six endangered languages spoken in Oman and Yemen, exhibits a set of characteristics regarded by Semitic scholars as archaic, such as: large sound systems including lateral fricatives and affricates, and glottalised stops and affricates; productive subjunctive and conditional moods, as well as other characteristics that may be reminescent of classical Semitic languages, such as the reverse gender agreement between numerals and nouns, and the presence of second and third person feminine and dual pronouns. However, certain other features of these languages have not been analysed in detail by mainstream Semitic literature. In fact, some of these features have not been discussed at all: for example, the presence of a first person dual pronoun, and the apparently non-Semitic facies of a sizeable part of Modern South Arabian lexis. Moreover, the unexplained relationship between Modern South Arabian languages and a huge amount of undeciphered epigraphs found mostly in caves and on rocks and boulders, calls for further studies. These epigraphs employ a modified version of the south Semitic script, and are found not only in the present-day range of Modern South Arabian, but also further north-east into Oman proper. This presentation aims at providing a general introduction to the Modern South Arabian languages, and highlighting the above-mentioned issues, as well as advancing some working hypotheses.

2 March

Tim Zingler

Form and function in morphological typology

Abstract

The goal of linguistic typology is to understand the interactions of form and function in the languages of the world. Typically, investigations conducted in this research paradigm take a certain functional domain (e.g., ‘causative/applicative’) as the starting point and subsequently analyze by which formal means it is expressed. In this talk, though, I will argue that typology can also benefit from following the opposite approach, that is, by focusing on a specific type of linguistic form (e.g., infixation) and analyzing which functions it encodes.The major advantage of the latter strategy is that linguistic forms are ultimately less variegated than linguistic functions, which facilitates comparison.This strategy will then allow typologists to develop a more nuanced theory of morphology and to account for areal patterns that manifest themselves in the distribution of linguistic forms. In order to support these claims, I will draw on novel research on the suffixing preference.

Riccardo Giomi

A Functional Discourse Grammar typology of reflexives, with some notes on reciprocals

Abstract

This chapter presents the first-ever Functional Discourse Grammar typology of reflexives and opens the way to a comparable typology of reciprocals. The main finding of the paper is that the striking morphosyntactic diversity of reflexive markers can be reduced to only three basic classes, which differ as regards the structure of the predication frame on which the construction is built. In Type I reflexives the lexical predicate takes two coindexed arguments; Type II reflexives are based on a one-place frame in which the predicate bears a reflexive (or reflexive/reciprocal) operator; finally, Type III reflexives are characterized by the presence of a configurational predicate which takes both an external and an internal argument. All further differences are explained with reference to different ways of aligning the underlying pragmatic and semantic structures of each construction-type – more specifically, the number and information-structural status of referents at the Interpersonal Level and the number and structural position of verb arguments at the Representational Level. A further advantage of the proposed typology is that of accounting for possible differences in the lexical distribution of reflexive markers on the basis of the notion of partially instantiated predication frames, i.e. partially lexicalized constructional templates of the Representational Level.

Julie Marsault

The prefixal template of Umóⁿhoⁿ: case study of the “dative” prefix

Abstract

Umóⁿhoⁿ (Siouan), a highly endangered Native American language spoken in the United States, possesses a highly complex verbal morphology, in particular a series of arbitrarily ordered derivational and inflectional prefixes. After a brief introduction to the language, I will present the verb’s prefixal template, then focus on the prefix gí-, usually called “dative”. The case study of gí- covers several key issues of Umóⁿhoⁿ morphology: (1) change of slot of person marking triggered by the presence of other prefixes; (2) multiple exponence of the dative and of person marking; (3) semantic demotivation and lexicalization of the prefixes. Building on these developments, I will show that the dative prefix exhibits both inflectional and derivational characteristics.

16 February

Timur Maisak

Towards the Nakh-Daghestanian Lexicon of Grammaticalization

Abstract

What will be discussed in the talk is not an accomplished or even an ongoing project, but rather a general idea of creating a lexicon of grammaticalization for Nakh-Daghestanian languages. I will start with an overview of existing lexicons of grammaticalization (which are very few) and how they can serve as source of inspiration for the Nakh-Daghestanian Lexicon. I will then present example entries of the future Lexicon and mention the choices and the problems one has to face when creating such a Lexicon. Comments and suggestions from the audience will be most welcome.

9 February

Tatiana Philippova

Adpositions and case: Categorial issues

Abstract

This talk will address the issue of the categorial status of case markers and adpositions from a cross-linguistic perspective. I will present some major research questions arising in this respect, including the following: • How do we approach the case/adposition delineation problem in languages with case suffixes and postpositions? • Is it feasible to posit a cross-linguistically uniform category of postpositions? If yes, is it principally distinct from that of prepositions? • Can we meaningfully compare language-specific categories of case and adpositions across languages? Having introduced these questions, I will give an overview of the state of the art in research on this and related topics. Please note that this will be an overview talk, rather than one showcasing the results of my own research. And I expect that there will be plenty of room for discussion!

2 February

Aleksandra Trepalenko, Timur Maisak

Towards the corpus of Bagwalal dialects

Abstract

In the talk I will prBagwalal is a small and underdescribed language of the Avar-Andic branch of the Nakh-Daghestanian family. After a general introduction about the language and the history of its research (Timur Maisak), we shall present the ongoing project on the glossing of Bagwalal dialectal texts (Aleksandra Trepalenko). The texts first published in Gudava’s (1971) grammar in Georgian represent all six villages where Bagwalal is spoken. We are going to present the results of our analysis of the texts (glossing, translation), mention the main dialectal differences and describe some interesting features of Bagwalal and problems we faced during our work.

26 January

Sofia Oskolskaya

On typology of caritive constructions

Abstract

In the talk I will present the project “Grammatical periphery in the languages of the world: a typological study of caritives”. Caritive (aka abessive) expresses the non-involvement of a participant into a situation, with the non-involvement predication semantically modifying the situation or a participant of a different situation, like in English Mary came without John / money. The project aims at studying the means of expression of caritive meanings in the languages of the world. We developed a questionnaire and collected data from a representative sample of 100 languages. I am going to discuss the methodology of the project: the definition of caritive, questionnaire, methodology of collecting data. The project is still in progress, but I will present some preliminary results.

19 January

Chingduang Yurayong (Mahidol University)

Postposed -to in North Russian dialects through the lens of Finnic languages in contact

Abstract

The use of demonstrative-derived morphemes in the head-following position is characteristic of North Russian dialects (-to and its variants -ta, -tu, -ti, -te, …) and eastern Finnic languages (-se [singular] and -ne [plural]), such as Olonets Karelian, Lude, and Veps. In terms of function, some previous studies regard these grammatical elements as definite articles, while other recent studies identify additional functions related to information structure and discourse. Given that an equivalent construction is not observed in Belarusian and Ukrainian, and -to in other Russian dialects mostly remain invariable, several studies propose that declinable -to in North Russian could have resulted from language contact with the Uralic-speaking population who adopted Russian as their second language, particularly Finnic speakers. Of many aspects of the research question, this presentation will focus on exploring contexts of use of -to in North Russian and -se in Finnic from the perspectives of referentiality, information structure, and evaluation. Potential development paths will also be discussed by paying attention to the Russian-Finnic contact scenario during the past millennium.

12 January

Timofey Mukhin, Chiara Naccarato, Samira Verhees

Dagatlas Update

Abstract

We will give a short update about the project Typological Atlas of Daghestan covering the progress we have made so far. We will discuss our plans to publish the resource and a (partially) new approach to data visualization.

Polina Nasledskova

Borrowed postpositions in East Caucasian

Abstract

Grammar descriptions of East Caucasian languages include information about borrowed postpositions. I attempt summarizing the data on borrowed postpositions (both between branches of East Caucasian family and into East Caucasian from languages of the other families). I suggest contact origins of some postpositions whose diachrony is unclear from the sources. I will also provide an overview of the typology of borrowed postpositions. I ultimately aim at correlating borrowing of postpositions in East Caucasian with other contact-induced changes in the languages of the family. This presentation is a preview of a study, not a final analysis of the data.

Seminar schedule 2020

22 December

Sergey Say

Using BivalTyp (www.bivaltyp.info) for measuring (dis)similarities between valency class systems

Abstract

The goal of my presentation is two-fold. In the first part, I am going to introduce BivalTyp (www.bivaltyp.info) — a typological database of bivalent verbs and their encoding frames. This database contains information on the ways in which 130 bivalent contextualized predicates (such ‘be afraid’, ‘listen’, ‘touch’) are assigned to valency classes in 85 languages (mostly spoken in Eurasia). This part of the presentation will be user-oriented, i.e., I will focus on the ways data are processed, stored and visualized in the database. In the second part, I will briefly discuss some of the ways in which BivalTyp enriches our knowledge of the ways in which arguments select encoding devices (such as cases, adpositions and verb indices) in individual languages. In particular, I will argue that the very partition of verbs into valency classes can be used as a justified tertium comparationis in cross-linguistic studies of argument encoding. I will also introduce some distance metrics that can be applied to the data from BivalTyp and will discuss genealogically and areally determined similarities between valency class systems in the languages of the sample.

15 December

Ksenia Shagal (University of Helsinki)

Multifunctional non-finites in Northern Eurasia

Abstract

In this talk, I am going to discuss patterns of multifunctionality that are characteristic of non-finite forms in 50 languages of Northern Eurasia. Specifically, non-finite forms are investigated in terms of the inventory of functions each of them can perform when heading a subordinate clause: reference function (complement clauses), adnominal modification (relative clauses), and adverbial modification (adverbial clauses). The primary questions I will address are the following: (a) What patterns of multifunctionality in non-finites are most common and how are they distributed geographically across Northern Eurasia? (b) Do patterns of multifunctionality differ depending on how prominent non-finite subordination is in a language? (c) Are there any recurrent patterns involving specific constructions, and if yes, can we propose an explanation for their occurrence?

8 December

Ilya Chechuro

Good Practices for Linguistic Data

Abstract

This talk is devoted to the practices that make linguistic data findable, accessible, interoperable, and reusable (FAIR). First, I will introduce some general guidelines for data structures, file formats, and data description. Then I will touch upon the issues related to orthographic systems and discuss the problem of orthographic ambiguity. The first part will be concluded by the discussion of Cross-Linguistic Data Formats (CLDF) and meta-databases such as Glottolog, CLLD and Concepticon. Together these tools form a framework that attempts to facilitate data standardization and sustainable storage. The second part of the talk will deal with data sharing. I will propose several tools for increasing reproducibility of programming code. I will also discuss version control with Git and academic licences. Finally, I will briefly introduce the tools that are useful when submitting a paper: Open Science Framework (OSF.io) and Zenodo.

1 December

Ekaterina Rakhilina, Tatiana Reznikova, Daria Ryzhova

Lexical systems with systematic gaps: verbs of falling

Abstract

The paper presents the results of a project on cross-linguistic analysis of FALLING verbs in more than 40 languages. The main possible oppositions and patterns of colexification in lexical systems are described in the framework of Moscow lexical typology group (Rakhilina, Reznikova 2016). Though in most languages this semantic field appears to be rich, our research did detect language systems without dedicated verbs of falling. We argue that these cases are neither accidental nor culture-specific, but can be seen as following from some fundamental semantic principles.

17 November

Ekaterina Kapustina

Особенности функционирования дагестанских транслокальных сообществ в условиях внутрироссийской миграции

Abstract

В докладе анализируется современное устройство и функционирование дагестанских сельских сообществ, члены которых участвуют во внутрироссийской миграции (в качестве примера выбрана миграция в города Западной Сибири). В качестве теоретической линзы были выбраны положения концепций транснационализма и транслокальности, которые позволяют рассматривать мигранта и его социальный мир без отрыва от его отправляющего сообщества, джамаата. Ориентация на сохранение приоритета сельской локальности при переселении за пределы села и республики Дагестан, поддержание транслокальных связей формируют новый социальный организм – мультилокальное сообщество – о специфическом функционировании такого рода сообществ и пойдет речь в сообщении. В основу работы положен полевой материал автора, собранный в городах Ханты-Мансийского автономного округа и в Республике Дагестан в 2011-2019 гг.

10 November

Damian Blasi

An ancient history bottleneck for linguistic diversity and its consequences for linguistic typology

Abstract

In this presentation I will discuss the relation between linguistic diversity and basic units of human organization in pre-agricultural, nomadic and forager societies. On the basis of those patterns I will discuss existing hypotheses on the previous stages of linguistic diversity (from early Holocene until today), and I will provide evidence for a relatively brief period of massive linguistic diversity from 4-1 kybp. I will conclude by spelling out the practical consequences of this finding for typological and historical linguistic generalizations.

3 November

Anna Azanova

Clitics li and chi in Rogovatoye and Spiridonova Buda dialects: functions and positional properties

Abstract

In the dialects that I will be talking about, the repertoire of function words such as conjunctions and particles is considerably different from the standard variety of Russian. Thus, they have clitic chi that is considered to have the same functions as Russian li. So, my first research question is what is the distribution of the functions between these quasi-synonymous clitics in the dialects that have both of them? Secondly, I want to talk about their positional properties, depending on the function they have. There were many studies of Slavic clitics in the standard languages, but none (as far as I know) considered dialect data, and that’s what I’ve tried to do.

27 October

Nina Dobrushina

Optatives in Nakh-Daghestanian and beyond

Abstract

Inflectional optatives - dedicated forms to express the speaker’s wish - are typical across the Caucasus. In this talk, I give an overview of optatives in Nakh-Daghestanian languages and discuss their possible diachronic sources and grammaticalization paths. I also argue for the contact as one reason for the areal spread of the optatives and suggest their prominent role in everyday discourse as a possible reason for this spread.

20 October

Chiara Naccarato

The standard of comparison in the languages of Daghestan

Abstract

In this talk I will present the results of my research on the standard of comparison in the languages of Daghestan, which was started as part of the DagAtlas project (the “Typological Atlas of the Languages of Daghestan”). In the languages of Daghestan, the standard of comparison is usually expressed by a spatial form, i.e. an inflected form of a nominal normally expressing a spatial relation. In this study, I classify the languages of Daghestan according to the type of spatial form used to mark the standard of comparison. Following the methodological approach of the DagAtlas project, I collected the data from the available literature and built maps for the visualization of results. The results obtained are discussed both in terms of frequency and distribution within the linguistic area under investigation, and in comparison with broader typological investigations of comparative constructions (Stassen 1985, 2013), which include almost no reference to data from Daghestan. The latter comparison does not reveal surprising findings: the Daghestanian data adhere quite well to the cross-linguistic picture (with a general preference for elative markers). Within Daghestan, the overall picture seems a bit fuzzy, and the distribution of values on maps does not allow to detect any noteworthy areal or genealogical clustering. An exception is constituted by Andic languages, which form a cluster based on the localization marker employed (forms in -č’- indicating contact with some entity).

13 October

Natalya Stoynova

Inter-speaker variation in code-switching in the situation of language shift. The case of Nanai and Ulch

Abstract

In this talk I will present some quantitative data on different structural types of code-switching attested in oral texts in Nanai and Ulch (Southern Tungusic). These texts represent a specific mode of code-switching between Nanai/Ulch and Russian observed in the situation of language shift. Speakers were instructed by the linguist to tell something in their native language, and this was an unusual and artificial way of communication, since both languages are endangered and the dominant language of the speech community is Russian. All the texts contain a lot of Russian fragments of different sizes and morphosyntactic types. I will focus on inter-speaker variation. There is a general assumption that inter-speaker variation increases in the situation of language shift. My preliminary observation is that this, particularly, concerns structural types of code-switches attested in the texts under discussion. First, I will check this observation. Second, I will show that intensity and preferred structural types of code-switching correlate with general narrative habits and skills of a speaker.

6 October

Ilya Chechuro, Michael Daniel, Ezequiel Koile, George Moroz

Сorrelations between linguistic distances with geography in Daghestan

Abstract

We continue our project of looking for correlations between linguistic distances with geography in Daghestan, an area of high language density and mountainous terrain. We are trying to detect the impact of landscape on linguistic divergence by comparing correlations of linguistic distances with Great Circle (“crow flight”) distances vs. distances calculated taking the terrain into account. This time we expanded our dataset to include Tsezic. We are trying to find ways to solve the problem of the geographic data being so much richer in datapoints than the documented village lects; and of combining slightly different data (such as Swadesh list vs. Jena lists) into a single count. We will tell you about our progress in the last few months in terms of data cleaning, playing with models and kicking each other. Still very much work in progress.

29 September

George Moroz

Phonetic fieldwork and experiments with the phonfieldwork package for R: rOpenSci review

Abstract

There is a lot of different tasks that typically have to be solved during phonetic research. They include creating slides that would contain the stimuli, renaming and concatenating multiple sound files recorded during a session, automatic annotation in ‘Praat’ TextGrids (one of the sound annotation standards provided by ‘Praat’ software, see Boersma & Weenink 2018), creating an html table with annotations and spectrograms, and converting multiple formats between each other (‘Praat’ TextGrid, ‘EXMARaLDA’, ‘ELAN’, subtitles .srt, and .txt from Audacity). All of these tasks can be solved by combining different tools (relabeling is straightforward, Praat contains scripts for concatenating files, etc.). R package phonfieldwork provides a functionality that makes these tasks easy to solve without additional tools, and also as compared to other packages: rPraat, textgRid. During the talk, I will show how the package works and what it can do, explain some changes that were proposed by rOpenSci reviewers and will take your ideas for improvement. The tutorial is available online.

22 September

Ilya Sadakov

A corpus of Tsnal Lezgian

Abstract

Lezgian is a language of the Lezgic branch of the Nakh-Daghestanian language family. Lezgian dialects are subdivided into the Küre dialect group, the Axceh dialect group and the Quba dialect group. The object of my investigation is spoken in the village of Tsnal of the Khivsky district in the Republic of Dagestan and belongs to the Jark’i dialect of the Küre dialect group (Mejlanova 1964). In this talk I am going to present the Tsnal Spoken Corpus I am working on and to discuss some of my early findings on the Tsnal variety of Lezgian.

15 September

Timofey Mukhin

From spatial deixis to anaphora: data from Lezgic and Tsezic

Abstract

Crosslinguistically, demonstratives, in addition to their primary, deictic function, often acquire anaphoric uses. East Caucasian languages have rich inventories of demonstrative pronouns involving not less than three different stems, but do not have dedicated 3rd person pronouns; instead, they use demonstratives. The main goal of my study is to examine which demonstratives are recruited in anaphoric function by obtaining corpus counts, separately for adnominal and independent uses. The study is based on narrative corpora of several languages from the Lezgic and Tsezic branches. I conclude that even closely related languages may show divergent behavior.

8 September

Alexandra Vydrina

Reflexive and the generic use of the second-person pronoun in Kakabe

Abstract

Kakabe (Mande) has a reflexive pronoun with an unusual restriction on its antecedent. It cannot appear with referential nouns and pronouns, where regular personal pronouns are used instead. It does appear, however, with generic and quantified subjects, as well as in infinitival clauses. It also appears in correlative clauses with a relativized subject. I provide an account of this unusual distribution, situate the Kakabe data in the broader typological context, and discuss a possible diachronic path from the second-person pronoun involved in the development of the unusual reflexive pronoun in Kakabe.

30 June

Nikita Muravyev

Verbal agreement and voice in the Uralic languages of Western Siberia

Abstract

Uralic (Ob-Ugric, Samoyedic) languages located in Western Siberia mostly exhibit a pragmatically-driven verbal agreement system whereby argument indexing on the verb depends on topicality of the core arguments. As shown in (Nikolaeva 2001; Dalrymple & Nikolaeva 2011) for Obdorsk Khanty (Northern) and Tundra Nenets, languages tend to use a Subject agreement paradigm for the Topical A > Focal O setting and a special Subject-Object paradigm for Topical A > Topical O and sometimes also for Focal O > Topical O. Additionally, some languages use inflectional Passive (Inverse) forms for Focal O > Topical A. However a deeper look into at least some languages of this area reveals that the usage of similar agreement and voice forms can depend not only on information structure but on a number of other factors, such as referentiality, animacy, number, assertiveness etc. partly resembling hierarchical indexation systems found in the Americas, South Asia and Australia, see e. g. (Zúñiga 2006). In the first part of the talk I will present my own field data from Kazym Khanty (Northern) with a more intricate verbal agreement system based on topicality and definiteness, compared to the situation in Obdorsk dialect. In the second part I will discuss an initial stage of a comparative areal research done by our project team in an attempt to shed some light on this phenomenon and its underpinnings across Khanty dialects as well as in Mansi and Tundra Nenets based on the text data available. Despite very limited size and time depth of existing corpora and text collections and yet a rather small amount of text material annotated and discussed by our team, the data show several tendencies that allow us to assess the overall situation in the region and to speculate about possible diachronic evolution of agreement and voice systems in the languages under investigation. References: Dalrymple, M., and Nikolaeva, I., 2011. Objects and information structure. No. 131. Cambridge University Press. Nikolaeva, I., 2001. Secondary topic as a relation in information structure. In: Linguistics, 39.1: 1–50. Zúñiga, F., 2006. Deixis and alignment: Inverse systems in indigenous languages of the Americas. Vol. 70. John Benjamins Publishing.

23 June

Ezequiel Koile, Michael Daniel, George Moroz, Ilya Chechuro

Quantitative Linguistic Geography of Daghestan

Abstract

We study how geographical factors shape the distribution of languages spoken in Daghestan. An interdisciplinary approach is developed, involving linguistic data, methods based on geographic information systems, and statistics. Using wordlists with the best available granularity, and geolocation data from the Atlas of Multilingualism in Dagestan, we build a geospatial inference modelling for explaining the linguistic diversity of the area. Our project has two stages: (i) A synchronic mapping of the correlation between geographic and linguistic distances, and (ii) a diachronic reconstruction of the speakers’ dynamics, driven by phylogeny and contact events. In this talk, very much a work in progress and an opportunity to discuss the aim and the data, we will only cover the first stage, focusing on the area of North Daghestan where Andic languages are spoken.

16 June

Konstantin Filatov

Anchiq Karata indicative morphology. Allomorphy, inflectional classes and possible diachronic puzzles

Abstract

In this talk I am going to present some results of my fieldwork in 2019: a fragment of description of Anchiq Karata (Andic, Avar-Andic-Tsezic, Nakh-Daghestanian) verbal morphology – the system of indicative verb forms. In the first part of my talk I am going to discuss Time-Aspect markers in three subparts of the paradigm, namely Perfective, Imperfective and Infinitive subsystems. The second part is dedicated to the procedure of establishing and explication of inflection classes. I am going to account for three major verbal inflectional classes (Conjugations) and a few smaller inflectional subclasses of morphophonological nature. Some morphological irregularities in verbal inflection are also going to be surveyed. The third part includes several diachronic questions that Anchiq Karata data on indicative morphology could elucidate in the context of divergence of Central-Andic languages.

9 June

Timofey Arkhangelskiy (University of Hamburg)

Borrowings, frequency and lexical change

Abstract

In this talk, I am going to explore the relations between borrowability, frequency of use and the dynamics of lexical change, based mostly on corpus data. The talk will have two parts. In the first part, I am going to look at frequency distributions of borrowings in the vocabularies of several languages. As I will show, a simple observation that more frequent words are less likely to be (recent) borrowings captures a facet of a much less trivial correlation. Specifically, the probability of a given word being a borrowing increases proportionally to the logarithm of its frequency rank in a sufficiently large corpus. The second part deals with the question of verbal borrowability. It has been long conjectured that verbs are more difficult to borrow than nouns. I am going to demonstrate that a recent empirical proof of that hypothesis by Tadmor et al. (2010) contains a logical fallacy because it implicitly equates borrowability and diachronic instability. I am going to provide an independent argument in favor of the hypothesis of greater diachronic stability of verbs compared to nouns. Nevertheless, it is not clear a priori whether this is a consequence of their lower borrowability or an independent phenomenon. In the latter case, it could alone cause the discrepancy between the observed proportions of borrowings among verbs and nouns, upon which Tadmor et al. based their argument.

2 June

Oleg Belyaev, Michael Daniel

Alternative recipient marking in Ossetic: Once-in-a-lifetime clearest case of contact induced change

Abstract

Ossetic regularly allows marking recipients in ditransitive constructions using either dative or allative case, a kind of variation that closely corresponds to the distribution of dative vs. lative recipients in East Caucasian languages. We show that the semantic motivation for the choice of marking can be described in terms of transfer of ownership vs. spatial transfer; key evidence is provided by the distribution of the two strategies with instances of the verb ‘give’ containing and not containing spatial prefixes. As the phenomenon is not attested elsewhere in Iranian and seems to be extremely rare cross-linguistically, it is more than likely that this feature of Ossetic developed as a result of language contact with Nakh.

26 May

Aigul Zakirova

The emphatic particle =gu in Andi dialects

Abstract

Andi =gu is an emphatic / intensifying enclitic. Beside contexts where it indicates some kind of contrast / emphasis, =gu is found in combination with many types of hosts where its contribution is less clear. With some hosts =gu is obligatory (e.g. cardinal numbers), with others it is optional. Similar enclitics have been observed for other Avaro-Andic and Tsezic languages, cf. Forker 2015 for Avar, Kibrik et. al. 2001: 713 for Bagvalal. In this study I employ natural texts to answer the following questions: what is the functional range of the Andi =gu? The contexts identified in Avar will then be compared to those discussed in Forker 2015.

19 May

Aleksandrs Berdicevskis (The Swedish Language Bank, University of Gothenburg)

Native speakers simplify their language when writing to non-natives on an internet forum

Abstract

It is often claimed that large proportion of non-native speakers in a population facilitates morphological simplification (Trudgill 2011). There exists evidence in favour of this claim, but much is still unclear about the actual mechanism of simplification. Atkinson, Smith & Kirby (2018), relying on the evidence provided by artificial language-learning experiments, hypothesize that an important role is played by the interaction between speakers, primarily accommodation by more proficient speakers to less proficient ones. It is reasonable to expect that the most prominent case of accommodation would be foreigner-directed language (that is, accommodation by L1 to L2 speakers). I test this hypothesis using the resource described in (Berdicevskis 2018), a collection of large corpora of both L1 and L2 natural written production in four languages (English, French, Italian and Spanish), downloaded from WordReference forums. If the hypothesis about the simplicity of foreigner-directed language is correct, we can expect that L1 speakers would use simpler language when responding to messages posted by L2 speakers. I show that in most cases this is true. In the talk, I discuss to what extent these results support the accommodation hypothesis and, more broadly, general theories about the adaptation of languages to socio-cultural environments. References: Berdicevskis, A. (2018). Do non-native speakers create a pressure towards simplification? Corpus evidence. In Cuskley, C. et al. (Eds.): The Evolution of Language: Proceedings of the 12th International Conference, 41–43. doi:10.12775/3991-1.007 Atkinson, M., Kirby, S. & Smith, K. (2018). Adult learning and language simplification. Cognitive Science 42: 2818–2854. doi:10.1111/cogs.12686 Trudgill, P. (2011). Sociolinguistic typology: social determinants of linguistic complexity. Oxford: Oxford University Press.

12 May

Diana Forker

Elevation as a grammatical and semantic category of demonstratives

Abstract

In this talk, I study semantic and pragmatic properties of elevational demonstratives by means of a typological investigation of 50 languages with elevational demonstratives from all across the globe. The four basic verticality values expressed by elevational demonstratives are up, down, level, and across. They can be ordered along the elevational hierarchy (up > down > level/across), which reflects cross-linguistic tendencies in the expression of these values by demonstratives and is grounded in our cognitive representation of the vertical axis and the special position of the ‘vertical positive region’. Elevational values are frequently co-expressed with distance-based meanings of demonstratives, and it is almost always distal demonstratives that express elevation, whereas medial or proximal demonstratives can lack elevational distinctions. This means that elevational demonstratives largely refer to areas outside the peripersonal sphere in a similar way as simple distal demonstratives. In the proximal domain, fine grained semantic distinctions such as those encoded by elevational demonstratives are superfluous since this domain is accessible to the interlocuters who in the default case of a normal conversation are located in close proximity to each other. I then discuss metaphorical extensions of elevational demonstratives to non-spatial uses such as temporal and social deixis. There are a few languages in which elevational demonstratives with the meaning up express the temporal meaning future, whereas the down demonstratives encode past. This finding is particularly interesting in view of the widely-debated use of Mandarin Chinese spatial terms ‘up’ for past events and ‘down’ for future events, which show the opposite metaphorical extension. I finally examine areal tendencies and potential correlations between elevational demonstratives and the geographical location of speech communities in mountainous areas such as the Himalayas, the Papuan Highlands and the Caucasus. I conclude that the data from elevational demonstratives do not support the Topographic Correspondence Hypothesis because languages spoken in similar topographic environments do not tend to have similar systems of elevational demonstratives if they belong to different language families.

28 April

Gilles Authier

Verbal morphological complexity in Lezgic languages

Abstract

The Lezgic branch of East Caucasian, which comprizes about twelve distinct languages, is very diversified typologically, in particular in the verbal morphology. Lezgic verbal systems grammaticalise variable sets of categories, and show differents types and levels of complexity. Based on an overview of the attested morphological realisations of these verbal categories in all Lezgic languages, our presentation will endeavour to link probable cases of increased and decreased complexity in verbal systems (judging form what we can hypothesize about the proto-Legic verbal system) with two main socio-linguistic considerations: the size of linguistic communities in diachrony and the influence of contact with non-Lezgic languages.

21 April

Daniel Wilson

Initial report on documentation of Sagada, Tsez Language technology for endangered languages

Abstract

I am a research fellow at the University of the Free State in Bloemfontein, South Africa and now working with the Department of Caucasian Languages at the Institute of Linguistics, Russian Academy of Sciences. I also work for XRI, a research institute which was designed to bridge the gap between academia and humanitarian development initiatives. This talk has two parts. First, I will be presenting an update on the status of my research on the Sagada dialect of Tsez, which I began last summer. The Tsezic languages are a sub-branch of the Nakh-Daghestanian (or East Caucasian) language family. The Tsezic languages are divided into two groups: the East Tsezic group (Bezhta and Hunzib) and the West Tsezic group (Tsez, Hinuq, Khwarshi, and Inkhowari). There is consensus in the prior research on Tsez that it should be divided into two main dialects: Tsez and Sagada, with further dialectal variation within Tsez (Imnaishvili 1963, Radjabov 1999, Abdulaev 2011, Comrie 2007, Polinsky 2015, etc.). The main division between Tsez and Sagada has been made based primarily on the variation noted by Imnaishvili in the middle of the 20th century. The data collected by Imnaishvili have provided most of the present knowledge about Sagada. It has even been noted that Sagada may rightfully be considered a distinct language (Maria Polinsky, Bernard Comrie p.c.). In this talk I will include sociolinguistic details from native speakers of Sagada, specifically how they view their language and its mutual intelligibility with the larger dialects of Tsez. I will also draw attention to phonological, morphological, and lexical similarities and differences between Sagada and Tsez. Second, I will give a brief presentation on the latest developments in language technology for low and zero-resource languages. This is based on my work at XRI and my attendance at the recent conference at the UNESCO headquarters in Paris titled “Language Technology for All.” These latest developments are having a large impact around the world to document and revitalize endangered languages and could be useful for languages in the Caucasus.

14 April

Alexandra Vydrina

Sentence Focus in Kakabe

Abstract

This talk draws attention to the diversity of pragmatic functions of Sentence Focus utterances in natural speech on the example of Kakabe, a Western Mande language. It is often ignored in the literature that SF can play multiple roles in discourse. Presentational ‘out-of-the blue’ utterances answering the questions ‘What happened?’ or ‘What’s new?’ are often considered as their main or even their only type of use. Yet the analysis of natural texts shows that SF utterances are at least as frequently used with the so-called explicative function (Sasse 1987; 1996; Matras and Sasse 1995) and the even lesser known inferential function, studied by Declerck (1992), Delahunty (1995; 2001) and Bearth 1992; 1997; 1999b). In particular, I will highlight the intersubjectivity aspect of speech production that is crucial in the understanding of how Inferential SF utterances are used. I will show on the example of Kakabe, a Western Mande language, that when natural speech is considered, apart from introducing all-new events, SF utterances turn out to be associated with a rich array of discourse strategies, such as explicative, elaborative, disruptive functions, etc. Accordingly, the discourse properties of the referents inside SF are subject to variation, and crucially, they affect the implementation of the focus-marking.

7 April

Tatiana Philippova, Anastasia Panova

Preposition drop and language contact: The case of Daghestanian Russian

Abstract

This paper studies the phenomenon of preposition drop — cases where preposition does not appear when we expect it to — in particular, in locative, directional and temporal adverbial phrases. We review and classify the existing analyses of the phenomenon that were proposed for different languages, predominantly non-standard and contact varieties. Next, we proceed to our quantitative study of preposition drop in Russian spoken in Daghestan, based on data collected from the sociolinguistic interviews of the DagRus corpus. We show how preposition drop depends on various linguistic and sociolinguistic factors, employing statistical methods. Level of Russian, preposition type and phonetic context turn out to be good predictors for preposition drop. We propose a functional explanation for the observed pattern.

31 March

Anastasia Yakovleva

Greek diglossia: a case study of spatial marking in Katharevousa

Abstract

According to Ferguson (1959), in a diglossic situation two distinct varieties of a language (‘high’, learned by formal education, and ‘low’ colloquial) are spoken in the same community. He claims that High variety always exists in a stable codified form, whereas Low demonstrates wide variation in grammar and vocabulary. Although this is the case for some diglossic societies (such as Tamil), for others the situation is the opposite . My corpus analysis of Katharevousa (official language of Greece till 1976) demonstrates the instability of this register in the domain of spatial relations.

24 March

Samira Verhees

What is a quotative evidential, and does it exist?

Abstract

Reported speech markers constitute a substantial part of evidentiality’s semantic domain. At the same time, the internal division of this subdomain into specific values remains disputed. Aikhenvald (2004) proposed an important distinction of reportative and quotative markers: reportatives refer to information based on hearsay, while quotatives refer to information based on the verbal report of a particular source. Several authors have since argued that quotatives (in contrast with reportatives) are not proper evidentials (among them are Boye (2010) and Holvoet (2018)), because they designate a proposition, rather than specify the speaker’s information source. In this talk I will discuss the typology of reported speech evidentials and compare the properties of “quotative” markers from a variety of languages to determine whether they can be viewed as evidentials.

17 March

Polina Nasledskova

Denominal postposition in East Caucasian languages

Abstract

This is a study of grammaticalization sources of postpositions across East Caucasian languages. The focus is on the postpositions grammaticalized from nouns denoting body parts. While these nouns cross-linguistically often grammaticalize into spatial markers, in particular adpositions, this path does not seem to be typical of (all) East Caucasian. Postpositions from body parts are not equally spread across the family. Some languages have many, some few, and some none at all. The goal of this study is to provide an account for their distribution in genealogical and areal terms.

Timofey Mukhin

Anaphora and spatial deixis in East Caucasian: an overview of the data

Abstract

Demonstratives, in addition to the main deictic uses, cross-linguistically often acquire a number of other functions, including anaphora. The majority of East Caucasian have rich inventories of demonstrative pronouns that employ three or more different stems covering various dimensions of deixis (distance, altitude and other). Most languages of Dagestan do not have a special 3rd person nominal pronoun and use attributive demonstratives instead. The main goal of my study is to examine what governs the choice of the stem to be used anaphorically (for example only proximal one) or, if all stems of an inventory are used in this function, how are they distributed in this function by frequency. My study is based on grammars and samples of narrative texts. In this presentation, only data from Lezgic languages will be discussed.

10 March

Peter Arkadiev

Non-canonical inverse in Circassian and Abaza: borrowing of morphological complexity

Abstract

In this paper I discuss a typologically peculiar inverse-like construction found in the polysynthetic ergative Circassian languages of the Northwest-Caucasian family and will argue that this construction has been borrowed into Abaza belonging to a different branch of the same family. These languages possess a cislocative verbal prefix, which, in addition to marking the spatial meaning of speaker-orientation, systematically occurs in polyvalent verbs when the object outranks the subject on the person hierarchy. The inverse-like use of the cislocative in Circassian differs from the “canonical” direct-inverse system in that, first, it is fully redundant since the person-role linking is achieved by means of the person markers themselves and, second, it does not occur in the basic transitive construction, featuring instead in configurations involving an indirect object both in ditransitive and bivalent intransitive verbs. I argue that the similar use of the cislocative prefix observed in Abaza is a result of pattern-borrowing from Kabardian, with which Abaza has been in intense contact, and that this borrowing has resulted in the increase of both paradigmatic and syntagmatic complexity of Abaza verbal morphology.

3 March

Evgeniya Budennaya

Non-pro-drop in the Baltic Area: for and against contact-induced origin

Abstract

Five geographically close languages to the east of the Baltic sea – Russian (East Slavic), Latvian (Baltic), Ingrian, Votic and Ingrian Finnish (Finnic) – use a similar pattern for marking subject reference. In this pattern both personal pronouns and subject agreement on the verb are employed (from ⅔ to ¾ of all occurrences). This happens with all types of personhood: (1) Russian: Ja id-u domoj I.NOM go.PRS-1SG home ‘I go home’ (2) Latvian: par k-o t-u domā-ø ? - about what-ACC 2SG-NOM think.PRS-2SG ‘What are you thinking about?’ (3) Ingrian: hǟ kūl-i-ø 3SG.NOM die-PST-3SG ‘She is dead’ However, this double-marking pattern is extremely uncommon over the world where most languages are either pro-drop with verbal inflection (61%, WALS) or non-pro-drop without any additional verbal inflection (Siewierska 2004). Taken together the geographical proximity of the languages under discussion and the typological rarity of the referential pattern itself, one can treat it as an areal feature which could not arise independently (Kibrik 2013). The talk will trace this feature diachronically and discuss the results for Russian, Latvian and minor Finnic. Special attention will be given to the controversy of whether we deal with a similar contact-induced change in Latvian and minor Finnic or with two different processes that eventually converged into an apparently similar pattern.

25 February

Chiara Naccarato, Samira Verhees, Timofey Mukhin, Rita Popova, Lev Kazakevich, Konstantin Filatov

Typological atlas of Daghestan: state of affairs and future plans

Abstract

The typological atlas of Daghestan will be a WALS style resource containing information about linguistic features in the languages of Daghestan. Data for this resource are retrieved from grammars and organized into databases which are then used to generate maps. The final product will be a tool for the visualization of information about linguistic structures characteristic of Daghestan, but also a useful resource for bibliographical research on parameters of interest. In this talk we will discuss the state of affairs of the project and our future plans. We will briefly present the preliminary results related to the features that are currently being developed, and we will discuss some technical issues concerning the design of introductory texts and the generation of maps.

18 February

Maria Morozova, Maria Ovsjannikova, Alexander Rusakov

A dialectometric study of Albanian varieties: linguistic complexity and language contact history

Abstract

The goal of our study is to examine the Albanian dialect continuum using the quantitative methods of dialectometry and interpret the results in terms of the history of the Albanian dialect landscape, in particular its contact history. Our data come from the Dialectological Atlas of Albanian Language that maps phonological, morphological and lexical features of 131 Albanian varieties of the main dialectal area. Using distance calculation, MDS analysis and hierarchical clustering, we estimate and visualize the closeness of these varieties and analyse it against their geographical distribution and the traditional classification of Albanian dialects. The main focus of our talk will be on the notion of linguistic complexity as applied to the Albanian dialect continuum. We identify 27 phonological and morphological parameters as binary complexity/simplicity features, examine their realization in the varieties under study and assess the relation between intensity of contact and linguistic complexity. Then we will briefly discuss the distribution of the varieties in terms of 212 lexical features to compare their grammatical and lexical closeness.

11 February

Natalya Stoynova, Irina Khomchenkova

Contact-influenced Russian of Northern Siberia and the Russian Far East

Abstract

We will present a new small corpus of contact-influenced Russian speech, namely the Corpus of Russian spoken in Northern Siberia and the Russian Far East, and several case studies on contact phenomena in grammar, based on its data. The corpus consists of short oral texts, mostly narratives, collected as a “by-product” of language documentation projects (some of them are Russian versions of texts in the corresponding indigenous language). The current size of the corpus is ca. 34 hours (78452 tokens). The majority of texts come from two regions - the Taimyr peninsula (the subcorpus of Samoyedic Russian) and Khabarovsk Krai (the subcorpus of Tungusic Russian). The most important feature of the corpus is the manual annotation of grammatical and lexical contact-induced features. It will be discussed in detail with a focus on problematic cases. To illustrate the range of problems that can be investigated on the data of this corpus, we will also present several case studies. First, we will try to trace a post-pidgin continuum, attested among the Nganasans, on the data of non-standard gender agreement patterns (cf. бабка помер ‘old woman die.pst.masc’). Second, we will consider a problem of identifying morpho-syntactic calques on the example of non-standard numeral constructions, attested in Tungusic Russian (пять дом ‘five house.nom.sg’, двое сыновья ‘two son.nom.pl’). Third, we will illustrate the problem of differentiation between contact-induced and dialectal features on the example of pluperfect be-constructions (умер было ‘(he) died be.pst’) and non-standard coordination patterns with the particle da (офицер=да, майор=да ‘officers=ptcl majors=ptcl’), attested in the speech of the Nanais. Finally, we will discuss some general quantitative data on contact-induced grammatical features attested in the corpus, namely the general frequency distribution of different types of these features and individual profiles, compiled for several speakers.

4 February

Ezequiel Koile, Konstantin Filatov, Michael Daniel

Bayesian phylogenetic analysis and wordlist handling

Abstract

In this talk, present an introduction to modern Bayesian phylogenetic analysis in historical linguistics. Algorithms will be discussed with special focus in its conceptual motivations, as well as its scope and limitations. Wordlist building and handling will be approached from a practical perspective, including recommendations and examples of implementation. Cross-linguistic online resources and edition tools for this task will be presented. At the end of the seminar, in a separate bonus track we will introduce you to the ongoing collection of 100 Swadesh lists in Daghestan. Our approach emphasizes the importance of strict provenance of the list (village of collection) and the importance of the protocol for data collection (contextualization of the lexical items). If we have time, we will show the preliminary results of the project - the tree of the Andic branch as based on our data.

28 January

Ilya Yakubovich (Russian Academy of Sciences / University of Marburg)

Correlates of Language Shift in Population Groups vs. Epigraphic Cultures

Abstract

The scholars focusing on sociolinguistic situations in ancient societies have no direct access to information about the boundaries of language communities at given points in time but have to study them through the prism of the available written sources. Since the notion of language shift is equally applicable to population groups and epigraphic cultures, and since it can be accompanied by contact-induced changes in both cases, it is appropriate to ask a question how the difference between the two types of communities correlates with different manifestations of language contact. Both corpora of spoken languages and archaic written texts commonly feature results of lexical transfer (borrowings) as well as structural interference. They seem, however, to exhibit inverse chronological correlation between the two types of contact-induced changes and the moment of language shift. When language contact is observed among population groups, lexical expansion frequently predates the shift to a new native language, while structural changes continue to reflect substrate influence after the act of language shift has taken place. In contrast, in the instance of archaic epigraphic cultures (e.g. hieroglyphs or the cuneiform), late or peripheral texts in a particular language frequently reveal deviations in grammatical structure, while written language shift in a scribal community may be accompanied by the retention of graphic loanwords (heterograms), which represent the language of tradition. In my talk I intend to address the reasons for such an asymmetry, and to discuss the peculiarities of some contact situations that do not fall under the proposed generalization.

21 January

Nina Dobrushina, George Moroz

The speakers of minority languages are more multilingual

Abstract

Population size is often discussed as a factor which might have influenced patterns of language and cultural evolution (Bowern 2010; Donohue & Nichols 2011; Nettle 2012; Bromham et al. 2015; Greenhill et al. 2018; Koplenig 2019; see Greenhill 2014 for an overview). In this paper, we advance the hypothesis that the larger is the population of language speakers, the less is the number of L2 mastered by these speakers. The correlation between the size of language population and the level of multilingualism of its speakers is tested statistically on a large body of empirical data from Dagestan. Due to the digitalization of the Census of 1926, we have at our disposal reliable information about the population of Dagestanian villages before the urbanization and the rise of population caused by the First Demographic Transition. These data allow us to make credible estimations of the number of speakers of all languages and dialects of Dagestan before the changes of 20th century. The second type of data comes from a field study of Dagestanian multilingualism. Language repertoires of 4,032 people were collected and coded by using the method of retrospective family interviews during field research in 2011 - 2019. The data on multilingual repertoires covers only a small part of the villages for which we have population sizes, namely 54 villages speaking 29 different lects. We match population sizes of these 29 languages and the number of L2s spoken by the speakers of these languages from 54 villages, and run a Poisson mixed effects regression model that predicts the average number of second languages spoken by speakers from L1 communities of different size. The study confirms the hypothesis that the size of language population is negatively correlated with the multilingualism of the language community.

14 January

Aigul Zakirova

The particle =OK in the Volga-Kama languages: contexts of use and frequencies of lexicalization

Abstract

In this talk I will focus on the emphatic identity particle in the Volga-Kama languages (Chuvash =aχ/ =eχ, Tatar and Bashkir =uk/ =ük, Meadow Mari =ak, Hill Mari =ok and Udmurt =ik). The particle was borrowed from the Turkic (Bulgar) lects to the Finno-Ugric lects of the Sprachbund. Its contexts of use overlap to a fair extent in different languages but do not fully coincide. One of the core functions is that =OK is used when an argument of a proposition is identical to an argument of a different proposition (e.g. that house=OK ‘the same house’, referring to a previously mentioned house). In this talk I will take a closer look at this and other contexts of use of =OK in the languages of the area. Besides, =OK tends to attach and become lexicalized with some items, mostly adverbial expressions. I will present data on the frequencies of such collocations with =OK in the Volga-Kama Sprachbund.

Seminar schedule 2019

24 December

Konstantin Kazenin

Ethnicity, speaking indigenous languages and fertility in the North Caucasus

Abstract

Regions of the North Caucasus have experienced considerable social changes within the recent 4-5 decades, which included intensive urbanization, loosening of traditional family norms, weakening of gender asymmetries and lowering of empowerment of elder generations in communities and families. These processes started at different time and went with different speed in the republics of the North Caucasus, but in all the republics they were accompanied by a significant decrease of fertility, known as the First Demographic Transition (FDT) in population studies. That decrease was not at all unexpected, as ‘detraditionalization’ changes of the kind recently observed in the North Caucasus and the First Demographic Transition took place nearly simultaneously in many parts of the world. However, a number of cultural characteristics of the North Caucasus allows to address some issues concerning the First Demographic Transition which are difficult to consider elsewhere. First, at the start of the social and demographic changes women of different ethnicities differed in their fertility levels. As urbanization made interethnic contacts more intensive, the question arises whether these differences were eliminated in the process of the fertility decrease. Second, in urban population of the North Caucasus, including younger generations, the proportion of speaking indigenous languages is not negligible. Therefore, it is possible to consider the hypothesis that in families where an indigenous language is spoken in everyday life fertility will not decrease so radically as in families where Russian is the only spoken language of the parents. If this expectation is borne out, we get an interesting case where linguistic and reproductive behavior are related. The talk will start with introducing the concept of the First Demographic Transition. Then we present the key demographic changes observed in the North Caucasus between the 1970s and the 2010s. After that, using data from the Republic of Daghestan (Russian Census 2010 and a sample survey of 2018), we demonstrate that ethnic differences in fertility were preserved to different extent in different educational groups of urban population. Also, using data from the Republics of Ingushetia (a sample survey of 2019), we show that parents’ speaking an indigenous language really has a positive significant relation to fertility.

17 December

Paul Phelan

The periphrastic causative in West Circassian

Abstract

West Circassian, along with the other languages of the Northwest Caucasian family, is a highly polysynthetic language with complex verbal morphology. One marker in particular, the causative marker ʁe- , is highly productive. The morphological causative is commonly used to derive transitive predicates from nontransitive verbs and nominals. It is also, clearly, an important instrument for expressing the semantics of causation and in the associated valency increasing operation. In some cases ʁe- has even calcified on predicates, so that the predicate has no meaning without the causative prefix. It is therefore rather remarkable that a periphrastic causative strategy has arisen in West Circassian. This construction is based on the matrix verb ṣ̂ə- ‘to do, make’, which pairs with a lexical verb with the purposive suffixation. Based on observations of the behavior of personal indexes on the matrix verb it is apparent that this structure is noncompositional and has grammaticalized with causative semantics which are the same as those of the morphological causative.

10 December

Anastasia Panova, Tatiana Philippova

A detailed corpus study of preposition drop in DagRus: preliminary results

Abstract

In this talk we will discuss the phenomenon of preposition drop (omission) that is observed in the speech of L2 speakers of Russian with a Nakh-Daghestanian or Turkic (Kumyk, Azerbaijani) first language. We shall first review insights on the issue from the existing literature on the topic (Daniel & Dobrushina 2009, Daniel, Dobrushina & Knyazev 2010, Daniel & Dobrushina 2013 for Russian spoken in Daghestan; Stoynova and Shluinsky 2010 for Russian spoken by the Enets people; Khomchenkova, Pleshak and Stoynova 2017 for Russian of Northern Siberia and Russian Far East; Shagal 2016 for Russian spoken by the Erzya people), then present the data on preposition drop in the Russian speech of Kumyk and Azerbaijani native speakers, and suggest working hypotheses about the theoretical interpretation of the phenomenon that would guide our further work.

3 December

Polina Nasledskova

Causative alternations database

Abstract

Causative alternations database There are several kinds of correspondence between causative and non-causative verbs: one can be derived from the other, they can be suppletive etc. These correspondences differ not only from language to language, but also from one causative pair to another within one language. The database was created so that one could deal with exact numbers of these alternations in many languages. I am going to talk about the data, the structure of the database and its challenges.

Ezequiel Koile

The Database of Cross-Linguistic Colexifications CLICS³: Data-driven semantic research from a cross-linguistic perspective

Abstract

The term colexification (François 2008) refers to instances where the same word expresses two or more comparable concepts, covering instances of polysemy, vagueness, and homonymy. The comparative study of colexifications across languages allows for the construction of semantic maps, a useful tool for the study of lexical typology and beyond, ranging from studies on semantic change, patterns of conceptualization, and linguistic paleontology. In this talk, we describe the recently released third version of the Database of Cross-Linguistic Colexifications, CLICS³ https://clics.clld.org/ (Rzymski, Tresoldi, et al. 2019), a computer-assisted framework for the interactive representation of cross-linguistic colexification patterns, containing data for 2811 concepts across 2955 languages.

26 November

Nina Dobrushina

Challenges of variation

Abstract

Variation is inherent to all languages. Ɨt seems, however, that the degree of variation can vary from language to language. It is sometimes claimed that languages with writing systems show more variation than unwritten languages. It was also argued that small languages have less variation than large languages with many L2 speakers. It seems, however, that none of these conjectures were ever empirically tested. In fact, to date we have no methods which would allow measuring and comparing the amount of variation between languages. In this talk I want to raise this problem rather than suggest a solution.

George Moroz, Samira Verhees

Catching variation during fieldwork on Nakh-Daghestanian languages

Abstract

During fieldwork researchers have to deal with all kinds of variation in the answers given by speakers: free variation, idolectal or sociolinguistic variation. In the present investigation we studied the degree of variation among 44 speakers of Zilo Andi for 16 different morpho(no)logical features known to be variable in this dialect. Additionally, we conducted a survey among a number of researchers of Nakh-Daghestanian languages, asking them about their fieldwork habits - including questions about how many speakers they usually consult. We used these data to evaluate the probability that an average researcher of Nakh-Daghestanian languages catches the observed variation during fieldwork.

19 November

Olesya Khanina, Andrey Shluinsky, Yuri Koryakov

Enets in space and time: a study in linguistic geography and history

Abstract

This paper summarises a joint study by Yuri Koryakov, Andrey Shluinsky, and myself, see (Khanina et al. 2018a, Khanina et al. 2018b). Through a series of linguistic maps based on published ethnographic data and our fieldwork accounts, we reconstruct the territories in which Forest Enets and Tundra Enets (Samoyedic, Uralic; Central Siberia) have been spoken from the 17th century till today. We analyze in details migrations of the two ethnic groups and the changing language contact scenarios. One of the most intriguing findings of this study is an explanation of the Forest Enets - Tundra Enets puzzle. There is no unanimity whether they are separate languages or dialects of the same language. Ethnographically, the two linguistic communities are clearly distinct, have different self-nominations, and do not consider themselves as belonging to the same ethnic group. As our field experience has shown, the degree of modern mutual comprehension is not a neutral question and depends on the stance that a speaker takes at the moment of conversation, whether stressing the difference between the two ethnic groups or aiming at reaching his/her communicative goal. Whereas the phonologies of Forest Enets and Tundra Enets suggest a split of at least several hundred years ago, and lexicostatistical calculations go even further by dating the split ca. one thousand years ago, the match between the two Enets grammars is so striking that it contradicts this scenario. So here the linguistic geography steps in, documenting migrations that led to a secondary convergence in the 19th century of the once more distinct Enets lects, which was later, in the beginning of the 20th century, followed by a secondary divergence. We support this historic hypothesis with a catalogue of all features that separate the two Enets varieties and with linguistic maps reconstructing changes in the territories of the two ethnic groups in the last 300 years. References Khanina, Olesya, Koryakov, Yuri & Andrey Shluinsky. 2018a. Enets in space and time: a case study in linguistic geography. Finnisch-Ugrische Mitteilungen 42, 109-135. Khanina, Olesya, Shluinsky, Andrey & Yuri Koryakov. 2018b. Forest Enets and Tundra Enets: how similar/different are they and why? Paper presented at the 7th international conference on samoyedology, October 2018 (Tartu, Estonia).

12 November

Alexandra Vydrina

Fouta-Djallon multilingualism

Abstract

Western Africa occupies a central place in the research on multilingualism due to the studies on the sociolinguistic situation in Cameroon (di Carlo 2018) and the Casamance area in Senegal (Lüpke and Storch 2013; Lüpke 2016). The study focuses on yet another case of multiligualism in Western Africa by discussing the multilingualism patterns in the area of Fouta-Djallon plateau in Guinea. The situation will be analyzed in the perspective of communities speaking Kakabe, a minor language spoken in about fifty villages. The involved languages are Kakabe, Maninka, Pular, and, to a lesser extent, Sussu, with Pular belonging to the Atlantic family and the three other languages to the Mande family. In my talk, I will analyze the attested multilingualism patterns in different types of language practices. The study is based on a multi-media oral corpus representative of a variety of genres and containing data that I have been collecting in the region since 2009.

29 October

Natalya Serdobolskaya

Morphosyntax of complement clauses in East Caucasian languages: long-distance agreement

Abstract

The East-Caucasian languages (Nakh-Daghestanian) show a number of puzzling structures that are challenging from the theoretical point of view: non-finite clauses where all the arguments are encoded in the same way as in independent sentences, backward control, long-distance reflexive pronouns and long-distance agreement in complement clauses. This talk is focused on long-distance agreement in East Caucasian languages. First, I discuss the phenomenon in Qunqi Dargwa. The infinitives and converbs are the only complementation strategies that allow long-distance agreement. In Qunqi, there is a fuzzy boundary between infinitives and indirect mood forms. The converbs are used both with control verbs and with emotive and perception complement-taking verbs. The long-distance agreement pattern is only observed with control verbs. I show that these structures show properties of clause union. Then I consider the data of 19 East-Caucasian languages (mostly based on the data from Kibrik 2005 “Materials to the typology of ergativity”), and discuss the long-distance agreement patterns in those languages. In most of these languages this phenomenon is limited to control constructions, while Tsez and Tsakhur deviate from this generalization.

22 October

Albert Davletshin

Subgroups, linkages and beyond: Working on shared innovations in Eastern Polynesian languages

Abstract

Polynesia covers a vast territory of the planet. It includes a large number of speech communities which descend from a common ancestor; some them are isolated by thousands of miles of open ocean. No wonder, Polynesia has always been the favorite place for both linguists and anthropologists working on phylogenetics. The standard account of the Eastern Polynesian subgrouping is that the language of Easter Island (Rapanui) forms a branch on its own, coordinated with Central Polynesian languages; CE in turns branches into Tahitic and Marquesic (Roger 1985). It has been becoming more and more evident that Tahitic and Marquesic are not valid subgroups (Vladimir Belikov 2009, Mary Walworth 2014). In my talk, I am going to show that Rapanui, Mangarevan, North and South Marquesan constitute a subgroup within Eastern Polynesian languages. Interestingly enough, this proposal implies some phonological and lexical innovations spreading across the Pacific. The main objective of the talk is to discuss the latter and their implications for the theory of language.

15 October

Alexander Shiryaev, Michael Daniel, George Moroz

Glottalized /lˀ/ in Rikvani Andi

Abstract

The opposition of the geminate and singleton ejective lateral stop /L’/, reconstructed to proto-Andic, has been lost in various Andic languages due to the phonetic evolution of the simpleton into a different soundtype. The speakers of Rikvani Andi, a dialect of Andi (Andic, East Caucasian) spoken by 800 people in the village of Rikvani in Dagestan, developed a glottalized lateral consonant as a reflex. Glottalized sonorants are a typological rara. While they have been sparsely attested in various areas where glottalic initiation also occurs with stops (ejectives), Rikvani Andi is the only variety of a Caucasian language where it has so far been reported. The present study is an acoustic analysis of the sound as documented in the field data collected by the HSE team in the village. Data from six speakers’s word list elicitation as well as from spontaneous texts from two speakers are used. This study is the first that considers glottalized sonorants in connected free speech. We test several generalizations and observations previously made wrt glottalized sonorants (Um 2001, Maddieson and Larson 2002, Maddieson et al. 2009). We confirm presence of strong variation of the realization of the glottliazed /lˀ/ across speakers in terms of presence of creaky voice, focus of glottalization, intensiity and formant structure. We do not confirm Um’s suggestion that the peak of glottal construction (pre- vs. post-glottalized realization) is affected by the poisition of the sound in syllable/word. We conclude that the distinctive feature is the presence of creaky voice and intensity, but these features become less pronounced in free speech. Finding the acoustic cue that serves to distinguish glottalized sonorant requires further acoustic research. Variation across speakers and fading away of the cues in connected speech may suggest that a merger of /lˀ/ and /l/ is on its way. Various visualization techniques highlighting the nature of /lˀ/ will be discussed.

1 October

Chiara Naccarato, Samira Verhees

Fieldtrip to Botlikh (Daghestan)

Abstract

In August of 2019 we visited the village of Botlikh to study the Botlikh language. Our aim was to collect some data for a small investigation on agreement patterns of ordinal numerals. In addition, we translated several texts recorded by Togo Gudava in the 1950s-1960s, met a number of potential language consultants and learned some new things about the sociolinguistic situation in Botlikh. We will talk about the trip and our future plans for working on this language. The Third School on Statistical Methods for Linguistics and Psychology (University of Potsdam, Germany)

24 September

Ezequiel Koile

Phylogeography of the Bantu Expansion

Abstract

Bantu expansion is among the most important and least understood human migrations. Bantu-speaking populations (240 million people, 500 languages, spanning 9 million km2 [1]) are the result of a huge migration originating in a homeland near the border of Nigeria and Cameroon between 4,000BP and 5,000BP [2,3,4,5,6,7,8]. Although the homeland and the time depth are well established, the migration route is still unclear. Recent phylogenetic studies [1,9,6,7,8] support the late-split [10,11,12,13,14], which claims that East-Bantu and West-Bantu languages’ common ancestor crossed the African Rainforest, splitting after this. It is thought that this crossing was made through the Sangha River Interval (SRI), a N-S savanna opening into the rainforest. However, in dated phylogenies [7], dates don’t match consistently: They should have crossed this corridor around 4,000BP, while it was completely open only 2,500BP. We propose two different hypotheses for competing with the traditional SRI late-split. The first, a coastal savanna corridor [15]. The second, an earlier paath through the rainforest. We compare the hypotheses with a Bayesian phylogeographic approach based on linguistic trees. We use lexical and geographical data for 400+ Bantu and Bantoid languages, inferring the linguistic and geographic history in parallel, by implementing the break-away model [16] in BEAST2 [17]. We conclude that the way through the rainforest happened around 4,000BP.

Chiara Naccarato, Natalya Stoynova, Anastasia Panova

Contact-influenced word order in genitive noun phrases: A corpus-based investigation of Russian spoken in Daghestan

Abstract

The paper deals with non-standard word order in the variety of Russian spoken by bilinguals from Daghestan. Specifically, we focus on the occurrence of prepositive genitive modifiers in bilinguals’ speech. Whereas in monolinguals’ Russian the neutral and most frequent word order in noun phrases with a genitive modifier is the order N+GEN, in Daghestanian Russian the opposite order GEN+N often occurs. This phenomenon was mentioned as one of the striking morphosyntactic features of Daghestanian Russian, and its frequent occurrence can be partly explained in terms of syntactic calquing from speakers’ L1s, all featuring an unmarked GEN+N order in noun phrases. However, the picture is far less trivial than it could look at first sight. On the one hand, the word-order pattern GEN+N does not seem to affect equally all types of genitive noun phrases in Daghestanian Russian. On the other hand, similar examples of non-standard word order are sometimes found in monolinguals’ speech too. In the course of the paper, we present the results of our corpus-based investigation of genitive noun phrases in Daghestanian Russian as compared to monolinguals’ spoken Russian, including dialectal varieties. Prepositive genitives appear to be favored by several lexico-semantic and processing features of both the head and the genitive dependent. The strongest factor is kinship semantics: noun phrases that express a kinship relation tend to be prepositive. In monolinguals’ spoken Russian, although prepositive genitives are very infrequent, they sometimes show similar lexico-semantic and processing features. Therefore, we are not dealing with a simple calquing process. Rather, L1 influence is manifested in the strengthening of some tendencies existing in monolinguals’ Russian too.

17 September

Michael Daniel, Ilya Chechuro, Samira Verhees, Nina Dobrushina

Lingua francas as lexical donors: quantitative field study

Abstract

The paper investigates the role that the rate of bilingualism plays in lexical borrowing. Our data comes from Daghestan, an area of high language density. Based on loanword counts, we isolate two zones of lexical influence, the south, heavily influenced by Azerbaijani, and the north, dominated by Avar. This salience of Avar and Azerbaijani as donor languages is likely to reflect the historical role of these languages as lingua francas in their respective geographical zones. The study supports the idea of Brown (1996, 2011) that contact influence from a lingua franca is higher than from a language only used to communicate with its L1 speakers. In line with the widespread argument that the amount of contact-induced change from a language is proportional to intensity of bilingualism (Thomason & Kaufman 1988), Brown stipulates that the importance of lingua francas as lexical donors must be linked to the high rate of bilingualism in these languages. The bilingualism in Azerbaijani and Avar was indeed high, as the evidence from field research on traditional language repertoires of Daghestanian highlanders shows. On the other hand, the knowledge of two other locally important languages, Chechen and Georgian, which was, at some locations, only slightly lower, did not lead to the same level of lexical transfer; in fact, the amount of Georgian and Chechen borrowings seems disproportionately low. High bilingualism rates are thus not sufficient for a language to become a major lexical donor. At the level of methodology, the paper explores the prospects of using short wordlists as ‘contact probes’, tools for measuring lexical contact. We follow the approach by Haspelmath & Tadmor (2009) and Bowern et al. (2011) in applying a fixed list of concept to quantify lexical contact between languages. Based on field elicitations conducted in a number of villages in the Republic of Daghestan, a list of 160 concepts is shown to be efficient enough to differentiate the degrees of lexical impact from the locally important L2’s to minority languages. The method does not only ensure comparability across contact situations but also provides a level of resolution that is sensitive to differences between villages speaking the same language. By fine-tuning the wordlist to a different linguistic setting, the methodology suggested here may be extended to other geographical areas of intense language contact and become a tool for reconstructing multilingual patterns of the past.

11 June

Chiara Naccarato, Natalya Stoynova, Anastasia Panova

Contact-influenced word order in genitive noun phrases: A corpus-based investigation of Russian spoken in Daghestan

Abstract

In a recent paper (Naccarato, Panova & Stoynova Forth.), we have examined cases of non-standard word order in the variety of Russian spoken by bilinguals from Daghestan. Specifically, we have restricted our analysis to the noun phrase, and have looked at the occurrence of prepositive genitive modifiers in bilinguals’ speech. As we have shown, whereas in Standard Russian the neutral and most frequent word order in noun phrases with a genitive modifier is the order N+GEN (muž sestry), in Daghestanian Russian the opposite order GEN+N (sestry muž) often occurs. This phenomenon has been partly explained in terms of syntactic calquing from speakers’ L1s, all featuring a neutral GEN+N order in noun phrases. However, such inversion in word order does not seem to equally affect all types of genitive noun phrases in Daghestanian Russian, but appears to correlate significantly with noun phrases featuring kinship semantics. Moreover, similar examples of non-standard word order are sometimes found in monolinguals’ speech too, which makes the picture far less trivial than it could look at first sight. In this talk, we present the latest results of our corpus-based investigation of genitive noun phrases in Daghestanian Russian as compared to monolinguals’ spoken varieties of Russian, with the aim of explaining the factors boosting non-standard word-order realizations.

4 June

Samira Verhees, Chiara Naccarato

Animacy agreement in Botlikh: ordinal numeral

Abstract

Botlikh (Avar-Andic, East Caucasian) features a two-fold animacy agreement system including, on the one hand, a set of noun class (i.e. gender) markers representative of many EC languages and, on the other hand, an additional set of dedicated animacy markers which are unique to Botlikh. The dedicated animacy markers can appear on various targets (i.e. negative copulas, interrogative particles, question word formants, attributive clitics, present/future participles, ordinal numerals), and agreement is controlled by either the nominal head or the absolutive argument of the verb. By focusing on ordinal numerals, which appear to mark animacy most consistently, we set the following goals: a) to better understand the agreement patterns of these forms; b) to clarify which referents qualify as animates and which do not. For these purposes, we have created the first draft of a survey which we will discuss during the talk.

28 May

Michael Daniel, Anna Aksenova

Экспедиционное исследование в кубачинской зоне: обработанные результаты эксперимента на понимание локальных идиомов

Abstract

Nina Dobrushina, Yuri Koryakov, Daria Staferova, Alexander Belokon

Dagestan census

Abstract

23 April

Nina Dobrushina, Dasha Perova

Предварительные результаты исследования одного вариативного явления в русской речи в сельском Дагестане и в русской речи носителей русского языка (по данным эксперимента)

Abstract

Timur Maisak

Numeral classifiers in Udi as a contact-induced development

Abstract

Доклад посвящен наличию в удинском языке (лезгинская группа нахско-дагестанской семьи) числового классификатора даьнаь “штука” и близкого к классификатору слова тан “человек”. Оба слова являются заимствованиями, к тому же классификаторов в остальных нахско-дагестанских языках вроде бы не бывает, что позволяет рассматривать их наличие в удинском как явный случай контактного развития. В докладе свойства удинских классификаторов будут рассмотрены подробнее с учетом недавней работы Stilo (2018), посвященной системам с двумя классификаторами как ареальному явлению в рамках “Араксо-Иранского языкового союза”. Расширенную англ. аннотацию см. в прилагаемом файле.

9 April

Anastasia Panova, Elena Sokur

Rutul dictionary

Abstract

Maria Aristova

База данных структурных заимствований из азербайджанского языка в лезгинские

Abstract

Polina Nasledskova, Michael Daniel

Vowel quantity as the distinction of spatial forms in Kina Rutul: an experimental study

Abstract

2 April

Johanna Nichols (Berkeley, HSE)

IngRel, DagRel, and others: Relativization and the accessibility hierarchy in ergative languages, with implications for corpus databases

Abstract

26 March

Katarzyna Wojtylak (University of Regensburg, Germany / James Cook University, Australia)

Language change in Northwest Amazonia: grammatical categories

Abstract

A fundamental question in studying the interconnectedness of the world’s languages is what grammatical categories are most and least likely to be borrowed in varying situations of contact (see e.g. Muysken, 2008, Mithun, 2014, and references therein) . The Lowland Amazon is an area of extreme linguistic diversity, where geographic proximity and language contact has resulted in unprecedented diffusion of patterns and forms (Aikhenvald 2002). The area between the Caquetá and Putumayo (hereafter referred to as ‘C-P’) River Basins, spanning southern parts of Colombia and northern Peru, is loosely defined in the literature as the ‘People of the Centre cultural complex’ (Echeverri 1997). It consists of eight dispersed ethnolinguistic groups that belong to three distinct language families (Witotoan, Boran, and Arawak), plus one isolate, Andoque. Traditionally, the groups lived next to each other, and displayed relative cultural homogeneity, such as trade of goods, common ritual activities, intermarriage, and multilingualism (Eriksen 2011, Seifart 2015). The C-P languages also share a daunting number of linguistic traits, including nominal classification. In this talk, we will see how the categories of evidentiality and epistemic modality are expressed across the C-P languages, and what it tells us about language contact and change in Northwest Amazonia.

References

Aikhenvald, Alexandra Y. 2002. Language contact in Amazonia. Oxford: Oxford University Press.

Echeverri, Juan Alvaro. 1997. “The people of the center of the world. A study in culture, history and orality in the Colombian Amazon.” PhD dissertation, New School for Social Research.

Eriksen, Love. 2011. Nature and Culture in Prehistoric Amazonia: Using G.I.S. to reconstruct ancient ethnogenetic processes from archaeology, linguistics, geography, and ethnohistory. Lund: Lund University.

Mithun, Marianne. 2014. “Language change: the dynamicity of linguistic systems.” In How Languages Work: An Introduction to Language and Linguistics, edited by Carol Genetti, 264-294. Cambridge: Cambridge University Press.

Muysken, Peter, ed. 2008. From Linguistic Areals to Areal Linguistics. Amsterdam: John Benjamin Pub. Co.

Seifart, Frank. 2015. “Tracing social history from synchronic linguistic and ethnographic data: The prehistory of Resígaro contact with Bora.” MUNDO AMAZÓNICO 6 (1):97-110.

19 March

Damian Blasi

When genetic and linguistic transmission do not coincide: emerging patterns from multiple studies

Abstract

26 February

Anastasia Panova

The scope of refactive markers in Abaza

Abstract

Most descriptions of Abaza mention two affixes which express the meaning of refactive (‘again’, ‘once more’, etc.): the suffix -χ and the prefix ata-, which almost always appears only in combination with -χ. I argue that the main difference between these two refactive markers is that the marker -χ “sees” the internal structure of an event and can have scope over any part of it (just the resultant state, or just the process, with or without arguments), while the marker ata-+-χ is “blind” to the internal structure of the situation and can only “copy” the whole event with its arguments.

12 February

Elena Sokur, Johanna Nichols

Noun vs. verb inflectional synthesis: A complexity trade-off?

Abstract

Samira Verhees, Ilya Chechuro

A database for loanwords in Daghestan

Abstract

In this talk we introduce our first pilot database for the DagLoans project. The database contains translations of 160 concepts collected in the field in Daghestan (and Northern Azerbaijan). At present, this includes a total of 24.785 entries from 23 different languages. The database can be used to find the translation of a concept in one or more languages. The most important feature is “Set”: all entries are grouped in sets with other similar words, which allows us to plot the spread of lexical items on the map. The database can be used for conducting quantitative research on lexical convergence as well as for creating geographical maps showing the areas and the intensity of foreign influence.

5 February

George Moroz

Bayes Factor: Bayesian way without diving in Bayesian maze

Abstract

The most common statistical task is hypothesis testing. When a pair of competing models is fully defined, their definition immediately leads to a measure of how strongly each model supports the data. The ratio of their support is often called the likelihood ratio or the Bayes factor. During the talk I will show how to define different models and compare them with Bayes factor.

Konstantin Filatov

Типологический атлас языков Дагестана: проблемы и перспективы

Abstract

29 January

Chiara Naccarato

The u+gen construction in Modern Standard Russian

Abstract

In Modern Standard Russian, the prefix/preposition pair u-/u is peculiar with respect to other similar pairs, due to the meaning mismatch between the two. While the prefix u- has an ablative meaning, as shown when it is prefixed to motion verbs, the prepositional phrase u+gen occurs in locative constructions, and other related constructions, such as predicative possession that is expressed via the cross-linguistically common Locative Schema. Etymological considerations show that the meaning preserved by the prefix is older. The only type of occurrence which, according to the literature, preserves the ablative meaning for the u+gen construction preposition is found with verbs of requesting, removing, and buying. Notably, however, in other Slavic languages putative ablative contexts are limited to verbs of requesting. Data from MSR, OCS, Polish and Czech lead to the conclusion that the extension of the u+gen construction to verbs of removing in MSR is based on its use for the encoding of predicative possession. Extension to verbs of buying is better explained through the locative meaning of the construction. As a result of different developments, the u+gen construction has become part of the argument structure of a group of verbs including verbs of asking and requesting, verbs of removing and verbs of buying, which are characterized by the common feature of taking human non-recipient third arguments.

22 January

Ivan Kapitonov

Kunbarlang

Abstract

Kunbarlang is a critically endangered polysynthetic language spoken in central Arnhem Land, Northern Territory, by approximately 40 people. It belongs to the non-Pama-Nyungan Gunwinyguan family. This talk reports on the first comprehensive description of Kunbarlang (although it builds on and extends important unpublished work by Carolyn Coleman and Joy Kinslow Harris). Kunbarlang has very rich verbal morphology that includes complex agreement paradigms, composite TMA system that differs from other Gunwinyguan languages, an array of argument derivation tools, and coverb constructions. The nominal domain, on the contrary, has little morphology and relies heavily on syntactic constructions - for instance, case marking of nouns is analytical. The talk will give a general overview of the grammar, and then focus on a few selected topics across different areas.

15 January

Ekaterina Schnittke

Sequence of tenses in Russian? Tense choice in complement clauses in Standard and Learner Russian

Abstract

It is generally believed that Russian has no sequence of tenses (SoT) in complement clauses, and the choice of absolute tense over relative is considered to be a typical error in the interlanguage of non-standard speakers of Russian as a foreign language whose native language features SoT, e.g. English. However, all uses of absolute tense in Learner Russian cannot qualify as errors, since Standard Russian shows a great deal of variation in tense assignment in complement clauses. One of the factors that is said to govern tense choice is the semantics of the matrix verb (Barentsen, 1996; Гиро-Вебер, 1975, Schlenker, 2003, inter alia). Specifically, speech and mental verbs are said to strictly require the relative tense, whereas sensory, emotion, and existential matrix verbs allow for both absolute and relative tense patterns. Despite the acknowledged variation, the precise distributional patterns of tenses in complement clauses have been understudied. This paper is a systematic corpus-based study of the variation in tense choice across the semantic classes of the matrix verbs in two language varieties: (i) Standard Russian as represented in the Russian National Corpus and (ii) Learner Russian of anglophone speakers as represented in the Russian Learner Corpus. I examine those clausal complexes where the matrix verb in the past tense and the verb of the complement clause denote simultaneous actions. The analysis identified a likelihood hierarchy of verbal semantic classes ranging from the least likely to tolerate past tense in the complement clauses to the most likely ones: speech<mental<sensory≈emotion

Seminar schedule 2018

18 December

Natalya Stoynova

The internalization of inflection? The restrictive kə̄n in Ulch

Abstract

11 December

Vadim Dyachkov

Databases

Abstract

4 December

Timur Maisak, Michael Daniel, Yury Lander

Corpus research of target relativization in several languages of the Caucausus

Abstract

In this talk, we will discuss modifying participial constructions which is a predominant type of relative clauses in East Caucasian languages. One of the key properties of participles in East Caucasian languages is the lack of syntactic orientation. There is little to no syntactic restrictions on what can be relativized: the gap in the relative clause can correspond to a core argument, a peripheral participant or even a participant that is not part of the verb’s argument structure. Languages also share some common patterns of constructionalization of specififc relative construction (such as name-constructions). On the other hand, there is variation across languages, e.g. more or less strongly articulated preference for S relativization; or more or less widespread use of the resumptives; and language-particular features, e.g. a very high ratio of addresse relativization (in name-constructions) in Agul. After a general overview of the problems related to the study of relativization targets, we concentrate on language-particular case studies and discuss the counts of relativization targets in the corpora of two East Caucasian languages (Agul and Archi). As a comparative background, West Circassian corpus data will be presented. In this language relativization is syntactically oriented, the strategy cannot be classified as participial, and special reflexivizers may be interpreted as obligatorified resumptive pronouns. Finally, we discuss the comparison of corpus counts on the relativized syntactic role in the three languages, and the problems connected to such comparison.

13 November

Nina Dobrushina

Conditions and questions: several cases of combined marking in Nakh-Dagestanian languages

Abstract

In this paper, I consider several Lezgic languages suffixes (possibly, but not definitely related) that cover a rather wide range of contexts. Some contexts of their use may be qualified as denoting unrealized state-of-affairs (such as conditional clauses, polar questions, and, to a certain extent, indefinite pronouns). Some others fall short of this definition, including indirect questions and other subordinate clauses with WH-words. The set of contexts covered by the markers in question is one and the same in at least three Lezgic languages (Lezgian, Aghul, Tabassaran), and also in Azerbaijani, which raises a question of possible contact origin of this pattern. Some other Lezgic languages employ, in these contexts, several different markers. Kina dialect of Rutul presents an especially interesting case, combining in one morpheme (-jden) the meanings which are unlikely to be associated. In this talk, I will present the case of Kina Rutul in details, discuss possible interpretation and origins of the marker -jden, and compare Kina Rutul with other Lezgic languages.

6 November

Michael Daniel

Evaluating DP as a measure of corpus heterogeneity. The Even dialect comparison project at crossroads

Abstract

In this SMALL discussion, we will present the path taken so far for the methods of inter-dialectal comparison, the point we are currently standing (or stuck) at, and will gratefully take advices as to how to proceed. We will first remind of the starting point of the project, ie a mehod of isolating inter-dialectal divergence that takes into account inter-speaker variation. Than we will briefly overview the steps we did so far (LogLikelihood, Wilcoxon-Mann-Whitney test, Gries’ DP). We will then focus on the very last result we got, evaluating the observed DP value against the simluation and permutation test for the distribution of the DP in a random sample - and whether we can use it for our purposes.

30 October

Aigul Zakirova

The emphatic identity particle =OK in the Volga-Kama Sprachbund

Abstract

The particle =OK, originally Turkic, is attested in all the core members of the Volga-Kama Sprachbund: Chuvash =aχ/ =eχ, Tatar and Bashkir =uk/ =ük, Meadow Mari =ak, Hill Mari =ok and Udmurt =ik. Meadow Mari, Hill Mari and Udmurt have arguably borrowed the particle from Turkic (Bulgar). =OK is used in contexts many of which may be characterized as emphatic identity contexts: the argument marked by =OK is the same as an argument of a different proposition (≈ Russian že: Masha rabotajet v pole, Masha že sidit s det’mi ‘Mary works in the field and it is also she who sits with the children’). However, in different languages =OK exhibits different morphosyntactic restrictions on the constituent to which it may attach, i.e. in Tatar it attaches to demonstratives (šul uk keše ‘the same person’) but not to proper names (*Märijäm ük ‘it is also Mary who…’). In Chuvash, Meadow and Hill Mari and Udmurt the particle can attach to proper names. =OK can also attach to the verb, with different interpretations and again with different morphosyntactic restrictions. There are similar constructions and lexicalizations with =OK in some of the languages (e.g. reduplication construction of the type V-converb=OK V). I would like to discuss whether – and how – we can approach these similar patterns in terms of contact. From the literature we know what were the strongest bonds in the area (Chuvash-Mari, Tatar-Meadow Mari, Tatar-Bashkir). The question is whether greater and weaker similarity of =OK morphosyntactic, construction and lexicalization patterns between languages corresponds to areal affinity and how to demonstrate it.

23 October

George Moroz

Realative clause in Andi

Abstract

16 October

Tanya Filippova

Фразовые компаративы: ограничения на объект сравнения

Abstract

Доклад будет посвящен исследованию определенного типа сравнительных конструкций, которые я буду называть фразовыми компаративами (phrasal comparatives). Примерами таких конструкций в русском языке являются: Ваня выше Пети; Ваня прыгает выше Пети; Ваня ценит Петю больше Коли и т.д, то есть конструкции, в которых стандарт сравнения всегда выражен именной группой в родительном падеже.

Я рассмотрю ограничения на объект сравнения в конструкциях с переходными глаголами, таких как Ваня ценит Петю больше Коли. Подобные предложения потенциально имеют 2 прочтения: 1) стандарт сравнения противопоставлен подлежащему (объект сравнения = подлежащее); 2) стандарт сравнения противопоставлен дополнению (объект сравнения = дополнение). Мы посмотрим, как порядок слов и маркирование именной группы частицами только и даже влияют на интерпретацию таких предложений.

Мы также рассмотрим фразовые компаративы в некоторых других языках, и я предложу предварительный общий анализ ограничений на объект сравнения в рамках теории информационной структуры.

Наконец, я представлю план большого опросника для носителей русского языка, призванный установить, при каких условиях одна из интерпретаций фразового компаратива является просто предпочтительной и при каких – единственно допустимой.

9 October

Alexey Koshevoy

Validity of the data collected indirectly: belated proof of concept

Abstract

Within the framework of Multidagestan project, vast amount of sociolinguistic data about traditional small-scale multilingualism was collected in Daghestan. The aim of the project is to trace the change of the multilingual patterns in the 20th century. However, 71 percent of the data were collected in an indirect way, asking people about their relatives. We will discuss the statistical methods that we used to check the robustness of the indirectly collected sociolinguistic data.

Seminar schedule 2017

21 November

Timur Maisak, Anastasia Panova

Corpus of Russian spoken in Daghestan

Abstract

The corpus of regional variants of Russian spoken in Daghestan is based on transcribed sociolinguistic interviews in Russian with speakers of various Daghestanian languages who live in rural areas. Technically, the corpus is built using the platform and annotation principles developed for the dialectal corpus of Ustja River Basin. The aims of the project include both its maintenance and adding new texts, as well as the use of the corpus for systematic study of morphosyntactic characteristics of Daghestanian Russian. In the talk, we plan to discuss the current state of the corpus, the possibilities of corpus-based research, as well as the problems we met and the perspectives of the project.

14 November

Olga Lyashevskaya, Ilya Chechuro

Prosodic Analysis of Non-standard Russian Spontaneous Speech

Abstract

The study deals with the intonation patterns in three Non-Standard varieties of Russian. In the first part of the talk we discuss the methodological issues, such as analysis of pitch range, data filtering and data representation. In the second part, we consider a number of case studies, namely Daghestanian Russian, Southern Russian and Jewish Russian. The general goal of the project is to compile an annotated corpus of non-standard Russian with pitch annotation and explore the intonation patterns of non-standard varieties of Russian basing on corpus data.

7 November

Aleksei Fedorenko, Yury Lander, George Moroz

Circassian Isoglosses

Abstract

West Circassian and Kabardian languages represent a dialectal continuum spread in Krasnodar district, republics of Adygea, Karachai-Circassia and Kabardino-Balkaria. During the presentation, we are going to talk about phonetic and sociolinguistic features of different Circassian idioms, present a project of an atlas of Circassian isoglosses and show first maps, which the atlas will include. In addition, we will describe the process of creating a phonetic and grammatical questionnaire and the difficulties associated with that.

26 October

Vasilisa Andriyanets, Brigitte Pakendorf

Dialectal Variation in Even based on Corpora of Field Recordings

Abstract

The talk presents the “Dialectal variation in Even” project. The project works with two dialects of Even: the easternmost one spoken in the village Sebjan-Küöl in Yakutia and the westernmost one spoken in Kamchatka. The first dialect has been in contact with Yakut for a long time, while the second has possibly been in contact with Koryak and Itelmen. The aim of the project is to discover differences between the two dialects and whether they stem from independent innovations or contact. For this, we use corpora of field recordings collected by Brigitta Pakendorf in the course of 2007-2012. In this talk we will describe the data, argue what differences can be found in them and what statistical methods can be used for this, and present some differences in morphology and syntax that have been found so far.

19 October

Alexandra Kozhukhar, Olga Lyashevskaya

Universal Dependencies for Mehweb Dargwa

Abstract

The Universal Dependencies (UD) is a project dealing with consistent cross-linguistic morphological and syntactic mark-up. The UD is currently in version 2 and covers 52 languages with 10 more languages yet to be included. With its own annotation principles and abstract inventory for parts of speech, morphosyntactic features and dependency relations, UD aims to facilitate multilingual parser development, crosslingual learning, and parsing research from a language typology perspective. While UD covers 11 language families, it does not include any languages of the Caucasus (including the East Caucasian family). In our talk we will describe the way Mehweb Dargwa (East Caucasian) meets the UD scheme.

10 October

Anna Volkova, Michael Voronov

Spoken Meadow Mari corpus: data, design, and aims

Abstract

The talk presents the Spoken Meadow Mari corpus project. Meadow Mari is a Uralic language spoken in the Volga region by some 375 thousand people. The core of the corpus are recordings made in 2000-2001 by a group of researchers from the Lomonosov Moscow State University. In our talk we will discuss the data we have, possible applications of the project and the target audiences of the corpus, as well as its structure. Making the corpus data presentable involves transcribing, glossing and annotating the data as well as aligning audio and text which should facilitate data analysis.

3 October

Sven Grawunder, Michael Daniel, Vasilisa Zhigulskaya, George Moroz

Daghestanian Stops

Abstract

Rich consonantal inventories are a salient feature of the languages of the Caucasus on the whole and of the languages of Daghestan in particular. Their composition is a subject to a certain variation from language to language, but is overall similar. All languages have ejectives, most languages have labialization, and geminates are not uncommon. On the other hand, acoustic properties of the phonologically identical elements may be substantially different. To document these differences, in the last few years we do systematic field recordings of data from different languages. In this talk we will introduce the aims and methods of the project and will present preliminary results of our analysis of acoustic properties of ejectives as compared to corresponding voiceless stops. We will evaluate the impact of such parameters as closure duration and voice onset time. We will use the data from three different languages - Rutul (Kina dialect), Andi (Zilo dialect) and Mehweb. In the long run, we hope that our project will be able to address the following theoretical question: is the observed intragenetic variation more sensitive to areal or to systemic factors?

26 September

Nina Dobrushina, Alexandra Kozhukhar

Daghestanian Multilingualism

Abstract

The project “Atlas of Multilingualism of Daghestan” is based on sociolinguistic interviews recorded in Daghestan by the team of this project over the course of seven years. The aim of the project is to determine the level of bilingualism in Daghestanian mountain villages and describe the sociolinguistic patterns of linguistic convergence of local languages. In addition, the project allows to establish the type of linguistic contact characteristic of neighboring villages, which languages not pertaining to a particular area were spoken by inhabitants, how the command of Russian changed, what role the geographic distance between languages played and how the command of certain languages was distributed among inhabitants of a village. This talk will focus on two topics: first of all, we will show the results of a study of how the command of languages was distributed among men and women and how these dynamics have evolved since the beginning of the 20th century until now. Second, we will discuss some problems and shortcomings of the method used and we will suggest some verification methods.

19 September

Michael Daniel, Samira Verhees, Ilya Chechuro

Daghestanian Loans

Abstract

The DagLoans project aims at investigating lexical convergence between East Caucasian languages and their neighbours in quantitative terms. We focus on horizontal interaction, looking at borrowings between languages that are in direct contact and dismissing influence of dominant cultures and distant languages, e.g. Arabic, Persian or Russian. The project consists of two parts - one dealing with lexical matter copy, the other with lexical pattern copy. Today we are only discussing the data of the former. We deal with lexical matter borrowing in an attempt to compare and quantify horizontal borrowing between languages at different locations. Instead of comparing standard languages, we aim at comparing local varieties and, ideally, village lects. Basing on the Leipzig-Jakarta list and issues of “Отраслевая лексика”, we attempt to compose a list of lexical items with a high borrowability rate. The list should be concise enough to be elicited from several speakers during a one day visit to a village, but, on the other hand, long enough to discriminate local varieties and, ideally, village lects. At present, we work on data from Rutul, Tsunta and Botlikh districts. In the talk on September, 19 we plan to discuss the list and its composition and elicitation techniques we use to decide whether it is an adequate tool for studying lexical contact rate at local level and how it reflects local geography and data on bilingualism. The talk is based on the data from 6 villages (4 languages) of the Rutul region that are located in the same valley: Khlut (Lezgian), Kiche (Rutul), Rutul (Rutul), Kina (Rutul), Helmets (Tsakhur), and Kusur (Avar). Presentation