The Typological Atlas of the Languages of Daghestan (TALD) is a tool for the visualization of information about linguistic structures typical of Daghestan. The scope of the project currently covers all East Caucasian languages and several other languages spoken in Daghestan, Chechnya, Ingushetia and adjacent territories.
The Atlas consists of:
Daghestan is the most linguistically diverse part of the Caucasus, with at least 40 different languages (and many more highly divergent idioms) spoken on a territory of 50,300 km2 that consists mostly of mountainous terrain. The majority of the languages spoken there belong to the East Caucasian (or Nakh-Daghestanian) language family: one of the three language families indigenous to the Caucasus. For the most part, the languages of the East Caucasian family are spoken in the eastern Caucasus area (with the exception of some relatively recent diasporic communities). They have no proven genealogical relationship to any other languages or language families.
Other languages spoken in Daghestan include three Turkic languages: Nogai, Kumyk (Kipchak) and Azerbaijani (Oghuz); and three Indo-European languages: Russian (Slavic, the major language of administration, education, and urban areas), Armenian (Armenic), and Tat (Iranian). Arabic is the language of religion, as most people in Daghestan are Sunni Muslims. The official languages of Daghestan (in alphabetical order) are Agul, Avar, Azerbaijani, Chechen, Dargwa, Kumyk, Lezgian, Lak, Nogai, Russian, Rutul, Tabasaran, Tat, Tsakhur.
Historically there was no single lingua franca for the whole area. As a result, Daghestanians were known for having a command of multiple locally important languages, which they picked up in the course of seasonal labor migration, trading at cardinal markets, and other types of contact. Currently these patterns are disappearing fast due to the expansion of Russian.
One of the aims of TALD is to chart the genealogical and geographical distribution of linguistic features and to facilitate multi-faceted analyses of language contact in Daghestan by comparing the presence of shared features with known patterns of bilingualism and lexical convergence.
The list of languages included in our sample is available in the section Languages.
The Atlas currently offers three different types of map visualizations:
Each of these visualizations has its benefits and drawbacks, so we allow the user to toggle between the different options.
Below are some examples from the chapter on Morning greetings, which describes the two main ways to greet someone in the morning in the languages of Daghestan: wishing them a good morning or asking them whether they woke up.
For map visualizations we use the Lingtypology package (Moroz 2017) for R.
This is the more basic visualization, which shows one dot on the map for each language in the sample. The inside of each dot is colored by language. Languages from the same group have similar colors (e.g., all Lezgic languages have some shade of green). Hover over a dot to see the name of the language, and click to view a popup with a link to the language’s page in the Glottolog database and the name of the village. The color of the outer dots indicates the value of a linguistic feature. By unticking the box “show languages” you can remove the inner dots and visualize the distribution of different values in the area without the distraction of genealogical information.
This visualization represents each language as a cluster of dots, which correspond to villages where a certain language is spoken (this visualization makes use of the East Caucasian villages dataset).
A benefit of this type of visualization is that it shows the size and boundaries of speech communities (as opposed to maps based on abstract general datapoints). Its main drawback is that it involves a lot of generalization. We do not have information on each village variety of the languages in our sample, so we extrapolate the information we have on a language or dialect to all the villages where they are spoken. In doing so, we risk overgeneralizing information and erasing possible dialectal differences.
Note, however, that extrapolation is performed in a bottom-up fashion, so if we have data for a specific village variety that differs from other varieties of the same dialect group for a specific feature, we do not extrapolate data to that village. Let us consider the the feature “Number of noun classes” in Andi varieties. In this case we have general (dialect-level) information for Lower Andi (where three noun classes are found) and Upper Andi (in which we can distinguish five noun classes). This distinction will be kept on the map, i.e. Lower and Upper Andi villages will be filled with different colors. In addition, we know that Rikvani Andi differs from the other Upper Andi varieties in that it features six noun classes. In this case we do not extrapolate the general information on Upper Andi to the village of Rikvani because we have more precise information for that village, which will be filled on the map with a different color as compared to both Lower and Upper Andi.
The data granularity visualization shows the level of accuracy for each point in a dataset, e.g., “village dialect” indicates that we had information about the feature for a specific village variety, while “language” means that we only had information for the language in general. Sometimes information is available for a certain dialect group within a specific language. For such cases, we use the labels “dialect_toplevel” (if information is available for a macro-group like Southern Avar), “dialect_nt1”, where “nt1” stands for “non-toplevel 1” (if information is available for a lower-level dialect group like Zaqatala < Southern Avar), “dialect_nt2”, where “nt2” stands for “non-toplevel 2” (if information is available for a dialect group of an even lower level like Balakən < Zaqatala < Southern Avar), and so on.
This allows the user to see what kind of data underlies the default visualization.
Our goal for the Atlas is to continue adding new data to existing datasets and thus gradually improve its coverage and accuracy.
The chapters and datasets in the Atlas are created by researchers specializing in the languages of Daghestan as well as by students of linguistics with no prior knowledge of the area and the languages spoken there.
If you would like to contribute a chapter and / or data to the Atlas because you are studying a certain topic in the languages of Daghestan, or you are a student looking for an internship, do not hesitate to contact us! You can find our contact info under Team.
To get a better idea of our methodology and what you will have to do if you decide to become a contributor, see our Contributor Manual.
The data can be accessed through the Atlas interface, or downloaded directly from our GitHub page. For reasons of space, on the Atlas interface we show filtered versions of the original databases, which only include the main information displayed on maps. However, both filtered and full versions of the databases are available for downloading. Full versions including more detailed information for each observation in the database (e.g., specific morphemes or wordforms, examples of their occurrence in texts with glosses and translations) can be downloaded by clicking on the download button, or by accessing our GitHub page.
Daniel, M., K. Filatov, T. Maisak, G. Moroz, T. Mukhin, C. Naccarato, and S. Verhees (2022). Typological Atlas of the Languages of Daghestan (TALD), v. 1.0.0. Moscow: Linguistic Convergence Laboratory, NRU HSE. DOI: 10.5281/zenodo.6807070. http://lingconlab.ru/dagatlas.
@book{tald2022,
title = {Typological Atlas of the Languages of Daghestan (TALD), v. 1.0.0},
author = {Michael Daniel and Konstantin Filatov and Timur Maisak and George Moroz and Timofey Mukhin and Chiara Naccarato and Samira Verhees},
year = {2022},
publisher = {Linguistic Convergence Laboratory, NRU HSE},
address = {Moscow},
url = {http://lingconlab.ru/dagatlas},
doi = {10.5281/zenodo.6807070},
}