!pip install pycauctilePyCaucTile: Tile Grid Maps for East Caucasian Languages
1 Introduction
PyCaucTile is a package that generates tile grid maps for illustrating features of East Caucasian languages. The plots are created using plotnine library, providing a ggplot2-like interface in Python.
A tile grid map is a popular type of simplified cartographic visualization. Regions on such graphs are usually represented by squares (tiles) of the same or proportional size on a conditional grid of coordinates that preserves the approximate location of objects. So, each rectangle on PyCaucTile maps indicates a language in its approximate position relative to neighboring languages. The linguistic features are encoded by the color of the tile.
This software was created as a part of the project of the Linguistic Convergence laboratory. There is also an R package that shares the same functionality (see RCaucTile) by George Moroz.
2 Installation
The package is available at the PyPI repository, so you can install it using the pip command:
To use PyCaucTile, you can import the whole package generally, as well as load the functions and data directly
import pycauctilefrom pycauctile import ec_tile_map, ec_languages3 How to use PyCaucTile
One of the main utilities of the package is a comprehensive template of East Caucasian languages, complete with color coding that reflects established genealogical classifications. This color scheme is adopted directly from the Typological Atlas of the Languages of Daghestan.
To display this template, simply call the ec_tile_map() function without any arguments:
ec_tile_map()
As you can see, all languages are color-coded according to their language branch: Nakh languages are brown, Andic languages are blue, Lezgic branch is green, and so on. This template sets the default distribution of languages.
In the core of the package there is a built-in dataset ec_languages that contains information about 56 languages from TALD. Most variables are self-descriptive, except for x and y, which define the location of each language on a grid that was constructed for this package based on approximate geographical distribution of languages. The dataset can also be downloaded from the github repository.
ec_languages.head()| language | branch | family | glottocode | language_color | branch_color | x | y | abbreviation | morning_greetings | consonant_inventory_size | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Agul | Lezgic | East Caucasian | aghu1253 | #00cc66 | #ffd000 | 8 | 4 | NaN | Good morning | 44.0 |
| 1 | Amuzgi-Shiri | Dargwa | East Caucasian | sout3261 | #f9ec49 | #ffb268 | 9 | 5 | NaN | NaN | NaN |
| 2 | Archi | Lezgic | East Caucasian | arch1244 | #88ff26 | #ffd000 | 6 | 5 | NaN | Did you wake up? | 69.0 |
| 3 | Avar | Avar-Andic | East Caucasian | avar1256 | #009999 | #009999 | 6 | 7 | NaN | Did you wake up? | 45.0 |
| 4 | Azerbaijani | Oghuz | Turkic | nort2697 | #cccccc | #666666 | 8 | 1 | NaN | Good morning | 25.0 |
ec_languages.info()<class 'pandas.core.frame.DataFrame'>
RangeIndex: 56 entries, 0 to 55
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 language 56 non-null object
1 branch 56 non-null object
2 family 56 non-null object
3 glottocode 56 non-null object
4 language_color 56 non-null object
5 branch_color 56 non-null object
6 x 56 non-null int64
7 y 56 non-null int64
8 abbreviation 8 non-null object
9 morning_greetings 42 non-null object
10 consonant_inventory_size 47 non-null float64
dtypes: float64(1), int64(2), object(8)
memory usage: 4.9+ KB
The columns also include two features from the Typological Atlas of the Languages of Daghestan:
morning_greetingscontains values from the “Morning Greetings” chapter (Naccarato, Verhees 2021) from the Typological Atlas of the Languages of Daghestan. The languages of Daghestan can be classified into three groups according to whether they feature morning greetings including questions about the night’s rest (valueDid you wake up?), based on the combination of concepts like “morning” and “good” (valueGood morning), and both strategies (valueBoth).consonant_inventory_sizecontains consonant inventory sizes based on “Phonology” chapter (Moroz 2021) from the Typological Atlas of the Languages of Daghestan.
To load your own data, you may prepare a table with columns language and feature and use it within the ec_tile_map() function. While we will create a simple table on the fly for demonstration purposes, in practice it is more convenient to use pandas functions like read_csv(), read_excel(), or similar data import methods.
df = pd.DataFrame({
'language': ("Avar", "Chechen", "Mehweb"),
'feature': ("value a", "value b", "value b")
})
ec_tile_map(df)
The languages for which no data in the feature column is available are displayed in light grey color. There is a possibility to hide all unused languages with the …
# to be updatedIn practical research scenarios, feature columns often have descriptive names rather than the generic “feature”. The feature_column parameter in the ec_tile_map() function allows providing any name for the data column:
ec_tile_map(ec_languages,
feature_column = "morning_greetings")
ec_tile_map(ec_languages,
feature_column = "consonant_inventory_size")
To add the title to the plot, use the title argument:
ec_tile_map(ec_languages,
feature_column = "morning_greetings",
title = "Morning greetings (Naccarato, Verhees 2021)")
To change the title position (the default position is left), one can use the title_position argument with right or center value:
ec_tile_map(ec_languages,
feature_column = "morning_greetings",
title = "Morning greetings (Naccarato, Verhees 2021)",
title_position = "center")
ec_tile_map(title = "This is a Tile map of East Caucasian languages",
title_position = "right")
For numerical features or categorical variables with concise values, direct annotation of feature values on the map significantly enhances interpretability. The annotate_feature parameter contains the functionality:
ec_tile_map(ec_languages,
feature_column = "consonant_inventory_size",
title = "Consonant inventory size (Moroz 2021)",
annotate_feature = True)
While with long values of categorical features annotations might look messy:
ec_tile_map(ec_languages,
title = "Morning greetings (Naccarato, Verhees 2021)",
feature_column = "morning_greetings",
annotate_feature = True,
title_position = "center")
4 Changing the default colors
The default color schemes may not always align with your needs or publication requirements. plotnine offers extensive flexibility by access to ggplot2 color scales.
For numerical data, one can use the scale_fill_distiller() function with one of the palettes (Blues, BuGn, BuPu, GnBu, Greens, Grey, Oranges, OrRd, PuBu, PuBuGn, PuRd, Purples, RdPus, Reds, YlGn, YlGnBu, YlOrBr, YlOrRd).
ec_tile_map(ec_languages,
feature_column = "consonant_inventory_size",
title = "Consonant inventory size (Moroz 2021)",
annotate_feature = True) \
+ scale_fill_distiller(palette = "Greens")
There is a direction argument that controls the order of the colors in the palette, so it can be reversed by setting it to -1:
ec_tile_map(ec_languages,
feature_column = "consonant_inventory_size",
title = "Consonant inventory size (Moroz 2021)",
annotate_feature = True) \
+ scale_fill_distiller(palette = "Greens", direction=-1)
To define your own palette for a numeric variable, you can use the scale_fill_gradient() function:
ec_tile_map(ec_languages,
feature_column = "consonant_inventory_size",
title = "Consonant inventory size (Moroz 2021)",
annotate_feature = True) \
+ scale_fill_gradient(low = "navy", high = "tomato")
When the color scheme is clear and the annotate_feature argument displays the exact feature values on the map, it makes sense to remove the legend:
ec_tile_map(ec_languages,
feature_column = "consonant_inventory_size",
title = "Consonant inventory size (Moroz 2021)",
annotate_feature = True) \
+ scale_fill_gradient(low = "navy", high = "tomato") \
+ theme(legend_position = "none")
For categorical features, plotnine provides the scale_fill_brewer() function, which can be used with one of the ggplot2 palettes (Accent, Dark2, Paired, Pastel1, Pastel2, Set1, Set2, Set3).
ec_tile_map(ec_languages,
feature_column="morning_greetings",
title="Morning greetings (Naccarato, Verhees 2021)",
title_position = "center") \
+ scale_fill_brewer(type="qual", palette="Pastel1", na_value=None)
The scale_fill_manual() function can be used to define your own palette for a categorical feature.
ec_tile_map(ec_languages,
feature_column = "morning_greetings",
title = "Morning greetings (Naccarato, Verhees 2021)") \
+ scale_fill_manual(values = ("#D81E05", "#0070A1", "#00923F"), na_value=None)
5 Changing the values’ order
In Python, categorical variables by default follow the order in which unique values first appear in the dataset. To define a custom ordering that better reflects the feature, you can use pd.Categorical data type. The following approach preserves the original values while instructing Python to treat them as ordered:
df = pd.DataFrame({
'language': ['Avar', 'Chechen', 'Lak'],
'feature': ['value a', 'value b', 'value b']
})
df['feature'] = pd.Categorical(
df['feature'],
categories=['value b', 'value a'],
ordered=True
)
ec_tile_map(df)
6 Changing the language template
The East Caucasian language family exhibits significant dialectal differentiation. A unified genealogical classification of all idioms spoken in Daghestan does not exist. The default language inventory in PyCaucTile is based on the genealogical classification from the Typological Atlas of the Languages of Daghestan (see the languages page). Therefore, it is highly probable that some researchers may wish to modify the default inventory by removing or altering the names of existing units.
To remove specific languages from the template, list the desired languages in the hide_languages argument.
ec_tile_map(ec_languages,
feature_column = "morning_greetings",
title = "Morning greetings (Naccarato, Verhees 2021)",
hide_languages = ["Gigatli", "Shari", "Chechen"])
In order to change the names of existing languages in the template, you need to provide the rename_languages argument with an object that maps old language names to their corresponding new names. This can be represented as either:
- A dictionary, where the keys are the old language names and the values are the corresponding new language names.
new_language_names = {
"Upper Andi": "Andi",
"Northern Akhvakh": "Akhvakh"}
ec_tile_map(ec_languages,
feature_column = "morning_greetings",
title = "Morning greetings (Naccarato, Verhees 2021)",
hide_languages = ["Lower Andi", "Southern Akhvakh"],
rename_languages = new_language_names)
- A data frame with two columns:
language(the old language names) andnew_language_name(the corresponding new language names).
new_language_names = pd.DataFrame({
'language': ["Upper Andi", "Northern Akhvakh"],
'new_language_name': ["Andi", "Akhvakh"]})
ec_tile_map(ec_languages,
feature_column = "morning_greetings",
title = "Morning greetings (Naccarato, Verhees 2021)",
hide_languages = ["Lower Andi", "Southern Akhvakh"],
rename_languages = new_language_names)
As shown in the example above, we merged:
- Upper and Lower Andi into the joint “Andi” variable;
- Southern and Northern Akhvakh into the joint “Akhvakh” variable.