Feature datasets for the Atlas are based on descriptive literature.
To create a dataset you will need the following things:
Start by creating a folder for your feature in the archive, and upload your dataset there, even if it is still a work in progress: this makes it easier to discuss any problems or questions you might have.
You can see an example of a completed feature dataset here.
For a quick reference on how to collect data, see the Step-by-step instructions. The instructions below go into more detail.
Some of the columns are required (without them, there may be problems with rendering maps and chapters) and some are not.
Note: if you use an apostrophe (for example, in the name of a village (Tad-Magitl'
) or in transcription), use U+0027 '
. If you are working in Excel, when loading the table to the drive, make sure that the apostrophes remain correct.
lang - language name
type - specifies whether the idiom is a village variety, a dialect spoken in multiple villages, or a standard language; please use our standard names and type classifications
In our literature database you can find bibliographical information about the source, as well as information on which idioms are represented in the sources (on the sheet source type and idiom, columns idiom and type). Alternatively, you can search for the name and type of a certain idiom in the East Caucasian villages dataset.
One value = one row in the dataset. If you have multiple values for one idiom, create two rows for this idiom, each with a unique id. The general rule of thumb for the genlang_point column is 1 “yes” per language (the Dargwa varieties listed in Language sample each count as a separate language), for map it is 1 “yes” per idiom.
Maps are generated based on the value1 column. If you want to show the distribution of multiple parameters, please name further columns value2, value3, etc., and accordingly: value1_name, value2_name.
contributor - your name in English, so we know how to properly credit your work
example_comment - any kind of comment you would like to add regarding the example
date - the date on which you submitted your table; edits of the table after its first publication on the website will receive a new date stamp accordingly
When you add a reference to your dataset, check if it is already listed in the literature database. If yes, copy the bibtexkey of the reference from the database to the source column in your table.
A bibtexkey is a unique identifier for a source which allows us to easily cite sources across the Atlas.
(You can find our library of descriptive sources here.)
In case you used multiple sources for one row / observation, separate the keys with a semicolon and space (; ), and do the same with the page numbers in the adjacent column.
If you refer to multiple page ranges from one source, separate them with a comma.
If the entire source was relevant, for example because it was a paper devoted to your topic, or because you read the whole grammar and the feature is not mentioned anywhere, indicate NA in the page column.
source | page |
---|---|
khalilova2009; khalilova2011 | 221, 234–239; NA |
In the example above, pages 221 and 234-239 from khalilova2009
have relevant information about the feature, while the paper khalilova2011
was consulted/is relevant in its entirety.
If the source of the information is a personal conversation, it is necessary to indicate the name of the source with a note p.c.
for example Santa Claus, p.c.
.
If you use a source which is not in our database yet, you will have to add it by submitting a form with the necessary information.
Russian resources are listed in Cyrillic script with a translation in English of the title and booktitle fields.
The bibtexkey for a source is constructed as follows:
khalilova2009
surname (Latin script, lower case letters) + year
If there are two authors, use both surnames, cf. Chumakina & Corbett (2008) becomes chumakinacorbett2008
. For sources with more than two authors, write down only the surname of the first author followed by et al, e.g. alekseevetal2008
.
In case a different resource by the same author and from the same year is already present in the literature database, add a keyword following the year, e.g. bokarev1949
/ bokarev1949avar
.
For unpublished sources use the surname followed by the word draft, and the year in which the manuscript was produced or when it was expected to be published, for example `creisselsdraft2020’.
The ‘author’ field should be filled as follows: last name, name and (if present) the first letter of the patronymic or second name, for example: Абдуллаев, Сайгид Н. If the source has more than one author use and (in English): Абдуллаева, Айшат З. and Гаджиахмедов, Нурмагомед Э. and Кадыраджиев, Калсын С. etc. If the source does not specify the full names, try to find them on the Internet rather than using initials.
For city names in the ‘address’ field use English names, e.g. Moscow, not Москва.
Don’t forget to upload a copy of the source to the appropriate folder in the library using the bibtexkey as filename.
Some general principles for transcription:
For any general questions about data collection or the library, you can always contact Chiara.
If you have a more specific question about your feature, please also contact Chiara, and she will connect you with an expert consultant for your feature.