How can I use the database?
Our online database allows you to look at the phonotactic data in two ways. The first is to use the interactive map, which allows you to search for particular features or combinations of features, and view their distribution. The second way to examine the data is to look at the statistics display, which gives quantified information about the distribution of the features selected. To use either option, first launch the database from the menu on the left.
The database will appear as a world map with a tabs window overlayed. To begin, go to the features tab on the tab window (this is the default opening tab).
Click the green plus sign to add a feature. You can then proceed to select one feature by clicking the drop down menu:
Some features are binary while others are ordinal. For the binary features, such as ‘Ø onsets allowed?’, you will not be asked for further specifications. For the ordinal values, such as ‘Maximal onset’, a range bar will appear beneath the feature which tells you the range of values from minimum to maximum.
If used as it is, this filter will now list languages with a maximal onset of between 0 and 6 consonants: that is the whole list of languages in the database. To refine the range of values displayed, add another range row with the green plus button beside the values already given. Clicking this button will add another row.
More rows can be added, and the values in the rows can also be customised, so that the ranges you select are the ones you want.
Once you have selected the values you want, click the blue refresh button under the feature tab. These languages will then appear on the map behind the tabs window as numbers.
To minimise the windows tab to better view the map, double click on the bar at the top of the window. To maximise it again, simply do the same. Keep in mind that you can drag the window to a different area of the map at any time.
The coloured numbers represent the number of languages with the feature in that area. As you zoom in (by clicking on the map or using the scale on the left of the map), these numbers will disappear in favour of dots representing individual languages.
To turn off the number representation (so that the languages will be rendered as dots no matter how far out you zoom), go to the Options tab in the tab window. There you can also adjust the size of the dots.
The different values of a feature are represented through different coloured dots. Which colour indicates which feature value can be found under the Legend tab.
There you can change the colours (by clicking on the dot), change the size of the dots for each feature, and reorder the features so that one appears ‘above’ the others on the map.
The legend tab also gives the total number of languages with the feature and the percentages of the database, as well as access to a full list of the languages in the database with that particular feature (click the blue arrow button). To apply any of the changes to the map, click the refresh button.
By scrolling over one of the dots you can discover which language is represented by that dot, and the value it has for the feature you have selected, such as Wutung in New Guinea.
If you click on the dot, a window will appear with every feature value in our database for that language.
The database allows you to view more than one feature at a time on the map. To do this, click the plus button under the features tab again. Remember to click the refresh button to update the map display.
After rendering the features, you can then select which of the values you want to display by selecting or deselecting the value combinations in the legend tab. Not all logical combinations of the features are automatically displayed. We have hidden certain types of sets by default because we think they are likely to be of less interest to users, and logically possible sets that have zero members are not displayed at all. To see all sets that could be displayed, select the show hidden features checkbox.
To remove features entirely, click the red X button next to each one. Again, keep in mind that the changes will not be applied to the map until you have clicked the refresh button.
So far we have only looked at how the data can be displayed through the maps. You can also see the data through the statistics tab in the tab window. If the feature you are viewing is has ordinal values, the stats tab will show both the number and percentage of each value, as well as a histogram of those features.
If the feature you are viewing is binary, there will be no histogram. Where viewing more than one feature, the stats tab will give the statistical values for each combination of values, as in the legend tab. It is then simple to compare the attested values with the values that might be expected. For instance, given the combination ‘CVC language?’ and ‘Codas require onsets?’, we would predict 5.5% of languages to have positive values for both features, but in fact 7.1% meet this criterion, indicating some skewing (in this case, at least part of the explanation is not hard to find: languages without codas cannot satisfy the feature ‘Codas require onsets?’)..
There are four main kinds of filter, which differ in the type of data they contain:
- Phonotactics filters indicate information about the structure of the syllable in a language. The values for these filters can be binary (such as 'Coda = nasal'), or refer to ranges of data (such as 'Maximal coda'). The minimum and maximum values for ranged data filters can be set using the drop-down menus that appear. Sub-ranges can be added by clicking the > button. They can be removed again by clicking their x buttons.
- Segment filters allow you to select information that is part of the language's structure, such as the number of plosives used, or whether velar nasals are present.
- Prosody filters allow you to select suprasegmental information such as how many tonal contrasts are present phonologically.
- Non-structure filters allow you to add restrictions that are typically neither binary nor ranged data, but rather select from a set such as 'Language family', 'Country', or 'Macro-area'. The value of these features can be selected from a drop-down menu that appears when a filter of this type is selected.
When the filters are set, click on the circular arrow button to render the data. The Stats tab will be displayed. Here are shown the sets made up of the intersections of the filters. The No column shows the number of languages in these sets and the [%] column the percentage the languages in each set make up of the total languages shown.
Not all logically possibly intersections are shown. We have hidden certain types of sets by default because we think they are likely to be of less interest to users, and logically possible sets that have zero members are not displayed at all. To see all sets that could be displayed, select the Show all checkbox. This will also reveal the % column, which shows the percentage values over all the sets that could be displayed. To toggle the inclusion of a set, select or unselect the checkbox for the set in the Colour column.
The colours of the intersection sets are automatically generated from the colours assigned to the filters it contains. The automatically assigned colours can be changed by clicking on the colour legend for each set and selecting a new colour from the palette.
Click View to see a table of the languages in each set. To sort the rows in the table by the values in any column, click the title of the respective column. To remove a column from view, click the reduce button (-).
The filters and options in the Stats tab can be changed at any time and rendered by clicking the circular arrow button in either the Filters or the Stats tab.
To render any changes made in the Stats tab on the map, click the circular arrow button.
To save your settings or to share them with someone else, click on the Get URL button in the Stats tab. Paste this URL in the address bar of your browser to load the phonotactics database with your custom settings.
Some examples of interesting features that we've found include:
- Turkish shows consistent word-initial epenthesis (/kral/ surfaces as [kəral]). Other sequences of consonants, in different syllables, do not require epenthesis. Turkish also has contrastive long vowels and diphthongs that pattern the same as monophthongs for phonotactic purposes.
- Anejom and Jebero both have glottal stops, but only allow them syllable-finally.
- Chiricahua privileges a glottal stop as the first consonant in a CC Coda.
- Although languages with syllabic consonants tend to use syllabic nasals, Washo and Shilluk employ both nasals and liquids as syllabic consonants.