Rapid Word Collection

Ron Moe is a colleague of mine at SIL International and has developed a set of semantic domains and accompanying questions that enable the rapid elicitation of words. He has been promoting it for years as the Dictionary Development Process (DDP). Although teams that have tried it have not often effectively managed the resulting mass of data, it is unarguably the best and most efficient method of collecting words in minority languages without a history of literature.

The semantic domain list is a bit like the Dewey decimal system in that it has a wide range of categories to represent all of the topics that might be expressed.  While Dewey focuses on literature, the semantic domain list focuses on all the topics and categories of words one might express in a language or culture. The list of over 1,800 domains contain subjects such as: food preparation, house construction, beauty, weather.  Each subject has accompanying questions used to collect words, such as "What words do you use to describe rainy or stormy weather?"  Although the questions are not in the same language as the words being collected, someone would translate the question into that language verbally and then write down the words as people think of them.  Since our brains store similar concepts together, it is a lot more efficient to collect words according to semantic domains than by trying to think of all the words in a language that start with the letter A.

Today, despite its potential, the word collection method is woefully underutilized in minority language development efforts. People involved in language development projects typically collect a lexical database with only a few thousand words despite many years of engagement. By contrast, experience with the rapid word collection method has shown that it is possible to collect over 15,000 words in just two weeks at an early point in a project, providing a rich lexical resource for both language development and Bible translation. I formed the Rapid Word Collection Research Group to find out why the method has not been adopted wholesale, to address any flaws in the method, and repackage and rebrand it in a way that is destined to succeed and to attract outside funders. Think of it as an Extreme Makeover for DDP.

For a full week, our research group worked together to investigate the reports of word collection workshops that have taken place, noting what worked and what didn't work. We went over the existing DDP documentation that explains how to run the workshop, and we distilled its message into a very clearly defined set of instructions based on what has proven to be the best practice. We even spent ten minutes arguing over whether or not the workshop leader should staple certain word collection pages together! Our goal was to ensure that not only would the words be collected as efficiently as possible, but that they would also be entered into the computer at the same time. Far too many workshops have ended with reams of paper with thousands of collected words that are now collecting dust in some corner. The revised plan specifies a total number of 30 local participants, including two people dedicated to glossing the words, and two others dedicated to data entry. We believe that following the new parameters closely will bring a minimum result of 15,000 words---glossed, semantically tagged, and accessible on the Internet for further research. If those requirements required extra time or resources, we included them in the project plan and budget.

It soon became evident that we needed to round up the two weeks to a month by including a week of advance preparation and a following week to clean up the data. In addition, we discussed specific areas where we can further test and confirm this best practice method while documenting it on video. There is funding already available to run the first couple workshops, and we will be actively pursuing major funding via outside agencies who are interested in vernacular education, language documentation, or language development.

All the revised and expanded materials for Rapid Word Collection will soon be available on a multilingual website, including a downloadable form that will help any language project do the planning and budgeting to host such a workshop in their location. As funding becomes available, we hope to be able to provide the means for word collection to any language group who would benefit from it. Stay tuned for the website announcement which will be sometime in the last quarter of 2011!