The problem of documentation is to find strong evidence for the origin of a word. Our major source for such documentation is the OED. However, the evidence of the OED has to be used cautiously because we know that its earliest date of attestation is frequently not the earliest documentable use of a word. The sources drawn upon by the OED are not evenly distributed across the centuries. The OED is biased in favour of literature and particularly of canonically enshrined authors. Moreover, inescapably the OEHs readers were inconsistent in the thoroughness with which they gathered citations.

The improved availability of scholarly sources (editions, bibliographies, indexes, concordances, and the like) since the work on the OED was done enables us to see how much was missed by the compilers of that great dictionary and how cautious we must be in drawing conclusions from it (Schäfer 1980). We are now aware that the OEDs datings are often inadequate by several decades or even more than a century. Thus, the adjectival abominate is first documented in the OED from 1850; but it was used at least as early as 1594 (Bailey 1978:1). As electronic texts become more available, it will be feasible to estimate more accurately how cautious we need to be in using the OEDs evidence, and it will become easier to correct that evidence.

Several estimates of the rate of growth of the English vocabulary have been based on The Shorter Oxford English Dictionary, 1968 edition. There are, however, two problems with using that work as a basis for study. First, the principles on which it was abridged from the OED parent work are not clear; and second, the text of the parent work itself is seriously flawed, in the ways suggested above.

In particular, excerpting of eighteenth-century books for the OED was to have been done in America, but citation slips for that century did not reach Murray, and so, despite efforts to cover the period, it is seriously under-represented in the OED. Comments upon the growth of the English vocabulary based (as they generally are) on OED evidence, often through the medium of the Shorter OED, show a significant decline in the production of new words in the eighteenth century (Finkenstaedt & Wolff 1973: 29; Neuhaus 1971: 31). The temptation is to explain that decline as a consequence of the conservative temperament of the Age of Reason, a neat instance of the effect of world view on language. In fact, what the 'decline' almost certainly shows is lack of evidence due to uneven gatherings of citations. It is a fact, not about the language of the mid-eighteenth century, but about the vicissitudes of lexicography in the late nineteenth.

The neat and impressive-looking line graphs that have been drawn to show the peaking of word-making in the vigorous, language-intoxicated high Renaissance, its deep valley of decline in the eighteenth century, and its subsequent rise to a new, if lesser, high in the mid-nineteenth century show nothing about the language. What they show is the extent and assiduousness with which the OED volunteers read and excerpted books. Shakespeare was over-read; the eighteenth century under-read - that is what the graphs show. We have no reliable data on which to base generalisations about the growth of the English vocabulary. To get such data we need, not a computerisation of the faulty OED sampling, but a wholly new approach.

The problem of continuity is a more difficult and generally an unsolvable one. After a word is coined in English, we usually assume that all later instances of the word derive from the initial coinage. But clearly there is no reason why that should be the case for many words. A word may be independently reborrowed or reformed many times.

For example, cosmos 'the world' was used by Orm in the spelling cosmos about 1200 and identified as of Greek origin in the Middle English Dictionary (Kurath & Kuhn 1954- ). The first citation of the word in the OED is from 1650: As the greater World is called Cosmus from the beauty thereof, with the reference to 'beauty' echoing the Greek sense 'world, order, beauty' despite the Latinate form of the ending. The next citation is from an 1848 translation from German of Humboldt's Cosmo.. Thereafter, the QED has citations illustrating several closely related senses from 1858, 1865,1869,1872,1874,1882, and 1885. This evidence suggests that cosmos has been borrowed into English at least three times, twice (1200 and 1650) from Greek or Latin, and once (1848) from German.

The lack of evidence for continued use of cosmos between 1200 and 1650 and between 1650 and 1848 suggests that the two earlier borrowings were abortive; present-day use of cosmos begins with its 1848 borrowing from German. The OEDs 1865 citation, however, has the spelling Kosmss and refers to the Pythagorean concept of numerical order; it is at least influenced by Greek directly and may be another independent borrowing. It appears that the word in contemporary use is not descended from an early Middle English borrowing from Greek, but from a late Modern borrowing from German reinforced by Greek.

