A paper written for and read by me at the conference called Language Study and Thesaurus in the World organized by Japan's National Language Institute, Tokyo, August 1997. I had the honour of chairing its Plenary Session. The paper was later on reproduced in a condensed form in Lexicography in Asia edited by Mr Tom MacArthur, author of Lexicon of Contemporary English (Longman 1981). It gives an idea of the thought gone into the making of Samantar Kosh and presently the bilingual ones. And future…—Arvind Kumar







By Arvind Kumar


The Rich Sanskrit Tradition

The tradition of glossaries, thesauruses and dictionaries in India goes back to the Vedic age, estimated to be anywhere from 3,000 bc or earlier. The pride of being the world's first known thesaurus may go to Nighantu. It was compiled by sage Kashyap, and was a glossary of Vedic words, arranged subject wise. Sage Yask wrote a treatise on it, called Nirukt. Every Vedic scholar was made to memorise and master Nirukt, because a proper and precise understanding of words and their context was considered of utmost importance in carrying the Vedas, literally knowledge, from person to person and generation to generation.


clip_image004Over the centuries, the tradition led to compilation of many famous Sanskrit dictionaries. The Shabdakalpadrum[i], a Sanskrit dictionary, lists 29 such earlier works.[ii] Most these too were arranged subject wise and were thesauruses in a very broad sense.

Amar Kosh is at the apex of all the Sanskrit thesauruses. Its author Amar Singh gave his work the title of Namalinganushashan, i.e., the Discipline of Names and Genders. It was also called Trikand--after its three cantos. However, popularly it is known only as Amar Kosh,[iii] to commemorate the great achievement of its author, just as the English thesaurus is better known as Roget's Thesaurus, in all its editions and variations.

The exact time of the appearance of Amar Kosh is not known. It may have been written anytime between the sixth and the tenth century ad. This vagueness is because the ancient Indians never cared to keep an exact record of dates. Like Roget's Thesaurus, Amar Kosh was an instant hit. Ever since, it has been the subject of many treatises. Its Hindi commentator Pandit Haragovinda Sastri lists 41 of them.[iv] Its fame crossed the trans-Himalayan borders of India and spread far and wide. It is said that one Pandit Gunaraj translated it into the Chinese language some time in the 6th century.[v] Amir Khusro’s  bilingual Hindi-Urdu Khalikbari was directly inspired by it.

It has also been translated into many European languages. From one such translation Peter Mark Roget became acquainted with Amar Kosh when his book was under print. Till then he was under the impression that his was the first work of its kind in the world. He gave a cursory glance to it, and hastened to acknowledge it presence. In a footnote to the Introduction of his first edition, he writes:

'The following are the only publications that have come to my knowledge in which any attempt has been made to construct a systematic arrangement of ideas with a view to their expression. The earliest of these, supposed to be at least nine hundred years old, is Amera Cosha, or Vocabulary of the Sanskrit Language, by Amera Sinha, of which an English translation, by late Henry T. Colebrooke, was printed at Serampore, in the year 1808.'[vi]


Roget goes on to comment:

'The classification of words is there, as might be expected, exceedingly imperfect and confused, especially in all that relates to abstract ideas or mental operations. This will be apparent from the very title of the first section, which comprehends Heaven, Gods, Demons, Fire, Air, Velocity, Eternity, Much; while Sin, Virtue, Happiness, Destiny, Cause, Nature, Intellect, Reasoning, Knowledge, Senses, Tastes, Odours, Colours, are all included and jumbled together in the fourth section (of the first canto).'


Roget also expresses some satisfaction with Amar Kosh. He goes on to say: 'A more logical order, however, pervades the sections (in the second canto) relating to natural objects, such as Seas, Earth, Towns, Plants, and Animals, which form separate classes; exhibiting a remarkable effort at so remote a period of Indian literature.'[vii]

It is a remark coming from a dispassionate scientist-philosopher who liked everything to be properly categorised and ordered. However, the point Roget missed was that if it has to lead its users from one context to another by association of ideas, the categorisation in a work of societal nature may differ from society to society and time to time.

Thesaurus: a Mirror of Society

It is a cliché, but bears repetition. Like any piece of art, a lexicographic work, too, be it a dictionary, glossary, vocabulary or a true thesaurus in the modern sense, has to be addressed to its own society at any given point of time. It has to take care of the understanding levels and mental perceptions of its target audience.

The very scope and design of a lexicographic work and its success depends on the clear understanding of the target audience on part of its makers. On this depends the criteria for the basic format, framework or structure of a work. For example, a dictionary made for poets has to be arranged by the last letters of words, to be of help in rhyming. Thus, a large number of earlier dictionaries in many oriental languages follow this pattern. On the other hand, dictionaries in the modern age of the printed book are made for writers of prose. We find them following the alphabetical order so familiar to all of us.

Similarly, it is the users who dictate the pattern in which various word-groups in a thesaurus are to be placed in proximity of others. If a thesaurus has to lead its users from one word-group to another, it will have to follow the cultural and mental perceptions of its society to be of any practical use to them.

Amar Singh lived in an ancient oriental society. This society had a social pattern which was unique to India. Outsiders know a lot about it but find it difficult to follow its 'mental operations'. This society was compartmentalised in a rigid system of four well-defined Varnas or castes or classes of people, namely, the Brahmins, the Kshatriyas, the Vaishyas and the Shudras. In this society, everybody had a given vocation and social status. These were decided by the Varna in which a person was born and bred.

Above all, in this society, everybody's desires and motivations, thoughts and actions, were guided by a plethora of religions most of which taught that attainment of moksha or nirvana was the only goal worthy of a human being. All human activity, including pursuit of wealth, was a means to that end. Human destiny was guided by the supreme beings who inhabited the heavens.

This society guided not only Amar Singh's personal world-view but that of his audience too.

In the light of above, let us look into Amar Kosh and its structure and see how it was very much relevant to its contemporary audience, though Roget, belonging to a different society and period, found it rather inadequate. This will also help us in getting a better idea of the problems of a thesaurus's structure in general.

The Amar Kosh Structure

The Sanskrit language has a vast repertory of words, but Amar Kosh lists only about 8,000, most of them nouns or names as many Sanskrit grammarians referred to them. These are versified in 1,502 shlokas, organised in three cantos and divided in 25 headings. One subject leads to the next associated with it or to its opposite. The poetic form at times forces the author of Amar Kosh to deviate from the strict path of orderliness. At times, we find him taking short detours from the main line. The following table will give you a broad idea of how Amar Kosh is organised.

1. Heavens                     2. Earth                                          3. General Words

1. Heaven                       1. Earth                                           1. Adjectives

2. Sky                              2. Towns                                         2. Words with narrow meanings

3. Directions                  3. Mountains                                 3. Words with many meanings

4. Time                            4. Plants                                          4. Non-changing words -- avyayas

5. Intellect                      5. Animals                                      5. Words as per gender

6. Words                          6. Man

7. Dramatic Arts            7. Brahmins or priests

8. The Nether World     8. Kshatriyas or warriors

9. Hell                             9. Vaishyas or traders and farmers

10. Water                        10. Shudras or menials

clip_image006It is obvious why Amar Singh starts his work with the canto called The Heavens Group and the first heading in this canto too is Heaven. Heaven and gods occupied the top place in that society and guided not only human destiny but also all its activity.

Let us have a closer look at the contents of some of the word-grouips under the first heading Heaven.

1. Heaven            

Gods as such       

Adityas--the main gods, sons of Aditi

Types of gods--    enumerates 11 types of gods

Asuras or demons-- they too were considered gods in the beginning, but like Satan fell from their position later

Buddha-- because he places Buddha before Brahma and others, Amar Sinha is supposed to be a Buddhist by religion



Kamadeva or cupid                                              

Lakshmi--the consort of Vishnu                        

Belongings of Vishnu--like his conch, chakra, gada, sword, jewel, the sign on Vishnu's body, his horses, his charioteer, his minister, and finally the eagle whom he rides in the skies


In a similar manner, Amar Singh goes on to give names and synonyms of various gods and the things associated with each of them.

Sun, Fire and Air too are gods in Indian mythology. It was appropriate for these to get their place here. Each one is followed by various aspects of it. For example, Fire is followed by flame, spark, heat, ash...  and Air by various type of storms, various types of air that reside in the body, and Pran the breath. Since in the Indian mind air was associated with speed, it finds its place here, and so does continuity and quickness.

The fact remains that many of the Hindu gods represent human sensibilities and various aspects of nature on which the humans subsisted. Intellect was considered a heavenly attribute, and mind and the senses were supposed to be its aspects. Word or language, dramatics, etc., again relate to intellectual activity. Dramatics (including performing arts) especially was part of religious activity and a gift of the gods.

Roget found the placement of these under the heading Heaven 'jumbled together'. However, it was quite logical to Amar Singh's time and society.

We saw earlier that the first five headings of the Second Canto generally known as The Earth Group got some approval from Roget. These are the ones that deal with 'natural objects', namely Earth, Towns, Mountains, Plants, Animals. Later on, in our discussion, we will see that even these groups have not been treated by the thesaurus makers of old or present in a very scientific or methodical manner.

However, the last five headings in the Second Canto are of greater interest to us. It is in them that we discover how society dictates the classification and placement of word-groups in a thesaurus. Let us take a look:

The first one of these, i.e., No. 6 is devoted to Human Beings. In this Amar Singh tells us about things he thought pertained to humanity as such, e.g., man, male, female, types of women... young girl... young woman... relatives like son, brother, sister... weakling, strongman, fat man, disease...  I will not go into more detail of these and the other word-groups included in this heading because of the shortage of space and time.

From the heading 7 Brahmins onwards, Amar Singh enters the realm of social organisation. As pointed out earlier, Indian society in the Brahminical ages was organised on the basis of four well-known Varnas. One has to keep in mind that to an Indian living in that age the only valid point of reference to any social activity was based on this system. All artefacts and products were everyhow identified with these Varnas. It may be beneficial if we go into some details of any one of these. Well... , let us go to the Kshatriyas then. The following table represents only a few of the earlier word-groups under this heading.


8. Kshatriyas. They were the rulers and warriors. Naturally, this heading contains words pertaining to them, their activities like ruling and war, mental attributes like bravery and cowardice, and objects and things like horses and arms associated with them


Types of kings


Purohit or the priest of King

Judges and justice



Headmen-- like the village headman, the head of the mint, the keeper of the harem, the eunuch

Subordinates, servants

Other kings-- Enemy king, friend king, non-aligned king, king who defends and takes care of the fort when a king is out on a conquest


Request-- requests can be made only to a friend...




It may be interesting to note that death is included under this heading. Obviously so. Kshatriyas fought wars, war meant fight, fight meant death. War also meant prisoners of war, and the heading concludes with them.

I will now give just a very fleeting glimpse of the Third Canto. It is titled Words in General. Here the subject matter and the author's approach are very different from what we have seen so far. It is divided into five headings. (See Table 1.)

The first of these contains adjectives. The word-groups are put together either by association or juxtaposition. To give just a few examples--- tolerant, angry, very angry, awake, swaying with sleep, one who sleeps, asleep, facing the other way, facing down, one who worships gods, one who worships everything...  We see the same approach through to the fourth heading.

The fifth and the last heading in this canto is called Words as per genders. In this the word-groups are organised by the last letters, much like any other dictionary.

To summarise: the structure of Amar Kosh is partly based on classification and partly on social or linguistic associations. The social and linguistic association is mostly reflected in word-groups connected with human activity. Its third canto is partly glossary and partly thesaurus.

The Structure of Thesauruses in General

Various terms like hierarchical, classificational, domain-specific and associative have been used to describe the manner of collection of words in groups according to their subject or topic and the placement of these groups in the context of other groups. One may call this the structure, framework, organisation or arrangement of a thesaurus.

Talking of Roget's arrangement of word-groups, Ms. Susan M. Lloyd, who prepared the Longman's 1982 edition, tells us in her Preface to the 1982 edition:

'Roget arranged his... material into a comprehensive framework with a clearly visible structure, in which each topic, or concept, had its own logical place. In this, he was following in the steps of the seventeenth century philosophers such as Leibnitz, who had attempted the classification of concepts as a preliminary to inventing a Universal Language...  '[viii]

clip_image008However, this approach does not take one very far in a thesaurus. Roget had to deviate from it more often than not. Under the subheading 'The originality of the Thesaurus', of the above Preface, Susan M. Lloyd has this to say, in Roget's defence:

'While Roget approved of Wilkins's aims, and expressed his wish that his own classification might be instrumental in preparing the way for further investigations into a Universal Language, his primary intention in compiling the Thesaurus was more practical: to offer the reader a choice of expressions from which he or she could choose the most suitable or the most effective in a given context. His task, then, was twofold. First, like the philosophers, he had to create a hierarchy of concepts which would provide the framework for his book; then he had to discover and classify the language which could express these concepts. While the philosophers sought to simplify, in order to discover what they hoped were the limited number of concepts basic to any language, Roget had to recognise and come to grips with the protean ambiguity of the language itself, with all its interrelationships and its infinite capacity for expressing shades of meaning.'[ix]

One need not elaborate further, since Ms. Lloyd has very succinctly put her finger on the crux of the problem of creating and following a structure in a Thesaurus.

Categorisation of 'natural objects' like species and placing them in a scientific manner might seem a very easy task. It does not turn out to be so in a thesaurus. Let us take a look.

How do we place various animals and living beings with reference to others? One may choose to place lion either alphabetically within a broad group called animals or under a true hierarchical and zoological categorisation. Nobody has done this.

Susan Lloyd's edition lists lion under heading 365 Animals/cats. Roget's International Thesaurus (Fourth Edition) (revised by Robert L. Chapman), Harper & Row, 1997, puts lion under heading 414 Animal/27 (wild cats) and lists it also under subheading 58 Mammals. Another edition, Roget's International Thesaurus, Third Edition, Collins, London and Glasgow, reprinted 1974, lists lion under heading 413 Animals/4 (wild animals). However, and quite mysteriously, the same edition puts lioness much later under heading 420 Feminity/10 (female animals).

All these editions are published under the selling brand name of Roget, and broadly follow the original structure devised by Roget. None of these placements is very scientific or methodical, if one may say so. All of them derive inspiration from a societal context.

Mr. Tom MacArthur breaks new ground in his Lexicon of Contemporary English (Longman 1981) by devising a totally different structure. In this work, A50 Animals/Mammals is an independent heading having various subheadings. Lion is included (with illustration) under subhead A53: the cat and similar animals. He gives words for a female and a young lion at the same place. However, this work is not a thesaurus in the sense that it does not offer synonyms. Its purpose is to place words of a similar nature for a user to understand and appreciate subtle distinctions.

To a poet a lion may remind of the deer whom the lion hunts, as it certainly would a Sanskrit poet in whose imagery the two are linked through the sport of hunting. Thus, in Amar Kosh, we find lion as the first subcategory in the Second Canto's heading 5 Lions etc. He starts with lion considered to be the king of animals and list many other wild animals just after it and concludes the subsection with cat, iguana and porcupine. The wild animals lead him to deer whom the lion hunts. From the deer, he goes on to enumerate legendary beings who in his times were counted among animals.

What is remarkable is that Amar Singh does not list all the animals here. One finds cattle like cow, sheep, under the Vaishyas, because animal husbandry was their profession. Elephants, horses, etc. are under the Kshatriyas. It was they who used them in warfare. Similarly, in Susan's Longman edition the main entry for horses is under heading 273 Carrier.

This brings me to another and perhaps more important aspect of the subject. How to arrange, in a thesaurus, various headings with reference to others. Let us for the time being remain with the lion and the deer and consider where may the word-groups connected with hunting be placed. Under sport? To the kings in historical times, it was a sport.To many it is a sport even now. Under jungle? Why not? Jungle may remind one of hunting? Under bravery? Under adventure? Under professions? To a hunter who lives on his earnings from it, it is a profession. In such a scheme, it might be placed near butcher. Under violence? To a Vaishnavite Hindu or a Jain, who believes in non-violence, it represents nothing but abhorable violence.

Susan M. Lloyd's edition puts hunting under heading 619 Pursuit. In Roget's International Thesaurus (Third Edition) Collins 1963, the headword is Pursuit, but the heading number is 653. In Roget's International Thesaurus (Fourth Edition) revised by Robert L. Chapman, Harper and Row, 1977 the heading number is 655.

The concept Pursuit forms a minor part of this heading. It gets two paragraphs pursuit and pursuer as nouns, later followed by one paragraph as verbs, one as adjectives and another as adverbs. The rest of the word-groups under Pursuit are: hunting, fishing, hunter, fisher, quarry, (to) hunt, fish, and the interjections hunting cries.

clip_image010At times one wonders that the topic of pursuit did not remind Roget of a policeman in pursuit of a thief? Or why did he not create an independent heading hunting which could have found its due place elsewhere? The idea of pursuit could remain where it is with more word-groups like following, coming after. But following reminded Roget of followers and courtiers the concepts which could be placed elsewhere.

As far as the question of placement of the heading Pursuit is concerned in all these Roget editions, it comes under the general category Volition. Susan's edition puts it under 2 Prospective volition/conceptional with the following plan:

617 Intention                           618 Nondesign

619 Pursuit                              620 Avoidance

                                                621 Relinquishment

In Roget's International Thesaurus (Fourth Edition) by Chapman (Harper & Row) it is under Volition/Purpose with this plan:

653 Intention

654 Plan

655 Pursuit

656 Business, Occupation

Roget's International Thesaurus (Third Edition) (Collins) follows the above plan with no divergence.


clip_image012Tom MacArthur, in his Lexicon, again breaks new ground. He puts hunting in the broad category M Movement, Location, Travel, and Transport/Moving, Coming, and Going, where under M34 verbs and nouns are found following, chasing, hunting. This is preceded by M33 hurrying and rushing, and followed by M35 escaping, etc.

In the same work hunting also features in the broad category N General and Abstract Terms/Showing, Hiding, Finding, Saving, and similar words where under N359 verbs: seeking and searching, one finds hunt as one of the keywords. In the same subcategory hunt features as a keyword in N361 finding, discovering, etc. However, fishing has been delinked from hunting and features under K Entertainment, Sports, and Games/K 190 Outdoor Games.

One can understand this rather easily. Hunting is no longer a sport in western society, while fishing continues to be. Still one feels some sort of a link could be provided between hunting and fishing.

All this goes to show that any placement of words or word-groups in a thesaurus is at best arbitrary. Everyone tries to be as logical as possible in a field which can only be partly logical.

The Samantar Kosh

When we started work on our Samantar Kosh, we thought our job would be rather easy. Did we not have the excellent model of Roget to follow? As a first step, we assigned numbers to all the concepts as per our model. We thought all we had to do was to add Hindi words to them. Very soon we discovered that it would not work. Indian sensibility did not lead the reader as per Roget.

To give an example, Roget saw hereafter and doomsday in the context of Future. An Indian would be more comfortable if hereafter led him to life after death, rebirth, incarnation, this incarnation, past incarnation, moksha. To his way of looking at things, Doomsday has more to do with Pralaya or the end of the world juxtaposed to creation. A God-fearing Muslim or Christian would think of doomsday linked to heavenly justice and retribution.

clip_image014When Roget failed us, we thought of pursuing Amar Kosh. However, we found that Amar Singh was too much out of tune with expansion of knowledge and language. Also, Indian society has changed radically since Amar Singh's day. No longer is an Indian reminded of war or arms in the context of a Kshatriya. Nor would one think of lion in the context of a Kshatriya or of cow in that of a Vaishya. The Shudras are no longer menials or servants.

The sombre realisation was that we had no model to follow. We decided to develop our own system as we progressed with the work. The most important question was: What order, sequence, pattern to give to our word-groups so that a reader could make the best use of it? Do we divide our headings in broad classes as Amar Singh and Roget had done?

We know Amar Singh divided the whole language in three broad Cantos: 1. Heaven. 2. Earth. 3. Words in General. Roget compartmentalised the whole language into eight classes: 1. Abstract Relations. 2. Space. 3. Physics. 4. Matter. 5 Sensation. 6. Intellect. 7. Volition. 8. Affections.

While organising our data of some 1,60,850 expressions arranged in 1,100 headings and 23,759 subheadings, we forgot all about Amar Singh and also decided to do away with the Rogetian classification. We kept ourselves to the basic line that our word-groups should be collected under specific headings. Some of these headings would be clubbed together because of similarity or dissimilarity, but it would not be necessary for our headings to follow any strict hierarchical order. The only guiding prinicipal in their mutual placement should be that one heading should lead to the next one by association or juxtaposition. We would jump to an unrelated subject only when it was unavoidable.


If you look at the list of 1,100 headings in our Kosh, given at the beginning of the book, you will see that there is just no attempt to have any sort of classification. Only the names of some headings have been printed in bold letters. This does not indicate any logical change from one class to another. It only draws a user's attention to the nature of subjects which one may find in its vicinity. We start with

The Universe


Stellar Body

Movement of Stellar Bodies

Rotation of Earth


Solar System (all the non-earth planets)

Sun and Moon



Plains and Deserts

Jungles and Gardens

Garden and Urban Trees

Garden Flowers

Pits and Caves

Mountains and Valleys

Indian Mountains (list and synonyms of important mountains like the Himalayas--we give only 30 out of many others)

Ponds and Lakes

Water Supply (types of wells for drinking and irrigation, water carriers, water taps)


Indian Rivers (Ganges has 37 synonyms here, Yamuna 20)

River: From its source to the end (source of a river, waterfall, flow of water, confluence, delta, and submergence in the sea) along with verbs and adjectives.


Draining Out of Flood Waters (includes canals, drains, sewers, etc.)

Seas and Bays

As in the popular mind a body of water is associated with its banks and landmass, we go to:

Land (surface of earth) (including coast, ground recovered from water, marsh, etc.)

Islands and Continents

Asian Continent and Countries of South Asia


Now, let me tell you briefly how we have treated the living beings. After various headings devoted to matter and energy, Samantar Kosh goes to animate mater. We start with 111 Vegetation, its aspects like 112 seed, root, 114 branches, 115 leaves... go to 121 living beings... 122 Worms and Insects, 123 Reptiles, 124 Water Animals, 125 Fish, 126 Birds, 127 Animals. While Fish and Birds are listed alphabetically, the Animals section starts with describing the types of animals, wild animals, pets... to dairy cattl... animals used for riding and as carriers, to deer, lion,... cats, rats... dogs... monkeys and concludes with primates to lead upto the next heading 128 Man.

It will be very much in place if, before ending this section of our discussion, I mention another major difference between the English language and many eastern languages. In English, most of things have only one word. Lion is lion, at the most also Leo. Mango is mango. In Hindi, they have many synonyms. In our main data bank of 5,40,000 records lion has 129 synonyms, cheetah 29, deer 55, elephant 165, wheat 30, mango 46, grape 34... . Among gods, we collected 2,317 names for Shiva! It may not be necessary to include all the synonyms in a thesaurus designed for day-to-day use. Yet, very many of them have to be included and special methods have to be devised to contain them. Let us look at the problem in some detail.

The fact that many concepts do not have a large number of synonyms allows the makers of English thesauruses to include alphabetical lists for groups of things (to change to subject from animals) like minerals, ores, elementary metals, alloys. In these, iron features in elementary metals (here the thesaurus maker gives two more words [Ferro- or ferri- sider(o)-], also as iron pyrites in minerals, steel is listed under alloys. One may comment here that, in the reader's mind, iron is linked to steel. The flow of thought in this list fails to take a user from iron directly to steel. He has to remember that steel is an alloy of iron with carbon and various other metals like nickel, chromium, manganese. Only then will he be able to locate the word steel in the next list Alloys.

This device of providing simple lists just does not work in Hindi. To give an example, the first edition of our work contains 20 synonyms for iron (out of 57 from our data). Similarly, the device of lists does not allow the inclusion of other related things like raw iron, cast iron, iron dust--for all of which the Hindi language has many words. Besides these, our work under heading 93 Metals goes on to steel, alloy steel, stainless steel, steels which were used in India earlier, e.g., armour steel.

Thesauruses and Computers

We now live in the age of computers. They are heralding a revolution which will change the way the world has been collecting and using information till now. The capacity of society in general and individuals in particular to handle vast datas of information is on the verge a great explosion much bigger than the one caused by the printing press and with much greater impact. Every day computers are getting more and more involved with thesauruses. There are three major aspects of their role:

           Online thesauruses available in various DTP or Desk Top Publishing programs. Most of them act as dictionaries of synonyms. Some of them also provide antonyms. I don't think that any one of them leads to other concepts which may be similar or associated with an idea.

           Stand alone thesauruses which a user might consult. Generally they come on CDs with encyclopaedias and dictionaries.

           Specific programs to make thesauruses.

For the present discussion our main interest lies in the programs to make thesauruses. (I will discuss the possibility of using them to make bi-lingual and multi-lingual thesauruses, with a view to lay the foundation for a huge thesaurus-like database of words from important world languages.)

To give a focus to the subject, it would be better if I talk from our own experience in making Samantar Kosh, and the use of computers in finalising it.

When we first thought of making our thesaurus, in 1973, computers were a mysterious thing, bulky and rather primitive, and much beyond our meagre personal means. Hence, the idea never occurred to us that we would ever be computerising it. Accordingly, we designed special-sized cards for ourselves. By 1990, we had a room full of them kept in wooden trays, one or more of which was earmarked for a broad category. These trays, almost forty in number, containing more 60,000 cards spread out fan-wise on our work table and around it in specially designed racks. Day be day, we were finding it impossible to keep track of the cards and the subjects covered in the trays. The idea of computerising our data looked the only way out. Fortunately for us, by this time, computers had become more user-frinedly, smaller and cheaper.

It was our son Dr. Sumeet Kumar's fanatic idea that we must computerise our data, and it was his job to find out all about it and make the necessary arrangements. There were certain questions to be answered. The most important was: Are there any computer programs available to make thesauruses? He learnt about some programs which could be used to make thesauruses. They had very curious names, such as Assasin, Astute, Avocon, and one with a very simple name Thesaurus Development. Their description in the books he read made them sound too tedious and time consuming. We needed something simpler.

Sumeet's thinking was that most suitable for us would be a database management program, which would hold huge data and be easily manoeuvrable. After considering all aspects, Sumeet chose FoxPro DOS version.

Now came the question of getting an application written for us. Computer programmers, like the hocus-pocus men, make their work look very mysterious and intricate and ask for the sky as their fees. Sumeet decided to learn a bit of programming himself. Now, we had a budding and enthusiatic programmer in the family. He wrote out an application which would fulfil the immediate need of data entry. Later on, he kept on adding features to it to satisfy our growing demands. Ultimately it led him to write an application to convert data into preformatted text for making the pages and index of the book. The result has been a full-fledged program with which we are fully satisfied, to the extent that now we can use it to make bi-lingual and multi-lingual thesauruses.

On to Bi-lingual Thesauruses

The makers of bi-lingual dictionaries would love to give a one-to-one correspondence for words in two languages. Persons engaged in writing various computer programs for translation from one language to another would be willing to give the earth for such a dictionary.

All of us know it is very rare to find two words in two languages carry the same meaning and weight. Even if two or more such words are generally used in the same context in two the languages, generally they carry different weight and associations. To quote a simple example: for the word success in English, we use the words saphalata and kamyabi in Hindi. All the three words have a different cultural and semantic background and context. Success carries with it the sense of having reached somewhere. Saphalata is a word emanating from an agricultural background. It literally means fruitfulness, having come to fruition. Kamyabi has an Indo-Persian origin meaning achievement of an objective.

If I try to find the English equivalents of a very common Hindi word like shobha, I am always at a loss. I find listed many English words as its rough equivalents in many Hindi-English, Sanskrit-English dictionaries: splendour, brilliance, lustre, beauty, grace, loveliness, elegance... None of these satisfies me. Shobha carries with it just a fraction of all these put together, and also something beyond them. It has something do with showiness which is not cheap but good and becoming, which none of these words conveys.

I have always felt that bi-lingual thesauruses would be a much better way to help persons from two different languages to understand and select the right option. A bi-lingual thesaurus would offer a whole range of words for any word-group instead of repeating many similar options for a single word within the group. Such a bilingual Hindi-English thesaurus is a crying need in India. It is said that Indian is one of those few countries which have a very high density of English-speaking technical personnel. This army of young Indians is more at home with English than Hindi. They are computer-literate too. It is they who are occupying the driver's seat in every profession, in offices, factories. Generally, they are at a loss for the right or nearest Hindi word for an idea which may come to them in English.To fulfil their needs, we already have started work on a bi-lingual Hindi-English thesaurus. Our son Dr. Sumeet Kumar has devised a computer program for this. Our daughter Meeta Lal is working on laying the base for it.  (See Postscritpt)

The thesaurus will have two independent indexes, one for each language. With the help of this thesaurus a user will be able to enter the different worlds of Hindi and English from any language of his choice. One concept will lead the user to another, as any thesaurus does. We believe this may be of more help than a simple bi-lingual dictionary.

To our way of looking at things, the matter does not end here. Actually, it opens a whole new world of possibilities. Why do we not start creating a database which may, ultimately, become a huge repository or depository of words from all the major languages of India, and the world. This repository may begin with any two languages, and ultimately encompass all the languages of the world. In today's brave new world of computers, with dramatic achievement in augmenting their capacities, and with the introduction of 32-bit computing and the emergence of 64-bit computing on the immediate horizon and the huge capacity of new generation discs, this may not be such a wild dream.

In India, we already have computer technology which seamlessly transliterates one phonetic script to another, from Hindi to Bengali, to Tamil, to Simhalese, to Tibetan, to Thai. It is also possible to get special programs written which would allow all the scripts of the world, be they phonetic, pictographic, or otherwise, to be integrated in the sense that it may become possible to enter many scripts in the different fields of a database. I have heard, I hope rightly, that the best brains of the world are already on the job.

Ultimately this may lead to the fulfilment of the wishes of earlier philosophers who dreamt of evolving a world language, but in a way unthought of by them. I do not foresee that the world language will become one. But they can be united in a database. From our database of the world languages, we should be able to get bi-lingual, tri-lingual or multi-lingual thesauruses of any number of languages of our choice. It will just be a matter of choosing the target languages for such a thesauruses and manipulating our databank to get an output in the shape of a thesaurus either in a printed book format or as an online computer program. Since we shall be working in a database format, it will allow for various structures/orders to be followed for different base languages as per the sensibilities of its audience. As far as an online computer program is concerned the question of a structure is almost immaterlial. Any structure or arrange would do, because the search engines in a computer need only hot links, not a structure for a user to reach the target word-group.

Amar Singh and Roget accomplished their legendary work on individual stamina. Similarly, our Samantar Kosh is the result of a whole family's involvement. But any progress in the field of bi-lingual thesauruses on international scale is much beyond the capacity of an individual.

It may not be a bad idea, to start with small and compact groups of people. They could look at the possibilities of a thesaurus of any two/three languages of their choice. For example, there could be a group to link Hindi with the Japanese language. It is high time Asian languages started coming closure. The growing importance of Japan in Asian and world affairs demands this.

I do not know if any work has been done in the field of bi-lingual thesauruses in European languages. There could be a group to work on a bilingual English-French, English-German or English-Spanish thesaurus.

Ways could be found to correlate the datas created by such groups to get a world-wide words bank. And ultimately create a huge World Bank of Words!  Why not?



Ten years after the reading of this paper, our combined efforts resulted in the publication of The Penguin English-Hindi/Hindi-English Thesaurus and Dictionary from the Penguin India in 2007. Last year, our startup company Arvind Linguistics Pvt Ltd purchased the total stock of the celebrated work from the Penguins, and now markets it through its own channels. Its address is: Arvind Linguistics Pvt Ltd, E-28 (First Floor), Kalindi Colony, New Delhi 110065.


Our lexicographical endeavours continue unabated. We were working on a digital internet version of our data called Arvind Lexicon. This is now available on our site The online lexicon at present comes in three editions: 1. Library Edition. 2. Professional Edition. 3. Free Edition. This can be accessed by anyone, after registering as a user. Below is a screen shot from its professional version:




Arvind Kumar, 22 January 2013

