Research Strategies (abridged) - 2

No further updates


HomePreface1 - Taking Charge2 - Databases

3 - Information Fog4 - Periodical Maze 5 - Internet Research6 - Other Resources

7 - Case Studies8 - Learning to Read9 - Organizing Notes10 - Research Writing


Note that chapter arrangements in the 2011 4th print edition will differ from the above. 



The print edition of this textbook now divides keyword and controlled vocabulary searching into two chapters.  Selected material from those two chapters is brought together in the information below.]

It is time to begin traveling into the realm of electronic searching, a land much more complex and (frankly) exciting than you ever imagined when you first Googled something and actually got what you were looking for. So pack your bags. On the plus side, by the end of our exploration you�re sure to be a better and wiser human being.

What is a Database?

Whether you are familiar with computers or not, you've encountered databases.  Databases are everywhere, and I guarantee you�ve already searched one or more though not necessarily with all the skill you needed.  When was the last time you used a phone book, a dictionary, a library catalog? All of these are databases. Here�s a definition:

A database is any collection of data that can be retrieved using organized search procedures.

Many people today are hotshots on computers. They can make the keys sing, the mouse roar and the CPU hum. But few of us understand database searching well enough to do it effectively, let alone efficiently. You may know how to use a computer, but disaster will befall you if you don�t understand how to search databases. Worry not, however. You�re about to discover a few things.

Keyword Searching

The keyword has become the main tool of research in today�s electronic database environment, and many people assume that keywords are their friends. That�s not exactly the case. Have you ever had �friends� you trusted only to a certain point, because you had to tell yourself to watch your back when you were around them? You knew they appeared kind and could be helpful, but they could just as easily betray you or do you harm. That�s the keyword�useful, handy, but potentially a backstabber. Keywords represent the Wild West of database searching�bold and exciting, but risky as can be.

Database Basics for Keyword Searching

We start with the principle that every database is made up of words. Computers, though inherently unintelligent when it comes to real thinking, are experts at recognizing words. To understand how keyword searching works, you need to know that most databases use descriptive records to identify and show the features of the data they are dealing with. For example, every time a new book is added to a library, it is cataloged by creating a catalog record, which might look something like this:

Title: Google and the myth of universal knowledge: a view from Europe
Author: Jeanneney, Jean No�l, 1942� .
Publisher: Chicago: University of Chicago Press, c2007.
Description: xvi, 92 p. ; 23 cm.
ISBN: 9780226395777
Series: Digital formations ; v. 6
Google (Firm)
       Library materials�Digitization.
       Electronic information resources�Europe.
       Information organization.
       Digital libraries.
      Web search engines�Europe.
       Internet industryEurope.
       InternetSocial aspects.
Call No.:
ZA4234.G64 J4313 2007

This is, in essence, a description of the main details about this book that the database needs to have available in order for you to identify and find the book. When you search the database by keyword, you will be looking for significant words in this record. Now, imagine that there are thousands of records, and youre interested in finding a list of books about the social ramifications of the Internet. You should be able to think of important words (= keywords) and input them into a search box. The database search program will then look for those words in each of the records in the database and will download to your screen any records that have the words youve asked for.

Insight #1:With keyword searching, what you type is what you get. The computer cannot interpret your request or give you the next best solution (though some may search on a few synonyms). All it can do is identify the words you ask for and give you the relevant data. Garbage in, garbage out.

With many keyword searches (but not Internet search engines like Google or Bing), you can type part of a word, then add an asterisk (*) or sometimes a question mark (?) or a dollar sign ($), and the computer will look for every word that begins with the letters you typed. E.g., interact* will ask the computer to search for interact, interacting, interaction, even interactivity.  This is called "truncation."

You can also sometimes do forward truncation in which the asterisk goes at the beginning (rare) or middle truncation (wildcards), in which truncation is done within a word (e.g.,Wom*n).

Even given the variations allowed through truncation, keyword searching demands a whole lot of precision. The search function in the computer database will only find the exact thing you want it to find. If you mistakenly type intract* instead of interact*, the search function will give you data with words intractable, thus spoiling your whole day and making you grouchy in social environments. So do what your mother told you and learn to spell. Get it right the first time.

Boolean Searching

Many years before computers, a man named George Boole invented a mathematical system that enabled people to visualize the combination of various classes of things. The computer folks have taken his system into the world of database searching in order to formulate searches where two or more terms are used. Lets look at some of the basic commands used in Boolean searching:

The OR Command

Suppose that Im looking in a database for information about cars. I realize that a keyword search will pull out all information that has the word cars in it, but some people use the term automobiles. How can I tell the database search program to look for both words at the same time and give me data whether that data uses the word cars or the word automobiles? In a situation in which I am searching for synonymsdifferent words that mean the same thingI use the OR command. Lets visualize it this way:

That is, Database Search Program, please give me everything on cars or automobiles, I dont care which. So you will get all the data with cars in it, plus all the data with automobiles in it. Both words dont have to be in your results. Either word will do.

In a keyword search in a computer catalog or some other database, your search may look like this:

Cars or Automobiles

Another situation calling for an OR search might be that in which two concepts are closely related, and you suspect that finding data on either of them will further your overall goal. For example, in doing a search for psychoanalysis you might also want to search for the father of psychoanalysis Sigmund Freud. If you leave off the Sigmund (because he is usually referred to just as Freud), you can formulate a search like this:

psychoanalysis or Freud

With an OR search, you typically get a lot of hits, that is, pieces of data brought down to you out of the database.

Insight #2: An OR search is usually for synonyms or for keywords that are already closely related. You use it to anticipate the various ways something might be described or approached so that you dont have to do multiple individual word searches.

The only alternative to doing an OR search is to do separate searches on each of your search terms, then try to compile the various sets of results. OR lets you avoid the pain of such an experience.

The AND Command

One of the most profitable uses of keywords is in combining topics to narrow down a search. For example, if you wanted to look at the problem of educating homeless youth, a keyword search could be formulated to produce very precise results.

Lets visualize it with a diagram first. If youre searching for the relationship between homeless youth and education, you dont want every piece of data on homeless youth, nor do you want every piece of data about education. You want the data that comes from having homeless youth and education intersect. Thus:

Your formulated keyword search will look like this:

homeless youth and education

A little tip: Be very careful not to add unnecessary words to AND searches.

Insight #3: In an AND search, always look for the fewest number of terms requiredto get data that is on target with your search goals. The more unnecessary terms you add, the more you risk screening out good data that does not use those terms.

Insight #4: A keyword AND search is used to search for data that relates two or more topics or concepts together. The data found will show the effect of the relationship between/among these topics.

An AND search is a limiting kind of search. It asks the search program to provide data only when that data contains all the keywords linked by AND. Thus, you should expect that an AND search will give you fewer hits than if you had searched each keyword on its own.

Insight #5: AND searches will narrow or limit your topic. Thus you can expect that you will not get as many hits with an AND search as with an OR search.

Nesting ANDs and ORs

[See the print edition]

The NOT Command

If Im back to looking for information about cars, but Im not interested in any car made in Europe, a NOT search is what I need (Please dont send me cards and letters asking why I have a problem with European carsits a long story). With this search, I want to tell the computer to give me everything about cars

but no data about European cars. Heres how to do a NOT search:

(cars or automobiles) not Europe*

What Im saying is that I want data about either cars or automobiles as long I dont have to deal with European cars or automobiles.

Exceptions to the Above

[See the print edition]

Controlled Vocabularies

Metadata - Definition and Use

[See the print edition]

Controlled Vocabularies

A few years ago, as people started sharing things on the WWW (photos, favorite bookmarks, etc.), folksonomies started to develop. A folksonomy is simply a user created method of labeling items so that you or anyone else in your circle of friends of colleagues can use these labels to find your items. A couple of examples are Flickr ( ) for photos and ( for Web site bookmarks. The labels or tags used in these sites can be displayed as clouds of links to actual data. Users can search across one anothers collections, pulling out, for example, favorite pictures from Japan or bookmarks to sites on ballroom dancing.

One big problem, though: The tags are user generated. This means that consistency goes out the window. One user tags his photos of New York with the term newyork. Another uses nyc, or newyorkcity or ny (most tagging systems demand that you tag with only one word or close up spaces). This means that if I search across the collection for pictures of New York, I need to know all the relevant tags that are being used for the subject, or I will most certainly miss a lot of items just because I didnt use all the right tags.

In some ways, it would be great if some dictator webmaster actually issued a set of standardized tags and insisted that everyone use them instead of making up their own. Hmmm Actually thats the answer. A set of tags that is uniform so that all the pictures of Japan have to be tagged JapanPictures, not JapaneseVacationPictures, or PicturesOfJapan. That way, you would get all the pictures of Japan that are in the collection.

Too bad I cant claim credit for the idea. The Library of Congress in 1898 was faced with the prospect of creating a new method of cataloging its large collection. As part of the process, the librarians determined that the only way to be sure all books on a particular topic could be identified was to standardize a system of subject headings (tags?) that would be used in the descriptive record (metadata) related to each book. They created records containing these subject headings along with other metadata (authors, titles, etc.).

Library of Congress Subject Headings

Consider the following descriptive record (metadata) for a book

LDR: 00793nam 2200265Ia 45x0
005: 20030610113341.0
008: 030610s2003 nyua 001 0 eng d
020: $a 0595271960
035: $a ocm52399619
090: $a LB2375 $b .B33 2003
100: 1 $a Badke,William B., $d 1949
245: 10 $a Beyond the answer sheet: $b academic success for international students/$c William B. Badke.
260: $a New York: $b iUniverse, $c c2003.
300: $a v, 152 p. : $b ill. ; $c 24 cm.
500: $a Includes index.
650: 0 $a Academic achievement.
650: 0 $a Foreign study.
650: 0 $a College student orientation.
650: 0 $a Student adjustment.
650: 0 $a College students.
650: 0 $a Students, Foreign $x Education (Higher)

If you look at the 650 fields just above, you will see terminology like:

Academic achievement.
Foreign study.
College student orientation.
Student adjustment.
College students.
Students, Foreign $x Education (Higher)

Each of these is a Library of Congress subject heading related to some aspect of the books subject matter. How did these subject headings originate? Quite simply, the Library of Congress (LC) in Washington, DC predetermined the terms by which most topics in the world of information would be called and then organized these terms in alphabetical lists. Some subject headings were easy: dogs are DOGS, sunflowers are SUNFLOWERS, and so on. Some were more difficult: What do you call senior citizens? LC chose AGED, much to the outrage of senior citizens.

Television faith healers are HEALERS IN MASS MEDIA. Why? Because LC said so. That�s the point with controlled vocabularies. These vocabularies  are created by people �out there� who then control them and refuse to allow you to change them.

The fact is that you can�t have it both ways. You can either choose your own search terminology (as in keywords or tagging), in which case you can�t be sure you�ll find everything, or you can use standardized terminology not chosen by you. If you use standardized terminology, you have less flexibility, but you are more likely to find most of what the database has to offer about the topic you are seeking. Thus:

Insight #1: With controlled vocabularies, you have to use the subject terms provided by the system. You might not like the terms chosen, but they are what you�ve got. No variations are allowed; you have to use the subject headings in the forms provided to you.

How does a controlled vocabulary work? Armed with a set of predetermined subject headings, catalogers (creators of metadata records) decide which heading (or headings) to assign to a particular chunk of data. In the case of LC, every time they get a book to catalog, they write a description of the book (i.e., the catalog record), which then becomes metadata, and to that metadata is added one or more controlled subject headings.

So a book entitled Them TV Preachers may have the subject heading HEALERS IN MASS MEDIA assigned to it. A book called Active Seniors in Today�s World may be labeled with the subject heading AGED. Note something very important here. The book Them TV Preachers did not have any of the actual words of the subject heading in its title. The title told you the book was about TV preachers. It said nothing about healers or about mass media. The same was true for the second title�Active Seniors in Today�s World�the term �AGED� is not to be seen anywhere in the title. Why, then, were they given the subject headings they received? Because some intelligent librarian sat down with these books, determined what they were about, and then assigned the closest subject headings from the already existing controlled vocabulary list. Thus:

Insight #2: The actual wording in a title of whatever you are searching for is not important for controlled vocabularies. Subject headings are assigned on the basis of somebody�s judgment as to what the item is actually about. The title words can be the same as words in the subject heading or radically different.

Consider the advantages: I have 5 books with the following titles:

Terminal Choices
Choosing Life or Death
The Practice of Death
The Right to Die

All of them are about mercy killing or euthanasia. You might not have guessed that fact by looking at the titles, but the intelligent LC librarian has looked over these books determined that they are all about the same topic and assigned them the same subject heading to all of them: EUTHANASIA.

Controlled vocabularies are a good solution to the problem of retrieval. How can we ask the right question so that the database will deliver to us the information we need? If we wanted a list of books about euthanasia, it would be nice to have a search tool that would enable us simply to type a predetermined word or phrase into the computer and get back a list of all the euthanasia books regardless of the wordings of the actual book titles. This is what a controlled vocabulary is designed for. Most of the books on euthanasia in a library will be retrieved just by typing in the subject term EUTHANASIA.

Insight #3: Use a controlled vocabulary as a search tool when you want a collection of data on the same subject regardless of what the data actually says about itself.

But let�s be clear about one thing�controlled vocabularies are �controlled� in the sense that someone other than you has determined what they will be. You as the user can�t mess with them by changing their words or rearranging their structure the way you can with keywords. You use controlled vocabularies; you don�t create them and you can�t fool with their form.

Insght #4:Messing with controlled vocabulary wording or form is strictly forbidden. Subject headings are created by someone other than you, and they can�t, in most cases, be manipulated or turned into keywords.

Let�s see how the LC Subject Headings controlled vocabulary system works in practice. The Library of Congress provides subject headings for its own books, but it has also conveniently issued its list of approved headings so that all of us can use their system. Most libraries in North America have chosen to do just that, so that your library�s subject headings are likely derived from the Library of Congress.

Your library may have a print edition of Library of Congress Subject Headings as a set of large red volumes. Below is a mock-up of what you might see on typical page from the guide. On the right are the subject headings or alternative headings. On the left is a description of what you are seeing on the right. If you have trouble distinguishing left from right, look for italics (left) or non-italics (right):


Working the Angles�Identifying Controlled Vocabularies

Controlled vocabularies can involve more than subject headings. Names can be tricky in databases�Am I �Badke, William� or �Badke, Bill?� Thus many databases also have controlled vocabularies of names, by which they standardize the form of an author�s name so you can find everything by that author in the database. Titles of books or articles have their own controlled vocabulary built in. A title takes a unique form in its choice and order of words so that titles tend to be standardized automatically.

But subject headings, whether they are in a library catalog or some other database, are a challenge to identify.

Library Catalogs

Library catalogs often do not have guides to subject headings embedded in them, which is why you have to use the Library of Congress Subject Headings volumes or the online version, described above. But there is another way to identify subject headings. As a purist, I hesitate to tell you this, because it�s not foolproof, and you may end up missing headings you could have used. But here it is: Starting with a keyword search in a library catalog may be the best way to find relevant subject headings.

Here�s how it works. First do a keyword search in the library catalog, using words you think might appear in titles of relevant books on the topic you are dealing with. Find a book that is right on topic and click on the title of the citation to it to open up the full catalog record. For example, you might be writing on Pacific Island societies, so you use these terms in a keyword search. You discover in your result list the ideal book and open up the full record:

The growth and collapse of Pacific island societies: archaeological and demographic perspectives/
Author: Kirch, Patrick Vinton Rallu, Jean-Louis
Publisher: Honolulu: University of Hawaii Press, �2007.
ISBN: 9780824831349
Subjects: Ethnology�Oceania.

Look at the line that says �Subjects.� There are a couple of official, authorized forms of the Library of Congress subject heading for this concept Ethnology�Oceania and Ethnology�Hawaii. They may not be the subject headings you would have thought of, but that doesn�t matter. What you have with these subject headings are tools to find all the other books on the topic, usually just by clicking on the hyperlinked subject headings in the catalog record. So this method is relatively simple: Use a keyword search to find one book on your topic. Open the citation to get the full catalog record. Identify the LC subject heading(s) used for this book and click on its (their) link(s) to search for other books like the first one.

Other Databases

There are a many databases out there that use controlled vocabularies. Here are some clues to finding their subject headings systems:

  • A term commonly used is �Thesaurus,� that is, a guide to subject headings that not only identifies authorized subjects but also can lead you to broader, narrower or related terms that are also authorized. Sometimes there is a �scope note,� that is a definition of what is covered by a certain subject heading.

  • At other times in a database you may see a link to �Subjects� or �Descriptors,� which mean the same thing.

  • In connection with the options above, or sometimes separate from them, you may find a �browse� function. �Browse� generally involves working with controlled vocabulary terms (subjects, authors, titles, etc.) from alphabetized lists of headings.

Getting Fancy�Combining Controlled Vocabulary and Keyword Searching

[See print edition]

Keeping on Track with Controlled Vocabularies

[See print edition]

  [NOTE: The print edition covers hierarchies in the this chapter.  Part of that material is reproduced in the online version, chapter 3. For practice exercises with a key, and for an assignment related to the material in this chapter, see the print edition.]



Last revised: July 13, 2012