The correct capitalisation programme and service from WRC Solutions

This application converts text looking like this:

-NUMBER-
100002
-INPUT_DATE-
8301
-SUBJECT-
CARDIOVASCULAR
-TITLE-
EFFECTS OF CALCIUM ENTRY ANTAGONISTS IN HYPERTENSION
-AUTHOR-
KREBS H, GRAEFE K H, ZIEGLER R
-REFERENCE-
CLIN EXP HYPERTENS, 1982, 4:271-284
-DOCUMENT_TYPE-
REVIEW
-KEYWORDS-
CALCIUM ANTAGONISTS, HYPERTENSION, MODE OF ACTION, VERAPAMIL, NIFEDIPINE
-END-

into text like this:

-NUMBER-
100002
-INPUT_DATE-
8301
-SUBJECT-
Cardiovascular
-TITLE-
Effects of calcium entry antagonists in hypertension
-AUTHOR-
Krebs H, Graefe K H, Ziegler R
-REFERENCE-
CLIN EXP HYPERTENS, 1982, 4:271-284
-DOCUMENT_TYPE-
Review
-KEYWORDS-
Calcium antagonists, hypertension, mode of action, Verapamil, Nifedipine
-END-

If you wonder why anyone has text in all upper case then you haven't worked in libraries. When they first computerised (around 1983 for this example) it was often not practical to use lower case letters.

If you wonder why a program is needed - think of 20,000 records like the one above. Think also that many of those records will have 100 or 200 words of abstract included as well.

Why bother?

Readability is significantly improved by having words in the correct case.

Comprehension is improved when proper nouns and trade names are correctly capitalised.

Professionalism is enhanced when printed documents have correctly capitalised text and trade names are correctly identified.

Why not use Word?

Of course you can select the text and do a Shift+F3 but that will still leave an awful lot of manual work.

You could use the spelling checker but that will still be very labour intensive.

In either case, when you have finished one file, almost nothing is of value to carry forward to the next one.

How does it work?

The application was designed to deal with arbitrary Personal Librarian source files and to give them the correct capitalisation of terms. The examples above show a typical Personal Librarian layout. [Personal Librarian was a major full text windows based text retrieval application of the late 80's. It is no longer commercially available - the company was taken over by America Online and the text retrieval engine is what is used on AOL. It can of course be extended to cover other file types and layouts.]

Correct capitalisation means that the first term in a Personal Librarian field will have an initial capital as will any term following a full stop, question mark or exclamation mark. A field is marked by –FIELDNAME–. Other terms (proper nouns, abbreviations, tradenames) will be capitalised only if they are mentioned in the capitalisation dictionary or if they are in any of the specialised fields such as Author or Reference.

In the example above Verapamil and Nifedipine, two terms in the KEYWORDS field are names of proprietary medicines and were in the dictionary.

An example of such a dictionary:

ACTH
Africans
ANBPS
April
Arzneim
Arzneim-Forsch
Boston
BP
British
CAS
Nifedipine
P-Hydroxytriamterene
Verapamil

To assist in preparing the capitalisation dictionary two programs are provided - one to extract a list of new unique terms from a file and another to merge two such lists.

With the three applications and their multiplicity of command line switches, some batch files and some diligence in properly formatting the terms you do want to be capitalised, a large source file can quickly become much more usable by giving most terms the correct case.

Make use of this expertise yourself by acquiring the application or ask us to do it for you as a service.

Top
 
Last updated 10/01/2007 Copyright © 2001-2007 WRC Solutions