This is a Web version of an article published in Armenia in the Volume "Computers in Armenian Philology" Yerevan 1993. All transliterations were removed for the sake of simplicity. (The publication was made possible through the help of the Dutch Organization of the Sciences NWO) CD-ROM TECHNOLOGY Possible Applications in Armenian Philology by by R. H. Lola Koundakjian Written for Association Internationale des Etudes Armeniennes, The Netherlands, & Linguistics Institute - Academy of Sciences, Armenia "The world has arrived at an age of cheap complex devices of great ability; some thing is bound to come out of it. " Vannevar Bush "As we may think" the Atlantic Monthly, July 1945 Introduction Our era has been accused of "Information Overload" and "Drowning in Paper". In a world dedicated to reducing the threat to the environment and keeping up with knowledge, a relatively new technology is leading the way to, perhaps, reaching that goal. As we will attempt to demonstrate, CD-ROMs are ideal tools in Armenological research, especially philology, because the latter involves mostly text. CD-ROM has been called the New Papyrus. (1) The reasons may be because although CDs must be accessed via an electronic device and can be interactive with other previously introduced consumer products, the CD-ROM is more active than passive, such as in the case of television. CD-ROMs invite the human mind to search by association, use thoughts' trails and clues, to select numerically, alphabetically, in other words, to work. Several technological developments converged in the advent of CD-ROMs: the massmarket entertainment industry, the proliferation and standardization of Personal Computers (PCs) in the business and home environments, and the creation of the "information" and "paperless" society myths. (2) CD-ROM derives its low price & manufacturing availability from the consumer audio marketplace. CD Audio is "the most successful consumer electronic product in history", which has created a stable worldwide standard and manufacturing base. (3) These audio discs are manufactured in factories that cost $10 million to build and over $1 million to maintain annually. Thus, CD-ROMs are produced in these ready made plants, with the addition of computers. Since CD-ROMs had the fortune of being introduced when PCs were already competitively priced, they allowed an affordable entry-level system to serve as support for high-end applications. CD-ROMs: A Brief History The first CD Audios were introduced around 1981, while the first CD-ROMs were introduced in 1984; its first titles appeared in 1985. The standardization of PC CD-ROMs came about in 1 987 by the International Standards Organization (ISO), with the High Sierra or ISO 9660 format on 9-track tape, from which the production house presses the CD-ROM. (4) When CD-ROM technology was introduced, the first proponents were publishers who were awed by the immense storage capacity and PC compatibility. Originally seen as an alternative publishing media, it has now also other uses. (Figure 1) More than 2, 000 products and applications are available on CD-ROM and the number is doubling each year. (5) In the year 1986 the first general purpose software for this industry was introduced. Although generally disappointing, these softwares have improved due to the input of ideas from clients. Performance was slow in the very beginning, until the media was optimized to make the best use of optical discs. The early stages can also be generalized as being the "text only" era. After 1988, 386-based premastering systems were used to index and structure CD-ROMs with larger storage capabilities incorporating text, image and sound. The advances in the CD-ROM industry can be summed in this manner: - In 1989, the creation of a single CD-ROM application through a service bureau would have cost between $50, 000-$100, 000. Today any institution interested in electronic publishing may set-up a n in-house CD-ROM publishing system for much less, including both software and premastering hardware, permitting capacity to produce 50 CD-ROMs or more per year. - The cost of mastering and replicating the first 100 discs has fallen from $5, 000 to $1, 500, wit h incremental discs currently costing less than $1 each. - The capacity has increased to 660 MB from 550 MB per disc. - CD-ROM drive costs have fallen to about $600 from $1, 500. - The CIA and the U. S. State Department, two of the world's largest consumers of CD-ROM products, have imposed the creation of a standard which will involve CD-ROMs containing only data. The disk drive software itself would contain the operating system. (6) What are CD-ROMs and what is their Storage Capacity CD-ROM means Compact-Disc Read Only Memory. It is a storage media for text, digital data, images and stereo sound, in a 4. 72" (120 mm) disc format. Perhaps one of the more exciting breakthroughs in information storage, CD-ROMs were created by Philips and Sony in 1983, as an extension of the CD-Audio which they pioneered in the late 70's as a medium for digitally recorded sound. CD-ROMs store data in a series of 0s and 1s (binary), the latter represented by the transition from a pit to a land and vice versa, the length of a pit or land depending on the number of 0s. These pits are burned unto the Master by a laserbeam, the surface is physically sealed off from contact from the disk drive, then it is sandwiched between a grooved, transparent surface and a protective undercoating, permitting the disc to be handled, even roughly, without damaging it. A scanning laser beam, reflected or not from the silver pits, "reads" from the CD-ROM without contact nor wear; there is no loss and degradation involved. A single CD-ROM may contain the equivalent of: - 1 800 double density floppy disks(5 1/4"), - 74 minutes of audio, - 12, 000 images at 300 dots per inch (dpi), -50, 000 computer pages of data at 132 columns/page, -or- 250 Computer Output Micro(COM) fiches Technical industrial data, which notoriously change o n a constant basis, can be updated at minimal cost ($1. 00/disk). Since upgrades are done electronically, they take less time than in print publishing, (7) allowing clients and technicians to receive 650 Megabytes (MB) of digital data (text, graphics), the equivalent of 20, 000 pages of information with up-to-date material, which is instantly available through their PC. Creating CD-ROMs: The Software Two types of applications are needed in the creation of CD-ROMs: (a) a software to build and prepare an application t o input data on disc; (b) another for users to retrieve the information from the disc through their PC. Early CD-ROM publishers had to create their own softwares, but today many optical publishing specialists exist.(8) Hardware Requirements for In-House Electronic Publishing Premastering System is the hardware used for "authoring". A typical system will include a minimum of 386-based PC, for faster indexing, tape drives and magnetic disk with gigabytes of space (no exaggeration). The system may also contain a simulation software that emulates the speed of the CD to test the actual performance of the program. In 1987, 98% of the CD-ROM readers were installed on IBM PC compatibles, quickly dropping to 95% in 1988, with the introduction of Apple and DEC with their drive s. The Apple Macintosh is clearly the better choice for multimedia applications. The future combined efforts of Apple and IBM and their use of a new non-Intel chip may narrow the gap even more. (9) The basic difference between an audio CD and CD-ROM player is t hat of an error-correction chip instead of a digital to analog conversion chip. The mechanism supports the random access versus the linear nature of CD Audio. CD-ROMs have to conform to the Philips-Sony standard, thus are interchangeable. Regarding the design of the CD players, they were originally top-loading, then came the front-loading type, allowing them to be stacked between the monitor and the PC unit, followed by the disk-drive slot in the PC itself, with the CD-ROM protected in a caddy, instead of a sliding-door format as in the CD Audio. Current varieties include external, internal and portable models. Performance is evolving as well, calculated according to their performance in microseconds (ms). Recently multi-disc readers have been introduced, paralleling development in CD-Audio. These are called juke-box readers. Prices of CD-ROMs are higher than CD-Audio for the main reason of lower volume and sales, although in four years since the introduction of the drives, the prices have dropped to 50%. Summary of Cost s for CD-ROM Publishing The CD-ROM is absolutely the most inexpensive way to disseminate large volumes of information to personal computer users. Its long-term benefits include: - lower information distribution costs - saving information retrieval time - less duplication of efforts - improving learning rate - sharing of knowledge - increasing productivity by using information where, when and how it is needed. The approximate costs of CD-ROM publishing, as of 1991, involve: - 386-based system $12, 000 CD-ROM drive with interface $600 - Premastering system (600 MB -1. 2 GB and 9 track tape) $25, 000-30, 000 - Authoring System (off the shelf) $795 - 30, 000 - Licensing fee (depends on number of CDs to be pressed) - Retrieval Software (perpetual) $200 Media Transfer and 1st Master $1, 500-3, 000 Replication/disc $1. 00 This list does not include additional workstations needed to input the data, hiring the personnel, office space and supplies. However, the typical costs of such a project can be estimated at well under $500, 000 per year. Disadvantages of CD-ROMs - CD-ROMs are read-only therefore cannot be updated by the end user. - The mastering step costs a lot and takes three days. Initial keying can take months or years depending on the project. - Retrieving speed from CD-ROMs is slower than in the magnetic media. The industry is working to improve this. In any case, it is faster than the traditional methods of research. 10 - A CD-ROM reader must be added to PCs at additional cost. - In-house software development is costly in an environment which is constantly in evolution. - Customized software may be limited to an single application. 11 There are service bureaus that offer everything from keyboard entry to managing disc mastering, replication and packaging, but disallow control over the product and are very expensive. (12) The clear choice to be made depends on the number of CD-ROMs to be published, and on the update frequency. - Most CD-ROM softwares still handle only black and white graphics. Color is key for multimedia (i. e. mixture of text, structured data, graphics, sound and video), which is a new environment. Advantages of CD-ROMs CD-ROMs have many advantages over traditional publishing. - First and foremost is interactivity - any work with built-in electronic notebooks with word processing features including WordPerfect conversions. - It is paperless, in fact it bypasses the need to print on paper altogether. (13) - It is much easier to update - It is cheaper to duplicate - the real expense is the master, and first 100 copies. There are currently entire libraries' indexes on one disk; the British Library's index, for example, is on three discs. - Unlike fiche, CD-ROM data is electronically compatible and searchable by a PC, a significant advantage over fiche. There are fewer than 1 million microfilm readers compared to more than 30 million personal computers. One reason that optical disks are replacing microfiches for many applications. - CD-ROM search is superior in speed to paper, computer output microfiche, microfilm and on-line media, and the cross-reference is superior in access for researcher . One of the largest U. S. banks is using CD-Roms to store its daily transactions, reducing storage space and costs while improving customer help. Searches take minutes instead of days. Telephone directories, and titles at libraries are some of the few other environments where CD-ROMs have been used successfully. (Figure 2) Life Expectancy of CD-ROMs CD-ROMs are currently estimated of having a life-expectancy of a minimum of 10 years to 100 years, especially in some models made of very hard tempered glass layered with gold, instead of the standard aluminum. 14 CD Publishing There are several steps involved in CD publishing once the content has been decided. (Figure 3) Data preparation is the term used to describe the transfer of the information into machine readable form. Text is typically converted to ASCII, using manual typing, Optical Character Readers (OCR) - which is not 100% accurate, therefore requires quality control - video and audio data & digitized images. The next step involves file structuring, order and layout of data & disc geography to minimize seek time. Then comes the premastering stage, which includes the query methodology, indexing, organizing the display, print or other export capabilities. 15 Most of the CD-ROMs contain structured data, including information from computer databases, bibliographies, directories, catalogs and numerical data. It may contain text fields, and its indexing includes each and every field. The biggest task is in organizing the information as much as possible and in such a way that it will minimize access time. Queries may contain Boolean operators ("and", "or", "not"), "word" or "phrase" searches. Full text may emulate a field search but it will take much longer than an indexed field. Hypertext searches also permit "related" material to be found to enhance browsing capabilities. The testing stage simulates retrieval effectiveness, speed and proper layout. The data is then transferred to the 9-track tape. The two following stages take place a t the factory: mastering, i. e. creating the glass master, and disc making. Some CDs already available There have been ongoing projects at major U. S. universities for the digital conversion of ancient texts. Among them are: - The IRIS (Institute for Research in Information and Scholarship), which has been an ongoing project since 1983, with CD-ROMs available since 1985. - Brown and Harvard Universities. Other examples of areas where CD-ROMs have been used, ideas which could encompass Armenian some day: - Language software, such as Lingua Rom are similar to language cassettes: they teach a language, incorporating sound as well as text/images. Languages of the World "the most complete multilingual dictionary ever with seven million words, eighteen dictionaries, twelve languages" on one CD-ROM. (Harrap Publishing Group, U. K. ) - Oxford English Dictionary (OED-CD) currently at $895. 00 - Oxford Textbook of Medicine Electronic Edition - Random House bilingual dictionaries - Merriam Webster's 1989 Ninth New Collegiate Dictionary - Legal Codes of different countries - New Grolier Electronic Encyclopedia (9 million words) - British Library: General Catalogue of Printed Books to 1975 by Saztec Europe, Ltd. , in 3 CD-ROMs). (16) - Aeschylus, Aristophanes, Aristotle, St. Augustine, Confucius, Epictetus, Euripides, Hippocrates, Homer, Omar Khayyam, Lucretius, Plato, Religious Documents - Bible (King James), Egyptian Book of theDead, Bhagavad Gita, Buddha's Life and Teachings, Koran and Book of Mormon -Shakespeare, Sophocles, Virgil, plus many more on one CD. (World Library, Inc. ) Possible CD-ROMS in Armenian &/or Armenian Studies For commercial/education environments: Multi-media Hanragitaran, incorporating text, sound and images, cross-referenced for browsing in history, the sciences - Polyglot Dictionaries, with pronunciation of Armenian words; synonym and antonym browsing capability. - Language-ROM in Armenian, with Eastern and Western pronunciation. - - Music CD-ROMS: operas with score and libretto - History of the Diaspora, regularly updated. - Dialects - Customs of the provinces - Pictures of the Past: Armenia in the 1st century of Photography, ana nthropological study using historic pictures. FOR SCHOLARS: - Matenadaran catalog - Index of Armenian historic archeologicaldigs; details of research done in each of the digs, pictures of the area, maps and plans; all epigraphic text whether deciphered or not. - A multimedia database containing details about all Armenian dialects, including sound. Historic description of the provinces, costumes, customs etc - Armenian art works inmuseums around the world - Catalogue of Armenian Inscriptions, with detailedlists containing cross-references of site, subject matter, name of signee(s), with map of area and picture of the inscriptions. - Index of Armenian Manuscripts in various institutions and private collections; transferred from the print mediaand regularly updated. - Complete text of Primary Sources/ Manuscripts in Armenian and modern language translation(s), possibly to be scrolled in a double-window set-up; scanned versions of the original texts; close-ups of illuminations; cross-references by date, collection, subject matter - All history texts found in different languages and/or versions, for comparativeresearch, e. g. Agathangelos, History of Armenia. (17) - Transfer of Armenian architecture pictures from microfiche, with a database containing relevant descriptions - Transfer of the Armenian Database of Leiden (18) - Linguisticdatabases - Armenian manuscript colophons, with fields cross-referenced for historic information Electronic reprints of: - Historical and Armenological periodicals e. g. Biuzandion , Bazmavep - Dictionaries: Nor Bargirq Haygazean Lezvi Haykakan Anounneri Pararan - Historical volumes - Mayr Tsousakner Out of print books, fragile &/or large format manuscripts are perfect candidates for CD-ROM or optical pu blishing because they should be not handled or are out of reach. Projects involving artworks formerly photographed for print publishing, or works that are already two-dimensional, can be easily transferred to digital format via scanner. Note: A multi-volume CD-ROM entitled "Abstract of Dissertations" contains many dissertations about Armenia and the Armenians, which were deposited in major universities since the mid-19th century. On the horizon The future of ASCII Extensive work is being done to replace ASCII (7-bit code) with a 16-bit UNICODE which will include all known alphabets/symbols of the world, in the hopes of creating standards in international computing. The first 128 characters will be the same as ASCII. The project has been going on since 1987 at the Xerox Palo Alto Research Center and researchers at Apple. Unicode will include all "dead" languages and language fossils, which will be ideal for linguists. Some alphabets/symbols are not yet ready, e. g. Egyptian hieroglyphics, but will be eventually added. The first 8191 spaces in unicode representalphabets; 8192 to12, 287 represent symbols, e. g. punctuation, mathematics;12, 288-16, 383 are for the Chinese/ Korean/Japanese auxiliary alphabet;16, 384-59, 391 have been reserved for the unified Han characters for the samethree languages. The rest, 59, 392-65, 536 have been reserved for users toimplement, and compatibility areas for developers. The Armenian alphabet is represented in Unicode including all punctuation unique in Armenian. It was decided that Unicode would not include ligatures in any language as they would involve more space. Therefore ;u & ou are not represented. Electronic Scanning Most suitable for in-house archiving, i. e. replacing paperwith electronically scanned images and integrating databases with t he images. Theenvironment can be single-user or LAN (Local Area Network), using optical diskdrives and/or juke-boxes. The work involves processing archival papers which arescanned, then creating fields in a database to index data about each scannedimage ( e. g. names, dates). The information is stored in WORMs (Write Once - ReadMany) optical disks for immediate access. WORM disks are 5. 25" in diameter, with 920 MB of optical storage capability, storing up to 20, 000 pages. There are also 12", 6. 55 GB disks with 120, 000 pages capacity. The data can be loaded from ahost computer, 9-track tape, QIC PC cartridge, DOS file or IBM 3480 cartridge. (More on the WORMs later). On-line Connection to Libraries There are nationwide data networks being createdwhich will allow any PC user to connect to libraries such as the U. S. Library of Congress (estimated 80 million items). Such privileges were once only availablein some of the larger universities involved in a consortium, and sometimes only to research librarians. Today, an estimated 150 universities are connected to 40sources of information, ranging from Shakespeare to corporate documents. In thefuture, on-line connection agencies hope to include electronic newspapers andelectronic public libraries. Progress depends on tech nological improvements inthe areas of speed of networks, computing power and software. 19Erasable Optical Drives An optical erasable drive can store between 600 MB and 1Gigabyte (GB). (20) These are unlike CD-ROMs and WORMs, because they allow end usersto rea d and write data, as in a magnetic drive, with the advantage over thelatter of being removable (for storage) and much larger in capacity. Their datastorage is stable at an estimated maximum of 10 years. In large capacities, theerasable optical storage dri ves are cheaper than hard disks, but they are alsoslower. They are, however, faster than CD-ROMs. (Figure 4) WORMs Write Once-Read Many are perhaps the ultimate archiving media. WORM disksuses a magneto-optical technology to burn data into the surface of th e disc, andthe data can't be altered. The drive doesn't use a magnetic head such as in harddisk and floppy drives, therefore less wear and tear is involved. An alternativeform being researched is WORM in a tape format called DOT (digital optical tape), wit h 650 MB capacity per cartridge. Industry Standards The standards which are enjoyed in the CD-ROM environment are now available in Optical drives, which would mean that any cartridge written in adrive can be read by another. This is unlike the older WORM technology. Anindustry committee is currently working on a multiplatform standard, which couldjeopardize the ISO standard for optical cartridges. (21) Photo CDs Philips N. V. and Eastman Kodak will be offering a CD player by nextyear, which will allow photograp hs and slide files to be processed for display ona television screen via a compact disk. Picture Transmissions Some of the most innovative computer technologies comes tous from the Press. One such organization is the Associated Press (AP) who, in1970, was the first in the news media to use the video display terminal forediting and transmission of pictures; this was in the forefront of today'soptical publishing media, incorporating pictures and text. (22) AP's latestdevelopment is the Leaf Picture system, an e lectronic darkroom, which interfaceswith today's popular equipment and desk-top publishing, an area newspapers andmagazines are pursuing and investing into. The technology is based both on PCsand Macintoshes; it can receive photo transmissions, manipulate them & exportthem to a desktop publishing program, and scan pictures. It has a compressionprogram, which allows quick viewing of stored pictures. AP also has a laptoptransmissions model, a scanner for trans-parencies, and a video interface. (23) Conclusion In the hopes of approaching towards a Less Paper Society (thinking that a Paperless Society has so far been proven to be Utopian), CD-ROMs and otherelectronic media have been compared for their best and worst applications (Figure5 ). CD-ROMs are best for lar ge data storage, for topics that require no dailyupdate, with a wide range of subject matter that demand extensive research. Institutions are finding that CD-ROMS save them time in revision, production, andof course, warehouse storage costs. The CD-ROM works well in LAN environments, asit was conceived for networking. CD-ROMs are ideal for the Armenology milieu in all its fields of specialization, because of the extensive raw research material that has yet to be analyzed, forthe possibilities it offers in cr oss-discipline research, for the informationthat could be used by scholars were they available, and for the ease it allows inthe dissemination of such information. Endnotes: 1 Microsoft Press, 1986 2 Riszak 3 Dataware Technologies 4 op cit. 5 OpticalPublishing Ind ustry Assessment Volume II 6 This would allow, for example, aMacintosh user to run a PC CD-ROM emulating PC on a Mac 7 Optical publishing is aterm used for CD-ROMs; electronic publishing is used for distribution ofinformation in digital form over telecommu nications or broadcast system, whileprint publishing includes books, magazines and reports on paper or microfiche. 8 While the cost of internal operations may not be always justified, externalservice bureaus are expensive and the publisher may lose control over the work. 9 Time, July 1991 10 The spiral track that winds from the inside of the disc to theoutside is accessed by the drive's head, like in CD-Audio; the speed of therotation varies to ensure that data moves past the beam at a constant velocity 11 Licensing may be an alternative. There are more than twenty commercial CD-ROMsoftware products available today. 12 A new type of company has emerged, mixingthe service bureau environment while training company staff, thus reducing thecost of creating futur e CD-ROMs. The major disadvantage is the cost of theinternal software, hardware and support staff. 13 Paper, storage and postage cost savings are tremendous 14 Century Disc 15 Help screens, foreign language versionscan be part of the added features 16 At a cost of less than the printedcatalogue. The British Library has over 8. 5 Million volumes of which only 53% arein English; the conversion of the printed catalogues of titles took a team of 127people over 4 years; the catalogue was manually keyed in. "Each element of thecatalogued record (author, title, date and place of publication) has beendiscretely coded with MARC compatible field and keyed twice to assure accuracy. Greek, Cyrillic and Hebrew characters have been captured and can be displayed andprinted . " according to their catalogue. 17 Indexed fields may include century(approx. date) of colophon(s), manuscript number, description, artist, city ofcreation. Possible cross-reference may include information about the history ofthe time, king and catholicos , etc 18 The data is already in ASCII format. 19 New York Times, July 1991 20 Gigabyte =1, 000 MB) 21 MacUser November, 1990 22 AP beganLaserphoto transmissions, for laser-scanned pictures, starting from 1974. It inaugurated electronic darkrooms for digitized picture in 1979, transmissionsdirect to newspapers in 1980, graphics retrieval service that uses digitaltransmission in 1987, and high speed digital delivery of graphics and photos in 1988. 23 The system has a speed of 5 megabytes per second. Bibliography: Books Lambert, Steve and Suzanne Ropiequet, ed. CD-ROM: The New Papyrus Microsoft Press, 1986. Roszak, Theodore, The Cult of Information: The folklore of Computers and the True Art of Thinking. Panthon Book, New York, 1986. Saffady, Bill, Optical Storage Technology: A State of the Art Review, Meckler Corp. (Annual publication) Articles and Industrial Literature "Advance in CD's Starts a New Battle" , New York Times, June 19, 1991 p. D1 "AVERY Library: Celebrating the Glorious Past- Shaping the High-Tech Future "Columbia, Winter 1990 CD-ROM Librarian, Volume 6, Number 4, IV. 1991 "CD-ROM Power: Knowledge at Hand", PC Computing February 1990 "CD-ROM for the Common Man", New York Times, November 28, 1989 "CD: The Next Generation" by Ken. C. Pohlman, Stereo Review, July, 1991 Dataware Technologies , "Corporate Guide to Optical Publishing" "Erasable Optical Drives", MacUser November. 1990 "Gateway to Volumes", New York Times, October. 31, 1989 "Love at First Byte", Time, July 15, 1991 "Making CD-ROM Usable for Unix" , UnixWorld, July 19 91, pages 123- "Most Valuable Players", MacUser, March 1990 "Paperless Storage" UnixWorld, July 1991, pages. 97-) "Time to Make Way for CD-ROMs?" New York Times, June 18, 1991, p. C11