ALL DIGITAL                 
  Home  Documents Images  Audio Video CAD

 

Payment

 

Contact Us Downloads ABOUT US
menu


DIGITAL WORLD

 Daily Useable files   Legal Files   Cashbook-Wage Calc Files       Address Book        Misc files

Word Star Text_txt  RTF   Chiwriter(Old)  Data Base Word Perfect  MS Word  Compressed PDF

 

 

DIGITAL TEXT & DOCUMENTS

A word processor (more formally known as document preparation system) is a computer application used for the production (including composition, editing, formatting, and possibly printing) of any sort of printable material. Word processor may also refer to an obsolete type of stand-alone office machine, popular in the 1970s and 80s, combining the keyboard text-entry and printing functions of an electric typewriter with a dedicated computer for the editing of text. Although features and design varied between manufacturers and models, with new features added as technology advanced, word processors for several years usually featured a monochrome display and the ability to save documents on memory cards or diskettes. Later models introduced innovations such as spell-checking programs, increased formatting options, and dot-matrix printing. As the more versatile combination of a personal computer and separate printer became commonplace, the word processor quickly disappeared.

Word processors are descended from early text formatting tools (sometimes called text justification tools, from their only real capability). Word processing was one of the earliest applications for the personal computer in office productivity.

Although early word processors used tag-based markup for document formatting, most modern word processors take advantage of a graphical user interface. Most are powerful systems consisting of one or more programs that can produce any arbitrary combination of images, graphics and text, the latter handled with type-setting capability.
Microsoft Word is the most widely used computer word processing system; Microsoft estimates over five hundred million people use the Office suite, which includes Word. There are also many other commercial word processing applications, such as WordPerfect, which dominated the market from the mid-1980s to early-1990s, particularly for machines running Microsoft's MS-DOS operating system
 
Preservation of word / text processing documents would depend on 
1. Preservation vs. access formats
2. Criteria for sustainability
3. Word processing formats
4. PDF
5. RTF
6. XML


Long t time storage of Paper-manuscript or documents are a major problem for digital repositories. These need to be converted into an archival format for preservation. One has to decide as to what file formats are suitable for long-term storage of word processed text documents & how one can convert documents into a suitable archival format.
Majority text documents created today are created in a word processor file formats generated by word processors. Most of the text we’re interested in archiving is in one of the various Microsoft Word formats. Smallest & easiest format is in Wordstar or notepad RTF.
Since word processing formats are not suitable for preservation, many archives seem to have chosen PDF, but this has serious problems. XML is a better answer, but it’s not a complete answer. XML is not a file format, but a meta-format, a framework for creating file formats. We have to choose a suitable XML file format for storing documents.
There are various methods available for converting word processing documents into a suitable XML format. There is a lot of published research on digital preservation, but not much of it that deals in any detail with preservation of text.

First we need to make an important distinction between preservation formats and access or viewing formats.

1. Preservation vs. access formats

A preservation format is one suitable for storing a document in an electronic archive for a long period. An access format is one suitable for viewing a document or doing something with it.

The vast majority of all text documents created today are created in Microsoft Word using its native .doc format (in one of its many variations depending on the version of Word being used). It would be great if we could just deposit Microsoft Word documents into repositories and be done with it, but unfortunately that won’t do, for a few good reasons:  Word format is proprietary. It is owned by Microsoft corporation. Even the recent Microsoft Word XML-based formats suffer from this. So why are proprietary formats a bad thing where the  owner could choose to change the format at any time, possibly forcing repositories to convert all their documents or  could change the licensing at any time, perhaps insisting that documents may only be opened using their software, or that users pay a fee for reading or editing existing documents.  Except for the recent XML-based versions, Word is a binary format. There is no obvious way to extract the content from a Word document. If the document is corrupted even a little, the content can be lost. Even the most recent version, Microsoft Open XML format, is a compressed Zip archive of XML files. Compressed files are particularly prone to major loss if corrupted. Word is not just one format but many.
Even the new XML-based format has some technical problems. Microsoft has released their latest XML-based file format, known as Open XML, publicly, along with assurances that it is and will always be free.
Open Document Format grew out of OpenOffice.org’s earlier Open Office XML format. It is now an OASIS and ISO standard and a European Commission recommendation. It is supported by the open source word processors KOffice and AbiWord, with more to come.An ODF file is a Zip archive containing several XML files, plus images and other objects. The Zip archiving and compression tool is freely available on all major platforms, so there should never be a problem getting at the content of an ODF document. Using a Zip archive does mean that the files are prone to catastrophic loss of content with even minor data corruption, in the same way as the Microsoft Word formats discussed above.

Word processing formats are at heart about describing the appearance of the document, not its structure. For serious processing it’s the structure we want. In 20, 50 or 100 years, most readers will probably not care about the size of the paper, the margins, the fonts used and so on. Even today, if we’re going to serve up a document as a web page, those details are irrelevant. Sometimes these details can even be a disadvantage, for example if the document insists on fonts that are unavailable on your computer. On the other hand, the division of the document into sections will always be relevant, useful and important, and must be preserved.
There are several, but none of them has much market share, nor do any of them have any particularly conspicuous advantages. Probably the best strategy with these is to convert them into Word or Open Document Format, then treat them in the same way as the majority of documents. OpenOffice.org will open many file formats, so it can be used as a generic first stage in any process of converting documents into useful formats. Use OpenOffice.org in server mode to open all documents and save them in Open Document Format, then process them into something better.
Many repositories seem to have adopted PDF as their main format for text documents, both for storage and for access. PDF has some good points: It is easy to create, either using Adobe Acrobat software or using the PDF Export feature available in both Microsoft Word and OpenOffice.org Writer. It can be viewed on all platforms using the free Adobe Acrobat Reader software (with some caveats, see below).
It is extremely effective at preserving the formatting of a document. For some applications (for example in legal contexts) this may be of vital importance.

However, there are some serious problems with using PDF as a storage format:

*

The format is owned by Adobe. While it is currently open, the company could decide to keep future versions secret,  There are some compatibility problems between different versions. Documents may rely on system fonts. There is an option in PDF to embed all fonts in the document, but not all software uses this, and some PDF viewing software either cannot locate the correct fonts or doesn’t know how to substitute suitable alternatives. Failing to embed all fonts can result in a serious degradation of the on-screen appearance of a document, or in a complete failure to display the content.

PDF includes extra features like encryption, compression, digital rights management and embedding of objects from other software packages. These all present difficulties, particularly the last.

PDF is an excellent access format for printing to paper. Any good preservation system should be able to generate PDF renditions of documents for this purpose. PDF is not so good for viewing on screen, as it ties document content to a fixed page size. This means that for large page sizes or small screens (e.g. on handheld devices like PDAs or mobile phones) text will either be too small to read or the user will have to scroll back and forth along the lines, which is highly inconvenient. Looking ahead, who knows what viewing formats we will use. We need to be able to reformat content to fit the viewing device.


 RTF ( Rich Text Format)

RTF stands for Rich Text Format. It is a Microsoft specification[17], but they have published it, so one could argue that it is an open standard. It is certainly widely interoperable, with most word processors capable of reading and writing RTF. There are problems with using RTF as a preservation format:

It is still defined by a corporation, with all the risks that entails.
There seem to be parts of the specification that are not in the publicly available specification document, and which have changed over the years.
The specification is not complete and precise, leaving many little quirks.
The National Library of Australia has chosen RTF as its main preservation format[5]. I think a well-chosen XML file format has significant advantages over RTF, but it might well be worth retaining RTF as an access format, since it has good interoperability.
 

 XML

XML  is widely accepted as a desirable format for document preservation. See for example the assessment of XML on the US Library of Congress digital formats web site and the related conference paper by Arms & Fleischauer. The reasons are simple: ,XML is a free, open standard. XML uses standard character encodings, including full support for Unicode. This makes it capable of describing almost anything in any language.
XML is based on plain text. This gives it the best possible chance of being readable far into the future. Even if XML and XSLT are no longer available, the raw document content and markup will still be human-readable. (This will be true even if the meaning of the markup has been lost, although formats designed with preservation in mind should make the meaning more or less apparent from the carefully chosen element and attribute names).

 TEI (
Text Encoding Initiative)

TEI stands for the Text Encoding Initiative. Its guidelines are aimed mostly at the preservation of literary and linguistic texts (so a very different slant to DocBook). Like DocBook, TEI is huge. Furthermore, it’s not exactly a format, but a set of guidelines for building more specialized formats. One such is TEI-Lite, which has proved very popular, and is used by several serious repositories.

TEI may be better-matched than DocBook to some scholarly work, particularly in the humanities. It does have some serious shortcomings however:

Authors create documents in a word processor (either Word or Writer), using a generic template. They must use styles, and only the special styles in the template, not the standard built-in styles. The key to effective web publishing like this is to have a fast feedback loop. Instead of authors sending their work to a web publisher and getting the result back weeks later, they save their document and click "Refresh" in their browser to see the results. If they have done something wrong, they see it straight away.

 

 


 

   romadigital-lab.in            Last Updated on 09th-May-08   

Copyright@2008 by romadigital-lab.in  All rights reserved



Basic Text files  ANSI     Typical  MS Word Files

Excel Typical Calculation Sheet

Photo Files Jpeg,Tiff,BMP ect

Graphic files, Photoshop,PNG

Corel Draw,PSF

Animated Files  GIF

AutoCad files

 

I

On purchase  the Digital file

will be sent by e.mail


 


 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 Digital  glossary | Free Picture Lessons | Free Audio recording lesson Terms and Conditions | Disclaimer

This website is maintained by Roma Digital Lab. * All trademarks/registered trademarks are properties of respective owners.