0% found this document useful (0 votes)
211 views

Microsoft Office Word 97-2003 Binary File Format

This document summarizes the Microsoft Office Word 97-2003 Binary File Format (.doc). It identifies the format as the default used for documents in Microsoft Word from 1997 through 2003. The format uses the .doc file extension and stores content in a compound file format structure, beginning with a CFB header. Required streams in a .doc file include the WordDocument stream containing a File Information Block and optional streams for metadata.

Uploaded by

gjgdfbngfnfgn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
211 views

Microsoft Office Word 97-2003 Binary File Format

This document summarizes the Microsoft Office Word 97-2003 Binary File Format (.doc). It identifies the format as the default used for documents in Microsoft Word from 1997 through 2003. The format uses the .doc file extension and stores content in a compound file format structure, beginning with a CFB header. Required streams in a .doc file include the WordDocument stream containing a File Information Block and optional streams for metadata.

Uploaded by

gjgdfbngfnfgn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

 

Digital Preservation Home | Digital Formats Home    

 Sustainability of Digital Formats:


Planning for Library of Congress Search this site
Go

Collections
Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

Microsoft Office Word 97-2003 Binary File Format (.doc)


>> Back
Table of Contents

Identification and description


Local use
Sustainability factors
Quality and functionality factors
File type signifiers
Notes
Format specifications
Useful references

Format Description Properties

ID: fdd000509
Short name: MS-DOC
Content categories: text, office/business
Format Category:
file-format
Other facets: unitary, binary, structured, symbolic
Last significant FDD update:
2019-12-18
Draft status: Full

Identification and description

Full name Microsoft Office Word 97-2003 Binary File Format (.doc).
Description The Microsoft Word Binary File format, with the .doc extension and
referred to here as DOC, was the default format used for documents
in Microsoft Word from Word 97 (released in 1997) through
Microsoft Office 2003. Although it cannot support all functionality of
the Word application introduced since Word 2007, the DOC format
has continued to be available as an alternative to the DOCX/OOXML
format, standardized in ISO/IEC 29500, for saving document files in
Word. As of late 2020, the documentation for File formats that are
supported in Word, from Microsoft, lists "Word 97-2003 Document."
[Note: In other contexts, the same format has been called "Word 97-
2004 Document" or "Word 97-2007 Document."]
According to the Wikipedia entry for Microsoft Word, the .doc
extension has been used for four distinct file formats:
(a) Word for
DOS; (b) Word for Windows 1 and 2 and Word 3 and 4 for Mac OS;
(c) Word 6 and Word 95 for Windows and Word 6 for Mac OS; (d)
Word 97 and later for Windows and Word 98 and later for Mac OS.
This format description is for the last of these formats. For
convenience, the term "DOC" will be used here to refer specifically
to this variant of the Microsoft Word files with .doc as extension.

Although the DOC format is proprietary, it has been covered by


Microsoft's Open Specification Promise since 2007. The specification
released in 2007 is available as Microsoft Office Word 97-2007
Binary File Format Specification [*.doc]. The structure for the DOC
format has been documented and kept up-to-date in [MS-DOC].

Since the release of Word 6.0, in 1993, the structure of a Word


document with the .doc extension has been an OLE (object linking
and embedding) Compound File Binary file as specified in [MS-
CFB].
In 1997, the detailed structure of the CFB file used for Word
documents was modified. The CFB format provides a file-system-like
structure within a file for the storage of arbitrary, application-specific
streams of data. It consists of storages, streams, and substreams. A
DOC file begins with a CFB header and must include a CFB root
directory (identified by the name "Root Entry" in UTF-16). The root
directory has entries for each stream or storage object at the top level
of the compound file hierarchy. Each object entry has a name (also
encoded in UTF-16, although most of the document content is usually
stored in 1-byte characters) and points to the location in the file for
the named object. Mandatory streams in a DOC file include a stream
with the name "WordDocument" (also referred to as the "main
stream") and a "table" stream with name "1Table" or "0Table". The
content of the WordDocument stream follows the CFB header and
begins with a File Information Block (Fib), which contains
information about the document, including a code identifying the
DOC file as a Word Document, and specifies the file pointers to
various portions that make up the document. Streams that are not
required by the specification, but are typically present in files written
by Microsoft Word, include a SummaryInformation stream (with
basic file-level metadata) and a DocumentSummaryInformation
stream. A Word file in the DOC format begins as follows, with all
values given as they occur in the physical file, for example when
viewed using a Hex dump utility:

CFB header (usually 512 bytes):


Header Signature for the CFB format with 8-byte Hex
value D0CF11E0A1B11AE1. Gary Kessler notes that the
beginning of this string looks like "DOCFILE"
16 bytes of zeroes
2-byte Hex value 3E00 indicating CFB minor version 3E
2-byte Hex value 0300 indicating CFB major version 3
or value 0400 indicating CFB major version 4. [Note: All
DOC files created by compilers of this resource (in
various versions of Word since 2003) and examined with

You might also like