Fig. 10.1.17 Word stemming.
In order to see how textual processes word stems, consider the following raw text:
...she walked her dog to the park....
The resulting database entry would look like the following:
Document name, byte, stem—walk, value—walked
Document Metadata
On occasion, it is useful to create an index of the documents that are being managed by
the organization. The index can be created where there is only the index or the index can
be created in conjunction with all the other features available in textual ETL. There are
business justifications for both types of design.
Typical contents for a document index include such data as follows:
Date document created
Date document last accessed
Date document last updated
Document created by
Document length
Document title or name
Chapter 10.1: Nonrepetitive Data