data-architecture-a
Fig. 4.4.12 In order to do analytical processing, text needs to be placed in a data base. It simply is no contest. Chapter 4.4: ...
Chapter 4.5 Contextualizing Repetitive Unstructured Data Abstract There are different definitions of big data. The definition us ...
Once the parsing takes place, the output can be placed in any one of many formats. One format the output data can be placed in i ...
Some of the possibilities for that recasting of the output data include placing the output data back into big data. Another poss ...
Chapter 4.6 Textual Disambiguation Abstract There are different definitions of big data. The definition used here is that big da ...
Fig. 4.6.1 Transformation of text into a standard database. Once raw text is transformed, it arrives in the analytic database in ...
Fig. 4.6.2 Tying the text to the database. Input Into Textual Disambiguation The input into textual disambiguation comes from ma ...
Fig. 4.6.3 Raw text, taxonomies and other parameters are input into textual ETL. Mapping In order to execute textual disambiguat ...
Fig. 4.6.4 Mapping. In almost every case, the mapping process is done in an iterative manner. The first mapping of a document is ...
Fig. 4.6.5 Iterative development. Input/Output The input to the process of textual disambiguation is electronic text. There are ...
Fig. 4.6.6 Input and output passing through textual ETL. The output from textual disambiguation is placed into a work table area ...
Fig. 4.6.7 A load utility. Document Fracturing/Named Value Processing There are many features to the actual processing done by t ...
disambiguation. Fig. 4.6.8 The two main processing components of textual ETL. Preprocessing a Document On occasion, it is necess ...
Fig. 4.6.9 Preprocessing text. E-mails—A Special Case E-mails are a special case of nonrepetitive unstructured data. E-mails are ...
Fig. 4.6.10 Filtering emails. Spreadsheets Another special case is the case of spreadsheets. Spreadsheets are ubiquitous. Someti ...
Fig. 4.6.11 Reformatting spreadsheet data. Report Decompilation Most textual information is found in the form of a document. And ...
nonlinear format. Fig. 4.6.13 An entirely different form of textual disambiguation. Therefore, an entirely different form of tex ...
Fig. 4.6.14 shows that reports can be sent to spreadsheet report decompilation for reduction to a normalized format. Fig. 4.6.14 ...
Chapter 4.7 Taxonomies Abstract There are different definitions of big data. The definition used here is that big data encompass ...
Fig. 4.7.1 Taxonomies—one of the keys to unlocking unstructured data. Chapter 4.7: Taxonomies ...
«
4
5
6
7
8
9
10
11
12
13
»
Free download pdf