10.2. Custom-Made Data Analysis Packages 599
inspect the entries in the datafile if needed. One can do this for a binary file as well
but only with the help of a computer code or a special program. For very large data
file, of the order of gigabytes or more, one should use binary format. The format of
such a file depends on several factors, such as user accessibility and data handling
routines. There are standard binary formats, such asZEBRA,forwhichextensive
read/write routines are available in different languages. Using such a standardized
format is advantageous since it simplifies the task of the developer as well as the
user.
A good thing about binary format is that its access is much more efficient and
faster than the ASCII text format. The efficiency of reading ASCII data files de-
creases with file size and therefore for moderate size datasets one should weigh the
pros and cons of saving the data in both formats before designing the package.
Coming back to the data analysis package, the developer should first understand
the format of the data file and devise efficient and faster means of accessing the
data. For example, extensive use ofrewindingASCII text files decreases the speed
and therefore one is generally better off reading all the entries, even if they will be
used in a later part in the code, and saving them in arrays.
10.2.BDataAnalysisRoutines
We have mentioned a few times before that it is always advantageous to use available
data analysis routines that are based on tested algorithms. In some cases, however,
one does not have an option but to resort to self design and development.
The prerequisite to writing data analysis routines is the algorithm. Algorithm is
the heart of any data analysis system. If it has not been properly developed, the
code can produce false results. The program in this case might not produce any
errors, and if the values are not far off from the expectations, the user would never
know that the results were actually wrong. Hence development and implementation
of algorithm requires extreme care and attention to details.
The first two parts of algorithm development process are to understand the task
and choose a method to handle it. For example, for a certain analysis package
one might be interested in determining the correlation of two datasets belonging to
two variables. Since there are different ways to determine the correlation therefore
one should determine which method would suit best in that particular situation. In
general there are different ways to solve a particular problem and the choice depends
on several factors including
the application,
the required accuracy of the results,
the available computing power, and
the amount of available processing time.
After deciding on the method to solve the problem, the step by step computational
tasks are defined in the so calledflow diagram. A flow diagram is simply a graphical
display of the computations that are to be performed. It helps the code developer
in dividing the program into smaller segments, which makes the program not only
more efficient but also helps to modify and debug it. Even though development