example of how information is irretrievably lost during the compilation
process. A decompiler would have to employ some kind of heuristic to decide
whether to declare a variable for x * 4or simply duplicate that expression
wherever it is used.
It should be noted that this is more of a style and readability issue that
doesn’t really affect the meaning of the code. Still, in very large functions
that use highly complex expressions, it might make a significant impact on
the overall readability of the generated code.
Data Type Propagation
Another thing data-flow analysis is good for is data type propagation. Decom-
pilers receive type information from a variety of sources and type-analysis
techniques. Propagating that information throughout the program as much as
possible can do wonders to improve the readability of decompiled output.
Let’s take a powerful technique for extracting type information and demon-
strate how it can benefit from type propagation.
It is a well-known practice to gather data type information from library calls
and system calls [Guilfanov]. The idea is that if you can properly identify calls to
known functions such as system calls or runtime library calls, you can easily
propagate data types throughout the program and greatly improve its readabil-
ity. First let’s consider the simple case of external calls made to known system
functions such as KERNEL32!CreateFileA. Upon encountering such a call, a
decompiler can greatly benefit from the type information known about the call.
For example, for this particular API it is known that its return value is a file han-
dle and that the first parameter it receives is a pointer to an ASCII file name.
This information can be propagated within the current procedure to
improve its readability because you now know that the register or storage
location from which the first parameter is taken contains a pointer to a file
name string. Depending on where this value comes from, you can enhance the
program’s type information. If for instance the value comes from a parameter
passed to the current procedure, you now know the type of this parameter,
and so on.
In a similar way, the value returned from this function can be tracked and
correctly typed throughout this procedure and beyond. If the return value is
used by the caller of the current procedure, you now know that the procedure
also returns a file handle type.
This process is most effective when it is performed globally, on the entire
program. That’s because the decompiler can recursively propagate type infor-
mation throughout the program and thus significantly improve overall output
quality. Consider the call to CreateFileAfrom above. If you propagate all
type information deduced from this call to both callers and callees of the
current procedure, you wind up with quite a bit of additional type information
throughout the program.
Decompilation 471