Excel 2019 Bible

(singke) #1

Chapter 25: Importing and Cleaning Data


2525


Duplicate values are determined by the value displayed in the cell—not necessarily the value stored in the cell. For
example, assume that two cells contain the same date. One of the dates is formatted to display as 5/15/2016, and
the other is formatted to display as May 15, 2016. When removing duplicates, Excel considers these dates to be dif-
ferent. Similarly, values that are formatted differently are considered to be different. For example, $1,209.32 is not
the same as 1209.32. Therefore, you might want to apply formatting to entire columns to ensure that duplicate rows
are not overlooked just because of a formatting difference.


Identifying duplicate rows
If you would like to identify duplicate rows so that you can examine them without auto-
matically deleting them, here’s another method. Unlike the technique described in the pre-
vious section, this method looks at actual values, not formatted values.

Create a formula to the right of your data that concatenates each of the cells to the left.
The formulas that follow assume that the data is in columns A:F.

Enter this formula into cell G2:

=CONCAT(A2:F2)

Add another formula in cell H2. This formula displays the number of times a value in col-
umn G occurs:
=COUNTIF(G:G,G2)

Copy these formulas down the column for each row of your data.

Column H displays the number of occurrences of that row. Unduplicated rows will display
1. Duplicated rows will display a number that corresponds to the number of times that row
appears.

Figure 25.10 shows a simple example. If you don’t care about a particular column, just omit
it from the formula in column G. For example, if you want to find duplicates regardless of
the Status column, change the formula in G2 to the following:
=CONCAT(A2:C2,E2:F2)

Splitting text
When importing data, you might find that multiple values are imported into a single col-
umn. Figure 25.11 shows an example of this type of import problem.

The data in Figure 25.11 is displayed in a fixed-width font (Courier New). With the default font, it was not apparent
that the data lined up nicely in fixed-width columns.

Free download pdf