User Question: How do I remove duplicates from a dataset?
Solution: For many tasks in Excel, you need to remove duplicates from a dataset. Suppose you want to find the unique list of customers in this range (Figure 1, below).
CAUTION: Remove Duplicates is a destructive function. It is best to make a copy of your data before you use the feature!
In Figure 2, below, a copy of the Customer column appears in column F. From the Data tab, choose Remove Duplicates. Click OK and Excel deletes any duplicated values. The remaining dataset is the unique list of customers.
Although the Remove Duplicates button is amazing, it is also potentially destructive. Perhaps you want to identify the duplicates so that you can decide how to combine information from the duplicates.
First, Select the range of values. On the Home tab, choose Conditional Formatting, Highlight Cells Rules, Duplicate Values. Excel will highlight the duplicates in red (Figure 3, below).
Marking the duplicates with conditional formatting is less destructive and gives you time to figure out how to combine duplicates.
NOTE: Excel marks all the duplicate values, which is not extremely useful. It would be preferable for Excel to leave the first occurrence of each duplicated value unhighlighted. That way, the highlighted cells could be deleted after review.
If you select Home, Conditional Formatting, Choose a Formula to Specify Which Cells to Highlight, then you could set up a formula like the one shown in Figure 4 (below) to mark only the true duplicate values.
Using the COUNTIF formula to check for values that occur more than once is better than the built-in conditional formatting for duplicates.
This article first appeared in the November 24, 2014, edition of the CFO LearningPro, Excel Edition, newsletter.
CFO contributor Bill Jelen. a.k.a. Mr. Excel, is an Excel MVP and the author of 35 books about Microsoft Excel.