Formula Friday #5
Formula Fridays are short reads, spotlighting a single formula that you can start using straight away.
Today we are going to discuss how to find and/or remove duplicates from your data set. This is formula Friday so there will be formulas involved. But before we get into that there is a “Remove Duplicates” button in excel.
So if all you need is to remove duplicates from a column of data. Highlight that whole column, then press remove duplicates. A popup window will display how many duplicates were removed. And how many unique values remain. If that’s all you needed, class dismissed.
But if you need a little more info, stick around.
Expanding on the “Remove Duplicates” button option
Above we covered removing duplicates from a single column. However, you may need to remove duplicates from multiple columns. To do this highlight all the columns you want to include in your selection, then press remove duplicates.
A popup window will open. Make sure you tick the box which says “My data has headers”. Adjust the columns to look at, if needed, by checking or unchecking them in the checkbox on the left. Then click okay.
You will get a popup window as before telling you how many duplicates were removed and how many unique values remain.
It’s important to note, that in order to remove duplicates from multiple columns there will need to be a match between all the columns selected. This is a good option to use to clean data which might export with random blank rows.
Here comes the formula bit
We’re going to use COUNT functions to help us identify duplicates. Specifically COUNTIF.
Let’s say you’re checking through your commitment and you want to make sure all drops for your flow lines have been added. You’ll first want to add in a column at the beginning of your data set. And then add in the below formula into cell A2.
In this example our line numbers are in column G. And the data we specifically want to look at is from row 2 to row 397.
Now we can filter to the line number we want to look at and check if the drops have been raised as expected.
The formula we have used in this example shows us the total number of occurrences for the line numbers. In the above example we can see that the line number has 3 total occurrences in the data set.
But what if we only wanted to count the first instance of the data set, or input the sequential count. So that each drop of our flow line is labelled correctly.
To add in the sequential count
You need to make one small adjustment to the above formula
Instead of locking in your full lookup range, as we did with the previous formula. You want the lookup range to expand as it goes down. This way the formula counts each occurrence individually. Rather than counting the total number of occurrences in the whole data set.
This is what it looks like for our flow line.
Please note if you are going to use this method for checking your commitment drops. Make sure the data is set in chronological order first. Otherwise you could be counting drop one as drop three or something like that.
What if you only want to count the first occurrence
To count only the first occurrence of an option you will use the same foundation formula as for sequential count. Then add an IF statement around it. So it would look something like this:
This formula is saying if the count of the line number is greater than 1 (>1) replace the count with a zero. But if the count is not greater than 1 show what the count is – it will be 1.
This formula is saying if the count of the line number if greater than 1 (>1) replace the count with a zero. But if the count is not greater than 1, input 1.
Both of these formulas will need to have the data chronologically ordered from them to show you the true first option.
Formatting duplicate/unique values
Maybe you are setting up your allocations for the week and you want to quickly be able to see if you have any repeat or split drops coming in. In this instance you don’t really need to remove or exclude duplicates, just highlight they are there.
You can use conditional formatting for this.
Highlight the column you want to format. On the Home Ribbon, select “Conditional Formatting”. Then select “Highlight cells rules” and finally select “Duplicate Values”.
But what if you work on a department that has lots of split drops. Like the jeans or t-shirts department. The majority of your intake is likely to be repeats. Which means allocations could run off of automatic replenishment. What would be more beneficial to see in this instance are any unique values. That way you see which allocations need to be set up from scratch.
This time when you open the conditional formatting menu select “New Rule…”
A pop up window will open, in the “Select rule type” section select “Format only unique or duplicate values”. In the “Edit the Rule Description” section you’ll want to change the drop down from duplicate to unique.
And there you have it all your unique (one time occurrence) options will be highlighted.
How did I do?
I’m pretty sure that was under five minutes, but you’ll have to tell me. Duplicates can be so frustrating at times, but once you know how to manipulate the data to your advantage. Duplicates don’t seem so bad after all. Don’t forget if you have any burning Excel questions you can always drop me a line.
Until next time,