Duplicate contacts can turn up in your data for many reasons, such as mistakes by users who don't realise they're creating a contact for someone who is already in CiviCRM, duplicates that aren't caught in the import process, and duplicate records created when people fill in forms about themselves on your site without realising they're already in your list of contacts (maybe with their names spelled differently or with a different email address).
Some examples of scenario's where duplicate contacts can enter your database:
CiviCRM is equipped with several features for dealing with duplicate contacts. Some attempt to avoid the duplicate contact from being created, others help you to search, identify and "merge" duplicate contacts found in your database.
These features are split between those that act "automatically" (known as "unsupervised"), and those that act manually (known as "supervised").
These features include:
Dedupe rules are a way of specifying to these features whether CiviCRM should consider contacts to be duplicated. For example a rule could state that when the email address and first name between two contacts match CiviCRM should consider these contacts to be duplicates.
Using the default rules to find and merge duplicate contacts
If you do not want to configure the dedupe rules at this stage you can simply use the default rules to find duplicates from your database.
Firstly view the dedupe rules. Go to Contacts > Find and Merge Duplicate Contacts in the navigation menu. This displays the following screen:
From the screen, here's an example of a process to dedupe all individuals in your data:
Different rules are configured for each contact type (individuals, organizations, and households.) A default supervised rule and a default unsupervised rule is set for each contact type. The default rules are used when CiviCRM invokes automatic checking, in ways we'll explain in detail shortly.
CiviCRM now includes three categories of dedupe rules:
Unsupervised: The 'Unsupervised' rule for each contact type is automatically used when new contacts are created through online registrations including Events, Membership, Contributions and Profile pages. They are also selected by default when you Import contacts. They are generally configured with a narrow definition of what constitutes a duplicate so as to avoid a false match being merged accidentally.
Supervised: The 'Supervised' rule for each contact type is automatically used to check for possible duplicates when contacts are added or edited via the user interface. The UI will alert the user if a contact they are creating matches another contact using the rule. The user can then decide whether to edit the existing contact or to continue to create a new contact. Supervised Rules should be configured with a broader definition of what constitutes a duplicate as the user can decide whether to act on the rule or not.
General: You can only configure one 'Unsupervised' and one 'Supervised' rule for each contact type, but you can configure any number of additional 'General' rules to provide other criteria to scan for possible duplicates.
To determine whether two contacts are duplicates, CiviCRM checks up to five fields that you can specify. You can also set a length value which determines how many characters in the field should be compared. For example, if you set a length of 2 on the First Name field, a first name of "Mike" would match "Michael" and they would be recognized as duplicates, because the first 2 characters are the same. However, if you set the length to 3 instead, "Mike" would no longer match "Michael" and they would be accepted as different contacts. If the length value is left blank, the comparison is done on the entire field value.
Each field is also configured with a numeric weight that determines the relative importance of a match on that field. When a match is discovered on a field, that field's weight is added to the total weight for the rule. After each field is checked, if the total weight is equal to or greater than the numerical threshold set for the rule, the contacts being compared are flagged as suspected duplicates.
If you notice duplicate contacts within a set of search results you can quickly merge them directly from the search results instead of using the separate Find and Merge Duplicate Contacts process. This is a great way to clean up your database during your everyday workflow with minimal disruption.
Once a dedupe rule has been used, the option to "Batch Merge Duplicates" will be available beneath the list of duplicates discovered. This feature will merge all contacts under the given rule together, provided their are no data conflicts. For instance, two individuals named "Michael Blake" may have been matched based on identical first and last name, with neither having an email address on record. If the data held on both contacts is either exactly the same, or one contact contains information the other does not (e.g. a work phone number, where the other has a mobile), the two will be merged. However, if both contacts have different home telephone numbers, the records will be skipped; the two contacts will not be merged.
Once a batch merge has been completed, you will be returned to the original list. If any of the records were skipped due to a data conflict like the example above, the message shown below will be displayed. To view an updated list of duplicate contacts (those that were not merged by the duplicate process) you must click Refresh Duplicates; the page will not refresh automatically, just in case your database is very large, and searching for duplicates would cause a significant delay. You may then continue to assess and merge the remaining duplicates manually.
WARNING: before you begin to consider using batch dedupe, please take note of the following: