|
Written by Administrator
|
|
Wednesday, 08 March 2006 |
|
As referenced by wikipedia,
In database maintenance, deduplication, which is sometimes reffered to as referrential integrity and various other names, refers to the database maintenance task of removing duplicate data from within its databases. I.e. similar rows featuring, "J.Smith" and "John Smith" may well refer to the same conceptual individual and the rows within the database may need to be merged. This is often achieved with the merge/purge algorithm of Felligi and Sunters.
Deduplication is mostly used in comparing email records within custodians, or against custodians, to take out duplicate records to ensure a cost savings of extensive nature. Not only can email files (psts and lotus notes) be deduplicated from within, but stand alone edoc files as well. Most often, electronic evidence vendors will associate a hash value for any given file and then compare that hash value against the other files in a database.
This method of reducing the responsive data before any amount of imaging is started is a means to bring down the costs inherit inside this activity.
|
|
Last Updated ( Wednesday, 08 March 2006 )
|