Data Scrubbing: A Central Tenant of Settlement Administration
Every business runs on data. Our organization is data-driven, and it shows throughout every layer.
Whether it’s sales numbers, inventory, employee information, or financial records, it's central to all aspects of settlement administration.
Your success depends upon the quality of this data, so does ours. This legal segment absolutely depends on the power of data and a sophisticated robust set of procedures that enable large volumes of data to be cleansed, parsed,
semented, catalogued, refined, and reworked in order to function. These processes and workflows are collectively called ‘data scrubbing’.
Within this umbrella term, there are some specific sub-concepts, including de-duplication, consolidation, validation, enriching, formatting, extraction, standardization, normalization, and refactoring.
Kyle Heyne, Database Manager
Each process within a data scrub project is explained below.
Extraction
Often at the early phase of a project, extraction is the process of extracting specific datasets from corporate databases using SQL or other various database connector technologies. We do this as needed, as some situations call for more consultative approaches to data handling and formatting.
Cleansing
Cleaning data is typically removing erroneous and invalid characters, position errors, or column misalignments. It could extend into replacing values and changing values to match expected scenario data.
Consolidation
Consolidation is taking data from different sources and formats and streamlining it into a master file that is easy to read. This includes formatting data from large and often unstructured sets of information or improperly formatted data, into an organized legible arrangement of columns and rows.
Enrichment
Enrichment is adding data from an external source to enhance the quality of the data. One example of enrichment is to provide the location, contact information, historical records, and previous addresses of a list of class members.
Validation
Validation is checking the legitimacy of the data. Often we will run a list of social security numbers that requires formal validation. We can compare those social security numbers with the Social Security Administration database to ensure legitimacy.
Normalization
Normalizing data typically includes the sense of refactoring and organizing the information into preset and preconfigured buckets or categorical assignments. For example, normalization of vehicle data would force values into forming a series of data points such as Year, Make, Model, and Trim. These 4 data points are considered the fully normalized edition of a vehicle's data.
EXPERTISE, AT SCALE
A Technology Platform
For Settlement Administration