Data Quality and Comparison Components

Parse and analize data and addresses, connect to address verification services, detect differences and duplicates in datasets. The following are the mail components available within the SSIS Productivity Pack and their help manuals:

  • Address Parser
    • An SSIS transformation component that can be used to standardize and parse the input address data.
  • Address Verification Connection Manager
    • An SSIS Connection Manager used to establish a connection to an address verification service. This connection manager currently supports EasyPost and SmartyStreets verification services.
  • Address Verification
    • An SSIS transformation component used to verify address data from an input. The component verifies the accuracy of address data and provides corrections or adds missing data for addresses that are not accurate.
  • Data Profiler
    • An SSIS data flow component that can be used to analyze data and to compare rows from upstream data sources. Rows from any inputs will be passed through the component to corresponding outputs and when all the rows have been processed the component will output a single row to the "DataProfiler Output" with the results of data analysis.
  • Diff Detector
    • Enables the comparison of two sources; a primary and a secondary source. Rows from the inputs are matched using a business key (simple or compound key) and compared to each other to determine if the rows are unchanged, changed, deleted from the primary data source, or added to the secondary data source.
  • Duplicate Detector
    • Compares rows within a data source to identify duplicate rows based on an approximate (fuzzy) or exact match. The component creates two outputs: Unique Rows and Duplicate Rows. The Duplicate Rows output has 4 additional fields: Richness Score, Richness Rank, Similarity Score, and GroupID.

Video Resources

Youtube Video - Getting started with SSIS Productivity Pack - Data Quality and Comparison