This is a descriptive entry that details the behavior of RapidXREF. What it is, how it works. Keep in mind this function is ONLY available in RapidACE v2! RapidXREF (as described before) is a function that provides cross-reference information.
Overview of RapidXREF
This particular piece reads in: multiple source data models, 1 target data model, and a sparsely populated cross-reference spreadsheet (if you have it), and produces or builds a brand new fully-qualified cross-reference that maps every source table to every target table in a matter of seconds.
By sparsely populated, what I mean is: you can feed it a) no xref at all, and it will do a best guess, b) business keys to HUBS only and it will figure out the rest with 89% to 98% accuracy. This makes the job of building these dreaded cross-reference spread-sheets that much easier!
Yes, it reads xlsx formats, and writes xls formats directly.
In essence, here is an image that might help with the visualization of its functionality:
Required Inputs:
- Source Data Model
- Target Data Model
- Job Files / Configuration Files
- ** Optional Input: Sparsely defined Cross Reference
The source and target data model no longer require any sort of primary or foreign key declaration for this process. Nor do they have to be any particular data model type. They can be star schemas, staging areas, 3rd normal form-ish models, cobol copybooks, data vaults, and any other type of model you wish to import.
The cross-reference (if provided) gives the engine a guide to follow for some of the more difficult field lineage pieces that it might not otherwise guess. Remember, the cross reference or cross-walk is *required* by 98% of all data integration projects that exist. The cross-reference generally (usually) serves as documentation for the build out of the ETL code that is to follow.
The purpose of this function:
As stated previously, the purpose of this function is to generate the cross-reference document. At first glance you might think – well that’s easy, just match field to field by name, and you’ve got it.
Unfortunately it isn’t quite that simple in the real world. The new v2 engine takes many things in to account for matching including (but not limited to)
- Field names themselves
- abbreviations (short and long names)
- naming conventions (prefixes and suffixes)
- camel case, underscores, spaces, and dashes
- and a basic ontology of business terminology (including groupings of terms)
It uses an advanced algorithm that provides a lot more intelligence around the way it scores matches, and potential matches – to come up with matched tables (from source to target), along with matched fields. The algorithm goes way beyond just simple field name matching.
The algorithm is the secret sauce here, and the best way to understand it is to watch a demonstration of it, which I will post soon in video format.
What does it take to BUILD the XREF by hand?
Well, the typical process to build a cross-reference by hand for the first pass, can easily take a single person 2 to 3 months for 100 to 250 tables. Then, there’s bound to be errors that need removing, so add another 3 or 4 weeks to that for a “first-cut” release.
With RapidACE V2, it would take maybe 3 days to a week to build the up-front configuration files for import, and setup some basic terminology. Then it would take all of 10 minutes for the tool to generate the first pass output.
In other words, you can easily shave off months from your project just by using this kind of a tool.
Do I need to modify what’s produced?
Yes. RapidACE v2 is over-zealous, and sometimes matches things that “are matches” but are fields that you don’t want matched. For instance, suppose you have 3 fields in every source and every target like: “update user, update timestamp, update code” – and these are system fields. This function would match all 3 of these fields in all the source tables to all 3 of these fields in each of the target tables.
But there’s a configuration file for that – you can enter the “fields to ignore” during matching, and the issue is fixed.
However, what the tool produces is either 80% to 90% of what you want, or it’s 120% of what you want, and you need to augment it either way. Adjust it for simple business rules, delete the cross-mappings that you don’t want, and you can go on your way.
So what happens if we change the Data Models?
EASY! You push it back through! You can actually take the modified or generated cross-reference, and set it as the sparse input (even if it’s already fully qualified), and ask the engine to synchronize to the source model, or synchronize to the target model.
Again, saving you hours and hours of trying to capture the changes by hand, while retaining the modifications you’ve already put in to the cross-reference.
What are the benefits again?
Well, generally in this era of data warehousing and data integration, you probably want or need ETL code generation. Right?
In order to generate ETL code (or SQL code), you must have a source to target mapping done somewhere.
This function of the RapidACE v2 tool does this for you. Produces the source-to-target mapping in Excel! For you to modify by hand, manually manipulate, then push in to your code-generation system downstream.
The benefits include:
- Reduced errors in ETL code generation
- Reduced time-line for implementation of data integration projects
- Increased productivity of your EDW / Integration Team
- Generated Documentation that can be Manually adjusted (as necessary by business)
- Automated adjustment / synchronization of the cross-references when the data models change!!
Again, the purpose of this function is to assist with the ETL code generation, and to provide a huge time savings to your data integration team.
So how do I get the tool? work with the tool?
Simple, you hire me on a contract for a fixed price bid you can get a one-time generation or synchronization or both.
Contact me for more information. (use the contact us form here on the site)
