#datavault #RapidACE – v2 – About RapidXREF

Retweet

This is a descriptive entry that details the behavior of RapidXREF.  What it is, how it works.  Keep in mind this function is ONLY available in RapidACE v2!  RapidXREF (as described before) is a function that provides cross-reference information.

Overview of RapidXREF

This particular piece reads in: multiple source data models, 1 target data model, and a sparsely populated cross-reference spreadsheet (if you have it), and produces or builds a brand new fully-qualified cross-reference that maps every source table to every target table in a matter of seconds.

By sparsely populated, what I mean is: you can feed it a) no xref at all, and it will do a best guess, b) business keys to HUBS only and it will figure out the rest with 89% to 98% accuracy.  This makes the job of building these dreaded cross-reference spread-sheets that much easier!

Yes, it reads xlsx formats, and writes xls formats directly.

In essence, here is an image that might help with the visualization of its functionality:

Required Inputs:

  • Source Data Model
  • Target Data Model
  • Job Files / Configuration Files
  • ** Optional Input: Sparsely defined Cross Reference

The source and target data model no longer require any sort of primary or foreign key declaration for this process.  Nor do they have to be any particular data model type.  They can be star schemas, staging areas, 3rd normal form-ish models, cobol copybooks, data vaults, and any other type of model you wish to import.

The cross-reference (if provided) gives the engine a guide to follow for some of the more difficult field lineage pieces that it might not otherwise guess. Remember, the cross reference or cross-walk is *required* by 98% of all data integration projects that exist.  The cross-reference generally (usually) serves as documentation for the build out of the ETL code that is to follow.

The purpose of this function:

As stated previously, the purpose of this function is to generate the cross-reference document.  At first glance you might think – well that’s easy, just match field to field by name, and you’ve got it.

Unfortunately it isn’t quite that simple in the real world.  The new v2 engine takes many things in to account for matching including (but not limited to)

  • Field names themselves
  • abbreviations (short and long names)
  • naming conventions (prefixes and suffixes)
  • camel case, underscores, spaces, and dashes
  • and a basic ontology of business terminology (including groupings of terms)

It uses an advanced algorithm that provides a lot more intelligence around the way it scores matches, and potential matches – to come up with matched tables (from source to target), along with matched fields.  The algorithm goes way beyond just simple field name matching.

The algorithm is the secret sauce here, and the best way to understand it is to watch a demonstration of it, which I will post soon in video format.

What does it take to BUILD the XREF by hand?

Well, the typical process to build a cross-reference by hand for the first pass, can easily take a single person 2 to 3 months for 100 to 250 tables.  Then, there’s bound to be errors that need removing, so add another 3 or 4 weeks to that for a “first-cut” release.

With RapidACE V2, it would take maybe 3 days to a week to build the up-front configuration files for import, and setup some basic terminology.  Then it would take all of 10 minutes for the tool to generate the first pass output.

In other words, you can easily shave off months from your project just by using this kind of a tool.

Do I need to modify what’s produced?

Yes. RapidACE v2 is over-zealous, and sometimes matches things that “are matches” but are fields that you don’t want matched.  For instance, suppose you have 3 fields in every source and every target like: “update user, update timestamp, update code” – and these are system fields.  This function would match all 3 of these fields in all the source tables to all 3 of these fields in each of the target tables.

But there’s a configuration file for that – you can enter the “fields to ignore” during matching, and the issue is fixed.

However, what the tool produces is either 80% to 90% of what you want, or it’s 120% of what you want, and you need to augment it either way.  Adjust it for simple business rules, delete the cross-mappings that you don’t want, and you can go on your way.

So what happens if we change the Data Models?

EASY!  You push it back through!  You can actually take the modified or generated cross-reference, and set it as the sparse input (even if it’s already fully qualified), and ask the engine to synchronize to the source model, or synchronize to the target model.

Again, saving you hours and hours of trying to capture the changes by hand, while retaining the modifications you’ve already put in to the cross-reference.

What are the benefits again?

Well, generally in this era of data warehousing and data integration, you probably want or need ETL code generation. Right?

In order to generate ETL code (or SQL code), you must have a source to target mapping done somewhere.

This function of the RapidACE v2 tool does this for you.  Produces the source-to-target mapping in Excel!  For you to modify by hand, manually manipulate, then push in to your code-generation system downstream.

The benefits include:

  • Reduced errors in ETL code generation
  • Reduced time-line for implementation of data integration projects
  • Increased productivity of your EDW / Integration Team
  • Generated Documentation that can be Manually adjusted (as necessary by business)
  • Automated adjustment / synchronization of the cross-references when the data models change!!

Again, the purpose of this function is to assist with the ETL code generation, and to provide a huge time savings to your data integration team.

So how do I get the tool? work with the tool?

Simple, you hire me on a contract for a fixed price bid you can get a one-time generation or synchronization or both.

Contact me for more information. (use the contact us form here on the site)

#rapidace v2 now auto-builds cross-reference data

Retweet

Yes, I’ve managed to code a very sophisticated structural matching algorithm that uses ontologies for input, a source and target DDL set of structures, and even (if you already have it) a partial cross-reference document.

RapidACE v2 is very smart in this manner.  I think IBM has a tool they sell for $150k or more that “builds cross-reference documents”.  But RapidACE v2 goes a few steps further.  The matching algorithm is unique, special – and will use a number of different components to determine a matching score!

Yes, it goes way beyond v1 – RapidACE v1 stuck to primary and foreign key declarations, RapidACE v2 actually uses sophisticated text parsing algorithms, along with naming conventions, and defined ontologies as guides.  Of course, for the really tough problems where no possible match can be found – you can also pre-define a sparse XREF (source to target) for that specific match; and voila! RapidACE v2 will take that in to account.

It’s a very cool process, the end result is a matching score – you can change the output threshold up or down (out of 100%) to determine what gets produced to the Excel cross-reference.  That’s right, RapidACE v2 actually READS and WRITES directly to Excel spreadsheets.

The other cool thing is: once you have “generated” your cross-reference, you can edit it in Excel – tighten it up, change it, modify it, then push it back in to RapidACE v2 in order to generate accurate ETL code for SSIS, SQLServer, or Informatica v9.x

See you on the flip side,
Dan Linstedt

Code for v1 is uploaded to SVN

Retweet

Hello everyone, I’ve uploaded  the source code for V1 to: http://sourceforge.net/projects/rapidaceos/  It’s been setup in the SVN BROWSE section rather than the FILES list, this way I can update the code as necessary.  The BSD license is embedded in each JAVA file (except for those that I don’t own).  I’ve added the Wilmascope files to the build because of the fact I added a “freeze graph” command to stop the 3D graph from moving.  Other than that, the Wilmascope code remains unchanged.

Soon, I will have a downloadable copy available for you here on this web site (pre-compiled) with the QuickTable code extensions, but it will be made available for those that register (for free) with this site.

I used Eclipse HELIOS release with all the latest updates.  I’ve also installed Java3D 64 bit, and have a more than capable machine running win 7 64 bit, Java 64 bit, and of course compiling everything on 64 bit.

Cheers,

Dan Linstedt

RapidACE SourceForge Project

Retweet

The RapidACE SourceForge project is at: https://sourceforge.net/projects/rapidaceos/  I’ve not uploaded any files yet as I’m still working on incorporating the Open-Source license agreement to the code base.  Soon, very soon, I will upload Version 1.6 source (with a GUI), and v2 (command Line Only).  A short review of the versions is below:

Version 1.6 – Production Edition

This version is a working edition containing the following features:

  • GUI
  • Configuration Files
  • DDL Parsing
  • Merge & Consolidate
  • 2D and 3D visualization (* note: 3D requires a decent Graphics Processing Chip (GPU))  Also requires VRML processing capabilities
  • Forward Engineering: From 3NF to Stage, To Data Vault, To Star Schema, To Master Data Model, to Exploration Marts
  • Generation of Basic Cross-Reference (Source To Target Mapping)
  • Conversion of DDL from one database type to another.
  • IMPORT of Cobol Copybooks

Outstanding / known issues:

  • Problems with the DDL parser “hanging” with quotes, and complex MySQL DDL, especially comment blocks.  If you want the DDL to parse properly, try to clean it up first.  Remove all comment blocks, and remove all “quoted” field names, and the like.  For SQLServer DDL – remove “brackets and spaces” from within field and table names.  Remove all Stored Procedure and Function “code” from the DDL prior to importing.  And most importantly, BREAK ANY CYCLICAL FOREIGN KEY REFERENCES…  If Users table references Employees and Employees References Users table, then this is a cyclical foreign key.  Remove one of the FK declarations.  NOTE: change your DDL text file to ASCII.  This parser does NOT understand Unicode Text Files.
  • Code Generation Templates – missing, and/or too difficult to complete.  The cross-reference internal representation is a bit messed up.  This is why V1 code generation never really “got off the ground”
  • Forward Engineering Control – there are a lot of things the “3NF to Data Vault” forward generation engine does wrong.  Use the output only as a guide for future generation.  NOTE: if YOU clean up and merge your source DDL First, and provide basic Foreign Keys (across disparate data models) before reading them in to RapidACE v1, then you can end up with a fairly decent result.  Otherwise, forward engineering can sometimes provide you with “garbage”.
  • NOTE: the Merging and Consolidation engine are based on TABLE and FIELD NAMING CONVENTIONS, and a bit of matching logic – statistically driven.  SO, if your data models are disparate, and the naming conventions don’t match, then the models will NOT be integrated without you providing a foreign key designation (in the DDL only) to tie the models together.
  • There are about 125 known bugs and issues with this current version, too many to mention here.
  • There are some code pieces that I was playing with around RDBMS (SQLJDBC) access, but those pieces were never fully implemented.  V1 uses 100% in-RAM processing.  Therefore, it is recommended that you “RUN” RapidACE v1 with as much RAM as you can give it (heap & stack settings).  I’ve been successful running a single DDL construct with over 1000 source tables through the entire process in a 32 bit Java environment.  That means it’s worked for over 6000 tables (all in RAM).

Bottom Line?  V1.6 was good, but the Core needed to be re-written from the ground up.

JAR dependencies:

  • FreeMarker
  • FMPP (freeMarker Pre Processor)
  • resources.jar
  • rt.jar
  • jsse.jar
  • jce.jar
  • charsets.jar
  • dnsns.jar
  • dns_sd.jar
  • j3dcore.jar
  • j3dutils.jar
  • localdata.jar
  • sunjce_provider.jar
  • vecmath.jar
  • Jama-1.0.2.jar
  • concurrent-1.3.4.jar
  • jung-3d-2.0-beta1.jar
  • jung-algorithms-2.0-beta1.jar
  • jung-io-2.0-beta1.jar
  • jung-jai-2.0.beta1.jar
  • jung-graph-impl-2.0-beta1.jar
  • jung-visualtization-2.0-beta1.jar
  • resolver.jar
  • fmpp.jar
  • freemarker.jar
  • bsh.jar
  • oro.jar
  • j3d-org-java3d-all.jar
  • j3d-vrml97.jar
  • collections-generic-4.01.jar
  • commons-collections-3.2.jar
  • commons-configuration-1.4.jar
  • xml-apis.jar
  • colt-1.2.0jar
  • sqljdbc.jar
  • glutil.jar
  • commons-codec-1.3.jar
  • commons-httpclient-3.1.jar
  • commons-logging-1.1.1.jar

Some V1 Screen Shots:


V2 – Alpha Edition

This version is a COMMAND LINE driven engine, it is a complete rewrite of the entire core engine.  It started with the parser, and the code-generation system.  NO CODE FROM V1 is USED IN V2!!  Other than the “libraries” that are pulled in.

  • Command Line Driven
  • XML Configuration Files (literally over 150+ options to change / manage / override)
  • Informatica v9 ETL – XML code Generation, including workflows, sessions, worklets, mappings, etc..  Source to Stage, and Stage to Data Vault – these are FreeMarker Templates
  • Test Data Generation – both Informatica Mappings (that generate Test Data using the Java Object), and Raw Test Data Files themselves.  Again, these are FreeMarker Templates
  • DDL Parser – faster, 100% re-written, leaner, meaner, and no longer “hangs” on dirty DDL.
  • Forward Engineering: Currently, 3NF to Stage Model is the only available component.  I am working on 3NF to Data Vault engineering as we speak.

Note: there is a GUI, but it’s not been fully implemented nor built.   I would NOT suggest turning it on.  I will turn it on when I’m ready to do so.

** NEW FEATURES TO COME IN V2:  Model merging and consolidation WILL eventually incorporate Taxonomies.  Hopefully when I get to that point, I will integrate Ontology engines and RDF/OWL import / export.  I will also be using Taxonomies/Ontologies as a guide to the model merging and consolidation efforts.  This will be a crowning jewel of the feature sets when I get there.   – SUPPORT ME and WHO KNOWS…  If you support this project, I might just get more features done.

DO NOT mix V1 with V2 (settings, files, etc…) the ONLY things you can mix are the DDL files.  The Two engines are completely different builds and 100% different source code.

By the way.  If you are NOT familiar with FreeMarker Scripting Language, I suggest you study it.  It’s EXTREMELY POWERFUL, and yes, the templates I’m producing (for the most part) will be made available to the public domain.

If you’d like to hire my services, I will gladly create custom templates for you – these can either be contributed to the open-source project, or owned by you as Intellectual Property.  Contact me today for more information.

RapidACE base Generic Model

Retweet

RapidACE software uses MVC (loosely).   In that, I have a generic “data holder” for all data elements.  This data containter is called ITEM.  It is not inherited, nor abstracted.  All properties are assigned dynamically, and therefore the code abstraction “of what you can do with the data” is endless, without modification to the core set.

Furthermore, any “abstraction” happens in the controller layers – again, without Inheritance of the physical object.  This makes RapidACE core engine infinitely scalable, and easy to alter/update.  The core element or data holder is exposed here.

It’s called: “ITEM”.

[java]

/**
*
*/
package model.generic;

import java.util.ArrayList;
import java.util.List;
import java.util.HashMap;
import java.util.LinkedHashMap;
import java.util.Map;
import java.util.Set;

import freemarker.template.SimpleNumber;
import freemarker.template.SimpleCollection;
import freemarker.template.SimpleHash;

/**
* @author Dan Linstedt
* (C) Dan Linstedt, 2011-2012, All Rights Reserved
* Written for RapidACE
*
* Purpose: to hold ALL data elements, and to be a recursive container
* for additional data elements.  This way, an Item can represent a FIELD
* or a CONSTRAINT, or a TABLE with fields, or anything else I want it
* to be.  All “attributes” are programmatically added as objects,
* so they are NEVER private, but can be manipulated at run-time.
*
*/
public class Item extends LinkedHashMap<String,Object> {

/**
*
*/
private static final long serialVersionUID = 1597929376529169667L;

private Item parentMap;
private String XMLTag = “”;
private boolean vHasChildren = false;

// hold a COPY of array pointers to children objects
// this allows freemarker access direct through recursive hash
// and allows Java easy “function” reference to children
private LinkedHashMap<String,Item> private_children;

// hold a COPY of array pointers to PROPERTIES types objects
// for the same reasons as CHILDREN are private
//private LinkedHashMap<String,Object> private_properties;

// these are variables that are INTERNAL ONLY
// they are NOT added to the children, therefore are
// hidden from the view of FreeMarker
private LinkedHashMap<String,Item> internals;

/**
*
*/
public Item() {
internalInit();
}

/**
* @param initialCapacity
*/
public Item(int initialCapacity) {
super(initialCapacity);
internalInit();
}

/**
* @param m
*/
@SuppressWarnings({ “unchecked”, “rawtypes” })
public Item(Map m) {
super(m);
internalInit();
}

/**
* @param initialCapacity
* @param loadFactor
*/
public Item(int initialCapacity, float loadFactor) {
super(initialCapacity, loadFactor);
internalInit();
}

/**
* @param initialCapacity
* @param loadFactor
* @param accessOrder
*/
public Item(int initialCapacity, float loadFactor, boolean accessOrder) {
super(initialCapacity, loadFactor, accessOrder);
internalInit();
}

private void internalInit() {
private_children   = new LinkedHashMap<String,Item>();
//        private_properties = new LinkedHashMap<String,Object>();
internals          = new LinkedHashMap<String,Item>();
hasChildren(false);
}

public void hasChildren(boolean setchildren) {
vHasChildren = setchildren;
}

public void XMLType(String xType) {
XMLTag = xType;
}

public String XMLType() {
return(XMLTag);
}

public void name(String name) {
this.put(“name”, name);
}

public String name() {
return(this.getProperty(“name”));
}

public void destroy() {
//        destroyLinkedMap(private_properties);
destroyLinkedMap(private_children);

ArrayList<Object> vals = new ArrayList<Object>();
for(Object child:this.values())
vals.add(child);

for(Object child:vals) {
this.remove(child);
if (child instanceof Item)
((Item)child).destroy();
}
vals.clear();
//        private_properties.clear();
private_children.clear();    // clear the COPY of the objects
internals.clear();            // YOU are responsible for destroying other objects
this.clear();
}

/**
* DEEP COPY OF MYSELF TO ANOTHER OBJECT
* @return
*/
public Item copy() {
Item newCopy = new Item();
for(String prop:this.keySet()) {
Object val = this.get(prop);
if (val instanceof Item)
newCopy.put(prop, ((Item) val).copy());
else
newCopy.put(prop, val);
}

return(newCopy);
}

public void setCDATA(String cdata) {
this.put(“#CDATA”, cdata);
}

public boolean ignoreInLoop(String key) {
return(key.toUpperCase().matches(“XMLATTRS|#CDATA|HASDUPLICATENAMES”));
}

@SuppressWarnings({ “unchecked”, “rawtypes” })
private void destroyLinkedMap(LinkedHashMap myMap) {
ArrayList<String> keyList = new ArrayList<String>();
Set<String> props = myMap.keySet();
for(String key:props)
keyList.add(key);

for(String key:keyList) {
Object val = (Object)myMap.get(key);
myMap.remove(key);
if (val instanceof Item)
((Item)val).destroy();
}
}

public void put(String key,String value) {
//        private_properties.put(key, value);
super.put(key, value);
}

public void put(String key, Integer value) {
SimpleNumber myval = new SimpleNumber(value);
//        private_properties.put(key, myval);
super.put(key, myval);
}

public void put(String key, Float value) {
SimpleNumber myval = new SimpleNumber(value);
//        private_properties.put(key, myval);
super.put(key, myval);
}

public void put(String key, Double value) {
SimpleNumber myval = new SimpleNumber(value);
//        private_properties.put(key, myval);
super.put(key, myval);
}

public void put(String key, Long value) {
SimpleNumber myval = new SimpleNumber(value);
//        private_properties.put(key, myval);
super.put(key, myval);
}

public void put(String key, boolean value) {
Boolean myval = new Boolean(value);
//        private_properties.put(key, myval);
super.put(key, myval);
}

public void put(String key, List<Object> value) {
super.put(key, new SimpleCollection(value));
}

public void put(String key, HashMap<Object,Object> value) {
super.put(key, new SimpleHash(value));
}

public void put(String key, Item child) {
super.put(key, child);
private_children.put(key, child);
child.setParent(this);
hasChildren(true);
}

public void putDontChangeParent(String key, Item child) {
super.put(key, child);
private_children.put(key, child);
hasChildren(true);
}

public void setParent(Item parent) {
parentMap = parent;
}

public Item createChild(String key) {
if (this.childContainsKey(key))
return(this.getChild(key));
Item child = new Item();
this.put(key,child);
child.XMLType(key);
return(child);
}

public Item createChildMap(String childName) {
Item newItem = createChild(childName);
newItem.XMLType(childName);
return(newItem);
}

public String getAttribute(String key) {
Item child = this.getChild(“XMLAttrs”);
if (child==null) return(“”);
return(child.getProperty(key));
}

/* ————————————–
*
* These are routines used for interfacing with this
* hash array
*
*/

public void putProperty(String key, String value) {
this.put(key, value);
}

public String removeProperty(String key) {
return((String)this.remove(key));
}

public String getProperty(String key) {
Object itm = this.get(key);
if (itm instanceof String)
return((String)itm);
if (itm instanceof SimpleNumber)
return(((SimpleNumber)itm).toString());
return(“”);
}

public String getString(String key) {
return(getProperty(key));
}

public Integer getInteger(String key) {
Object itm = this.get(key);
try {
Integer newInt = Integer.parseInt((String)itm.toString());
return(newInt);
} catch (Exception e) {
return(0);
}
}

public Item getParent() {
return(parentMap);
}

public void putChild(String key, Item value) {
this.put(key, value);
}

public boolean childContainsKey(String key) {
return(private_children.containsKey(key));
}

public Item getChild(String key) {
return(private_children.get(key));
}

public Item removeChild(String key) {
Item itm = private_children.remove(key.toUpperCase());
if (private_children.size()==0)
this.hasChildren(false);
this.remove(key.toUpperCase());
return(itm);
}

public LinkedHashMap<String,Item> children() {
return(private_children);
}

public boolean hasProperties() {
if (this.size()>=1)
return(true);
return(false);
}

public boolean hasChildren() {
if (private_children.size()>=1)
return(true);
return(false);
}

public LinkedHashMap<String,Object> properties() {
return(this);
}

/**
* Drill into a path (dot notation) to get a CHILD item
* @param path – String (dot notation path)
* @return Item – child item
*/
public Item getPath(String path) {
if (path.equals(“”)) return(null);
Item match = this;
String[] keys = path.split(“\\.”);
if (match.containsKey(keys[0])) {
match = match.getChild(keys[0]);
int count = 1;
while(count < keys.length) {
String key = keys[count];
if (match.childContainsKey(key))
match = match.getChild(key);
if (match == null) return(null);
count++;
}
}
return(match);
}

public Item getInternal(String key) {
return(internals.get(key));
}

public void putInternal(String key, Item ptr) {
internals.put(key, ptr);
}

}
[/java]

What is RapidACE software (technically)?

Retweet

RapidACE software (that is going open source), is an enabling tool for working with DDL (Data definition language, like CREATE TABLE statements).  RapidACE performs many functions that we used to do by “hand” in the data warehousing world as an IT team.  It has the capacity to reduce 6 months of work (for 5 people) down to 4 weeks (and one or two people).  I’m NOT KIDDING!  It is an enterprise software component that I’ve been working on since 1997.  In this post, I will skip the history lesson, and instead discuss some of the features and libraries behind RapidACE.

Is open-source a part of RapidACE?

Yes.  There are LGPL libraries used within RapidACE – at least they were LGPL before RapidACE incorporated them…  LGPL allowed me to use the software embedded, without changes to the binary object, and resell it as a part of the RapidACE core.  Don’t mis-understand me….  Someone once made the comment: “Who would pay for a bunch of cobbled together LGPL open source programs?”  RapidACE has Intellectual Property that I have written over the 15 years it’s been in existence.  Code that MERGES, and CONSOLIDATES disparate data models, is my code.  Code that forward engineers Data Vaults, Staging Areas, Master Data Models, and Star Schemas are my code.

The libraries I list here serve specific enrichment purposes within RapidACE.

  1. JUNG Graphing library – handles the 2D and “visual” graph representation of the data model.  Why JUNG?
  2. Because JUNG is the basis for WilmaScope – which handles a 3D data model representation…  Have you EVER seen your data model in 3D before?  It’s really cool!
  3. FreeMarker – serves as the template engine, with the FreeMarker templates doing the code generation.
  4. FMPP – FreeMarker Pre Processor (wrapper), serves to allow file handling within the template language of FreeMarker
  5. Apache Libraries (ASF License) that are used by FreeMarker and FMPP (like ORO, Log4J, Resolver, DOM4J, POI, Geronimo-Stax-Api, XML-beans, bsh)

That’s it folks, there aren’t any other open source libraries embedded in RapidACE.

What is the purpose of RapidACE?

RapidACE software is designed to: read in data structures, merge & consolidate the structures, and then forward engineer NEW data models.  As a by-product it produces and manages cross-references (ie: source to target maps), also-as a by-product it produces or generates ETL code.  The graphical visualization components were added to the GUI for manipulation and data modeling purposes.

RapidACE is what you call a 100% architectural / structurally driven tool.  It does not use data nor data profiling at any time to accomplish it’s output.  It is also what you call: a 100% open-architecture tool.  Which means, that it “puts all of it’s generated results” out in to intermediate files for you to look at, manage, change, alter, and feed back in to the process along the way.  So, if you want to use your own data modeling tool, go right ahead.  If you want to “modify the cross-reference” in before code-generation, go ahead.  If you want to modify the data structures before forward engineering, go ahead.

Because this tool is architecturally focused and not data focused it can maintain consistent outcomes over time.  If it were data focused, it would have to change the model based on the data sets.

What are the Functions of RapidACE?

There are many different functions that you can get RapidACE to do.

  • Forward Engineer (auto-generate) a Staging Area Data Model
  • Forward Engineer (auto-generate) a Data Vault Data Model (single EDW model)
  • Forward Engineer (auto-generate) one or more Star Schema data models
  • Forward Engineer (auto-generate) a single master data model (starting point only)
  • Forward Engineer (auto-generate) a single Exploration Data Model (starting point only)
  • Generate ETL code
  • Merge disparate (separate) data models in to one (whether they are 3NF, or DV, or Star Schema – as long as they are all the same CLASS of data model)
  • Consolidate disparate (separate) data models in to one – this function applies a bit of “knowledge” for overlapping tables, super and subset structures, etc…
  • Change the “database & datatypes” of data models
  • Import and normalize Cobol Copybook structures to incorporate with other functions
  • Generate Test Data Sets
  • Generate ETL code that generates Test Data Sets
  • Generate Reports, HTML, XML
  • Validate structures
  • Visualize / graphically represent data models in 2D and 3D

These are the functions of RapidACE.  Granted, these functions are a combination of V1 and V2 – which V2 hopes to aspire to).

What are the INPUTS to RapidACE?

This truly depends on the workflow you are choosing, and where RapidACE fits in that workflow.  However, some of the generic inputs that RapidACE takes are as follows:

  • Source & Target DDL or Cobol Copybooks
  • Cross-Reference .XLS or .XLSX formats (v2 for Excel, v1 is text only)
  • FreeMarker Templates for code generation, test data generation, reports, etc…

Hope you enjoyed this definition of RapidACE.  As always, feel free to ask questions.

Thanks,
Dan L
PS: I offer SERVICES around using this tool to help you get off the ground running quickly, contact me today for more information.

DDL & Tree Sorting

Retweet

DDL is an interesting expressive language.  It has inherent problems when trying to walk the “tree of dependencies”, to discover parents or sort it in a dependency ordering.  Well, for this task I’ve found some really cool code on the web that allows us to express the table objects in a very simple graph, you know – nodes and vertexes – without all the trappings of visualization and complex coding that go with it.

The code that I found is beautiful really…  It is here:

http://stackoverflow.com/questions/2739392/sample-directed-graph-and-topological-sort-code

A topological sort of a directed graph, that actually allows cycles.  What are cycles you say?  Well, a cycle in this graph is when a table refers to another table (with a foreign key), and then, that table refers back to it.  So for instance:

USERS  references EMPLOYEES, and EMPLOYEES references USERS

When you have a cycle like this, it is hard to solve in to a dependency tree.  The question is: WHICH table is the true parent?  Well, in this case, understanding the true parent means adding weight to other tables that DEPEND ON or reference USERS or EMPLOYEES.  None-the-less, it’s hard to find actual code that can “sort” the DDL and ALLOW cycles.  Most code that deals with this mathematical problem, stops when it sees a cycle and doesn’t attempt to sort it any further.

Personally, I think this is a data modeling error, and should never happen – but, in the interest of “dealing with all possibilities”, the RapidACE code set must deal with this one as well.

Why sort tables in dependency order in the first place?

Well, I’ll tell you…  a dependency sorted list of table structures has MANY uses – for instance, when or if you generate test data – it is important to generate the data for the PARENT objects first, so that referential integrity can be managed when generating the child objects.  Another reason is generating new DDL files, you want a create table statement for the PARENT object FIRST, then the child object with it’s foreign keys.  That is, if the create FK statements are embedded inside the CREATE TABLE statements.   However the easiest way to solve this problem is to separate the Foreign Key creation in to their own list of ALTER statements.

What is a “GRAPH” of structural objects?

I’m not talking about “charts and graphs” that you would use in Excel.  I’m talking about a directed-dependency-view, where each TABLE is a NODE, and each FOREIGN KEY is a vertex.  This is a mathematical view of the DDL, and is absolutely necessary in order to do anything of interest with forward engineering.

Why is this important?

Because you can take DDL, or ANY structure (like a Cobol Copybook, or XML, or XSD) and represent it in 3 dimensional space, IF you apply graph theory.  If you can represent “data models” in 3 dimensional space then you can visualize them using “graphical libraries” that show directed graphs.  Furthermore, you can use the “smarts” of these libraries to merge graphs, find shortest paths, and even – assign weights and confidence ratings – so that you can then perform more work in the neural network space.  This is how RapidACE software get’s it’s 3D and 2D graphing abilities of data models…  It’s all mathematically based.