Don’t Copy. Obfuscate.

Share on facebook
Share on twitter
Share on linkedin
Share on email

How Do You Generate Test Data?

 If you are an experienced application developer or tester, chances are at some point in your career – probably fairly early on – you came to the realization that the “best” test data you can find resides in your production database.  Probably not soon after that, your friends on the InfoSec team made it clear that you were not free to just load that data into your less secure test environments.  So then what?  The common options at that point were 1.) try to generate the required data; 2.) make a copy and try to de-identify or obscure the data in some rudimentary fashion or 3.) get a waiver from the InfoSec team that allowed you to just copy the complete production database. 

None of those solutions are particularity good options.  In even moderately complex environment, making a copy has many drawbacks and risks.  If you go down the path of trying to create or manufacture the data, you will have a challenge in replicating the naturally occurring complexity found in production data.  As the size and complexity of the database grows, you must scale the team needed for this effort.  If, on the other hand, you try to de-identify the copy of production you effort is no less daunting.  In all but the most simple environments you will face the challenge of ensuring relation integrity between databases.  If you make changes to de-identify a customer record, how do you ensure that you make the same changes everywhere that customer’s information is found?  The answer is with a set of complex scripts.  It gets more complex if you try to subset that data to avoid needing all the production data.  Which bring us back to getting that waiver.  In the reality of HIPAA, GDPR, etc. this is just not a really viable option.   

 Even if you figure this out, it is safe to say that your organization is wasting time and resources that could be better spent in other ways.  Without an alternative to the 3 options above, its hard to imagine that all the work to keep up with the demands for good test data are not resulting in bottle-necks with teams wait for data to be created or cleansed or loaded (and reloaded). 

There is a better way - it's called Obfuscation!

This is a standard feature for test data management tools.  An obfuscation tool – normally in conjunction with a powerful subsetting engine – takes an extract of your production data and applies a prescribed set of transformation rules to sensitive data elements that change or obfuscate the data prior to it being written to your test database.  The result is valid test records that can’t be traced back to or reverse engineered to identify the original data.  As such, you are able to leverage and replicate the natural complexity of the production data without exposing PHI, PCI, and or PII.  TDM tools used to require seven-figure licenses and multi-year implementation timeline.   

 Semele Data’s TDM toolset changes all of that.  Our robust solution enables your team to quickly make copies of your production data and safely obfuscate the data without it ever being exposed.