Get a free report of your organisation's data maturity level - Learn more

How to Manage Business Rule Improvements in Data Vault

 |  5 August 2024

7 Reasons You Need a Modern Data Platform-min

In this article, our Principal Consultant, Bronwen Fairbairn, takes a closer look at Business Rule Evolution, and how business rule logic can change over time within the context of Data Vault 2.0.  

Scenario

Consider you have the following Raw Data with customer names and dates of birth as a satellite in the Data Vault.

Picture 1

This data shows that our customer Sarah Jones was initially recorded in 12-Jul-22, with a birth date of 1-Jan-1970. Then, on 20-May-2023, Sarah’s birthdate was updated to 4-Nov-1983 and on 27-May-2023 it was updated again to 27-May-2023.

Business Rule Version 1

You have a very simple Business Rule that calculates the customer age using the years and generates the following in a BV satellite.Picture 2

Business Rule Version 2

This Business Rule is in effect for some time, and everyone is happy. But one day the data quality team notice that there is a problem. The 1-Jan-70 birthdate wasn’t entered by the users. It was a system default and unclean data. They try to get this resolved at the source, but can’t and certainly can’t for historical data. Instead, they choose to update the rule to accommodate for this and a new Business Rule, version 2, is implemented. 

The new rule disregards the 1-Jan-70 birthdate and returns null. Running this rule from scratch across all the historical data, would result in the following:Picture 3

Business Rule Version 3

But then, through some analysis in the reporting logic, a slight inaccuracy in precision is identified. The age calculation is just looking at the years. Everyone becomes a year older on New Year’s Day, which obviously isn’t correct. Thus, a business rule 3 is implemented which takes the full birth date into consideration when calculating age.

 So, they adjust the business rule further.

Picture 4This is a simple scenario and a very simple business rule. Please let’s ignore the fact that calculating the customer age may not be sufficient justification for a business rule and instead focus on what to report in the Business Vault if the rule changes.

Why Recalculate History

One of the key issues comes from confusion as to the purpose of the Business Vault. The Business Vault is an auditable store of what was reported historically; but it is also a location for high-quality cleansed information, the source of best quality information to facilitate self-service.

But, what if these two agendas are in conflict. Initially we developed reports using the data generated by Business Rule 1. However, if a user makes a request now to see what the source system looked like historically, should we give them the data generated by Business Rule 1 at the time or should we give them the data at the time processed by the highest quality business rule we currently have in operation (Business Rule 3). Do we prioritise consistency of reporting or quality of report content?

Ideally, we would be able to do both, but the decision of what to implement first is specific to your organisation. Do users want to review what was reported historically and the net effect of that, in which case they need the auditability of historic reporting the Business Vault can provide. Or do they want to extract historical information and compare it to the current time, for example if they are developing a predictive model, in which case business rule changes can distort the information and prevent you from doing a like-for-like comparison.

Other examples of changing Business Rules

Rule improvements like the one I have mentioned here are commonplace. As you use information more and more, it is likely that you will realise you haven’t accommodated for all contingencies, and you may want to improve your data cleansing process. But the rule changes can also be triggered by external factors. For example, if you are classifying addresses into geographical areas, and there is a boundary move.

Consider the handover of Hong Kong to the Peoples Republic of China in 1997. This could logically result in a change to the geocoding business rules to classify locations in Hong Kong as being part of China, as opposed to being reported as its own territory.

From a reporting and BV perspective, Hong Kong should be reported as part of China from 1997 on. The BV should also have an auditable record of all pre 1997 reporting that had Hong Kong as its own territory. However, data consumers may also like to analyse what the area that is currently China looked like pre-1997.In other words, use the current Business Rule and apply it to old data. This will allow users to make like-for-like comparisons. For instance, they would be able to see if the population of an area changed because of people moving rather than a change to the Business Rule processes. l

Business Vault Representations

Let’s have a look at how the Business Vault can represent a rule that changes over time.

Assume that the rule was initially developed prior to 12-Jul-22 and operated by extracting the current data for all customers form the Raw Vault and loading that data into the Business Vault. The rule was then changed to v2 and v3 after 27-May-23.

Business Vault

The BV Sat would probably look like this.Picture 5

Using this Business Vault object, we would be able to report on the current ages of the customers and have an auditable history of all reporting that had been created previously. However, it still has that Customer a72cb0 had an age of 52 on 12-Jul-22. It can’t tell us how old our customers were at the end of 2022, because the improved business rule hasn’t been run on that historical data. We would have problems using the data generated at the end of 2022 to test a predictive model, because the data for that time generated by the CustomerAge v1 business rule looks different to the data currently being generated by the CustomerAge v3 business rule and this could throw out the forecasts. 

What we have in the BV here isn’t our best information. It isn’t our current data cleansing rule applied across the board. It is simply a record of the data that we used for reporting at the time.

We could add the Business Rule version in as a dependent child key. But that would result in a large amount of duplicate data. The old rules could keep running in parallel for a while. But this could lead to lots of unnecessary processing and mounting deprecated technical assets.

Picture 6Alternatively, you could use the DV2.0 “create time” fields to overwrite the old business rule data. You would still have that auditability but have cleansed information at the same time.

Picture 7This option gives you a fully auditable BV, while also allowing easy access to the best information using the current rule.

Because of the lack of end dates, you would be able to filter values using the Created Date and query the data that was returned by the CustomerAgev1 and v2 rules at any time they were operational.

You would also be able to see that, using your current processes, customer a72cb0 had an unknown age between 12-Jul-22 and 20-May-23 and see how a customer with that information might behave. You could use that to compare to how customers with no recorded age behaved at other times and make predictions for customers that currently have no recorded age.

Let’s call these different BV approaches:

  • Option 1 – Snapshot Load rule changes over time
  • Option 2 – Rule version as DC key
  • Option 3 – Using CreateDate to update old data as per the new rule

Let's have a look at the actual processing involved for each option.

Option 1 – Snapshot Load rule changes over time

Snapshot Load rule changes over time, is the basic choice. I suspect that this is what most Data Vault implementations are doing currently. It can generate a basic snapshot of current data using the rule and load it into the BV sat. It would be like snapshotting a current state source system table.

You could easily modify the process to only process changed records as opposed to doing a full snapshot. New RV records could be processes through the BV once as they appear, and a full snapshot could be taken if – and only if – the rule changes.

Picture 8Option 2 – Rule version as DC key

Rule version as DC key is a bit more complex. It would require you to put a rule version in as a Dependent Child Key for every BV Sat to prepare for future changes. Once again, the process could be snapshot or delta driven, but you would need to run multiple rules in parallel. If you ever wanted to save computational effort by turning an old rule off, you could. However, you would probably need a way of tracking this, so that you know that the data for that Dependant Child key is no longer up to date.

Picture 9I personally haven’t seen anyone do this, but it would be a way of having an existing and a release candidate rule running in parallel. The raw data could be loaded through the rule using a snapshot or a delta approach. But the overhead of having multiple rule versions running in parallel would be considerable.

The advantage of this approach is that you could report on the current data using an old business rule as well as the historical data using a new business rule. But the technical debt of never being able to retire any rules would become a significant challenge. People may be tempted to update old versions of a rule as opposed to the latest copy. You could generate lots of versions but no clear master for people to trust as the best copy.

Option 3 – Using CreateDate to update old data as per the new rule

Using CreateDate to update old data as per the new rule is the most complex but also the most powerful of the three options. It allows you to easily see what was reported, bit also what current analysis says about your historical data. It allows you to compare current reporting to the historical but equivalent figures and to historical figures reported at the point in time.Picture 10

One advantage of this approach is that the resulting BV object is still quite compact. You can use this data to report on three different perspectives. But if the rule versions are quite similar (which you may expect they would be in an agile environment) the number of records added to the BV sat is quite small. It is also quite easy to see the impact of the rule changes at a record level. You can see form here that the change to rule CustomerAge v2 only resulted in the 12-Jul-22 record for Customer a72cb0 changing and the CustomerAge v3 enhancements only resulted in the 27-Mar-23 record for Customer a72cb0 changing.

The rules could still run as a snapshot of the source or as a delta process. But as with Option 2, any time the rule changed the entire raw dataset should be loaded through to see if any historical data was cleansed differently. However, the Satellite load process would also need to support late data processing, something that many Data Vault solutions aren’t yet capable of.

How are you dealing with rule changes?

I would be very interested in knowing what you are doing and how you are dealing with rule changes.

Specifically

  • What do you think about the above options?
  • Do you have any additional approaches you would like to share?
  • What are you doing currently and how has it worked for you?
  • What are the BV priorities and limitations for your organisation? For example: Priorities - Auditability, easy access to high quality cleansed information for ad hock reporting, Limitations – BV can’t be too big or time consuming,

 

We will be discussing these questions in the Data Vault Innovators Community (DVIC). Make sure you join to see how others are solving these issue and to add your own insights. Login or join here. 

Alternatively, you can email me at: bronwen.fairbairn@ignition-data.com

Continue Reading

Ignition_orange_cta_BG
Let’s get started!

Relise your data potential.