EDGEWATER EXPERT’S CORNER: The Pros and Cons of Exposing Data Warehouse Content via Object Models

So you’re the one that’s responsible for your company’s enterprise reporting environment. Over the years, you have succeeded in building out a very stable and yet constantly expanding and diversifying data warehouse, a solid end-user reporting platform, great analytics and flashy corporate dashboards. You’ve done all the “heavy lifting” associated with integrating data from literally dozens of source systems into a single cohesive environment that has become the go-to source for any reporting needs.

Within your EDW, there are mashup entities that exist nowhere else in the corporate domain and now you are informed that some of the warehouse content you have created will be needed as source data for a new customer service site your company is creating.

So what options do you have to accommodate this? The two most common approaches that come to mind are: a) generating extracts to feed to the subscribing application on a scheduled basis; or b) just give the application development team direct access to the EDW tables and views. Both methods have no shortage of pros and cons.

  • Extract Generation – Have the application development team identify the data they want up front and as a post-process to your nightly ETL run cycles, dump the data to the OS and leave consuming it up to the subscribing apps.
Pros Cons
A dedicated extract is a single daily/nightly operation that will not impact other subscribers to the warehouse. You’re uncomfortable publishing secure content to a downstream application environment that may not have the same stringent user-level security measures in place as the EDW has.
Application developers will not be generating ad hoc queries that could negatively impact performance for other subscribing users’ reporting operations and analytics activity. Generating extracts containing large amounts of content may not be the most efficient method for delivering needed information to subscribing applications.
Nightly dumps or extracts will only contain EDW data that was available at the time the extracts were generated and will not contain the near- real-time content that is constantly being fed to the EDW – and that users will likely expect.
  • Direct Access – Give the subscribing application developers access to exposed EDW content directly so they can query tables and views for the content they want as they need it.

 

Pros Cons
It’s up to the application development team to get what they need, how they need it and when they need it. You’re uncomfortable exposing secure content to application developers that may not have the same stringent user-level security measures in place as the EDW has.
More efficient than nightly extracts as the downstream applications will only pull data as needed. Application developers will be generating ad hoc queries that could negatively impact performance for other subscribing users’ reporting operations and analytics activity.
Near-real-time warehouse content will be available for timely consumption by the applications.

 

While both of the above options have merits, they also have a number of inherent limitations – with data security being at the top of the list. Neither of these approaches enforces the database-level security that is already implemented explicitly in the EDW – side-stepping this existing capability will force application developers to either reinvent that wheel or implement some broader, but generally less stringent, application-level security model.

There is another option, though, one we seldom consider as warehouse developers. How about exposing an object model that represents specific EDW content consistently and explicitly to any subscribing applications? You may need to put on your OLTP hat for this one, but hear me out.

The subscribing application development team would be responsible to identify the specific objects (collections) they wish to consume and would access these objects through a secured procedural interface. On the surface, this approach may sound like you and your team will get stuck writing a bunch of very specific custom procedures, but if you take a step back and think it through, the reality is that your team can create an exposed catalog of rather generic procedures, all requiring input parameters, including user tokens – so the EDW security model remains in charge of exactly which data is returned to which users on each retrieval.

The benefits of this approach are numerous, including:

  • Data Security – All requests leverage the existing EDW security model via a user token parameter for every “Get” method.
  • Data Latency – Data being delivered by this interface is as current as it is in the EDW so there are no latency issues as would be expected with extracted data sets.
  • Predefined Get Methods – No ad hoc or application-based SQL being sent to the EDW. Only procedures generated and/or approved by the EDW team will be hitting the database.
  • Content Control – Only the content that is requested is delivered. All Get methods returning non-static data will require input parameter values for any required filtering criteria – all requests can be validated.
  • Data Page Control – Subscribing applications will not only be responsible for identifying what rows they want via input parameters, but also how many rows per page to keep network traffic in check.
  • EDW Transaction Logging – An EDW transaction log can be implemented with autonomous logging that records every incoming request, the accompanying input parameters, the number of rows returned and the duration it took for the transaction to run. This can aid performance tuning for the actual request behaviors from subscribing applications.
  • Object Reuse – Creation of a generic exposed object catalog will allow other applications to leverage the same consistent set of objects providing continuity of data and interface across all subscribing applications.
  • Nested and N Object Retrieval – Creation of single Get methods that can return multiple and/or nested objects in a single database call.
  • Physical Database Objects – All consumable objects are physically instantiated in the database as user-defined types based on native database data types or other user-defined types.
  • Backend Compatibility – Makes no difference what type of shop you are, i.e.; Oracle, Microsoft, IBM, PostgreSQL or some other mainstream RDBMS; conceptually, the approach is the same.
  • Application Compatibility – This approach is compatible with both Java and .NET IDE’s, as well as other application development platforms.
  • Reduced Data Duplication – Because data is directly published to subscribing applications, there is no need for subscribers to store that detail content in their transactional database, just key value references.

There are also a few Cons that also need to be weighed when considering this path:

  • EDW Table Locks – the warehouse ETL needs to be constructed so that tables that are publishing to the object model are not exclusively locked during load operations. This eliminates brown-out situations for subscribing applications.
  • Persistent Surrogate Keys – EDW tables that are publishing data to subscribing applications via the object model will need to have persistent surrogate primary keys so that subscribing applications can locally store key values obtained from the publisher and leverage the same key values in future operations.
  • Application Connection/Session Pooling – Each application connection (session) to the EDW will need to be established based on an EDW user for security to persist to the object model, so no pooling of open connections.
  • Reduced Data Duplication – This is a double-edged sword in this context because subscribing applications will not be storing all EDW content locally. As a result, there may be limitations to the reporting operations of subscribing applications. However, the subscribing applications can also be downstream publishers of data to the same EDW and can report from there. Additionally, at the risk of convoluting this particular point, I would also point out that “set” methods can also be created which would allow the subscribing application(s) to publish relevant content directly back to the EDW, thus eliminating the need for batch loading back to the EDW from subscribing application(s). Probably a topic for another day, but I wanted to put it out there.

 

So, does that sound like something that you may just want to explore? For more information on this or any of our offerings, please do not hesitate to reach out to us at makewaves@edgewater.com. Thanks!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s