Managing Data Integrity

When was the last time you looked at a view of data, report, or graph in CRM and said to yourself, “This doesn’t look right”? You’re not alone. Keeping data up-to-date is a common issue for many organizations. We rely on its accuracy for decision making. An example of decision-making from data is determining which resource to assign to a project. If the project pipeline is inaccurate, a more senior resource might get tied up in a smaller project when their skillset would have been better used on a more important project. Another example might be deciding to make an investment based on erroneous forecasts of that investment’s future.

When data is out-of-date and you recognize this, the risk of an inaccurate decision is diminished as you have the opportunity to contact the data owner(s) to get an update. When it goes unnoticed, the risk of bad decisions increases. While there are many reasons why data can get out of date, there is often one common root cause: the person responsible for entering the data did so incorrectly or failed to do so. Rather than demonizing a person, we can look to find ways to make it easier for the data to be kept up to date.

There are many factors that go into data integrity:

Does the responsible party for the data entry also own the information gathering mechanism?

This can manifest when there is a team assigned to a record or there is a disconnect and/or lag in the data gathering process. For example, if there is a government agency that only provides updates periodically, but management needs information more frequently, this can present a problem. Possible solutions:

  • One record – one owner. No team ownership of a record.
  • Talk with management about the data they want and the source if outside the direct control of the responsible party. Have an open dialogue if the data gathering mechanism is flawed or doesn’t meet the needs of management to decide on a best course of action.

Does data have to be kept up-to-date real time or can it be done periodically?

Not all decisions have to be made ad-hoc. Some decisions can be deferred, occurring weekly or monthly. It is important that an organization examine the risk associated with each data element. Those that supply data feeding high-risk areas or decisions needing to be made more often need updates frequently from their data owners. Those with less risk or are used less-often can have less emphasis on being kept up to date. Remember, at the end of the day, a person, somewhere, had to provide that data. As individuals, no one is perfect and it is unreasonable to expect perfection on every record, every field, every time. Prioritize!

Can data be automated?

There are many tools available that can be added on to your software that automates data gathering. There are many companies that have created tools that, for example, go out to the web and pull in data updates related to a search topic. Consider installing or developing such tools where appropriate. This will reduce the need for a person in your organization to be assigned to this task. It will save time and money!

Consider using a tool’s workflow or a manually created workflow to help remind data owners to make updates.

Many data tools have built in workflows. These can be used to set tasks or send an email periodically for data owners reminding them to update a record. An example might be to create a field called “Last update” which should be changed each time a person reviews the record to make updates to important fields. If this data is more than a week old, an email can be sent to the data owner. Where such tools are not available in the tool, one could use their email application to have a reoccurring task or calendar item to remind them. At last resort, a sticky note on a physical calendar can do the trick!

Data is the life-blood of an organization. Keeping it up-to-date is important for decision making affecting both small and big outcomes. Most data comes from people. Help your people by setting up reasonable, sound business practices and processes around data integrity. It won’t prevent erroneous data, but you’ll find less of it and will make you and your data owner’s work-lives much easier. For a case study about how Edgewater has followed these practices, click here for more information.

Are you Paralyzed by a Hoard of Big Data?

Lured by the promise of big data benefits, many organizations are leveraging cheap storage to hoard vast amounts of structured and unstructured data. Without a clear framework for big data governance and use, businesses run the risk of becoming paralyzed under an unorganized jumble of data, much of which has become stale and past its expiration date. Stale data is toxic to your business – it could lead you into taking the wrong action based on data that is no longer relevant.

You know there’s valuable stuff in there, but the thought of wading through all THAT to find it stops you dead in your tracks.  There goes your goal of business process improvement, which according to a recent Informatica survey, most businesses cite as their number one Big Data Initiative goal.

Just as the individual hoarder often requires a professional organizer to help them pare the hoard and institute acquisition and retention rules for preventing hoard-induced paralysis in the future, organizations should seek outside help when they find themselves unable to turn their data hoard into actionable information.

An effective big data strategy needs to include the following components:

  1. An appropriate toolset for analyzing big data and making it actionable by the right people. Avoid building an ivory tower big data bureaucracy, and remember, insight has to turn into action.
  2. A clear and flexible framework, such as social master data management, for integrating big data with enterprise applications, one that can quickly leverage new sources of information about your customers and your market.
  3. Information lifecycle management rules and practices, so that insight and action will be taken based on relevant, as opposed to stale  information.
  4. Consideration of how the enterprise application portfolio might need to be refined to maximize the availability and relevance of big data. In today’s world, that will involve grappling with the flow of information between cloud and internally hosted applications as well.
  5. Comprehensive data security framework that defines who is entitled to use the data, change the data and delete the data, as well as encryption requirements as well as any required upgrades in network security.

Get the picture? Your big data strategy isn’t just a data strategy. It has to be a comprehensive technology-process-people strategy.

All of these elements, should of course, be considered when building your big data business case, and estimating return on investment.

The Unknown Cost of “High Quality Outcomes” in Healthcare

“You were recently acknowledged for having high quality outcomes compared to your peers, how much is it costing you to report this information?”

I recently read an article on healthcareitnews.com, “What Makes a High Performing Hospital? Ask Premier”. Because so many healthcare providers are so quick to tout their “quality credentials” (yet very few understand how much it costs their organization in wasted time and money running around to collect the data to make these claims) and this article sparked the following thoughts…

The easiest way to describe it, I’ve been told after many times trying to describe it myself, is “the tip of the iceberg”. That is the best analogy to give a group of patient safety and quality executives, staffers, and analysts when describing the effort, patience, time and money needed to build a “patient safety and quality dashboard”  with all types of quality measures with different forms of drill down and roll up.

What most patient safety and quality folks want is a sexy dashboard or scorecard  that can help them report and analyze, in a single place and tool, all of their patient safety and quality measures. It has dials and colors and all sorts of bells and whistles. From Press Ganey patient satisfaction scores, to AHRQ PSIs, Thomson Reuters and Quantros Core Measures, TheraDoc and Midas infection control measures, UHC Academic Medical Center measures….you name it. They want one place to go to see this information aggregated at the enterprise level, with the ability to drill down to the patient detail. They want to see it by Location, or by Physician, by Service Line or by Procedure/Diagnosis. This can be very helpful and extremely valuable to organizations that continue to waste money on quality analysts and abstractors who simply “collect data” instead of “analyze and act” on it. How much time do you think your PS&Q people spend finding data and plugging away at spreadsheets? How much time is left for actual value-added analysis? I would bet you very little…

So that’s what they want, but what are they willing to pay for? The answer is very little. Why?

People in patient safety and quality are experts…in patient safety and quality. What they’re not experts in is data integration, enterprise information management, meta-data strategy, data quality, ETL, data storage, database design, and so on. Why do I mention all these technical principles? Because they ALL go into a robust, comprehensive, scalable and extensible data integration strategy…which sits underneath that sexy dashboard you think you want. So, it is easy for providers to be attracted to someone offering a “sexy dashboard” that knows diddly squat about the foundation, or what you can’t see under the water, that’s required to build it. Didn’t anyone ever tell you “if it sounds too good to be true, it is!?”

Electronic Medical Records ≠ Accurate Data

As our healthcare systems race to implement Electronic Medical Records or EMRs, the amount of data that will be available and accessible for a single patient is about to explode.  “As genetic and genomic information becomes more readily available, we soon may have up to 1,000 health facts available for each particular patient,” notes Patrick Soon-Shiong, executive director of the UCLA Wireless Health Institute and executive chairman of Abraxis BioScience, Inc., a Los Angeles-based biotech firm dedicated to delivering therapeutics and technologies that treat cancer and other illnesses.  The challenge is clear: how can a healthcare organization manage the accuracy of 1,000 health facts?

As the volume of individual data elements expands to encompass 1,000 health facts per patient, there is an urgent need for electronic tools to manage the quality, timeliness and origination of those data.  One key example is simply making sure that each patient has a unique identifier with which to attach and connect the individual health facts.  This may seem like a mundane detail, but it is absolutely critical to uniquely identify and unambiguously associate each key health fact with the right patient, at the right time.  Whenever patients are admitted to a health system, they are typically assigned a unique medical record number that both clinicians and staff use to identify, track, and cross-reference their records.  Ideally, every patient receives a single, unique identifier.  Reality, however, tells a different story, because many patients wind up incorrectly possessing multiple medical record numbers,  while others wind up incorrectly sharing the same identifier.

These errors, known respectively as master person index (MPI) duplicates and overlays, can cause physicians and other caregivers to unknowingly make treatment decisions based on incomplete or inaccurate data, posing a serious risk to patient safety.  Thus, it is no wonder that improving the accuracy of patient identification repeatedly heads The Joint Commission’s national patient safety goals list on an annual basis.

Assembling an accurate, complete, longitudinal view of a patient’s record is comparable to assembling a giant jigsaw puzzle.  Pieces of that puzzle are scattered widely across the individual systems and points of patient contact within a complex web of hospitals, outpatient clinics, and physician offices.  Moreover, accurately linking them to their rightful owner requires the consolidation and correction of the aforementioned MPI errors.  To accomplish this task, every hospital nationwide must either implement an MPI solution directly, hire a third party to clean up “dirty” MPI and related data, or implement some other reliable and verifiable approach.  Otherwise, these fundamental uncertainties will continue to hamper the effective and efficient delivery of the core clinical services of the extended health system.

Unfortunately, this issue doesn’t simply require a one-time clean-up job for most healthcare systems.  The challenge of maintaining the data integrity of the MPI has just begun.  That’s because neither an identity resolution solution, nor an MPI software technology, nor a one-time clean-up will address the root causes of these MPI errors on their own.  In a great majority of cases, more fundamental issues underlie the MPI data issue, such as flawed registration procedures; inadequate or poorly trained staff; naming conventions that vary from one operational setting or culture to another; widespread use of nicknames; and even confusion caused by name changes due to marriages and divorces – or simple misspelling.

To address these challenges, institutions must combine both an MPI technology solution, which includes human intervention, and the reengineering of patient registration processes or other points of contact where patient demographics are captured or updated.  Unless these two elements are in place, providers’ ability to improve patient safety and quality of care will be impaired because the foundation underpinning the MPI will slowly deteriorate.

Another solution is the use of data profiling software tools.  These tools allow the identification of common patterns of data errors, including erroneous data entry, to focus and drive needed revisions or other improvements in business processes.  Effective data profiling tools can run automatically using business rules to focus on the exceptions of inaccurate data that need to be addressed.  As the number of individual health facts increases for each patient, the need for automating data accuracy will continue to grow, and the extended health system will need to address these issues.

When healthcare providers make critical patient care decisions, they need to have confidence in the accuracy and integrity of the electronic data.  Instead of a physician or nurse having to assemble and scan dozens of electronic patient records in order to catch a medication error or an overlooked allergy, these data profiling tools can scan thousands of records, apply business rules to identify the critical data inaccuracies, including missing or incomplete data elements, and notify the right people to take action to correct them.

The time has come in the age of computer-based medical records that electronic data accuracy is now a key element in patient safety; as critical as data completeness.  What better way to manage data accuracy than with smart electronic tools for data profiling?  Who knows?  The life you save or improve may be your own.

Organizational Transparency: Introduce Your IT Team to the Business Users

If you thought siloed data was a problem in healthcare, well you’re right. There are tremendous opportunities to improve this fundamental problem in OR’s, ER’s, and units in hospitals of all shapes and sizes. A majority of healthcare CIO’s agreed as well, identifying it as the Top Tech trend on their radar for 2010. But more and more large healthcare organizations are realizing it’s not just the disparate data scattered across the technical landscape that’s causing headaches, its’ siloed departments as well. “Dr. Smith, meet Ryan the head of clinical decision support.”

I have personally been a part of those awkward conversations, which as a consultant, are never fun. That is, when you are engaged with a client and you become the person that introduces a clinician (physician or surgeon or charge nurse) from the business side, to their counterpart [within the same organization] on the IT side (Manager of Data Warehouse, Director of Clinical Decision Support). The first thing that enters my mind (and I hope theirs) is, “how have you two not met before today?” Unfortunately, these continue to happen and with higher frequency than anyone would like to admit.

The role of Healthcare CIO has changed, the qualifications for a successful CIO now demand a strong understanding of the business in which they operate. Ben Williams, CIO of Catholic Healthcare West and its 42-hospital enterprise, said it best, “there is a greater demand on CIOs to be business leaders and innovators and know the business and know the challenges and parameters.” One way for organizations to improve this cross functional understanding and ensure coordination between business and technical leaders is to have their integration embedded in the guiding principles of enterprise data governance.  The demand for clinical analysts that can bridge this gap has never been higher; aside from these types of resources, the people on the front lines who have proprietary knowledge of clinical workflows, applications, and technical infrastructure must be challenged to expand their expertise outside of their direct responsibilities, regardless of the side of the organization they currently sit [comfortably]. 

The Many Costs Associated with Lack of Transparency

“So what if Bill from IT doesn’t know David, the Director of the OR! I run a huge organization, not everyone knows everyone else.” Wrong attitude; let me quantify the costs associated with this lack of resource coordination:

  • Lack of both clinical and technical requirements creates project re-work that misses deadlines, lengthens implementations and extends “Go-Lives”.
  • User dissatisfaction with applications, user interfaces, and system capabilities from inconsistent education – pockets of expertise littered amongst a sea of novice users underutilizing the apps
  • Distrust of technical/clinical counterparts and the data/information within the systems
    • Dr: “Why can’t it just work? I want IT to be like the lights; if I turn it on it should work.”
    • IT: “Why can’t the users just learn how to use the system correctly?”
    • Failed projects leaves long-term impact on users/staff involved
    • Diminished Return on Investment

How will you ensure your clinical decision support staff understands the clinical requirements for near-real time reporting of data related to quality, performance, and compliance? How can you ensure your clinical staff are proficient in the use of your most recent system implementation in the OR, floor unit, or ED? What argument must you articulate to the naysayers and critics amongst your anesthesiologists, surgeons and nurses when they ask, “Why do we have to move from paper to automation?” If the answer is not consistent from both sides of the house, business and IT, the message is lost and the battle to win over your end users becomes harder and harder to win as each new initiative is rolled out.

And we know one thing, paper in healthcare is like pleather, shoulder pads and mullets in fashion…if it makes a comeback we’re all in serious trouble. 

Driving Value from Your Healthcare Analytics Program –Key Program Components

If you are a healthcare provider or payer organization contemplating an initial implementation of a Business Intelligence (BI) Analytics system, there are several areas to keep in mind as you plan your program.  The following key components appear in every successful BI Analytics program.  And the sooner you can bring focus and attention to these critical areas, the sooner you will improve your own chances for success.

Key Program Components

Last time we reviewed the primary, top-level technical building blocks.  However, the technical components are not the starting point for these solutions.  Technical form must follow business function.  The technical components come to life only when the primary mission and drivers of the specific enterprise are well understood.  And these must be further developed into a program for defining, designing, implementing and evangelizing the needs and capabilities of BI and related analytics tuned to the particular needs and readiness of the organization.

Key areas that require careful attention in every implementation include the following:

We have found that healthcare organizations (and solution vendors!) have contrasting opinions on how best to align the operational data store (ODS) and enterprise data warehouse (EDW) portions of their strategy with the needs of their key stakeholders and constituencies.  The “supply-driven” approach encourages a broad-based uptake of virtually all data that originates from one or more authoritative source system, without any real pre-qualification of the usefulness of that information for a particular purpose.  This is the hope-laden “build it and they will come” strategy.  Conversely, the “demand-driven” approach encourages a particular focus on analytic objectives and scope, and uses this focus to concentrate the initial data uptake to satisfy a defined set of analytic subject areas and contexts.  The challenge here is to not so narrowly focus the incoming data stream that it limits related exploratory analysis.

For example, a supply-driven initiative might choose to tap into an existing enterprise application integration (EAI) bus and siphon all published HL7 messages into the EDW or ODS data collection pipe.  The proponents might reason that if these messages are being published on an enterprise bus, they should be generally useful; and if they are reasonably compliant with the HL7 RIM, their integration should be relatively straightforward.  However, their usefulness for a particular analytic purpose would still need to be investigated separately.

Conversely, a demand-driven project might start with a required set of representative analytic question instances or archetypes, and drive the data sourcing effort backward toward the potentially diverging points of origin within the business operations.  For example, a surgical analytics platform to discern patterns between or among surgical cost components, OR schedule adherence, outcomes variability, payer mix, or the impact of specific material choices would depend on specific data elements that might originate from potentially disparate locations and settings.  The need here is to ensure that the data sets required to support the specific identified analyses are covered; but the collection strategy should not be so exclusive that it prevents exploration of unanticipated inquiries or analyses.

I’ll have a future blog topic on a methodology we have used successfully to progressively decompose, elaborate and refine stakeholder analytic needs into the data architecture needed to support them.

In many cases, a key objective for implementing healthcare analytics will be to bring focus to specific areas of enterprise operations: to drive improvements in quality, performance or outcomes; to drive down costs of service delivery; or to increase resource efficiency, productivity or throughput, while maintaining quality, cost and compliance.  A common element in all of these is a focus on process.  You must identify the specific processes (or workflows) that you wish to measure and monitor.  Any given process, however simple or complex, will have a finite number of “pulse points,” any one of which will provide a natural locus for control or analysis to inform decision makers about the state of operations and progress toward measured objectives or targets.  These loci become the raw data collection points, where the primary data elements and observations (and accompanying meta-data) are captured for downstream transformation and consumption.

For example, if a health system is trying to gain insight into opportunities for flexible scheduling of OR suites and surgical teams, the base level data collection must probe into the start and stop times for each segment in the “setup and teardown” of a surgical case, and all the resource types and instances needed to support those processes.  Each individual process segment (i.e. OR ready/busy, patient in/out, anesthesia start/end, surgeon in/out, cut/close, PACU in/out, etc.) has distinct control loci the measurement of which comprises the foundational data on which such analyses must be built.  You won’t gain visibility into optimization opportunities if you don’t measure the primary processes at sufficient granularity to facilitate inquiry and action.

Each pulse point reveals a critical success component in the overall operation.  Management must decide how each process will be measured, and how the specific data to be captured will enable both visibility and action.  Visibility that the specific critical process elements being performed are within tolerance and on target; or that they are deviating from a standard or plan and require corrective action.  And the information must both enable and facilitate focused action that will bring performance and outcomes back into compliance with the desired or required standards or objectives.

A key aspect of metric design is defining the needed granularity and dimensionality.  The former ensures the proper focus and resolution on the action needed.  The latter facilitates traceability and exploration into the contexts in which performance and quality issues arise.  If any measured areas under-perform, the granularity and dimensionality will provide a focus for appropriate corrective actions.  If they achieve superior performance, they can be studied and characterized for possible designation as best practices.

For example, how does a surgical services line that does 2500 total knees penetrate this monolithic volume and differentiate these cases in a way that enables usable insights and focused action?  The short answer is to characterize each instance to enable flexible-but-usable segmentation (and sub-segmentation); and when a segment of interest is identified (under-performing; over-performing; or some other pattern), the n-tuple of categorical attributes that was used to establish the segment becomes a roadmap defining the context and setting for the action: either corrective action (i.e. for deviation from standard) or reinforcing action (i.e. for characterizing best practices).  So, dimensions of surgical team, facility, care setting, procedure, implant type and model, supplier, starting ordinal position, day of week, and many others can be part of your surgical analytics metrics design.

Each metric must ultimately be deconstructed into the specific raw data elements, observations and quantities (and units) that are needed to support the computation of the corresponding metric.  This includes the definition, granularity and dimensionality of each data element; its point of origin in the operation and its position within the process to be measured; the required frequency for its capture and timeliness for its delivery; and the constraints on acceptable values or other quality standards to ensure that the data will reflect accurately the state of the operation or process, and will enable (and ideally facilitate) a focused response once its meaning is understood.

An interesting consideration is how to choose the source for a collected data element, when multiple legitimate sources exist (this issue spills over into data governance (see below); and what rules are needed to arbitrate such conflicts.  Arbitration can be based on: whether each source is legitimately designated as authoritative; where each conflicting (or overlapping) data element (and its contents) resides in a life cycle that impacts its usability; what access controls or proprietary rights pertain to the specific instance of data consumption; and the purpose for or context in which the data element is obtained.  Resolving these conflicts is not always as simple as designating a single authoritative source.

Controlling data quality at its source is essential.  All downstream consumers and transformation operations are critically dependent on the quality of each data element at its point of origin or introduction into the data stream.  Data cleansing becomes much more problematic if it occurs downstream of the authoritative source, during subsequent data transformation or data presentation operations.  Doing so effectively allows data to “originate” at virtually any position in the data stream, making traceability and quality tracking more difficult, and increasing the burden of retaining the data that originates at the various points to the quality standard.  On the other hand, downstream consumers may have little or no influence or authority to impose the data cleansing or capture constraints on those who actually collect the data.

Organizations are often unreceptive to the suggestion that their data may have quality issues.  “The data’s good.  It has to be; we run the business on it!”  Although this might be true, when you remove data from its primary operating context, and attempt to use it for different purposes such as aggregation, segmentation, forecasting and integrated analytics, problems with data quality rise to the surface and become visible.

Elements of data quality include: accuracy; integrity; timeliness; timing and dynamics; clear semantics; rules for capture; transformation; and distribution.  Your strategy must include establishing and then enforcing definitions, measures, policies and procedures to ensure that your data is meeting the necessary quality standards. 

The data architecture must anticipate the structure and relationships of the primary data elements, including the required granularity, dimensionality, and alignment with other identifying or describing elements (e.g. master and reference data); and the nature and positioning of the transformation and consumption patterns within the various user bases.

For example, to analyze the range in variation of maintaining schedule integrity in our surgical services example, for each case we must capture micro-architectural elements such as the scheduled and actual start and end times for each critical participant and resource type (e.g. surgeon, anesthesiologist, patient, technician, facility, room, schedule block, equipment, supplies, medications, prior and following case, etc.), each of which becomes a dimension in the hierarchical analytic contexts that will reveal and help to characterize where under-performance or over-performance are occurring.  The corresponding macro-architectural components will address requirements such as scalability, distinction between retrieval and occurrence latency, data volumes, data lineage, and data delivery.

By the way: none of this presumes a “daily batch” system.  Your data architecture might need to anticipate and accommodate complex hybrid models for federating and staging incremental data sets to resolve unavoidable differences in arrival dynamics, granularity, dimensionality, key alignment, or perishability.  I’ll have another blog on this topic, separately.

You should definitely anticipate that the incorporation and integration of additional subject areas and data sets will increase the value of the data; in many instances, far beyond that for which it was originally collected.  As the awareness and use of this resource begins to grow, both the value and sensitivity attributed to these data will increase commensurately.  The primary purpose of data governance is to ensure that the highest quality data assets obtained from all relevant sources are available to all consumers who need them, after all the necessary controls have been put in place.

Key components of an effective strategy are the recognition of data as an enterprise asset; the designation of authoritative sources; commitment to data quality standards and processes; recognition of data proceeding through a life cycle of origination, transformation and distribution, with varying degrees of ownership, stewardship and guardianship, on its way to various consumers for various purposes.  Specific characteristics such as the level of aggregation; the degree of protection required (e.g. PHI); the need for de-identification and re-identification; the designation of “snapshots” and “versions” of data sets; and the constraints imposed by proprietary rights. These will all impact the policies and governance structures needed to ensure proper usage of this critical asset.

Are you positioned for success?

Successful implementation of BI analytics requires more than a careful selection of technology platforms, tools and applications.  The selection of technical components will ideally follow the definition of the organizations needs for these capabilities.  The program components outlined here are a good start on the journey to embedded analytics, proactively driving the desired improvement throughout your enterprise.

Data Profiling: The BI Grail

In Healthcare analytics, as in analytics for virtually all other businesses, the landscape facing the Operations, Finance, Clinical, and other organizations within the enterprise is almost always populated by a rich variety of systems which are prospective sources for decision support analysis.   I propose that we insert into the discussion some ideas about the inarguable value of, first, data profiling, and second, a proactive data quality effort as part of any such undertaking.

Whether done from the ground up or when the scope of an already successful initial project is envisioned to expand significantly, all data integration/warehousing/business intelligence efforts benefit from the proper application of these disciplines and the actions taken based upon their findings, early, often, and as aggressively as possible.

I like to say sometimes that in data-centric applications, the framework and mechanisms which comprise a solution are actually even more abstract in some respects than traditional OLTP applications because, up to the point at which a dashboard or report is consumed by a user, the entire application virtually IS the data, sans bells, whistles, and widgets which are the more “material” aspects of GUI/OLTP development efforts:

  • Data entry applications, forms, websites, etc. all exist generally outside the reach of the project being undertaken.
  • Many assertions and assumptions are usually made about the quality of that data.
  • Many, if not most, of those turn out not to be true, or at least not entirely accurate, despite the very earnest efforts of all involved.

What this means in terms of risk to the project cannot be overstated.   Because it is largely unknown in most instances it obviously can neither be qualified nor quantified.   It often turns what seems, on the face of it, to be a relatively simple “build machine X” with gear A, chain B, and axle C project into “build machine X” with gear A (with missing teeth), chain B (not missing any links but definitely rusty and needing some polishing), and axle C (which turns out not to even exist though it is much discussed, maligned, or even praised depending upon who is in the room and how big the company is).

Enter The Grail.   If there is a Grail in data integration and business intelligence, it may well be data profiling and quality management, on its own or as a precursor to true Master Data Management (if that hasn’t already become a forbidden term for your organization due to past failed tries at it).

Data Profiling gives us a pre-emptive strike against our preconceived notions about the quality and content of our data.   It gives us not only quantifiable metrics by which to measure and modify our judgement of the task before us, but frequently results in various business units spinning off immediately into the scramble to improve upon what they honestly did not realize was so flawed.

Data Quality efforts, following comprehensive profiling and any proactive quality correction which is possible, give a project the possibility of fixing problems without changing source systems per se, but before the business intelligence solution becomes either a burned out husk on the side of the EPM highway (failed because of poor data), or at the least a de facto data profiling tool in its own right, by coughing out whatever data doesn’t work instead of serving its intended purpose- to deliver key business performance information based upon a solid data foundation in which all have confidence.

The return on investment for such an effort is measurable, sustainable, and so compelling as an argument that no serious BI undertaking, large or small, should go forward without it.   Whether in Healthcare, Financial Services, Manufacturing, or another vertical,  its value is, I submit, inarguable.

Data Darwinism – Evolving your data environment

In my previous posts, the concept of Data Darwinism was introduced, as well as the types of capabilities that allow a company to set itself apart from its competition.   Data Darwinism is the practice of using an organization’s data to survive, adapt, compete and innovate in a constantly changing and increasingly competitive business environment.   If you take an honest and objective look at how and why you are using data, you might find out that you are on the wrong side of the equation.  So the question is “how do I move up the food chain?”

The goal of evolving your data environment is to change from using your data in a reactionary manner and just trying to survive, to proactively using your data as a foundational component to constantly innovate to create a competitive advantage.

The plan is simple on the surface, but not always so easy in execution.   It requires an objective assessment of where you are compared to where you need to be, a plan/blueprint/roadmap to get from here to there, and flexible, iterative execution.

Assess

As mentioned before, taking an objective look at where you are compared to where you need to be is the first critical step.  This is often an interesting conversation among different parts of the organization that have competing interests and objectives. Many organizations can’t get past this first step. People get caught up in politics and self-interest and lose sight of the goal; to move the organization forward into a competitive advantage situation. Other organizations don’t have the in-house expertise or discipline to conduct the assessment. However, until this can be done, you remain vulnerable to other organizations that have moved past this step.

Plan

Great, now you’ve done the assessment, you know what your situation is and what your strengths and weaknesses are.  Without a roadmap of how to get to your data utopia, you’re going nowhere.   The roadmap is really a blueprint of inter-related capabilities that need to be implemented incrementally over time to constantly move the organization forward.   Now, I’ve seen this step end very badly for organizations that make some fundamental mistakes.  They try to do too much at once.  They make the roadmap too rigid to adapt to changing business needs.   They take a form over substance approach.  All these can be fatal to an organization.   They key to the roadmap is three-fold:

  • Flexible – This is not a sprint.   Evolving your data environment takes time.   Your business priorities will change, the external environment in which you operate will change, etc.   The roadmap needs to be flexible enough to enable it to adapt to these types of challenges.
  • – There will be an impulse to move quickly and do everything at once.   That almost never works.   It is important to align the priorities with the overall priorities of the organization.
  • Realistic – Just as you had to take an objective, and possibly painful, look at where you were with respect to your data, you have to take a similar look at what can be done given any number of constraints all organizations face.   Funding, people, discipline, etc. are all factors that need to be considered when developing the roadmap.   In some cases, you might not have the internal skill sets necessary and have to leverage outside talent.   In other cases, you will have to implement new processes, organizational constructs and enabling technologies to enable the movement to a new level.  

Execute Iteratively

The capabilities you need to implement will build upon each other and it will take time for the organization to adapt to the changes.   Taking an iterative approach that focuses on building capabilities based on the organization’s business priorities will greatly increase your chance of success.  It also gives you a chance to evaluate the capabilities to see if they are working as anticipated and generating the expected returns.   Since you are taking an iterative approach, you have the opportunity to make the necessary changes to continue moving forward.

The path to innovation is not always an easy one.   It requires a solid, yet flexible, plan to get there and persistence to overcome the obstacles that you will encounter.   However, in the end, it’s a journey well worth the effort.

Data Darwinism – Capabilities that provide a competitive advantage

In my previous post, I introduced the concept of Data Darwinism, which states that for a company to be the ‘king of the jungle’ (and remain so), they need to have the ability to continually innovate.   Let’s be clear, though.   Innovation must be aligned with the strategic goals and objectives of the company.   The landscape is littered with examples of innovative ideas that didn’t have a market.  

So that begs the question “What are the behaviors and characteristics of companies that are at the top of the food chain?”    The answer to that question can go in many different directions.   With respect to Data Darwinism, the following hierarchy illustrates the categories of capabilities that an organization needs to demonstrate to truly become a dominant force.

Foundational

The impulse will be for an organization to want to immediately jump to implementing capabilities that they think will allow them to be at the top of the pyramid.   And while this is possible to a certain extent, you must put in place certain foundational capabilities to have a sustainable model.     Examples of capabilities at this level include data integration, data standardization, data quality, and basic reporting.

Without clean, integrated, accurate data that is aligned with the intended business goals, the ability to implement the more advanced capabilities is severely limited.    This does not mean that all foundational capabilities must be implemented before moving on to the next level.  Quite the opposite actually.   You must balance the need for the foundational components with the return that the more advanced capabilities will enable.

Transitional

Transitional capabilities are those that allow an organization to move from silo’d, isolated, often duplicative efforts to a more ‘centralized’ platform in which to leverage their data.    Capabilities at this level of the hierarchy start to migrate towards an enterprise view of data and include such things as a more complete, integrated data set, increased collaboration, basic analytics and ‘coordinated governance’.

Again, you don’t need to fully instantiate the capabilities at this level before building capabilities at the next level.   It continues to be a balancing act.

Transformational

Transformational capabilities are those that allow the company to start to truly differentiate themselves from their competition.   It doesn’t fully deliver the innovative capabilities that set them head and shoulders above other companies, but rather sets the stage for such.   This stage can be challenging for organizations as it can require a significant change in mind-set compared to the current way its conducts its operations.   Capabilities at this level of the hierarchy include more advanced analytical capabilities (such as true data mining), targeted access to data by users, and ‘managed governance’.

Innovative

Innovative capabilities are those that truly set a company apart from its competitors.   They allow for innovative product offerings, unique methods of handling the customer experience and new ways in which to conduct business operations.   Amazon is a great example of this.   Their ability to customize the user experience and offer ‘recommendations’ based on a wealth of user buying  trend data has set them apart from most other online retailers.    Capabilities at this level of the hierarchy include predictive analytics, enterprise governance and user self-service access to data.

The bottom line is that moving up the hierarchy requires vision, discipline and a pragmatic approach.   The journey is not always an easy one, but the rewards more than justify the effort.

Check back for the next installment of this series “Data Darwinism – Evolving Your Data Environment.”

Data Darwinism – Are you on the path to extinction?

Most people are familiar with Darwinism.  We’ve all heard the term survival of the fittest.   There is even a humorous take on the subject with the annual Darwin Awards, given to those individuals who have removed themselves from the gene pool through, shall we say, less than intelligent choices.

Businesses go through ups and downs, transformations, up-sizing/down-sizing, centralization/ decentralization, etc.   In other words, they are trying to adapt to the current and future events in order to grow.   Just as in the animal kingdom, some will survive and dominate, some will not fare as well.   In today’s challenging business environment, while many are trying to merely survive, others are prospering, growing and dominating.  

So what makes the difference between being the king of the jungle or being prey?   The ability to make the right decisions in the face of uncertainty.     This is often easier said than done.   However, at the core of making the best decisions is making sure you have the right data.   That brings us back to the topic at hand:  Data Darwinism.   Data Darwinism can be defined as:

“The practice of using an organization’s data to survive, adapt, compete and innovate in a constantly changing and increasingly competitive business environment.”

When asked to assess where they are on the Data Darwinism continuum, many companies will say that they are at the top of the food chain, that they are very fast at getting data to make decisions, that they don’t see data as a problem, etc.   However, when truly asked to objectively evaluate their situation, they often come up with a very different, and often frightening, picture. 

  It’s as simple as looking at your behavior when dealing with data:

If you find yourself exhibiting more of the behaviors on the left side of the picture above, you might be a candidate for the next Data Darwin Awards.

Check back for the next installment of this series “Data Darwinism – Capabilities that Provide a Competitive Advantage.”