There is no industry with a greater potential to benefit from the disruptive power of data than healthcare. Massive investments in electronic health record applications are providing enormous amounts of deep clinical and outcomes data. The emergence of social media allows insights into the perceptions and behaviors of individuals unthinkable even a few years ago. The growth in innovation in connected medical devices at home and traditional care settings enables new models of data collection and intervention. And the rapid move to value-based reimbursement models is serving as a catalyst for healthcare organizations to use data and analytics to improve efficiency and quality while reducing costs.
The pressing need for healthcare organizations to become data-driven — and to effectively use data and analytics to become more efficient and agile — is driving healthcare organizations to behave much more like early adopters of technology when it comes to analytics. This is a refreshing change from the long-standing generalization that healthcare organizations tend to adopt information technologies a decade or more after other industries. There are many examples of where this is true, including healthcare providers’ historical reliance on paper medical records that have only recently been replaced with an electronic medical record, or payers with antiquated claims processing systems and call center applications.
Driving this willingness to become an early adopter of technology for analytics is the knowledge that implementation of a traditional enterprise data warehouse (EDW) is a lengthy, expensive journey fraught with logical and organizational challenges that, even if successful, will not be agile enough to keep up with a rapidly evolving healthcare market, business models or data landscapes. In no way does this observation diminish the value of having an EDW as a centralized repository of trustworthy data that can be used broadly across the enterprise by a wide variety of users with differing levels of expertise. The simple fact is that the business of healthcare is evolving so quickly – and the analytic requirements and need to extract value from data so urgent – that the years-long process of creating an EDW can no longer be the only way for healthcare organizations to get value from their data.
Understanding the limitations of the traditional EDW therefore is key in understanding the motivation for healthcare organizations to consider a more novel path.
The enterprise data warehouse is a well-established analytic construct. Decades of experience with traditional enterprise data warehouses across a variety of industries has led to a consistent roadmap to mitigate risk and achieve value. This roadmap includes starting with the analytic questions that need to be answered, and then pulling the logical thread to identify the required data, the sources of that data, how much historical data is required, and how the data will be governed to have a common meaning. This logical process then gives way to design and implementation of the physical data architecture that requires the design of the physical database schema so that the queries that answer the questions have adequate performance to provide a good user experience. And from this schema we then define the extract – transform – load processes, data quality rules, implementation project plan to include requirements definition, implementation, test, migration to production and all the other attendant activities required to ensure the enterprise data warehouse can successfully answer the questions that it has been designed to answer.
If all this sounds like a complex, time-consuming, expensive undertaking that’s because it absolutely is. And therein lies the rub against the traditional approach to creating an enterprise data warehouse – time, money and the fact that the entire approach required to mitigate risk and assure some measure of success is not agile enough to keep up with the needs of the business. While admittedly a simplification of many complex technical topics, the root cause of the lengthy design and implementation approach to the traditional EDW can be traced back to the historical performance limitations of available computing platforms. These platforms were expensive and therefore we went to extraordinary lengths to design for query performance and constrain how much data was loaded into the EDW since more data required more performance and therefore more costs. Modern EDW appliances employing a combination of massively parallel processing (MPP) and in-memory techniques have certainly mitigated some of these issues over the past decade, and in the end have allowed us to run queries more quickly (sometimes dramatically more quickly) but did not eliminate the need for intense and time-consuming up-front design work. In a nutshell, the EDW approach makes it hard to get data into the EDW, but easy to get it out.
Next generation analytic architectures reverse this approach, making it exceedingly easy to get data in – requiring little if any effort when data is loaded – and then apply logic and processing power when the data is consumed. This is inherently more agile since there is no design or effort required to make data accessible in the analytic repository. This architectural construct is most typically labeled a “data lake” and is built on Hadoop as the underlying data storage and processing platform. To support analysts who use the data, there is a rapidly maturing market category of self-service data preparation tools and applications that make it possible for business and clinical analysts to manipulate data when it is used, rather than when it is loaded. These tools serve to get people who understand what the data means much closer to the actual data, and much sooner in the process. Rather than waiting months, quarters or sometimes years for new data to be available in the warehouse, analysts can now interact with data within hours or days of it being available, and in appropriate situations are able to load the data into the data lake themselves. It is also important to understand that applying effort to the data when it is consumed does not imply that this approach is shoddy or somehow less disciplined than the work of information technology professionals doing the same tasks when the data is loaded. Self-service data preparation allows the application of the same rigor and capabilities for data profiling, data quality and data transformation – it is just these capabilities are being applied by individuals who know what the data means, against only the data of interest, and in an iterative manner that speeds time-to-value and insight. They are accomplishing the same work that historically required teams of information technology specialists and a multitude of specialized tools.
These emerging practices in next generation analytic architectures are becoming more mainstream, but remain the domain of organizations more inclined to be early adopters of information technology. What we are seeing in the healthcare market is a willingness of organizations to assume an early-adopter perspective out of necessity. The strengths and weaknesses of the traditional approach to the EDW are well understood. Implementing an EDW will follow a very predictable path with known people, process and technology barriers and challenges and with the EDW as a well-defined resource at the end. Unfortunately, while comfortable, healthcare organizations recognize that an analytics program focused on an EDW in this way will not meet their organizational needs to become data-driven. It will take too long, cost too much and ultimately not be agile enough to meet the rapidly evolving analytic needs in a healthcare market that is undergoing radical transformation on almost a month-by-month basis.
What is needed is a more agile approach to analytics and getting value from data, and this is driving healthcare organizations to be far more open-minded about newer technologies when it comes to analytics. I caution against applying the “big data” label to these next-generation analytics architectures; rather, I suggest that a next-generation analytic architecture represents the logical convergence of “small data” and “big data” architectures. A next-generation architecture benefits from the low cost and extreme performance and scalability of storage and compute technologies used to address big data challenges, but applies these capabilities against more traditional types and volumes of data to achieve agility. In practice, it turns out that applying effort to data when it is consumed is also a great approach to big data since the EDW approach of preprocessing data when loaded is impractical with big data volumes and frequently unknown queries and analytic requirements.
This article is the first foray into the fast-moving, exciting topic of analytic architectures and their role in the digital transformation of healthcare. My hope is it generated many, many clarifying questions in the minds of the readers (and probably some outright disagreement) that we can address in subsequent articles. Next-generation analytic approaches are not a panacea and certainly present challenges in the areas of people, process and technology. In future articles I will explore the practical aspects of adopting a next-generation analytic architecture for healthcare organizations. We will look into the implications of democratizing data and analytics self-service; the role of the enterprise data warehouse in a next-generation architecture; how data governance must evolve and its role in preventing data chaos; and how we can take lessons from other industries that have undergone digital transformation and apply those to healthcare.