Skip to main content
Learn more about advertising with us.

Health Catalyst Launches Open Source Machine Learning:

Use of machine learning and predictive analytics to improve health outcomes has so far been limited to highly-trained data scientists, mostly in the nation’s top academic medical centers.

No longer. is on a mission to make machine learning accessible to the thousands of healthcare professionals who possess little or no data science skills but who share an interest in using the technology to improve patient care. By making its central repository of proven machine learning algorithms available for free, enables a large, diverse group of technical healthcare professionals to quickly use machine learning tools to build accurate models. The site provides one central spot to download algorithms and tools, read documentation, request new features, submit questions, follow the blog, and contribute code. was started by Health Catalyst, a leading data warehousing, analytics and outcomes improvement company that is contributing ongoing support to the open source community. Health Catalyst has used to build predictive models that drive its clients’ outcomes improvement efforts and span across the company’s product lines. Models include but are not limited to a predictive model for central line associated blood stream infection (CLABSI), readmission models for COPD and other chronic conditions, schedule optimization, and financial predictions such as patient propensity to pay.

“Machine learning and artificial intelligence are going to transform healthcare. We are seeing amazing results and yet we are barely getting started. We are applying it to the reduction of patient harm events, care management, hospital acquired infections, revenue cycle management, patient risk stratification, and more,” said Dale Sanders, Executive Vice President of Health Catalyst. “With machine learning, the data is talking to us, exposing insights that we’ve never seen before with traditional business intelligence and analytics. By open sourcing, we hope to facilitate industrywide collaboration and advance the adoption of machine learning, making it easy for healthcare organizations to learn from and enhance these tools together, without the need for a team of data scientists. All of us have seen what open source software has achieved in other industries and we want to be a part of that in healthcare. ”

How works makes it easy to create predictive and pattern recognition models using a healthcare organization’s own data – and is unlike any other machine learning tool in the industry. The open source repository features packages for two common languages in healthcare data science – R and Python. These packages are designed to streamline healthcare machine learning by simplifying the workflow of creating and deploying models, and delivering functionality specific to healthcare:

  • Pays attention to longitudinal questions
  • Offers an easy way to do risk-adjusted comparisons
  • Provides easy connections and deployment to databases

Both packages provide an easy way to create models on a health system’s own data. This includes linear and random forest models, ways to handle missing data, guidance on feature selection, proper performance metrics, and easy database connections.

“We believe that machine learning is too helpful and important to be handled solely by full-time data scientists,” said Sanders. “The new tools in enable BI developers, data architects, and SQL developers to create appropriate and accurate models with healthcare data, without hiring a data scientist. These tools will democratize machine learning in a realm that needs it most – because everyone benefits when healthcare is made safer, more efficient and effective. And, we are not just being altruistic here. By submitting our tools and algorithms to the open source community, we and our clients will benefit from the collective intelligence that exists beyond our team of data scientists.”

Participation in is simple.  Interested parties can visit the site, choose either the R or Python language, read the install instructions, and follow the examples – at no cost. There is no similar platform or environment for healthcare professionals who are seeking to expand their skills and the value of machine learning to their organization.