It really all began thanks to Dr. Edison Liu’s sabbatical. At least, this is how Colleen Bushell, senior research scientist at the University of Illinois’ Applied Research Institute (ARI), explains her visual analytics team’s leap into the genomic and medical data domain.
It was 2010 and Liu, president and CEO of The Jackson Laboratory and adjunct professor of molecular and cellular biology at the University of Illinois, was in Urbana-Champaign on sabbatical and meeting with mathematicians, computer scientists, and fellow biologists at the National Center for Supercomputing Applications (NCSA). As the leader of a worldclass, cutting-edge genomic research institution, Liu admitted a struggle: he always had trouble communicating the results of genetic analysis of a tumor to physicians. The information was so complex, he didn’t know where to start.
Michael Welge, long-time colleague of Bushell’s at NCSA and then-director of data-intensive technologies and applications research, heard Liu’s challenge and knew it was not unique. Welge, an early member of the Mayo Clinic and Illinois Alliance, was aware that Mayo Clinic researchers and physicians were dealing with the same difficulty—how to draw out and communicate relevant information from a flood of genomic data. The challenge aligned with the Mayo Clinic Center for Individualized Medicine’s Clinomics Program aim: to transform genomic data into applications and information to guide decisions that ultimately improve patient health care. Members of the newly formed Alliance proposed a pilot project, and Mayo Clinic provided seed funding for the work. Like that, the Alliance’s first visual analytics for precision medicine project was born.
The Gamut to Genetic Variances
Visual analytics has been a widely accepted methodology for decades. Bushell describes it simply as the marriage of information design and data science. It is an outgrowth of information visualization and scientific visualization, and has overlapping characteristics, but can attack problems whose size, complexity, and need for both human and machine analysis makes them otherwise unmanageable.
While Welge brings an analytical and computational point of view to projects, Bushell’s formal training is in graphic design. Prior to her role with ARI, she was a professor in graphic design at the U of I. Her interest was information design, focusing on how to communicate both static and interactive data most effectively. From 1986 to 2004, Bushell was an NCSA research affiliate in data visualization. She designed the interface for Mosaic, the world’s first Web browser, worked on taking numeric data from simulations on a supercomputer to create a visual animation of a thunderstorm developing, and co-developed a visual programming interface for NCSA’s data mining software.“We were working with big data long before it was called ‘big data,’” jokes Bushell. “Until about six to seven years ago, my knowledge of microbiology, genetics, and medicine was zero. Nothing. The first book I bought was ‘Genetics for Dummies’—just so I could get the terminology correct,” says Bushell. “But even back in the early NCSA days, I was always working in a space like physics or atmospheric science, where it was always a collaboration and I was simply trying to understand enough to represent it accurately.”
With the formalization of the Mayo Clinic and Illinois Alliance, Bushell, Welge, and other members of their NCSA visual analytics team—mathematicians, information designers, and software developers—came together to focus their expertise on health and medical projects. The ARI team now includes Bushell, Loretta Auvil, Matt Berry, Lisa Gatzke, Peter Groves, Xiaoxia Liao, and Michael Welge.
Clearly Communicating Genetic Complexities
The visual analytics pilot project, an early collaboration of the Mayo Clinic and Illinois Alliance, along with the Genome Institute of Singapore (Edison Liu, founder) was a clinomics success. What started out as a few rough design concepts of genetic analysis of a tumor grew into a prototype genetic report that focused on a comprehensive diagnostic panel of 17 hereditary colon cancer genes using next-generation sequencing technologies. Previously, Mayo Clinic ran individual tests for five genes most frequently mutated in hereditary colorectal cancer cases. Labs around the world could test additional genes that were known or suspected to play a role in the disease, but no other institution could test all 17 comprehensively.
Matt Ferber, co-director of the Clinomics Program at Mayo, enlisted the Illinois team to design a visual report for this genetic colorectal cancer panel, one that helped communicate the results of the panel to physicians. Bushell presented the report at the 2012 Individualizing Medicine Conference, sparking great interest from physicians and investigators—especially colorectal surgeon Heidi Nelson.
Expanded Visualization Projects Meet Extended Random Forest
After seeing the results of the Alliance’s visual analytics collaboration first-hand, Dr. Heidi Nelson, also a vice chair for research in the Mayo Clinic Department of Surgery and an Alliance collaborator in the microbiome research area, knew it was just the beginning. Nelson sponsored three additional Alliance visual analytics seed projects, each with different goals for the team to achieve.
“Sometimes the goal is to help understand the biological complexity of what’s happening, or build a predictive model. Sometimes it is to determine feasible courses of action, given the genetic analysis. Sometimes the goal is to create an interactive visualization tool, too,” says Bushell.
One of Nelson’s seed projects that Bushell and the visual analytics team at Illinois ARI has completed for Dr. Jordan Miller’s heart disease research lab at Mayo Clinic was analyzing high dimensional mRNA and miRNA data from myxomatous mitral valve heart disease (or mitral valve prolapse). Essentially the weakening or degeneration of the valve’s connective tissue, it is the culprit of many a valve replacement surgery. However, despite progress in valve repair, lower mortality rates, imaging, and less invasive approaches, far too many patients undergo unnecessary valve replacement procedures. Additionally, performing open-heart surgery, only to find the mitral valve is not at significant risk for repair or replacement, is unsafe and costly. Mayo Clinic researchers wanted to search for features that could help identify degenerative heart valves.
Traditional statistical analysis notes the top ten up- and downregulated features. Welge constructed a newer analytical approach to the mitral valve project, aptly named Extended Random Forest (ERF) because it builds off the well-known Random Forest algorithm.
The method constructs hundreds of thousands of decision trees that, together, help determine top features and specific combinations and strengths that are most meaningful to a disease or condition. The process is extended to include stability analysis, which is important when there are many features but small sample sizes. In this case, the ERF method found a different list of most-important features than traditional analysis found—or than what researchers had predicted. “It’s actually good that we’re not all biologists, because we present the results without any preconceived bias,” says Bushell. “With this project, the Mayo Clinic researchers were very excited, because the results showed genes that mapped to three biological pathways they believe to be relevant to this valve disease.”
Creating New Tools for Medical Discovery and Decision-Making
Nelson is pleased with the number of projects now underway between Mayo Clinic and the Illinois visual analytics group.
“The work we’re doing together is critically important, because Colleen and her colleagues are able to take large amounts of data and work with it to create two or more levels of visualization. This is more comprehensible to a physician or patient than what we can provide alone,” says Nelson.
Currently, the Illinois team is working with Dr. Nelson to develop a microbiome research and reporting tool. It contains data from several microbiome studies where the microbial DNA have been extracted from stool samples and analyzed to identify bacteria that are relevant to various diseases. The tool uses visualization techniques to characterize the microbiome community and displays important bacteria on an interactive phylogenetic tree. Eventually, doctors could use the software to compare an individual patient’s profile with database results.
Bushell says she is particularly interested in the microbiome project on a personal level. Two of her three children have type 1 diabetes, and recent research shows a connection between the autoimmune disease and the microbiome. She says her team’s focus remains to find ways to identify relevant features in data and communicate intricate information in a biological context that helps people make decisions. Bushell sees Nelson as an incredibly visionary part of their visual analytics team, especially in moving these projects forward.
“What she wants to do is not just design a report of what we can find now, but push ahead to begin thinking about what we could potentially report on in five years, when we have more data,” says Bushell. “She’s asking, ‘what are the kinds of things that clinicians want to know about the microbiome?’ She is engaging and funding these projects, knowing that there is not too much clinically actionable information to report on yet. Dr. Nelson wants to get the dialogue going and move people toward a vision. Our design helps push the vision. It encourages them to say ‘we need to do this’ and ‘here’s how we can do it better.’”
“What an academic institution likes to do is find opportunities for their people to solve problems. And in health care, we have those real-world problems. So it’s a perfect marriage of having people with great expertise at Illinois filling the gaps we identify at the Mayo Clinic,” says Nelson.