Craig's Genome Data Visualizer

Summary

This is a project I was working on in November 2016. It's to visualize readings from individual genomes. The idea is that each reading has a start point, an end point, and a value, which indicates the strength of a match between the start and end point. I was tasked with structuring the data in such a way that a user could perform fast queries on the data and display the results. A typical query might be something like "I want to select all the readings between chromosome 1 offset 1,000,000 and chromosome 2 offset 500,000."

This program was only ever run on a toy dataset with approximately 2.9 million readings. In real life the dataset would be much larger. I was explicitly forbidden from using databases.

Results Text Display

This tab simply displays the results in a long text form. This text may be saved to file.

Results Graph Display

This tab displays measures about the results as a graph. You can pick which measures for the data you want to display (start, end, length, value) and how to display them (on the x-axis, on the y-axis, or via color). The picture may be saved to file.

Results Graph Display

This tab displays statistics for one measure of the results. Summary statistics are shown and a histogram is displayed for whichever measure for the results you select (start, end, length, value). This histogram and the accompanying summary statistics may be saved to file.

Programming Concepts Used

  • I used lambda selections to extract the various measures from the readings in the results.
  • I used Java's Swing package to make the GUI.
  • Because I was forbidden from using any kind of database, I wrote a B-Tree to store the data in some sorted form. I somewhat arbitrarily chose each node to have a maximum of 1,024 readings. Each node would be saved to the disk for quick future loading.
  • I used threads to read the data file and build the tree structure.
  • I used enums, regular expressions, interval trees, interfaces, abstract classes, polymorphism, affine transforms, singletons, factory classes, and functional interfaces.