Skip to main content

Blogs

Graph Exploration

A few years ago, before the explosion of the Web into popular culture, and especially before Facebook and social networking brought the concept to surface in a directly personal concept, few outside mathematics were interested in graphs.  Today, however, they have become extremely popular for intuiting the vast weave of interconnected, but irregular or incomplete, data that's become so prevalent.  Graph databases such as Neo4j have risen in the space known as NoSQL to replace traditional relational databases for solving certain types of problems with incredible speedups.  However, tools like this require a certain level of technical arcana that make them inaccessible or uninteresting to many.  A higher level tool, and the subject of this post, are the graph visualizers.

The trailblazer in this area, still with a somewhat steep learning curve for those not comfortable with text-based work, is the venerable GraphViz.  GraphViz consumes a text file, specifying nodes and their contents, and outputs a beautiful (sometimes) graphic.  It can even do things like render to SVG or image maps, which seems rather quaint at this point.  Still, the most interesting and useful component of GraphViz is its library of layout heuristics, which vary from using edge-weights as spring forces in a simple simulation using Hooke's law to any variety of assorted other techniques which have demonstrated utility.  These live on in newer tools, thanks to GraphViz's open source licensing.

Franck Cuny's map of the Github community made with GephiOne of these newer tools, and the one I'm most excited about currently, is Gephi, which aims to be "like Photoshop™ for graphs."  This builds on GraphViz's functionality via a nice graphical interface, which allows for more interactive and real-time layout options.  Graph coloring, layout, and even direct vertex and edge manipulation are possible in an exploratory editor, as well as a wide variety of analytical statistics.  One of my favorite example graphs is the one pictured here, by Franck Cuny, of the connections between open-source software projects on Github.  He's made views showing the connections within specific countries and programming languages, as well, which lend themselves more to seeing the true shape of the community.  His blog post goes into greater detail and is well worth the read.

Of course, while Gephi may have depth, it still has a learning curve.  Luckily, it has companions for broad accessible, in the form of Flash applets such as GexfExplorer, which can be directly embedded in a web page.  This is certainly a more limited tool, but with some extra backend work by web developers, could be a great boon to navigation of complex subjects.  There are also some very interesting commercial offerings, such as the Java applet from TouchGraph, which allows interactive traversal of datasets from companies such as Facebook and Amazon, and the amazing French company Linkfluence, certainly a leading consultant in this emerging field.

Here at Arts & Sciences, we could have potential uses for this too.  For instance, setting up nodes to represent different course offerings, with edges weighted by students who took or did well in both courses, we could get a nice visualization of the interconnectedness of disciplines.  For large datasets of natural language, we could use Levenshtein distance, or some variation, to find similar blocks of content in scholarly content, which could open new avenues of research.  Hopefully we'll be able to explore this union of aesthetics and functionality more in the future.  Personally, I think it's going to be a very interesting field.

Visual Chemistry

How does DNA work?  There's an A, and a C, and a G, and some other letter I can't readily recall, but did manage to learn to pass a test at some point.  I remember hearing about this double helix thing, the kind of far out, 'whoa how awesome is the universe!' kind of thing hippies dig, but eventually burn out on.  Apparently it was some transcendental cosmic beauty that made this so, but I would never be able to comprehend it fully due to my insufficient grounding in chemistry, most of which consisted of mnemonic devices to a random assortment of formulas, recounted in a voice to trick toy dogs and cheerleaders into thinking chemistry was fun.  Suffice it to say, the animated cartoon hillbilly DNA strand from Jurassic Park a few years prior made a much more lasting impression on my young mind.

Feynman's illustration of water molecules in steam

Recently I was speaking with an old friend of mine who didn't have these hangups about chemistry, who reminded me that at some point I had bemoaned the inability to see the microscopic processes in a satisfactory way.  This was in the stone-age days before the dot-com implosion, when we undergrads were forced to work out problems by pencil and paper, and those little chemistry diagrams just didn't appeal to my aesthetic sense, or something (I preferred square-root signs and marginal doodles).  At some point later on, when video games led me towards rigid-body simulation, I decided I'd give the fabled Feynman Lectures on Physics a perusal.  I certainly can't claim Feynman's verbage had no impact, but what I remember strikingly was the diagram of water molecules in steam.  Of course I'd heard the name H2O since I was a kid, but never connected the hexagonal shape of a snowflake to the shape of the molecule composing it.

Viewing Glucosyltransferase-SI in PyMOL

These days, we're swimming in an ocean of this information, all accessible at our fingertips.  Thanks to technology and the work of scientists all over the world, we can see the structures of DNA and other molecules.  The Worldwide Protein Data Bank is chock-full of organic molecules contributed by researchers, painstakingly codified and documented.  There's plenty of great software to work with the data, too-- for instance, the viewer here is PyMOL, written in Python, but an apt-cache search chemistry on my Ubuntu system lists 47 other packages as well.  This is free and open source, designed to make animating and exploring molecules easy.  I'm still no chemist, but this lets me dive in and learn interactively, in a way that textbooks never could.  That's exciting.