The payoff from interactive visualization

Jock Mackinlay

Jock Mackinlay is the director of visual analysis at Tableau Software.

Jock Mackinlay of Tableau Software discusses how more of the workforce has begun to use analytics tools.

Interview conducted by Alan Morrison

PwC: When did you come to Tableau Software?

JM: I came to Tableau in 2004 out of the research world. I spent a long time at Xerox Palo Alto Research Center working with some excellent people—Stuart Card and George Robertson, who are both recently retired. We worked in the area of data visualization for a long time. Before that, I was at Stanford University and did a PhD in the same area—data visualization. I received a Technical Achievement Award for that entire body of work from the IEEE organization in 2009. I’m one of the lucky few people who had the opportunity to take his research out into the world into a successful company.

PwC: Our readers might appreciate some context on the whole area of interactive visualization. Is the innovation in this case task automation?

JM: There’s a significant limit to how we can automate. It’s extremely difficult to understand what a person’s task is and what’s going on in their head. When I finished my dissertation, I chose a mixture of automated techniques plus giving humans a lot of power over thinking with data.

And that’s the Tableau philosophy too. We want to provide people with good defaulting as best we can but also make it easy for people to make adjustments as their tasks change. When users are in the middle of looking at some data, they might change their minds about what questions they’re asking. They need to head toward that new question on the fly. No automated system is going to keep up with the stream of human thought.

PwC: Humans often don’t know themselves what question they’re ultimately interested in.

“No amount of pre-computation or work by an IT department is going to be able to anticipate all the possible ways people might want to work with data. So you need to have a flexible, human-centered approach.”

JM: Yes, it’s an iterative exploration process. You cannot know up front what question a person may want to ask today. No amount of pre-computation or work by an IT department is going to be able to anticipate all the possible ways people might want to work with data. So you need to have a flexible, human-centered approach to give people a maximal ability to take advantage of data in their jobs.

PwC: What did your research uncover that helps?

JM: Part of the innovation of the dissertation at Stanford was that the algebra enables a simple drag-and-drop interface that anyone can use. They drag fields and place them in rows and columns or whatnot. Their actions actually specify an algebraic expression that gets compiled into a query database. But they don’t need to know all that. They just need to know that they suddenly get to see their data in a visual form.

PwC: One of the issues we run into is that user interfaces are often rather cryptic. Users must be well versed in the tool from the designer’s perspective. What have you done to make it less cryptic, to make what’s happening more explicit, so that users don’t present results that they think are answering their questions in some way but they’re not?

JM: The user experience in Tableau is that you connect to your data and you see the fields on the side. You can drag out the fields and drop them on row, column, color, size, and so forth. And then the tool generates the graphical views, so users can see the data visualization. They’re probably familiar with their data. Most people are if they’re working with data that they care about.

The graphical view by default codifies the best practices for putting data in the view. For example, if the user dragged out a profit and date measure, because it’s a date field, we would automatically generate a line mark and give that user a trend line view because that’s best practice for profit varying over time.

If instead they dragged out product and profit, we would give them a bar graph view because that’s an appropriate way to show that information. If they selected a geographic field, they’ll get a map view because that’s an appropriate way to show geography.

We work hard to make it a rapid exploration process, because not only are tables and numbers difficult for humans to process, but also because a slow user experience will interrupt cognition and users can’t answer the questions. Instead, they’re spending the time trying to make the tool work.

The whole idea is to make the tool an extension of your hand. You don’t think about the hammer. You just think about the job of building a house.

PwC: Are there categories of more structured data that would lend themselves to this sort of approach? Most of this data presumably has been processed to the point where it could be fed into Tableau relatively easily and then worked with once it’s in the visual form.

JM: At a high level, that’s accurate. One of the other key innovations of the dissertation out of Stanford by Chris Stolte and Pat Hanrahan was that they built a system that could compile those algebraic expressions into queries on databases. So Tableau is good with any information that you would find in a database, both SQL databases and MDX databases. Or, in other words, both relational databases and cube databases.

But there is other data that doesn’t necessarily fall into that form. It is just data that’s sitting around in text files or in spreadsheets and hasn’t quite got into a database. Tableau can access that data pretty well if it has a basic table structure to it. A couple of releases ago, we introduced what we call data blending.

“A lot of people have lots of data in lots of databases or tables. They might be text files. They might be Microsoft Access files. They might be in SQL or Hyperion Essbase. But whatever it is, their questions often span across those tables of data.”
 

A lot of people have lots of data in lots of databases or tables. They might be text files. They might be Microsoft Access files. They might be in SQL or Hyperion Essbase. But whatever it is, their questions often span across those tables of data.

Normally, the way to address that is to create a federated database that joins the tables together, which is a six-month or greater IT effort. It’s difficult to query across multiple data tables from multiple databases. Data blending is a way—in a lightweight drag-and-drop way—to bring in data from multiple sources.

Imagine you have a spreadsheet that you’re using to keep track of some information about your products, and you have your company-wide data mart that has a lot of additional information about those products. And you want to combine them. You can direct connect Tableau to the data mart and build a graphical view.

Then you can connect to your spreadsheet, and maybe you build a view about products. Or maybe you have your budget in your spreadsheet and you would like compare the actuals to the budget you’re keeping in your spreadsheet. It’s a simple drag-and-drop operation or a simple calculation to do that.

So, you asked me this big question about structured to unstructured data.

PwC: That’s right.

JM: We have functionality that allows you to generate additional structure for data that you might have brought in. One of the features gives you the ability—in a lightweight way—to combine fields that are related to each other, which we call grouping. At a fundamental level, it’s a way you can build up a hierarchical structure out of a flat dimension easily by grouping fields together. We also have some lightweight support for supporting those hierarchies.

We’ve also connected Tableau to Hadoop. Do you know about it?

PwC: We wrote about Hadoop in 2010. We did a full issue on it as a matter of fact.1

JM: We’re using a connector to Hadoop that Cloudera built that allows us to write SQL and then access data via the Hadoop architecture.

In particular, whenever we do demos on stage, we like to look for real data sets. We found one from Kiva, the online micro-lending organization. Kiva published the huge XML file describing all of the organization’s lenders and all of the recipients of the loans. This is an XML file, so it’s not your normal structured data set. It’s also big, with multiple years and lots of details for each.

We processed that XML file in Hadoop and used our connector, which has string functions. We used those string functions to reach inside the XML and pull out what would be all the structured data about the lenders, their location, the amount, and the borrower right down to their photographs. And we built a graphical view in Tableau. We sliced and diced it first and then built some graphical views for the demo.

The key problem about it from a human performance point of view is that there’s high latency. It takes a long time for the programs to run and process the data. We’re interested in helping people answer their questions at the speed of their thought. And so latency is a killer.

We used the connection to process the XML file and build a Tableau extract file. That file runs on top of our data engine, which is a high-performance columnar database system. Once we had the data in the Tableau extract format, it was drag and drop at human speed.

We’re heading down this vector, but this is where we are right now in terms of being able to process less-structured information into a form that you could then use Tableau on effectively.

PwC: Interesting stuff. What about in-memory databases and how large they’re getting?

JM: Anytime there’s a technology that can process data at fast rates, whether it’s in-memory technology, columnar databases, or what have you, we’re excited. From its inception, Tableau involved direct connecting to databases and making it easy for anybody to be able to work with it. We’re not just about self-analytics; we’re also about data storytelling. That can have as much impact on the executive team as directly being able themselves to answer their own questions.

PwC: Is more of the workforce doing the analysis now?

JM: I just spent a week at the Tableau Customer Conference, and the people that I meet are extremely diverse. They’re not just the hardcore analysts who know about SPSS and R. They come from all different sizes of companies and nonprofits and on and on.

And the people at the customer conferences are pretty passionate. I think part of the passion is the realization that you can actually work with data. It doesn’t have to be this horribly arduous process. You can rapidly have a conversation with your data and answer your questions.

Inside Tableau, we use Tableau everywhere—from the receptionist who’s tracking utilization of all the conference rooms to the sales team that’s monitoring their pipeline. My major job at Tableau is on the team that does forward product direction. Part of that work is to make the product easier to use. I love that I have authentic users all over the company and I can ask them, “Would this feature help?”

So yes, I think the focus on the workforce is essential. The trend here is that data is being collected by our computers almost unmanned, no supervision necessary. It’s the process of utilizing that data that is the game changer. And the only way you’re going to do that is to put the data in the hands of the individuals inside your organization.


1 See “Making sense of Big Data,” Technology Forecast 2010, Issue 3, http://www.pwc.com/us/en/technology-forecast/2010/issue3/index.jhtml, for more information.