Category Archives: Context and Domain

[thesis 1] Interview #1 – Jer Thorp


Jer Thorp defines his works as ‘software-based’ and ‘data-focused.’ He is the co-founder of The Office for Creative Research, along with Ben Rubin and Mark Hansen, and teaches at NYU’s ITP Program.

Some of his works pre-OCR include the algorithm for the 9/11 Memorial and Project Cascade. The latter was developed by Jer as a Data Artist in Residence at the NY Times R&D Group.


I talked to Jer for about 40 minutes. I didn’t conduct a formal interview. Instead, we talked about some of my ideas for this project and things he was working on. I tried to expand this to a conversation about broader themes in the field as well — future directions for data visualization, the role of data art, cultural changes related to data, etc. That is why this transcription doesn’t follow a Q&A format.

On the idea of ‘direct visualization’ as a form of representation closer to the artifact (the thing itself)

That’s a philosophically very deep question. You can get closer to the measurement, but it’s very hard to get close to the thing. That requires you analysing not only the act of representation, but you have to also consider the act of measurement, you have to consider the intent of the measurement, it’s all built into it. In that case [Manovich’s film visualizations], I would consider the dataset this kind of analytics that’s running on the images. There’s a lot of decisions being made there.

On the limitations of algorithmic-based cultural analysis

There’s this myth that the computer and algorithms allow you some type of purity. That is not true at all. This analytics allow us to do things we’ve never been able to do before. But they also don’t mean that we can dispose of ethnographical and the old-fashioned ‘talk to people,’ and do some journalistic research and so on. [Cultural analytics] It’s a tool, and a very powerful one. But it needs to be paired with other pieces.

About 3d (sculptural) visualisations versus 2d, in terms of perception

Scale and perspective make a huge difference for everything. I’m skeptical about those studies though, because if you dig up the papers half of them were done with a group of about 20, almost all white, grad students. And then we build all of our decisions based on them.

For us [OCR], it’s all about communication, which is a different thing. David Carson has this famous quote: “don’t confuse legibility with communication.” Because most of the times in our projects our fundamental goal is not to allow people to see “this is 6.8 or 6.2,“ but instead to give them a way to interpret that, or to have a feeling about it, or to construct a narrative from there. Of course rigour is important, so we’re not gonna show things in a way that is misleading. But I don’t believe in best practices in that sense. What’s best practices for your magazine might not be best practices for another magazine. And what if I’m projecting on a wall on a building, or what if it’s a sculptural form? These are things that we have no rules for, which I think is why I like those things more.

About future directions for the data visualization field

We’ve been working a lot with performance, and trying to think what that means perform data. We’re doing a long residency with the Museum of Modern Art. It’s an algorithmic performance that we write the scripts for actors that perform them in a gallery. It’s very much like traditional theatre in a way, but the content is generated using this data techniques we developed. That’s pretty interesting to me.

I think sculpture, data in a physical form it’s still in its infancy. Most of the works we’ve seen in the past 6 years look like something you just pulled out of the screen and plotted on a table. A lot of them look like a 3d renders, because of the 3d printers. There’s a ton of possibilities that haven’t been explored. We think about shape a lot in sculpture, but we don’t think a lot about material, its relation to the body or to a room, architecture, design, temperature… There’s a ton of room to do interesting things in that department.

On the reasons for a large number of recent data-related projects

Something about this movement has to do with something that’s happening in culture. It’s about and around this kind of data-based transformation that’s happening in the world. So it’s less about how it’s being done and more about why. It’s about the NSA, radical changes in transparency, wearable sensors, all those things coming together in this really big cultural change.

I’m definitely skeptical about the advertising slogans that have been used to promote this stuff, but I’m optimistic about its potential. Last year was really interesting, because for the first time we started to have real conversations about the exclusionary nature of big data. What does data mean for underprivileged communities? What decisions have been made based on a completely white North-American frame of thought? And how can we do better? That to me is really exciting.

The work we do here and the work that are a lot of people are doing is fundamentally about trying to push a cultural change on how we think about data. And that is gonna take a long time, it’s probably gonna require a lot more than a 9-person studio pushing against it, but what we need is a generation of people who understand data and collectively can make decisions.

Most people don’t even have a good understanding of what is data. And it’s fundamentally easy to talk about data as measurements of something. If you want to be more accurate, you can say it is records of measurement of something. And it’s important to include ‘measurement’ in it because it is a human act. So it is a human artefact. We can program machines to do it, but they’re still doing it based on our decisions — until A.I., strong A.I. development there’s no data that is not fundamentally human data.


This interview was made in a previous phase of my project, when my main questions were about representation — how to get close to the thing, or the artifact. It had some impact on my later decisions, leading me to turn my focus to the process instead of the result. In other words, if every data visualization process is based on decisions and implies lossy, is it possible to make it transparent? Can data visualization lead to a better understanding of data itself?


[Thesis 1] Design Brief #2

Thesis Question

In spite of current improvements in technology, data visualization continues to use techniques that date to its origins. Digital interfaces might have changed the way we interact with data, making it possible to filter, zoom-in, and access details of it. However, the basic principles to display data are still based on reductions to geometric shapes, like rectangles, circles, and lines.

When working with cultural data, this gap between representation and artifact might be particularly problematic. Does a bar chart of the most used words of a book reveal more than the book itself? Can a pie chart of the colors of a painting tell us more about a painting than the actual image? Or is it possible to balance both representations? If so, can modern technologies create new visualization techniques to bridge this gap between representation and artifact?


In his paper “Visualizing Vertov,” [@manovich_visualizing_2013] Lev Manovich analyses the work of Russian filmmaker Dziga Vertov using methods from his Software Studies Initiative. The research intends to show alternative ways to visualize media collections and focuses on the movies The Eleventh Year (1928) and Man With a Moving Camera (1929). It utilizes digital copies of Vertov’s movies provided by the Austrian Film Museum, in Vienna. They are the media source from which movie frames are taken, visualized and analysed, using custom software. Also, they provide metadata for the research: frame numbers, shot properties and manual annotations. A secondary source utilised by Manovich is Cinemetrics, a crowdsourced database of shot lengths from more than 10000 movies.

As for the visual representations, traditional techniques, like bars and scatterplots, are complemented by Manovich’s “direct visualization“ approach. In this method, “the data is reorganized into a new visual representation that preserves its original form.” [@manovich_what_2010]

Manovich describes his own method for analysis as analogous to how Google Earth operates: from a “bird’s eye” view of a large collection of movies to a zoom into the details of a single frame. The summary of those steps is as follows:


  • Panorama
    • ‘Birds-eye’ view: 20th Century movies compared by ASL (average shot length). Technique: scatterplot.
    • Timeline of mean shot length of all Russian films in the Cinemetrics database. Technique: line chart.
    • Movies from Vertov and Eisenstein compared to other 20th century movies. Technique: scatterplot.


  • Shot length
    • The Eleventh Year and Man With a Moving Camera compared by shot length. Technique: bars, bubbles, and rectangles.
    • Zoom-in of the same visualisations.


  • Shot
    • Each of 654 shots in The Eleventh Year, represented by its second frame. Technique: direct visualization.
    • Shots rearranged based on content (close-ups of faces). Technique: direct visualization.
    • Shots and their length. Technique: direct visualization and bar chart.


  • Frame
    • First and last frames from each shot compared. Technique: direct visualization.
    • Average amount of visual change in each shot. Technique: direct visualization (2nd frame) and bar chart.
    • Average amount of visual change in each shot. Technique: direct visualization (juxtaposition) and bar chart.
    • Frames organised by visual property. Technique: direct visualization.

Also, Manovich uses contextual notes to draw conclusions from the visualisations. His findings are often compared to facts from the history of cinema, Vertov’s early manifestos, and previous studies, confirming or contradicting them.

Project Concept

This prototype will compare two movies utilising some of the methods from “Visualizing Vertov.” It will combine traditional visualization techniques — charts using 2d primitives — and the “direct visualization” approach.

As for the data, it will use a specific component from films: sound. Because Manovich’s research relies largely on visual artefacts, using sound in this prototype might reveal limitations of his method or point out to new ways of applying it.

Besides, the Cinemetrics project focuses exclusively on a single aspect of movies: ”In verse studies, scholars count syllables, feet and stresses; in film studies, we time shots.” [@cinemetrics] This approach seem to underestimate other quantifiable data that could be used in movie studies.

In spite of using sound instead of time or visuals, this prototype will keep the analytical aspect of Cinemetrics and “Visualizing Vertov.” Then, it will draw conclusions on this approach compared to the supercut method.

To sum up, these are the questions raised by this prototype:

  • Is it possible to apply the “direct visualization“ technique to a non-visual artifact?
  • Which patterns and insights can a sound analysis of a movie reveal as compared to a visual analysis?
  • What are the results of an analytical method as compared to the supercut technique?

Research Methodology


All levels of analysis in “Visualizing Vertov” — movies, shot lengths, shots, and frames — utilise comparisons.This device is largely employed in traditional data visualization, and seems to be even more useful for cultural artefacts. In Vertov’s case, for instance, the shot length measurement would not provide any insight if it was not compared to the Russian avant-garde or the 20th Century movies in general.

Following the same logics, this prototype takes an European movie, Wings of Desire (Der Himmel über Berlin, 1987), by German filmmaker Win Wenders, and its American remake, City of Angels (1998), directed by Brad Silberling.


The following diagram shows the development steps for this prototype:


The digital copies of the movies utilised in the first step were not high quality ones. Also, the process by which the data was gathered do not preserve a high resolution sample. Those are limitations of this prototype, which focused on a rapid technique to extract and compare sound data. They will affect the visualization decisions as well.


The data exported as XML was read and visualised using D3, a javaScript library for data visualization. D3 provides a fast and reliable way to parse and represent large amounts of data in web documents. Web pages are also able to natively embed media such as sound. Those are the reasons why the final result of this prototype is a web page.

First Iteration

The first iteration of this prototype is a simple area chart depicting the sound variations from the movies. Because of the low quality of the sources, it utilizes a relative scale for each movie: the higher value of each film is represented as the higher point of each scale, and all other values are relative to that.

For this reason, the peaks from each movie might differ in absolute value of decibels. In conclusion, this parameter should not be used for comparison.


Some visual disparities seem to appear in this first iteration: longer areas with high volume in Wings of Desire versus constant medium-volume ones in City of Angels.

However, the sound volume does not seem to provide many insights. Are these variations due to background music? Which patterns represent dialogues? The representation is so highly encoded that leaves no way to answer these questions.

Second Iteration

In order to add some more clarity, the second prototype includes visual marks to represent the parts of the movies when dialogues occur. A computational method to differentiate music from speech would be laborious and not reliable. The solution was to use part of the first prototype developed for this project, which parses subtitles files into different formats. The software generated JSON files with timestamps of the subtitles. Like the sound data, they were read and visualised using D3.


This new iteration seems to shed some light on how the movies compare. While City of Angels shows a constant use of dialogues, the subtitle marks in Wings of Desire have long blanks in several parts of the movie.


The presence of the red marks also help us understand the sound representation. By comparing the two, it is possible to see that the alternating medium-low volume represents dialogues, while the constant and higher areas might indicate background music.


Even though this iteration offers more insights about the movies, most of them are not yet verifiable. The tool does not let us access the sound itself.

Third Iteration

The last iteration of this prototype embeds the sound extracted from the movies into the page. It is controlled by a slider that matches the horizontal dimension of the charts.

This final representation is analogous to the shot length one in “Visualizing Vertov.” It combines a 2D chart with the artefact itself, trying to bridge the gap between the two.


Findings and Next Steps

Though the third iteration of the prototype includes the sound, it does not achieve the same results as the display of frames in “Visualizing Vertov.” The access to the sound is mediated through a GUI element, which creates an extra step. On one hand, the overview of the “media collection” (sound fragments) is only accessible through the chart. And on the other hand, the access to the sound through the player does not provide an overview, since it is not possible to listen to all the sound fragments at the same time. As opposed to what happens in Manovich’s visualizations, those two representations were not combined.

Nevertheless, the sound analysis does seem to provide some useful insights for movie studies. Even though this rough prototype relies on low-res sources and the support of subtitle files, yet it reveals some interesting patterns. An analysis of other sound components might include some more interesting findings.

At last, this prototype shows a radically different approach compared to the supercut technique. The analytical method of translating time to space and reducing sound to visuals have a very less entertaining result. In conclusion, the next steps for this project are:

  • Find out the communication purposes of it — data visualization as an analytical tool? A media? A language?
  • Continue the explorations of media collections using a “direct visualization” approach.
  • Expand this approach, as much as possible, to media and representations not yet explored by the Software Studies Initiative.


“Cinemetrics – About.” 2014. Accessed October 10. index.php.

Manovich, Lev. 2010. “What Is Visualization?” Paj:The Journal of the Initiative for Digital Humanities, Media, and Culture 2 (1). index.php/paj/article/view/19.

———. 2013. “Visualizing Vertov.” Russian Journal of Communication 5 (1): 44–55. doi:10.1080/19409419.2013.775546.

[Thesis 1] Peer Interview

What is my concept and motivation so far. Text by Evan:

He thinks these is more space to visualize data instead of old school way- charts. So far his investigation has been through a movie editing tool he started to develop, and also a book that mashes up two different movies, made using the same tool.

He expects his coming project to turn into both a software to create experimental artwork and a practical tool for video editors. Therefore, he will start to reach out some video editors as his targets users. Other related users mashup communities on youtube. In addition, he might try to collect comments, experience from viewers and other professionals, like sound engineers, data visualization researchers, film and media researchers, computer vision experts, critics, new media artists, software developers, musicians, etc.

The biggest challenge so far for him is that he feels like he has a too broad and maybe purely theoretical question. He’s concerned with data visualization as a language, and he feels the need to push it to a direction different from the current one.

However, the problem is, that might be his concern only. As Scott said, our projects should be relevant and answer the question: “who cares?” and he cannot answer that yet. That’s certainly his main struggle.

Therefore, he is taking the movies project as a first iteration of his broad idea. It was planned as a sort of ‘direct visualization’ tool, so it fit into his main question.

From that it might be easier to find people who care — editors, movie makers etc.

[Thesis 1] Domain and Communities


That was Evan‘s misspelling, but I love it.

Yesterday we made a workshop to map the social aspect of our projects. We split into pairs and helped each other find the areas related to our work.
In the end, we should find common areas between the two. Because Evan‘s project and mine doesn’t seem related, we tried to link them by categorizing the areas instead.
The spreadsheet below is the result of our investigation:

Student Evan Gabriel
Subject Fear. More specifically, the one derived from language barriers. Because the project is based on her own experience as an international student in NYC, she might narrow it down to English language problems. Dada Visualization. Feel the need for new tools other than charts, to visualize data. So far his investigation has been through a movie editing tool he started to develop last semester.
The first people she might talk to is a group of students from an English class she met during Summer. He might finish this tool and test it with people involved in producing experimental videos.
Target Users students video editors
Secondary Users parents people who post mashups on youtube
friends viewers
Experts psychologists data visualization researchers
sociologists film and media researchers
linguists sound engineers
ISS (International Students Services) advisors computer vision experts
ESL (English as Second Language) teachers critics
software developers who work with language applications software developers
Analogous translators musicians
Artists Movie directors whose work is about language problems new media artists


In the end we found out that we share at least two domains: language and video.

[Thesis 1] Been there, done that.

Some precedents from the previous MFA DT thesis shows that I find related to my domain:

Data Rush

1. Data Rush
Apon Palanuwech, 2014

1.1. Description
Critical piece about the use of data by tech companies. Users are asked to navigate through Facebook and/or Google pages using their phones. This interaction triggers the gears of the interactive installation. A digital display shows the amount of user data being collected by the companies. After a certain number of bytes, a candy is released from the piece to the user.

1.2. Connection
I like how Apon’s project utilizes means other than charts to tell a data-related story. The analogy with the Gold Rush is historically consistent and effective in terms of delivering his message.
Also, user interaction and system feedback work together to craft an interesting narrative.

Qualified Life

2. Qualified Life
Fei Liu, 2014

2.1. Description
Another data-related critical piece. Fei Liu satirizes the ‘quantified self’ trend with a corporate machine aimed to increase employees productivity. Users have to perform a series of physical activities dictated by the machine in an ironically authoritative manner.

2.1. Connection
‘Qualified Life’ succeeds in provoking reflection on the self-tracking trend. The machine’s lines are subtle in revealing its critical tone. In addition to the demanding physical interaction, they make the installation also entertaining.
Like in ‘Data Rush,’ metaphor and interaction create a well-crafted narrative.

Skylines 03

3. Skylines
Patricio González Vivo, 2014

3.1. Description
Series of projects exploring how the tools we use shape our vision of the world. Skylines 03 is a set of postcards that show skeleton-like image of NYC. The images were constructed using the depth information from the Google Maps Street View.

3.2. Connection
‘Hacking’ the Google data is not only an ingenious trick, but also a very consistent statement on Patricio’s point on how our current tools impose a single view on the world. It also creates an interesting connection between representational art pre-20th century and new technologies.

[Thesis 1] Background and Domain

  • Your favorite project (it doesn’t have to be your best, but that would be great too).
  • A few slides depicting things you know how to do really, really, well.
  • Technologies you love.
  • People you admire.
  • A piece of content (media, sound, writing, etc.) that relates to the field or domain you want to situate your thesis in.
  • Precedents for the kind of work you want to make.

All slides here.