Data Seeks A Story

Digitization has enabled the accumulation, storage, and manipulation of enormous amounts of data.  The numbers involved are mind boggling and we’re becoming familiar with ever larger orders of magnitude (remember when a gigabyte was a big deal?).  And we’ve been hearing similar claims long enough now that we hardly notice when someone like Google CEO Eric Schmidt tells us that every two days we create as much information as we did from the beginnings of civilization until  2003.  And, of course, we are told that the pace will only quicken and we will keep achieving ever larger orders of magnitude in data production.

So the question seems to be, what do we do with all of this data?  A good deal of it is of little or no value, and so filtering through it presents a significant challenge.  Representing data meaningfully can also be a challenge and here visualization can be quite helpful.  A couple of recent instances of visualized data come to mind.  The first is Google Lab’s Books Ngram Viewer.  The Ngram Viewer allows a user to search a database of  digitized books published from 1500 to the present for particular words or phrases.  The Viewer then generates a graph plotting the frequency with which the words or phrases have been used during a particular time period.  So for example, here is a graph tracking the occurrences in English books written between 1700 and 2010 of the names of three philosophers —  Rene Descartes, John Locke, and Thomas Hobbes:

One of the limitations of the approach comes to mind when you can’t be sure if a movement in the mentions of “John Locke” is owed to greater interest in modern political philosophy or a certain television character (or, more interestingly, both together).

Here is another graph, this one plotting the use of the words nostalgia and Nostalgia (the search is case sensitive) just because I’m intrigued by the idea:

Another recent and more elegant instance of visualized data comes from Facebook.  The graphic below was generated by potting lines representing a sampling of FB friendships.  What is most fascinating about this graphic is that no independent lines representing the continents were included, all shapes emerged from the data:

So we have two instances of data rendered intelligible, at least let us say manageable or usable. But there is still another question, what does it mean?  How do we interpret the data.  The charts and image above represent a tremendous amount of data, but what do we make of it?  That still requires judgment, context, and a story.  This is more or less the point Katherine Hayles makes in her response to Ed Folsom’s “Database as Genre:  The Epic Transformation of Archives.” Here are some comments I’ve taken from Hayles’ essay which apply rather well to both cases, especially to Google’s Ngram Viewer (keep in mind her responses are to Ed Folsom who maintains the Walt Whitman Archive online):

What it means that Whitman, say, used a certain word 298 times in Leaves of Grass while using another word only three times requires interpretation—and interpretation, almost inevitably, invokes narrative to achieve dramatic impact and significance …

These structures imply that the primary purpose of narrative is to search for meaning, making narrative an essential technology for human beings, who can arguably be defined as meaning-­seeking animals …

Manovich touches on this contrast when he perceptively observes that for narrative, the syntagmatic order of linear unfolding is actually present on the page, while the paradigmatic possibilities of alternative word choices are only virtually present. For databases, the reverse is true: the paradigmatic possibilities are actually present in the columns and the rows, while the syntagmatic progress of choices concatenated into linear sequences by SQL commands is only virtually present …

No longer singular, narratives remain the necessary others to database’s ontology, the perspectives that invest the formal logic of database operations with human meanings and that gesture toward the unknown hovering beyond the brink of what can be classified and enumerated.

In other words, data seeks a story because humans seek a story — it’s our primordial way of navigating the increasingly dense forest of data.  It is also worth bearing in mind Jerome McGann’s observations regarding databases (also in response to Folsom):

No database can function without a user interface, and in the case of cultural materials the interface is an especially crucial element of these kinds of digital instruments. Interface embeds, implicitly and explicitly, many kinds of hierarchical and narrativized organizations. Indeed, the database—any database—represents an initial critical analysis of the content materials, and while its structure is not narrativized, it is severely constrained and organized. The free play offered to the user of such environments is at least as much a function of interface design as it is of its data structure—whether that structure be a database structure or, as in the case of The Walt Whitman Archive, a markup structure . . .

Bottom line:  The interface is not neutral and for that matter neither is the data because it has already been tagged and marked up in certain way when the database architecture was designed and the information entered in accordingly.

If databases and interfaces that give us access to the immense amount of information being digitized are going to be useful to us, we need to make sure we understand the embedded limitations so that these limitations do not become immense blind spots for us as we try to do what we must always do with information — make a story out of it.  And the making of the story, a basic human drive, requires an awareness of context, judgment and discernment, and a certain wisdom that, as of yet, the database and clever, even elegant, means of representing the data stored in them, are not by themselves going to bring to the task.  It may be worth remembering the old adage, information is not knowledge and knowledge is not wisdom.

Jacques Lacan, Jansenist?

If I imagine a Venn diagram consisting of one circle representing those interested in Jacques Lacan (a modest circle), and another representing those who read this blog (a rather tiny circle), then the overlapping area probably includes one person … if I count myself.  Nonetheless, I’ll post this anyway.

In conversation with a friend I was made aware of an article that contains this intriguing anecdote (if you’re in that overlapping area in the Venn diagram):

Jan Miel was, he says, the first to propose translating a text of Lacan’s into English and as a result had been invited to lunch in his country house in Guirrancourt, not far from Paris.  After the meal during a stroll in the garden Lacan turned to him and said:  ‘You are neither an analyst nor an analysand, so why are you interested in my teaching?’.  Miel found it difficult to answer because, he admits, he really did not know what he found so fascinating in Lacan’s work, so he eventually stammered:  ‘Well, my main interest is in Pascal.’  To which Lacan replied, ‘Ah, I understand’ and led him back to his library where he showed him a quite substantial collection of Jansenist books.  So if reading Lacan leads to Pascal, it appears that reading Pascal may also lead to Lacan.

“Ah, I understand” — loved that, and wondered how many times those same words were uttered in a Lacan seminar!

The article goes on to explore the use Lacan makes of Pascal’s Wager and presents some helpful background material on the Wager.  Be warned though, some math is involved.

Agitate for Beauty

One of the convenient consequences of posting one’s thoughts on a blog is that readers (the happy few in my case) will send along links to interesting ideas or stories related to what I’ve written.  Yesterday I wrote about resisting the temptation to communicate thoughtlessly and artlessly via digital media and pushing back against the pressures for more efficient, mechanical, and soulless communication.  In response I received a link to a post titled, “How ‘EOM’ Makes Your Email More Efficient.” (h/t:  DFR)

EOM, for the blissfully uninitiated, is short for “End of Message.”  The idea is pretty simple: turn your email subject lines into the actual content of the message and add on “EOM” so that the recipient knows they don’t need click through to read the body.  This saves you the time of writing a subject line and a greeting and a body and a closing.  It also saves the recipient the effort of clicking through to the main text of the email.  But wait there’s more!  Actually there are TEN listed benefits to EOM-ing (might as well — texting, emailing, Facebooking, Twittering, friending —  in our exciting, transgressive times nouns become verbs!).  Other advantages include:  if you do it, others will do it too and EOM encourages 100% readership!

All very efficient to be sure.  Reading the cheerfully and engagingly written post I was almost convinced this was a wonderful, life-changing practice.  Okay, dropping the sarcasm, I get it, seriously.  There are certain exchanges that happen over email that do not need to be packaged in the style and form of a royal proclamation or a papal encyclical.  Fine, fair enough.  And to their credit, one of the advantages listed is that you encourage more face-to-face communication.  If you can’t say it efficiently via email, then maybe you just need to go talk to the person (pause for audible gasp).  Great, that would be wonderful (unless our face-to-face adopt the syntax and style of our online communication).  The work place is busy, hectic, stressful; easing the demands of always online work life is commendable.

But (you knew it was coming), there is still this lingering fear that the ideals of efficiency and instrumentality, perfectly appropriate at some points and in certain contexts, will spread into realms of human communication where they ought properly to be unwelcome and shunned.  Yet, efficiency and instrumentality are alluring ideals that make few demands and promise great rewards, and so they insidiously infiltrate and colonize.

Sometimes I wonder if we are not operating under the unspoken assumption that perfect communication is something like the telepathic communication depicted in science fiction and fantasy.   That would be efficient indeed.  No words, no sounds, no effort.  No risk, no charm, no beauty.

So my tendency is to resist the push for increasing efficiency and instrumentality in our communication; not because I fail to see the advantages, but precisely because I recognize their appeal.  I tend to think Goethe was right, “We should do our best to encourage the Beautiful, for the Useful encourages itself.”  Agitate for beauty.

I’ll leave off with another poet, W. H. Auden, who also knew a thing or two about language, beauty, and responsibility.

As I listened from a beach-chair in the shade
To all the noises that my garden made,
It seemed to me only proper that words
Should be withheld from vegetables and birds.

A robin with no Christian name ran through
The Robin-Anthem which was all it knew,
And rustling flowers for some third party waited
To say which pairs, if any, should get mated.

Not one of them was capable of lying,
There was not one which knew that it was dying
Or could have with a rhythm or a rhyme
Assumed responsibility for time.

Let them leave language to their lonely betters
Who count some days and long for certain letters;
We, too, make noises when we laugh or weep:
Words are for those with promises to keep.

(“Their Lonely Betters”)

Obama Talks With A Computer

[Correction:  Mr. Scocca informs me via email that the dialog in his piece was an actual transcript of a session with Eliza.  So nothing “mock” or “contrived” about it.  All the more interesting, read on.]

Over at Slate, Tom Scocca has staged a mock dialogue with Eliza to good, even if somewhat contrived, effect.  Eliza, in case you’re wondering, is a computer program designed in the 1960’s to simulate human interaction based on certain scripts (such as DOCTOR which simulates a particular style of psychotherapy).

About half way through the dialogue we get these few lines which more or less encapsulate Scocca’s main concern:

TS: On Twitter, I am having a hard time telling the bots from the humans.

Eliza: How long have you been having a hard time telling the bots from the humans?

TS: Weeks and weeks now.

Eliza: Please go on.

TS: It is not because the bots are fluent or facile.

Eliza: Do any other reasons not come to mind?

TS: It is because so much of the human conversation has come down to bot level.

And a few lines later he adds,

TS: That’s not really someone who likes my blog post. It’s a pharma ad-bot that presumably grabs anything with “Obama” in it.

Eliza: Why do you say your blog post its a pharma adbot that presumably grabs anything with obama in it?

TS: Well, of course, by now I have been trained to put “Obama” into the headlines myself, for SEO purposes.

Eliza: Can you elaborate on that?

TS: My actual transactions with human readers are so machine-mediated, it’s necessary for me to keep machine-logic in mind while I’m writing.

I’m taking these observations as rather useful illustrations of how the language (or, logic) of a digital media platform shapes our communication to fit within its own limitations.  Borrowing linguist Roman Jakobson’s maxim regarding languages, I suggested a few posts down that, “Languages of digital media platforms differ essentially in what they cannot (or, encourage us not to) convey and not in what they may convey.”  In other words, we shape our communication to fit the constraints of the medium.  The follow up question then becomes, “do we adapt to these limitations and carry them over into other fields of discourse?”  Scocca provocatively suggests that if a computer ends up passing the Turing Test, it will not be because of an advance in computer language capability, but because of a retrogression in the way humans use language.

Keep in mind that you don’t have to be a professional writer working for a popular web magazine to experience machine mediated communication.  In fact, my guess is that a great deal, perhaps the majority, of our interaction with other people is routinely machine mediated, and in this sense we are already living in post-human age.

The mock dialog also suggests yet another adaptation of Jackobson’s principle, this time focused on the economic conditions at play within a digital media platform.  Tracking more closely with Jackobson’s original formulation, this adaptation might go something like this:  the languages of digital media platforms differ essentially in what their economic environment dictates they must convey.  In the case of Scocca, he has been trained to mention Obama for the purposes of search engine optimization, and this, of course, to drive traffic to his blog because traffic generates advertising revenue.  Not only do the constraints of the platform shape the content of communication, the logic of the wider economic system disciplines the writing as well.

None of this is, strictly speaking, necessary.  It is quite possible to creatively, and even aesthetically communicate within the constraints of a given digital media platform.  Any medium imposes certain constraints; what we do within those constraints remains the question.  Some media, it is true, impose more stringent constraints on human communication than others; the telegraph, for example, comes to mind.  But the wonder of human creativity is that it finds ways of flourishing within constraints; within limitations we manage to be ingenious, creative, humorous, artistic, etc.  Artistry, humor, creativity and all the rest wouldn’t even be possible without certain constraints to work with and against.

Yet aspiring to robust, playful, aesthetic, and meaningful communication is the path of greater resistance.  It is easier to fall into thoughtless and artless patterns of communication that uncritically bow to the constraints of a medium thus reducing and inhibiting the possibilities of human expression.  Without any studies or statistics to prove the point, it seems that the path of least resistance is our default for digital communication.  A little intentionality and subversiveness, however, may help us flourish as fully human beings in our computer-mediated, post-human times.

Besides, it would be much more interesting if a computer passed the Turing Test without any concessions on our part.

Oh, and sorry for the title, just trying to optimize my search engine results.

Breaking the Spell

In the not too distant past there were a series of Visa Check Card commercials which presented some fantastical and whimsical shopping environment in which transactions were processed efficiently, uninterruptedly, and happily thanks to the quick, simple swipe of the check card.  Inevitably some one would pull out cash or attempt to use a check and the whole smooth and cheerful operation would grind to a halt and displeasure would darken the faces of all involved.  For example:

Cynic that I tend to be, I read the whole campaign as a rather transparent allegory of our absorption into inhuman patterns of mindless, mechanized, and commodified existence.  But let’s lay aside that gloominess for the moment, it is near Christmas time after all and why draw unnecessary attention to the banality of our crass … okay, no, I’m done really.

But one other, less snarky observation: These commercials did a nice job of illustrating the circuit of mind, body, machine, and world that we are all enmeshed in.  This circuit typically runs so smoothly that we hardly notice it at all. In fact, we often tend to lose sight of how deeply integrated into our experience of reality our tools have become and how these tools mediate reality for us.  The emergence of ubiquitous wireless access to the Internet promises (or threatens, depending on your perspective) to extend and amplify this mediation exponentially.  To put it in a slightly different way, our tools become the interface through which we access reality.  Putting it that way also illustrates how our tools even begin to provide the metaphors by which we interpret reality.

Katherine Hayles drew attention to this circuit when, discussing the significance of embodiment, she writes,

When changes in [embodied] practices take place, they are often linked with new technologies that affect how people use their bodies and experience space and time.  Formed by technology at the same time that it creates technology, embodiment mediates between technology and discourse by creating new experiential frameworks that serve as boundary markers for the creation of corresponding discursive systems.

Translation:  New technologies produce new ways of using and experiencing our bodies in the world.  With our bodies we make technology and this technology then shapes how we understand our bodies and this interaction generates new ways of talking and thinking about the world.

But as in the commercial, this often unnoticed circuit through which we experience the world is sometimes disrupted by some error in the code or glitch in the system.  We often experience such disruptions as annoyances of varying degrees.  But because our tools are an often unnoticed link in the circuit encompassing world, body, and mind, disruptions emanating from our tools can also elicit flashes of illumination by disrupting habituated patterns of thought and action.  Hayles again, this time writing about one of the properties of electronic literature:

. . . unpredictable breaks occur that disrupt the smooth functioning of thought, action, and result, making us abruptly aware that our agency is increasingly enmeshed within complex networks extending beyond our ken and operating through codes that are, for the most part, invisible and inaccessible.

Thinking again about Arthur C. Clarke’s Third Law, “any sufficiently advanced technology is indistinguishable from magic,” we might say that disruptions and errors break the spell.  And depending upon your estimation of the enchantment, this may be a very good thing indeed, at least from time to time.