444 machine-generated haiku from lost dog listings

About


Intention | Process | Caveats | Reflections | Speculations

Mountain dog,
for one night you can rest
among bush clover
—Basho

'Lost Dogs' is an experiment in computationally-generated poetry, created using GPT-4, Datasette, and Python along with HTML, CSS and JavaScript. It uses as its source material hundreds of lost dog postings on Craigslist.

Intention

'Lost Dogs' is an exploration of what kind of collaboration with AI is possible or desirable in its current state. This experiment is not intended as either an endorsement or a criticism of AI. I wanted to mark one phase of the development of Large Language Models (LLMs) and test ideas about their use, as these technologies stand in 2024. Readers can evaluate the quality of results for themselves, comparing the generated poem to the source text that was used to produce it.

The assumption behind this project is that, as musician and artist Holly Herndon claims, "the model is the art" when it comes to generative AI.[1] The poetic content of the text contained on this site, if there is any, originates from two sources: the lost dog listings, and the collective body of previously written haiku from Basho to the present day that GPT has ingested. The queries have been engineered to emphasize the former.

If lost dog postings on Craigslist contain a kind of latent poetic content - the desire for something that has been lost - the test here is to what extent GPT can do something to distill that desire. I am influenced by Anne Carson's notion of lack as intrinsic to a certain kind of poetry - a "radiant absence" between the lover and the beloved, "deferred, defied, obstructed, hungry." [2] Other inspirations were: as a writer and a programmer, I have an interest in the kinds of formal experiments the Oulipo group schemed long before GPT came along.

Does GPT's capacity for automatically remixing language and computationally generating novelty add something to the listings? The results, as cliche or amateurish as they are, can sometimes in interesting ways echo, summarize, or mutate the loss that comes from the original source material. But the limitations of GPT to produce quality poetry are also on display here - more about that below.

While the tone of this website's design is deliberately nostalgic and retro, the intention is never to make light of the distress of dog owners who have lost their dogs, or dogs who have been lost. I am a dog owner and a dog lover, and have lost (and thankfully, recovered) a dog myself, which is exactly why I chose this subject matter for the project.

As a writer of short fiction, this project appealed to me since each lost dog posting is a kind of narrative, however minimal and unresolved. A few good examples of this are poems 0146, 0166, 0096, and 0265. The lost dog haiku made by GPT are probably closer to 'microfiction' or 'nanofiction'[3] than traditional haiku, which starts from meditation on a given moment in time and inspirations from undomesticated nature.

Then again, haiku do tend to suggest micro-narratives. The Basho poem above is an interesting example: though it mentions a dog, it is a "mountain" dog - perhaps a "lost" one or more likely, an undomesticated one who has never been "found" - who encounters a nature they are estranged from. Why would the mountain dog only rest in bush clover for "one night"?

Is it possible to ask those kinds of subtextual questions of a GPT-generated poem? Part of my intention here was to look at writing produced without intention, and examine the results. Can you do a "close reading" of an author-less GPT poem? Since some critical approches would find it beside the point what Basho "meant" by "one night" and instead emphasize what the reader can make of such an arrangement of words, maybe this is not a problem.[4] Even so, the results of a GPT poem might not make it to a threshold of interest for other reasons.

Process

The process of creating 'Lost Dogs' was a combination of human curation and algorithmic operations. I hand-selected each lost dog posting, searching Craigslist sites across the United States. I sometimes edited images or typed in text from the image.

This was a time consuming process where the bulk of the work took place. It would have been possible to speed this up dramatically, by writing page-scrapers that would automatically harvest postings. But finding and selecting appropriate listings felt like the most important part of the process. I tried to pick listings that had details that went beyond the very basics of location, name and time when the dog was lost.

I did use a page scraper, made using Octoparse, to pull content from the webpages after I had hand-selected them. I also used a Python script (written with the help of Github Copilot's AI code helper, which I use on pretty much everything these days) to quickly download the images. I checked and sometimes edited the images in Photoshop to crop or clean up, or blur out human faces for privacy.

The listings were then fed in batches to GPT-4 using Datasette to generate the haiku. (This whole project was in part, inspired by a demo of Datasette's poetry-generating capabilities by its creator, Simon Willison, at the NICAR data journalism conference.)

I went in three batches, generating the poems from listings and trying to get better results.

The first 50 were generated using a very basic prompt to make a haiku from the listing title and content. For the next 50, I used two different prompts with additional cues for GPT-4 to make use of detail and vary its tone. I then selected between the two for the best results.

For the remaining 344, I ran three different prompts to generate variations, and then picked the one that seemed to fit the idea of the project. No other editing of the poems was done.

I then exported the listings plus poems to a SQLite database, which drives the Flask web application written in Python and HTML. I styled the website using CSS, and added some interactivity and functionality using JavaScript. Finally, I transformed the site to plain HTML files using Flask Freeze, and uploaded the files to a server.

Caveats

The above-mentioned questions around authorship and criticism have taken on an entirely different light with the advent of truly author-less work.

Serious questions of ownership and appropriation that are increasingly engaged with by writers, visual artists, and any producers of intellectual or creative content when it comes to AI. Since these technologies are recombinant, they rely utterly and completely on all previously ingested data to produce novelty. Humans do as well, but we have the important distinction of being the owners of our own subjectivity and lived experience, around which everything we create is consciously and unconsciously organized.

Artists are increasingly taking positions against AI categorically, seeing in it the latest threat from a rapacious techno-capitalism that treats their work as worthless grist and demonetizes creative labor. Ironically, current information capitalism depends on a kind of information communism - it demands free access to intellectual property in one category ("content"), while jealously guarding its own intellectual property ("the platform").

Since, as explained above, I see the 'model as the art' when one is using generative AI, the "creative" labor in this case was my selection of the lost dog postings as I manually copy-pasted from Craigslist and the design of the website. I do not consider myself the author of any of the language in this project, or even consider it "writing."

Reflections

GPT occasionally does an uncanny job of extracting a decent poem from the Craigslist postings. When it produces something that reads like something a human would write, one is tempted to attribute some kind of mind, or magic to the process. This is a pretty widespread response to the technology.

Some have speculated that AI's distributed circuits and arcane numerical processes give host to demonic intelligence. Hundreds of YouTube theorists, if you search for this, will elaborate. In the Catholic Weekly, Father John Corrigan wondered in 2023 if when we ask questions to AI, "What if the thing answering in some instances is a spirit?"[5]

But if the spirit that answers our petition for a poem about a lost dog responds with the sentimental dreck that we find in poem 0314, I doubt we need to worry that much about its malevolence. More likely, we are simply disappointed at what "collective intelligence" (Holly Herndon prefers this term to "artificial intelligence" and others prefer "co-intelligence") has produced, based on the algorithms we are using to mobilize it.[6]

What the results of this experiment may highlight the most are the inherent limitations of AI, at least with current LLMs like GPT. Since they are blind to all "meaning" and don't have the ability to reflect on what is most important out of a body of text, they will miss the kind of singular detail that a human poet would hone in on.

An example of this is poem 0089. The Craigslist posting contains a striking detail: Noah, the dog, is an "emotional support dog and is needed to control audio and visual hallucinations." But GPT's poem makes no reference to this.

During the process of data collection, I started to resent the task I had created for myself. I imagined a horrible future where this is what the creative process has been reduced to: manually feeding data into a machine, in order for it to do the invention, in order for it to perform its synthetic art.

That said, the creation of models for working with AI is not necessarily a task of pure drudgery. Even the selection of lost dog postings felt like a certain kind of specialized work, more like that of an archivist than a data entry clerk.

Speculations

An interesting possibility posed by this project is: what if this kind of thing is not the beginning of the poetic capabilities of LLM's, but already its limit? In 2024, it is still impossible to tell for sure, but I have serious doubts that LLM's will ever be able to do much better at performing aesthetic feats with language than what we see in this 'Lost Dogs' experiment.

Promises of exponential growth and improvement are still somewhat dominant in the discourse around AI in late 2024. But there are many signs that the wave of hype has already started to dissipate. Progress in a certain direction, in engineering, can happen very quickly but then, come to a screeching halt when some unforeseen limitation emerges.

While fully exploring the ethics of using AI in the arts is beyond my scope here, I do think it's more complicated than saying AI is just "bad" across the board, although the threat to creative workers is very real. It may be that the very limitations of AI are what not only keep "humans in the loop" but very much required for doing anything valuable with LLM's and other AI technologies.

As critical, and skeptical, as I am about many of the uses ChatGPT and other AI is planned for, it is too easy to say that ChatGPT is only parasitic, or a perverse facsimile of the creative process. It does not "create," but it does produce novelty. As long as one understands that what it produces is always derivative, and fundamentally lacking in its own subjectivity, it can be used as a tool. Creative people and artists will make use of this tool, and it remains to be seen what shape that will take.

A start for responsible use of AI for work one is profiting from or determining ownership of is being careful about the models one uses. This is currently, hard to do since AI companies may not even know themselves what their models contain - do they include work by uncompensated artists who are being fractionally exploited, along with work that could safely be said is in the public domain.

Using AI on one's own, not only hand-collected, but self-created content goes another step towards a creative practice it's hard to argue against ethically, even if you don't care for the results. Holly Herndon, referenced above, has created models of sound and data explicitly as raw material for AI algorithms. Then sculpting the results, the AI is a kind of transformer, introducing randomness and distortion.

—Justin Allen, December 2024

Sources:

Basho. "Cats and Dogs: 18 Basho haiku and renku on cats or dogs," basho4humanity.com.

1. Weiner, Anna. "Holly Herndon's Infinite Art," The New Yorker, November 13, 2023.

2. Carson, Anne. "Eros the Bittersweet," (Princeton University Press, 1986), 20.

3. Probably the most famous example of microfiction (or nanofiction or extremely short flash fiction) is the six word story some attribute to Hemingway (though this is disputed): "For sale: baby shoes, never worn."

4. I'm referring here to the basic line Roland Barthes kicked off with the "Death of the Author" essay (1967). Barthes wanted to unseat the author and what they intended to say from a privileged position in relation to their work, and to liberate the text, opening it to multiple interpretations. However, this isn't to say that Barthes would have unquestioningly welcomed recombinant AI. He clearly admired finely crafted work, like that of Mallarme, that he thought valued the writing itself over the author.

5. Corrigan, Fr. John, "The Catholic Weekly" March 1, 2023, (Does artificial intelligence give insight into demonic activity? )[https://catholicweekly.com.au/does-a-i-give-insight-into-demonic-activity]

6. Mollick, Ethan. "Co-Intelligence: Living and Working with AI," (Penguin, 2024).