The common saying about tools is that we shape them and then they shape us. Continuing this recursion over time leads to the inevitable conclusion that eventually we become one with our tools. One technology where such a runaway process of merging with the tool has occurred is photography. We recently turned our entire planet into a cosmic camera to photograph a black hole in a neighboring galaxy.
Going beyond the metaphor, a closer look at today’s computational photography and social media reveals that algorithms are taking over the photographic medium and humans are acting as inputs and outputs in a planet-scale photographic apparatus.
Companies like Instagram and Snapchat have built some of the components of this apparatus. Snapchat creates social media software but defines itself as a camera company. Snapchat went public in 20171 and described its vision for the future as follows.
In the way that the flashing cursor became the starting point for most products on desktop computers, we believe that the camera screen will be the starting point for most products on smartphones. This is because images created by smartphone cameras contain more context and richer information than other forms of input like text entered on a keyboard.”
Taking a cue from Snapchat’s mission, this essay is an exploration of the origins of photography, the current state of the medium, a brief survey of the frontier, and finally the historical struggle between the text and the image.
Photography was not born with the invention of the camera. The camera obscura was already known to paleolithic humans who occasionally saw images of animals projected through tiny holes in their tents. The phenomenon of the pinhole was documented in China as early as 1000 BC and Aristotle and Euclid made comments on it, circa 300 BC. The phenomenon is so common that it occurs quite naturally such as when a solar eclipse is projected through holes in the canopy of a tree, or an entire building is projected onto a wall through a hole in the roof as in the image below.
Photography begins with the ability to fix the image onto the projected surface permanently. The first attempts were made in the early decades of the 19th century, all relying on the reduction of salts of silver to metallic silver -which is black in color - by the exposure to light.
I was initiated in that old art of fixing light onto silver in the mid-90s at age 10 in India. My school had built a new darkroom that year and equipped it with all manner of optical and chemical paraphernalia which fascinated me to no end. I would eagerly look forward to Wednesdays when we would queue up outside the dark room and wait solemnly at the door.
It was unusual for a bunch of 10-year-old’s to behave even under the fear of authority, but here upon entering a portal into a dark magical realm awash in a red glow, you would forget the burdens of school life. A dark room with a red lamp was one of the most awe-inspiring environments I experienced in my youth. It summoned for me the daemons of creativity from the netherworld.
I loved creating photograms in the dark room. Photograms are more like making a drawing in Photoshop than taking pictures with a camera. You take ordinary objects like leaves, keys, combs, anything you want and place them directly on a photosensitive paper under a device called an enlarger. Light from the enlarger exposes the image onto the paper below. The variation of transparency and shadows from the objects creates ethereal almost-3D images.
The paper still looks blank white after removing the objects and must be treated with several chemicals to create the image. I still vividly recall the acrid inorganic odor of those chemicals and how bone-chilling they were to the touch, like cold acetone.
Our teacher would prepare four large and shallow trays with various solutions for processing the image. The first tray contained the developer which is a chemical that turns the silver halide of the paper into a black silver powder wherever it has been exposed to light. I would submerge the paper into the cold aqueous solution and watch the image come to life as if from another dimension. A few seconds too long here and the paper could turn all black.
I would anxiously move the paper to the next tray containing the stop bath. Here the image would stop developing further into black. A quick dip in the third tray would fix the image permanently on paper. Finally, after a wash under cold water, the paper would be left to hang from a wire, like fresh laundry.
The dark room is all but history today, but its legacy still continues in the digital world. Menu items in tools like Photoshop such as exposure, burning and dodging, filters, developing and toning, all originated as physical processes in the dark room.
The entire business of the dark room now exists as algorithms in software. All photographic apparatus, save for the rudimentary pinhole and glass, is reduced to information processing.
The abundance of pixels
The creative process requires trial and error. To be creative you should be allowed to create waste. The writer needs a wastepaper basket. The novel as an art form only took off after writers could afford to waste paper.
In the days of the darkroom, you could not afford to waste film, paper or chemicals. Creative photographers still existed and made incredible images but they were usually journalists or the affluent class who could afford to create waste in the medium.
Despite my waxing poetic about the dark room, it was a rather constraining environment. The high cost of materials was not the only limiting factor. The long time to go through the trial and error loop made experimenting with new ideas rather tedious especially for those photographers who did not develop their own film.
The field of experimental photography was missing its most crucial element - the amateur. In every art form, radical new ideas come from hackers and tinkerers working at the avant-garde. You need a large number of amateurs for a few breakthrough ideas to succeed. In photography, this happened after the mass market availability of affordable digital single reflex cameras.
In the late 1990s, the early digital cameras were promising but were widely dismissed by professionals. Sony had a line under the name Mavica which used a 3 1⁄2-inch floppy 💾 for storage. I only remember floppy drives for their frustrating unreliability.
I read an interview in the National Geographic, which was at the time the gold standard for film photography, in which a staff photographer claimed that digital cameras were interesting but would never replace the film for serious journalism. All those dismissals turned out to be the equivalent of Watson’s - “I think there is a world market for maybe five computers” - prediction.
During the early 2000s, Moore’s law held very steadily and with every year, increasingly capable DSLRs came to the market. Professionals were initially worried that discreet pixels could not match the fidelity of film, but camera resolution rapidly reached a point where the difference became insignificant. Everyone and their mother bought digital cameras.
This race for higher megapixels lasted very briefly until around the point that cameras reached 10 to 12 megapixels. Beyond that, the returns were diminishing unless you were a professional or printed images in large format. Serious amateurs started investing in lenses which were expensive since they were not subject to the efficiencies of Moore’s law.
A digital photograph is a piece of data that encodes an image which is a 2-dimensional representation of a 4-dimensional event in space-time. A camera can output almost infinite images. In practice, most of these images will be redundant either due to technical errors or because they are not distinct enough amongst the set of other images corresponding to the same event. The goal of the photographer then is to generate non-redundant images, producing information with high entropy.
Rules of the game
Photography is commonly defined as a process of making images on a surface using light. This definition is far too preoccupied with the tools of the trade and does not get at the true nature of the medium. McLuhan’s famous phrase “the medium is the message” would suggest that to understand what photographers are really doing and what photography is about we need to look outside the camera obscura and see it as a [black box](https://en.wikipedia.org/wiki/Black_box">black box).
Look at what photographers do with the camera. They change settings and turn dials. They move the camera around. They look through the camera at the world but it is not the world that interests them. They are imagining what kind of image will the camera produce in its current configuration. They are searching for the most technically correct and non-redundant image amongst the set of all possibilities.
This kind of searching and goal-seeking turns photography into a game. In his book “Towards a Philosophy of Photography”, published in 1983 roughly a decade before the digital camera, Vilém Flusser drew a remarkably astute comparison between photography and chess. Chess players search for new moves and possibilities within the space of chess games. They are looking for a non-redundant move that the opponent would not have predicted and would give them a positional advantage. Some moves are so non-redundant that they become the decisive moment of victory in a chess game and can be truly called beautiful.
The act of photography is a constant struggle against the possibilities of the camera. Every successful image is the result of a game played against the limitations of the camera such as the possible range of speed, focus, exposure, and noise. This notion of photography as a game gets at the heart of the practice of photography and should be well known to seasoned photographers. If photography is a game, then the photographer is not a creator but a player; homo ludens, not homo faber.
Imaging techniques such as HDR, the Brenizer method, and focus stacking keep expanding the possibilities of the camera. Landscapes that were photographed before HDR was possible can be photographed again to generate new non-redundant images. Portraits with out-of-focus backgrounds have been democratized by the iPhone, but the Brenizer method still yields a qualitatively different result that leaves an unexplored space in portrait photography. Focus stacking is largely an unexplored territory since it requires expensive macro lenses and a software tool such as Photoshop.
The abundance of photographs makes the unexplored space of camera possibilities shrink rapidly. For instance, landscape photography seems to be approaching saturation unless new technology opens new possibilities. As an experiment, next time you travel and photograph a landscape, search for photos of the same location across Google Images, Instagram, Flickr or other photo services. You are likely to realize that the best images are rather alike, and there is a very high probability that someone else has taken a photo exactly like yours with little variation.
It is still possible to find non-redundant images. Newly found historical images are non-redundant since fewer photos were taken in the past. War, conflict, politics, celebrity and astronomy yield non-redundant images due to their social value. The greatest source of non-redundant images is still that subject whose uniqueness we have evolved to recognize and have a large part of our brain dedicated to - the human form.
The ludic fallacy
The amateur photographer, like the apprentice in any art, develops an obsession with his tools. With increasing mastery over the camera and the editing process, a troubling realization dawns upon the photographer - the creation of interesting images is not a technical problem beyond a certain level of proficiency.
Approaching photography as a camera-centric activity reveals a ludic fallacy. Like the protagonist Knecht in Hess’ novel “The Glass Bead Game”, the photographer realizes the limits of technical mastery. Hess describes the game as the quest for finding the unity of all art and learning. Incidentally, as Flusser did with photography, Hess compares the Glass Bead Game to a version of high dimensional chess whose rules are too complex for a lay audience to understand.
In the novel, Knecht suffers a personal crisis on realizing the futility of mastery. He resigns his title as “Magister Ludi”, and throws himself into the messiness of the world. Beyond a basic level of mastery, this is the attitude needed to get better at photography.
First and foremost, a photographer must have surplus time and money. This is required not just for practicing the art, but for traveling. Finding non-redundant subjects like rare wildlife is a game of patience, and watchful waiting. Approaching non-redundant landscapes involves strenuous hiking, camping, and waking up or staying up beyond the normal hours of the day. The colorful landscape is undoubtedly the most redundant image available today and I find black and white landscapes2 with a high dynamic range far more interesting. This explains part of the appeal of Ansel Adams’ Yosemite3 pictures.
The human face is certainly the most non-redundant source of interesting images. Before the invention of photography, it would have been impossible to gaze upon the face of a stranger without a violation of privacy and a confrontation. There is a thrill in looking someone in the eye. It is fairly diminished in the photograph, but not entirely missing. It is not surprising that the portfolios4 of the masters5 of photography6 are nearly entirely filled with portraits7.
The fundamental challenge of portraiture is that you must first get access to a person willing to be photographed as a subject, and then see the person as an object without making them uncomfortable. The most non-redundant portraits require access to celebrities and politicians which is perhaps the one thing that separates the masters from the amateurs.
Amateurs often photograph people with stealth. This usually fails because a human face is less interesting when it is not directly looking at you. Amateurs also photograph people that they are already familiar with. This fails because they are unable to see the person objectively and the photo often ends up looking like something out of a family album.
The human body is as interesting a subject as the human face. All good street photography is about observing the human form as people go about their business on the street. The human form is also the subject of fashion photography, pornography, and social media, which together account for the bulk of images created and viewed by humanity.
The most technically non-redundant images create history8. They are the source of our deepest fears - wars, forbidden spaces, natural disasters, and our greatest optimism - distant worlds like planets and black holes.
The indecisive moment
“Photography is not like painting. There is a creative fraction of a second when you are taking a picture. Your eye must see a composition or an expression that life itself offers you, and you must know with intuition when to click the camera. That is the moment the photographer is creative. Oop! The Moment! Once you miss it, it is gone forever.” – Henri Cartier Bresson, 1957
The famous French photographer Henri Cartier-Bresson popularized the idea of the “decisive moment” which is roughly speaking an instantaneous realization that something is worth photographing, and then in the same moment recognizing the best way to photograph it. This was the instinct behind all great photography.
This instinct was born out of the scarcity of the medium. In a medium of abundance, all photographic practice has moved towards increasing indecisiveness and wastefulness. This is the way to making great images today.
The first kind of indecisiveness is not knowing the best moment to take a picture. The modern photographer responds by photographing everything in the hope that there could be something good in there. Cameras encourage this indecision by taking photos at increasingly higher frame rates. The iPhone even goes so far as to photograph a few seconds worth of frames before and after the shutter is pressed just in case you were not indecisive enough. The Google Pixel not only records a series of frames but even selects the best one for you. This is what a medium of abundance should look like.
Indecisiveness turns out to be the only way to reliably photograph the most decisive of all phenomenon - the lightning sprite. The process by which scientists photograph lightening in the upper atmosphere9 is something like this.
- Run a high-speed camera continuously.
- If lightning happens, release a switch.
- Save last 3 seconds of data if the switch is released, otherwise, discard data.
The second kind of indecisiveness is not knowing what kind of picture to take. Modern cameras encourage this by recording the raw voltage data coming off the image sensor and storing it as pure numeric information instead of a fixed image. This is called the RAW format and is widely supported by cameras and even smartphones today.
RAW is not merely another format like JPEG. It is a dataset from which various kinds of images could be made by a computer. A good RAW image nearly always looks overexposed since it should push as many pixels as possible to maximum brightness without entirely clipping to white. Thus the camera is reduced to an information gathering device and the goal of the photographer is not to make the best-looking image at the moment but to collect the maximum amount of light data on the sensor.
The modern workflow postpones the crucial decisions in image making beyond the camera. The greatest decisiveness exercised by most photographers is in selecting which images to keep and how to edit and present them. These workflows are easily automated by algorithms. Some of these are available for use at the mere click of a button, and increasingly they are already built into the camera unbeknownst to the photographer who simply admires how well the camera works.
The logical conclusion to this chain of progress is that the photographer is reduced to an agent which transports and points the camera at interesting events. But why stop there? Cameras can be easily mounted to drones and robot arms can already point cameras with greater accuracy and stability than humans. It is easy to imagine a world where photography does not involve the human as the camera operator, a world of self-driven cameras. The self-driven camera is already a reality. Jacqui Kenny who calls herself the agoraphopic traveler10 takes photos only using Google Street View.
But why even stop there? A photograph is expected to have some semblance to reality as the philosophers of photography like Susan Sontag and Roland Barthes have remarked, but this is a mistaken notion. Neither of them was an avid practitioner of the art and could not have imagined a future where photography would largely be a matter of computation. All photographs are abstract models of reality. Increasingly hyper-realistic video game worlds are becoming rich sources of new non-redundant imagery.
And it won’t stop there. Modern machine learning tools like GANs can generate realistic landscapes and faces indistinguishable from reality, perhaps even an improvement on reality and at any rate no less interesting.
Machine learning is also taking away the burden of figuring out the decisive moment. With tools like Instagram people become a feedback loop for discovering non-redundant images through the action of millions of thumbs scrolling tirelessly through photos - halting here, liking there - all feeding back to the algorithm. The decisive moment in photography is the thumb scrolling through Instagram and the photography workflow is not really complete until the image is uploaded. We are all part of a giant camera obscura in the cloud. If a photo is uploaded to the internet and no one likes it, does it exist?
Ultimately the one constant in all of photography and art that seems at no risk of being replaced by the computer is the audience. A work of art must finally matter to someone. Until we have AGIs - and we are nowhere near it despite all the AI hype - the human audience matters. They matter because they have short attention spans, they easily get bored, and they die.
The image precedes text in the history of human communication. The ancients created sculptures, images of deities and cave paintings. In the old world largely devoid of information, an image must have been a magical entity, something not found in nature and yet signifying something new about the natural world. It would have been very easy to fall under the spell of an image.
One form of these early images are explored in Herzog’s documentary “Cave of Forgotten Dreams.” In the scene where we are first introduced to the cave paintings of bison and horses, it is possible to imagine walking through the dark and narrow chambers of the cave with the light of a fire.
Imagine you are a young child in a band of hunter-gatherers in the Paleolithic era more than 30,000 years ago. You have come of age and must be initiated to join a hunting party. You walk for hours into the belly of the cave led by an elder showing the way with a burning torch. The flickering light of fire makes the walls dance around you like the frames of a video on loop. The smell of soot fills the frozen air. In the end, you reach a wall with the painting of a majestic herd of Megaloceros who seem to be unaware of your presence from a distance.
Without warning the elder lets out a deafening hunting cry, so loud that you can’t distinctly hear the first few seconds, as if an amplifier was clipped. The scream echoes endlessly in the sanctum of the cave, periodically rising in volume due to the constructive interference of sound waves. An entire hunting party of a dozen men is rendered vividly in polyphonic sound.
The giant deer on the wall has been painted with eight legs, but lo and behold, the dancing fire with its chaotic tongues of flames and shadows animates the herd of deer into a frenzy. Waves of shock pulse through your body as adrenaline pumps into your blood and so begins your initiation into the band of stone age hunters.
During the stone age, this kind of visceral reaction to the image would not have been restricted to cave paintings. It is impossible for the human brain to directly process the vast quantities of information the world throws at it. The brain perceives the world in terms of models which are simplified representations of external phenomena. Our human ancestors must have modeled reality in terms of images. So they named the constellations after familiar animal forms and created gods and demons in their own image.
The spell of the image was first broken when the text was invented. You can scan an image starting from any point and ending at any other, even coming back to the original point. This is a non-linear manner of looking at the world. The image creates a magical consciousness where things exist outside of time and without cause and effect. The text enables linear thinking. The pixels in the 2-dimensional image are laid out in 1-dimensional rows of text. If you can explain how and why an image was created, you diminish its magical influence.
The text enables serialization of events and history is born. The text also enables conceptual thinking. As texts themselves become increasingly abstract, they became increasingly incomprehensible. While the pagans worshipped abstract images, modern religions such as Christianity were faithful to the text. Idolatry gave way to the worship of text, what Flusser calls textolatry. Religions such as Judaism and Islam forbid idolatry entirely. Buddhist shrines also forbid photography.
The spell of the image is well known to the advertising industry and the spell is cast with great potency to shape the purchasing decisions of the consumer. The image influences what we eat, our choice of mates, who we vote for, how we want to be seen, our desires and aspirations. As long as we think and act under the spell of the image we are all idolaters.