Mental Representations

Representations cartoon

Back to Home Annotated Bibliography


  Draft 1.0


Internal Representation optional for Cognitive Science?

        It is relatively uncontroversial that minds represent the world. That I have the belief that that the world is warming means that I represent the world as being a certain way. Accepting this does not commit one to the existence of states internal to the mind which themselves represent.
        One of the distinguishing commitments of contemporary cognitive science, particularly as contrasted with behaviorist psychology, is that there are inner representations which cause and explain behaviour. In cooking a meal, I may consult my memory for the next ingredient. A typical explanation from Cognitive Science would suggest that the recipe is represented somewhere in my brain and that in consulting my memory I am retrieving and examining this internal representation.
        Mental representation thus holds a crucial place in contemporary study of the mind. Thorough understanding of mental representation will make a significant contribution to the theoretical underpinnings of all the cognitive sciences.

Familiar Representations

        Representation is ubiquitous in human society. Natural human languages provide perhaps the richest and most powerful representational system known. Every time we speak we express meanings using physical vehicles – spoken or written words. Pictures provide another very common kind of representation. From photographs to portraits to sketches in the sand, we represent people and objects through visual likenesses.
        We are able to produce representations on the fly, for example when I say, in explaining a solar eclipse, "Let this soup bowl be the earth, and this large plate the sun and this glass be the moon …" The glass represents the moon only while I’m explaining the eclipse.
        There are many more common examples of representations: scale models, musical notation, icons on a computer screen, road signs and scientific formulae are just a few examples.

Internal Representation

        Gilbert Ryle had convinced many that minds could not properly be explained by the use of inner representations. The familiar external representations were the sorts of things that minds used and manipulated. If one were to imagine a representation inside a mind, who, or what, used that representation? Not another, inner, smaller mind (on pain of an infinite regress). So inner representations could be of no help in explaining the mind.
        The theoretical claim behind much of cognitive science, however, is that objects similar to these familiar external representations exist in our brains and minds, and that these inner representational states play a crucial part in explaining our behaviour and how we think.
        Modern serial, digital computers provided the fundamental metaphor for such systems of internal representation. The modern programmable computer, through its intellectual forebear the (purely theoretical) Turing Machine, is provably capable of carrying out any finitely specifiable task (given enough time and memory). Such a powerful, flexible apparatus operated on the basis of internal representations. Specifically, on the basis of binary representations of numbers, letters and truth values.
        Thus computers offer an ‘existence proof’ of representational minds: there is no doubt they exist; there is little doubt that internal representations are essential to their operation, and there is good reason to believe that they can behave ‘intelligently’ and perhaps have minds. In short, computers demonstrate that internal representations may have a critical role in explaining intelligent behaviour.


Vehicles vs Content








This image of the Jacaranda tree near my office is a representation of a tree. The content of the representation is the tree itself (the material object made of wood, sap and flowers). The vehicle which carries this content is the pattern of colour shapes on your screen.



The sentence "The earth is warming due to human activity." Represents a certain state of affairs – the warming of the earth due to human activity. The vehicle carrying this content is the token sentence – the pattern of light and dark shapes on your screen, or of ink if this page is printed.
        Vehicles of representation are straightforward material objects; patterns of ink or chalk or light, arrangements of physical matter, and so on. How, exactly, does the sentence quoted above carry information about the world? How, exactly, does the pattern of colour carry information about the tree?.
        Once we’ve distinguished the vehicle and the content of the representation, it is clear that understanding mental representation requires understanding representational vehicles, the things in the world referred to by those vehicles (contents), and the relation between the two. The latter two problems under the name of ‘mental content’ or ‘meaning’ (see elsewhere in this Field Guide). Understanding a relation requires understanding its relata, and so the nature of the vehicles deserves close attention. (Many philosophers recognise that there is more to content than simply reference to physical objects and states of affairs, for example Frege’s Sinn (‘sense’), or the proposition expressed by a representation. But focussing on referential content will illustrate the problems well enough.)

Analog vs digital

        The evident power of digital computers led early cognitive scientists to the view that it was the digital nature of the computer’s internal representations that was significant in their representational and computational prowess.
        Computational digital representation were contrasted with analog representations. For example, letters of the alphabet are digital representations. There are, in English, exactly 26 different types of letter, and every token letter (on this page, for example) is definitely one of the 26. Digital representations are discrete entities: in a given scheme of digital representation there are finitely specifiable distinct types of representation, and for every representational token, it is clear which, if any, type it instantiates. Numerals provide another good example of digital representation. Quantities are represent by the arrangement of a discrete number of possible digits. In the decimal system, there are 10 such digits (‘0’ … ‘9’). In the binary system, there are two (‘0’ and ‘1’).
        In contrast, analog representations do not admit of definite type-identity. A photograph is a good example of an analog representation: it represents via the arrangement of colour patches, and the colour values are drawn from a continuous spread of possible colours. Similarly the place of a given colour patch is measured on a continuous rather than a discrete scale.
        The power of digital computers to simulate analog systems (e.g. computer weather simulations) raises questions about the importance of the difference between analog and digital representation.

The analog/digital distinction refers to more than just representational vehicles, however. Digital computers use digital processes to manipulate digital representations. The difference in character between digital and analog systems might be best attributed to the form of processing, rather than the vehicles of representation.
        Digital representations, as the name suggests, evoke the image of numerical systems. A representational scheme in which a unique digit may be associated with each representational vehicle is a paradigm case of a digital system of representation.
        The digital/analog distinction looks to be a straightforward and useful way to classify representations. Analog watches represent the passage of time by the smooth flow of the hour, minute and second hands across the dial; digital watches represent the passage of time by changing the numerical readout in synchrony with the flow of time, in one second jumps.
        There are important differences between these two forms of representation. The flow of time has a clear visual analog on the watch: 2 hours is a sweep of 60º by the hour hand, and so on. The flow of time on a digital watch does not have the same visual analog. To read a digital watch you need to understand decimal numerals, whereas to understand the analog watch you need not have a grasp of numerals at all. Innumerate readers of an analog watch may still correlate sunrise and sunset with certain positions, and recognise the relationship between the movement of the watch and the flow of time.
        In short, reading a digital watch requires grasping an abstract representational scheme which makes use of arbitrary vehicles (in this case numerals).
        There are other important features illustrated by this example. The analog watch uses a representational medium which is smooth and continuous – there are no step-functions involved in determining the appropriate vehicle for a given time. Put another way, there are an indefinite number of states of the analog watch which require the real numbers for an accurate assessment. The digital watch has a finite number of possible states (all the possible times accurate to the second), and there is no way to represent times in between the possible represented times. For example, if the digital watch is has a second digit, but no 10ths of a second digit, then it can represent 3:25:43 and 3:25:44, but it cannot represent 3:25:43.5.
        The analog watch makes use of a representational vehicle which reliably covaries with what it represents – time. Thus as time progresses, the watch hands sweep around the dial a distance directly proportional to the elapsed time. Digital watches are quite different. The digits change with each second, however there is no other simple covariation between vehicle and content. More generally the structure of the analog watch – the internal relations between the hands and the dial – exactly matches the relations which hold between hours, minutes and seconds. There is no such similarity of structure between the numerals on a digital watch and the structure of time.
        The analog/digital distinction thus seems to mark an important representational difference. In an automated system, the task of reading an analog watch places very different demands than does the task of reading a digital watch.
        But there are any number of cases which cause difficulty for this distinction. What of watches where the second hand ticks, in discrete steps, from second to second? This case retains a certain amount of the structural covariation features of the paradigm analog watch, but introduced step-function discontinuities. Is it now digital or analog?

Computational Representation

        The fundamental structure and function of most modern computers is usually attributed to John Von Neumann’s work in the 1940’s and 1950’s. The key idea is that of a stored program computer. That is, the program which directs the computer’s actions is stored explicitly in the computer’s memory. The conceptual underpinning of the stored program computer is the notion of an automated formal system.
        We find in the CPU of a modern computer an electronic circuit set up so that certain patterns of electrical activity which we interpret as instructions (e.g. the instruction to add two numbers together) are so structured that they cause the computer to do just that. The meaning of the instruction matches its causal effect on the CPU.
        Von Neumann architectures take their basic idea from Turing’s universal machine. You have one place to store tokens (in Turing’s example this was a linear tape divided into boxes), where the tokens are typed as above. The trick is to set up the basic workings of the machine so that it can read its instructions off the tape along with the data it is processing.
        Instead of tape, computers make use of electronic memory. In the memory is stored the program (a list of instructions about what to do when) and the data (e.g. two numbers to add, or the works of Shakespeare to search for a keyword, or whatever).
        There has to be a finite number of basic instructions that are ‘wired into’ the computer. That is, the computer is hard-wired to understand this finite set of instructions. Typically, these basic instructions include collecting binary numbers from particular memory addresses, adding two numbers together, and placing numbers in memory addresses. From the basic set of instructions can be built quite complex programming instructions.

        We are particularly interested in what kind of representation such computers make use of. Prima facie, there are two different kinds of representation to be found in the computer.
        First, there is the representation of the program. The instruction to add two numbers together is a primitive ‘machine language’ instruction. When represented inside the computer, this instruction will in fact be strings of 1’s and 0’s whose electrical properties have the effect of adding two binary numbers. This sort of machine language instruction illustrates the first sort of representation we encounter. And each one represents a particular program instruction.
        The second sort of representation is the representation of the data. In this case, the data are numbers, and they are represented by binary numerals in the familiar way. But computers can represent all sorts of other data: letters, symbols, pictures, graphs and so on. All of these sorts of data are coded in binary form, according to some coding scheme (e.g. ASCII: A=65, B=66 etc.).
        This distinction is important in studying the computer’s apparent skill at ‘understanding’ representations. It is first and foremost the program instructions that the computer ‘understands,’ if it understands at all. The computer can follow the program instructions (e.g. to add two numbers) because it is hard wired to do so. It is program instructions that drive the intuition that programs can properly be said to read and understand symbol systems. On the other hand it can not be so readily argued that computers understand the data that they store.


        Just as the idealised neuron is a simple processing device with many inputs and one output, the basic connectionist unit is also a simple processing device with many inputs and one output. In the neuron, the inputs and output are typically measured in terms of the amplitude or frequency of electrical impulses (though this is a simplification). In crude terms, the value of a given neuron's output is a weighted sum of the values of all its inputs. It is these basic features of neurons which are also found in the connectionist 'unit'.
        The interesting characteristics of these units is that when combined into large networks, the network as a whole can exhibit cognitively interesting behaviour.
        There are a number of different forms of representation identifiable in connectionist networks. Networks are capable of ‘local’ representation, typically in schemes where each unit has a single semantic interpretation.
        The most obvious form of representation in connectionist networks is that found in the activation of the network’s units.
        The figure illustrates a network discussed by Paul Churchland which is designed to distinguish between mines and rocks on the ocean floor on the basis of sonar echo ‘profiles’ (Churchland, 1989b).


In this network, the input activation vector is a representation of the echo profile, and the activation of the two output nodes represents the presence of a mine or a rock, respectively. That a particular input was caused by a mine, for example, is represented by a certain activation of the output units. The activation vector of the hidden units at first seems not to represent anything much at all, but close inspection shows that the network operates to divide the ‘activation space’ into two sections, each associated with an output node (see Figure 3). That is, the multi-dimensional ‘space’ of hidden activations (where each hidden unit determines one dimension) is partitioned such that mine echoes cause activation in one half of the space, and rock echoes cause activation in the other half. The output units are then sensitive to whether the activation of the hidden units is in one partition or the other.


The information about mines and rocks that this network can be said to represent is held in its connectivity. A network’s connectivity consists in what unit is connected to what other units and with what strength (or ‘weight’). This information allows different networks to process information differently.
        In the mine/rock network, the weight connectivity represents that echo profile A has come from a rock, echo profile B has come from a mine, and so on. Thus the connectivity of the network as a whole (and no particular connection or proper subset of connections) represents all these things. The individual items of information that can be extracted discretely – for example, that echo profile A comes from a rock – cannot be distinguished by any discrete decomposition of the connectivity of the network. Only the whole connectivity – all the information about which unit is connected to which other unit with what weight – represents each ‘item’ of information.
        This can be contrasted with computational and linguistic representational schemes where discrete units of representation correspond to discrete items of information. Thus in a computational representation of the play
Hamlet there is a discrete component of the computer’s memory that represents Hamlet’s anguished question ‘To be, or not to be?’ Most of the computer’s memory could be damaged and many other lines from the play lost, but as long as certain pieces were left behind, this line would still be represented. But if damage were done to the connectivity of the mine/rock network, damage would be done to all the information held about echo profiles – nothing would be left unscathed.
        A last point to emphasise about connectionist networks is that they suggest a radically different notion of computation than the familiar serial, von Neumann architecture computers. A key difference is that all the stored knowledge is made use of in both learning and processing, since the knowledge is all stored superpositionally in the weights, and the weights are involved both in the processing of inputs to produce outputs, and in the readjusting of weights in learning.


        The mental imagery debate was largely sparked by experimental work in psychology where results suggested that subjects were rotating and scanning internal pictures at measurable rates.
        The basic idea is a simple one. When we imagine a scene or an object, it seems to us that we are observing a picture of that scene or object. The experimental work shows that more than just seeming to view internal pictures, we can operate on them just as if they were pictures - rotate them at fixed speeds, scan them for information, zoom in and out, and so on.
        The controversy arises in interpreting the significance of these results for understanding cognitive architecture. In particular, do these experiments show that rather then there be symbols or sentences in the head, that thinking is carried out (at least sometimes) in a pictorial or non-symbolic medium?
        The two positions in this debate may be dubbed ‘pictorialism’ and ‘descriptionalism’. According to pictorialism, mental images represent in the manner of pictures, while according to descriptionalism, mental images represent in the manner of sentences.
        A key point in this debate is that you need to distinguish between the pictorial nature of the
experience of thinking of the image, and the pictorial nature of the representation which gives rise to the experience. It seems quite possible that a symbolic representation (e.g. list of sentences ) could give rise to an imagistic experience (e.g. a matrix).
        A second key point is that all systems of representation require decoding. Even pictures can be culturally specific (e.g. photographs, mirrors), illustrating that you have to learn to see them.
        These considerations suggest that the psychological evidence is not conclusive on the architectural question. Nonetheless, the discussion raises the interesting question: how would be determine if a given representation (or scheme of representation) is symbolic or pictorial?
        Are symbolic schemes completely intertranslatable with imagistic schemes?

        Block argues that to the extent that the mind does make use of imagery, it is unavailable to cognitive science, and is rather the domain of neuroscience proper. This is because the functional architecture required for operations over pictures would need to be more sophisticated than the architecture which underpins symbolic systems.
        Sterelny argues that the possibility that the imagistic evidence could be explained by a symbolic representational base renders the hypothesis that internal representation is pictorial non-explanatory.

Representational Schemes and Genera

        Representation tokens are usually thought of as members of a general scheme of representation. Thus letters are members of an alphabet; words are members of a lexicon, and decimal numerals are members of the set of 10 digits digits ‘0’ … ‘9’. There are exceptions. When I use a glass to represent the earth, a plate to represent the sun, and a fork to represent the moon in a description of their relative positions during a solar eclipse, then the representational vehicles (the glass, plate and fork) are not members of any previously defined scheme of representation. They constitute an ad hoc scheme invented for this particular use, and they have no relationship to one another as representational vehicles. [incomplete distinction]

        It is generally assumed in the cognitive sciences that there are three genera of representation, and they are defined in terms of the relation which holds between representational vehicle and content. (John Haugeland says that this is the canonical view in the sense that ‘almost everybody expects almost everybody else’ to believe it.) Thus 'logical' representational schemes such as languages and formal symbol systems have compositional semantics – there are clear rules which define the meaning of a complex representation in terms of the meaning of the atomic constituents of that complex representation. 'Iconic' schemes such as scale models and pictures are isomorphic to their contents – the representational vehicle shares structure with the content. ‘Distributed' schemes such as holograms and connectionist weight vectors superpose many contents in one vehicle.


Logics, natural languages, sign systems, computer programming languages.

        It is clear that one distinctive feature of many logical schemes is a fairly complex sort of compositional semantics, like that found in natural languages, rather than the simple compositionality of concatenation found in pictures. For example, a scheme which can distinguish between conjunction and disjunction shows significant complexity in its compositional semantics. The ability to represent negation and complex, abstract combinations of atomic contents is a source of the significant utility of some logical representational schemes. Similarly predication is another compositional form which adds depth and complexity to a representational system.
        Necessary for this sophisticated sort of compositionality, it would seem, is the ability to group vehicles according to well-defined types. This is because sophisticated compositionality depends on general rules of composition, and such general rules need a well-defined domain to operate over. And type-identification itself seems to require that the vehicles be discrete (as contrasted with the continuous nature of some iconic vehicles, for example scale models).
        Considering the very loose constraint on the range of possible vehicles for logical schemes, and that they can represent pretty much anything, the only thing that can be said, in general, about the relation between vehicles and contents is that it is
arbitrary.2 Of course, in a logical scheme with compositional semantics, the relation between complex vehicles and their contents won’t be arbitrary because the content of the vehicle will relate in fixed ways to the contents of its component parts.
        So a refinement to the canonical account is the idea that the representing relation of logical schemes of representation be arbitrary for
atomic vehicles. Sophisticated compositionality might however be a sufficient condition on a scheme’s being logical.


        Iconic representations represent relations among different properties. So, if the velocity of a car is represented by the height of a rectangle, and the time spent travelling at that velocity by the length of that same rectangle, then the distance travelled by the car in that time will be represented by the area of the rectangle. If the representation of velocity changes, the area automatically changes; it does not need to be recalculated because the relational structures have been preserved in the representation.


Canonically, isomorphism is the defining relation of iconic schemes. The key notion is that iconic representation obtains when there is a reproduction of structure. Thus a bust of Immanuel Kant represents Kant’s head and neck because they share structural features. Mathematically, structure is understood as a set of elements and a set of relations over those elements. So in Figure 4, there is a 1-1 mapping from possible car velocities to possible heights of the rectangle, from possible durations to possible widths, and from possible areas to possible distances.3 These mappings are trivial because each domain is the real numbers. Critically, there is also a mapping from the relation between time, velocity and distance in the car to the relation between the height, width and area of the rectangle. So a structure in the square is the same as one in the body moving with uniform velocity. Thus the representational relation – the relation between the representational vehicle and the content – is that of identity. The structure of the content is reproduced in the representation.

Generalise to scale models, covariation of numerical values, isomorphism. Pictures, maps, scale models, graphs, charts and diagrams.

Superpositional Representation

        Connectionist networks have brought superpositional representation to prominence in the cognitive sciences. A representation of a contents c1 and c2 is superpositional if the resources used to represent c1 are identical with the resources used to represent c2. That is, one and the same token representation has the role of representing both contents. This is actually a more familiar kind of encoding than is often recognised.
        Consider the example of sound. A piano is played, causing a pattern of air pressure at the point at which a microphone picks it up (which then transduces the air pressure pattern into a pattern of voltage). A singer sings, similarly causing a characteristic pattern of air pressure. If the piano and singer make a noise at the same time, then their characteristic patterns of air pressure are superposed (everywhere, but in particular) at the point where the microphone picks up the signal and transduces it. Thus the electrical encoding of the sounds of the piano and the voice is a superpositional representation. This may then be recoded on a CD and reproduced through an amplifer and speakers.

        Holograms are another well-understood form of superpositional representation: the holographic plate records a large number of incident light arrays (from different angles), and (under the right conditions) is able to reproduce all these light arrays, each radiating in a distinct direction, hence as the viewer moves around the hologram, she sees different scenes.
        Many connectionist networks exhibit superpositional encoding. The complex connectivity of the network – the vector composed of the weights of each connection between units (‘synapse’) in the network – constitute a superpositional memory. Networks learn to process information in various ways, and the ‘knowledge’ they thereby learn is encoded in the the network’s weights. Generally speaking, no particular weight or group of weights is responsible for the representing of any particular content.
        An important consequence of this form of data storage is graceful degradation. Damage to part of a network, or part of a hologram leads to a partial degradation of the whole image. This is in contrast to both pictorial and logical forms of representation where the loss of part of the representation (damage to a hard disk, or the tearing of painting, has no effect on the representing of the other parts of the representation.
        Another important consequence of this form of representation as it is employed in connectionist networks is that it generalising inferences to be drawn by the act of representing. In a famous, but perhaps over-simple example, Ramsey, Stich and Garon trained a network to store (superpositionally) a number of propositions, for example ‘dogs have fur’. The network generalised from the `knowledge´ that dogs have fur, paws, fleas and legs, and that cats have fur, paws and fleas, to the `knowledge´ that cats have legs.
        Thus the connectionist use of superpositional representation has initiated a re-evaluation of representation in general by positing questions such as ‘Are representations distinct from processes of inference?,’ and ‘Might different kinds of representation permit different kinds of cognition?’.

Explicit and inexplicit

        A good place to start with these worries is Dennett’s ‘Styles of Mental Representation’. Dennett distinguishes between explicitly represented information, which is held in the ordinary kind of explicit representations with which we are familiar – sentences, program instructions, well formed formulae of a formal system and so on – and inexplicitly (Dennett says ‘tacitly’) represented information, which is simply embodied by a system or organism. One kind of inexplicit knowledge is the know-how of the digital computer. It inexplicitly knows how to follow certain rules; viz. machine instructions. Dennett understands one sort of inexplicitly represented information as the information that a system has which allows it to use and manipulate explicit representations, but which is not itself encoded explicitly. For example, the ability to add two binary digits of given length is typically wired into a CPU, and so we could say that it inexplicitly represents how to add.

The more recent literature on this topic (Kirsh 1989, Clark 1993) argues convincingly that there is no clear, absolute distinction between the explicit and inexplicit representation of information. Encryption of English sentences, for example, can yield representations which have all the same features as English sentences, but don’t appear to represent their contents explicitly. With skill, however, some readers might be able to decrypt on sight, without laborious processing. In such a case I think we have to say that the information which was inexplicit to the (untrained) decoder has become explicit to her as she becomes skilled at decryption. In contrast information about the constitution of the USA is explicitly represented somewhere in my University’s library, but right now it is opaque to me. I have to go through quite a lot of processing to get at that information. As far as I’m concerned right now, the information is not explicit to me, and hence inexplicit.
        These sorts of consideration lead Kirsh to argue that explicitness is relative to the system using the information, and isn’t captured by superficial features such as concatenation, spatial isolatibility and so on. Clark extends this analysis and argues that information is properly explicit only if, along with being easily deployable, it can be deployable in many different ways. If information is present in the system and easily available to a single process, it might still not be fully explicit. The content of ‘the grass is green’ is available for all sorts of different uses to us (answering questions about grass, about greenness, about living things, about colour, etc. etc.); however the information about addition embedded in the CPU of a computer, while available for the purpose of adding two binary digits of a fixed length, isn’t available for theorising about the properties of addition.
        So determining whether certain information is represented explicitly or inexplicitly depends on the availability of that information to various processes in the system, and so explicitness is relative to different possible contents, and to the systems which makes use of those contents.


<1> This is not all there is to its content, however; the content also includes an aspect of the visual presentation of the tree itself - how it looks from a certain perspective.

<2> Colin McGinn [2] p. 178 makes a similar claim.

<3> I say 'roughly' because there are a number of idealisations going on here. The rectangle may not be of indefinite size because we don't the have the resources to draw very big rectangles. Note also that we are assuming that the car in question has uniform velocity throughout the associated time period.

Annotated Bibliography                                                                                              Top of page