Attractor Spaces as Modules:
a
Semi-Eliminative Reduction of Symbolic AI to Dynamic Systems Theory
Teed Rockwell
2419A Tenth St
Berkeley, CA 94710
510/ 548-8779 Fax
548-3326
74164.3703@compuserve.com
Attractor Spaces as Modules:
A
Semi-Eliminative Reduction of Symbolic AI to Dynamic Systems Theory
abstract:
I propose a semi-eliminative reduction of Fodor’s concept of module to the concept of Attractor Basin which
is used in Cognitive Dynamic Systems Theory (DST). I show how Attractor basins
perform the same explanatory function as module in several DST based research
program. Attractor basins in some organic dynamic systems have even been able
to perform cognitive functions which are equivalent to the If/Then/Else loop in
the computer language LISP. I
suggest directions for future research programs which could find similar
equivalencies between organic dynamic systems and other cognitive functions.
Research that went in these directions could help us discover how (and/or if)
it is possible to use Dynamic Systems Theory to more accurately model the
cognitive functions that are now being modeled by subroutines in Symbolic AI
computer models. If such a reduction of subroutines to basins of attraction is
possible, it could free AI from the limitations that prompted Fodor to say that
it was impossible to model certain higher level cognitive functions.
What is this Thing Called
Modularity?
To
some degree, Fodor's claim that Cognitive science divides the mind into modules
tells us more about the minds doing the studying than the mind being studied.
The knowledge game is played by analyzing the object of study into parts, and
then figuring out how those parts are related to each other. This is the method
regardless of whether the object being studied is a mind or a solar system. If
a module is just another name for a part, then to say that the mind consists of
modules is simply to say that it is comprehensible. Fodor comes close to
acknowledging this in the following passage.
The condition for successful
science (in physics, by the way, as well as psychology) is that nature have
joints to carve it at: relatively simple subsystems which can be artificially
isolated and which behave, in isolation, in something like the way that they
behave in situ.
(Fodor 1983 p.128)
If
this were really as unconditionally true as Fodor implies in this sentence,
Fodor's Modularity of Mind would have been a collection of tautologies.
In fact, Fodor goes to great lengths to show that his central claims are not
tautologies, but rather reasonable generalizations from what has been
discovered in the laboratory at this point. Fodor gets more specific in
passages like this one:
One
can conceptualize a module as a special purpose computer with a proprietary
data base, under the conditions that a) the operations that it performs have
access only to the information in its database (together of course with
specifications of currently impinging proximal stimulations) and b) at least
some information that is available to at least some other cognitive processes
is not available to the module. (Fodor 1985 p. 3)
Fodor sums up these two conditions with
the phrase informationally encapsulated, and adds that modules are by
definition also domain specific
i.e. each module deals only
with a certain domain of cognitive problems. He also claims that what we call
higher cognitive functions cannot be either informationally encapsulated or
domain specific. Instead, these
higher processes must be "isotropic", (i.e. every fact we know is in
principle relevant to their success) and "Quinean" (they must rely on
emergent characteristics of our entire system of knowledge.). Roughly speaking,
“isotropic” is the opposite of “informationally
encapsulated” and “Quinean” is the opposite of “domain
specific”. Because modular processes are by definition neither Quinean
nor isotropic, there is "a principled distinction between
cognition and perception" (ibid.).
This is more than a little ironic because
Artificial Intelligence usually studies cognition, not perception. When it does
study perception, it does so by describing it as a form of cognition. But Fodor
claims that “Cognitive Science has gotten the architecture of the mind
exactly backwards” (Fodor 1985 p.497) when it sees perception as a form
of cognition. Thinking beings are by definition capable of responding flexibly
and skillfully to a variety of different situations. Perception, according to Fodor, is by its very nature
reflexive and rigid. It consists of unthinking responses to the immediate environment,
over which our conscious rational minds have essentially no control. These
kinds of processes are much easier to model than “what is most
characteristic and most puzzling about the higher cognitive mind: Its
non-encapsulation, its creativity, its holism, and its passion for the
analogical.”(ibid.) Fodor consequently claims that, given the conceptual
tools of cognitive science, it is not possible to have a science of the higher
"Quinean-Isotropic" cognitive functions, such as thought or belief.
This analysis of the current state of Cognitive
Psychology is also backed up with considerable scientific detail by Uttal 2001.
Uttal specifically says that he believes Fodor’s distinction between
perceptual faculties, which are modular, and higher order faculties, which are
not, is essentially correct. (p.115)
Nevertheless, Uttal does raise some points which could be used to
justify misgivings about any kind
of modular psychology. He points out that the reason that it is easier to
correlate brain function with perception is mainly a function of the nature of
perception, not of the brain itself.
. . .the dimension of each sensory modality is well defined.
For example, vision is made up of channels or modules sensitive to color,
brightness, edges and so on. . . .Because a thought may be triggered by a
spoken word, by a visual image or even by a very abstract symbol, we can
establish neither its links to the physical representation nor its anatomical
locus.” (p.114)
In
other words, when you can’t precisely define the nature of your stimulus,
it will be difficult to replicate a consistent stimulus-response connection.
The reason that stimulus-response connections can be established between brain
states and perceptions is that everybody knows what a ray of light is, and
there are precise quantitative ways of measuring its characteristics. It is
therefore not surprising that we can produce precise variations in neural
behavior by precisely varying the ray of light. But the existences of these
replicable S-R connections need not imply that these variations are being
produced entirely by an autonomous module. As Uttal points out, the fact that
different parts of the brain influence different mental or behavioral processes
does not require us to accept
“the hypothesized role of these regions as the unique locations of
the mechanisms underlying these processes”. (p. 11) Just because a certain kind of neural activity is
necessary for perception does not mean that it is sufficient. There is also
evidence which gives reason to question the modular hypothesis. Uttal cites research which
“argued strongly for the idea that even such an apparently simple
learning paradigm as classical conditioning is capable of activating widely
distributed regions of the brain.” (p.13). If ‘simple’
stimulus-response connections are not modular, is there any reason to think
anything else is?
The case for perceptual modularity looks even
weaker when we shift from Cognitive Psychology to Artificial Intelligence. If
the brain regions being studied by
Cognitive Psychology were really informationally incapsulated and domain
specific perceptual modules, it ought to be possible to build machines that
duplicated their functions using the modular architecture of computer science.
Unfortunately, although classical Artificial Intelligence has had some success
in duplicating what are often thought of as higher brain functions, its biggest
failures have been its attempts to understand perception, as Hubert Dreyfus has
documented in great detail (Dreyfus 1972/1994). If Fodor and Dreyfus are both
right, this would mean that Cognitive Science is suffering from a serious lack
of consensus in two of its branches. Neuropsychology cannot localize the higher
functions which can be mechanically duplicated by the modular architecture of
Artificial Intelligence. And Artificial Intelligence cannot use modular
architecture to duplicate the perceptual functions which Neuropsychology claims
are localized modules. It seems that even Fodor’s final exclamation of
“modified rapture” was too optimistic.
This paper, however, will be an attempt to offer a
hopeful alternative to this gloomy picture. Fodor admits that this limitation may only be true "of
the sorts {of computational models} that cognitive sciences are accustomed to
employ"(Fodor 1983 p.128). An examination of the presuppositions of those
computational models may reveal this to be a limitation of only one particular
kind of cognitive science. The distributed connectionist systems have so far
had the most success with replicating perception in ways that are commonly
thought of as being “non-modular” in some sense. I will argue,
however, that the cognitive abilities of these and other dynamic systems may be
modular in another sense, which need not share the limitations that Fodor
thinks are essential to modular architecture.
Fodor and the Symbolic Systems
Hypothesis
I
believe that the limitations described by Fodor do hold for the paradigm that
gave birth to cognitive science, which is often called the symbolic systems
hypothesis. It's most
fundamental claim is that a mind is a computational device that manipulates
symbols according to logical and syntactical rules. All computers and computer languages operate by means of
symbolic systems, so another way of phrasing this claim is to say that a mind
is a kind of computer. The symbolic systems hypothesis is still basically alive
and well, but now that it is no longer universally accepted it is often given
disparaging nicknames, like Haugeland's "GOFAI" (for good old
fashioned artificial intelligence) or Dennett's "high church
computationalism". Fodor
remains the most articulate preacher of the gospel of high church
computationalism, and when his concept of modularity goes beyond the
tautologous claim that minds are analyzable, it almost always brings in strong
commitments to the claim that a mind is some kind of computer. Fodor’s
modules are really what computer programmers call subroutines, which is why he
defines modules in the quote above as "special purpose computers with
proprietary data bases." GOFAI scientists model cognitive processes by
breaking them down into subroutines by means of block diagrams, then breaking
those subroutines down into simpler subroutines and so on until the entire
cognitive process has been broken down into subroutines that are simple enough
to be written in computer code. ( See Minsky 1985 to see this process in action.). Most of what Fodor said
about modules in Fodor 1983 follows from the fact that subroutines are domain specific and
informationally encapsulated ( i.e. they are each designed for specific tasks,
and only communicate with each other through relatively narrow input-output
connections). At the time, Fodor believed (and apparently still believes) that
GOFAI was and is the only game in town for AI. When Fodor says that it is
impossible to model Quinean and isotropic properties, what he really means is
that it is impossible to model them with the conceptual tools of GOFAI, and in
this narrow sense of "impossible" he is probably right.
Dynamic Systems Theory
as an Alternative Paradigm
This
paper will deal with whether there are similar constraints on the new sorts of
models currently available to cognitive science, which were not available when Modularity
of Mind was written. Recent developments in non-linear
dynamics have made it possible to use physics equations to describe systems
which have the kind of flexibility that seems to justify calling them cognitive
systems. This has resulted in a branch of cognitive science called Dynamic
Systems Theory (DST). There is now much controversy over whether it is possible
for the principles of DST to replace or supplement the computer inspired view
of cognition that is often called the symbolic systems hypothesis, or Good Old
Fashioned Artificial Intelligence (GOFAI).
Because
Fodor's modularity theory reveals both the strengths and the weaknesses of the
symbolic systems hypothesis, it provides excellent criteria for the evaluation
of the relative merits of DST and GOFAI. Fodor claims, I believe correctly,
that GOFAI explains cognitive behavior by a dividing a system up into
interacting modules. In order to be on equal footing with the symbolic systems
hypothesis, DST must enable us to account for the functions and properties that
Fodor calls modular. And if DST is also able to account for Quinean and
isotropic mental process (or show why the distinction between modular and
Quinean-isotropic processes is spurious), it would be clearly superior to the
symbolic systems hypothesis, for whom these processes are, by Fodor's own
admission, a complete mystery.
This
paper will describe a concept used in DST which I think has many significant
isomorphisms with the concept of module. These isomorphisms may enable us to
reduce the concept of module to this concept from DST when we are talking about
organic systems. Computers, of course, have real modules, because we build them
that way. But we may be decide that artificial intelligence may differ from
organic intelligence because the former approximates this feature of organic
systems with a brittle modular metaphor that is significantly different from
the real thing. A reductive account of the properties that Fodor calls modular
would not enable us to accept everything he (or anyone else) says about
modules. Whenever a new theory replaces an old one, it does so by contradicting
some parts of the old theory and accepting others. If it accepts a substantial
part of the old theory, the new theory is called a reduction. If it
rejects most of the old theory, we say that the new theory eliminates
the old theory. The best
contemporary theories of reduction claim that there is a continuum
connecting these two extremes of elimination and reduction (see Bickle 1998).
We decide where on the continuum a particular theory replacement belongs by
comparing the old and the new theory, and seeing how much and what sort of
isomorphisms exist between the two. I will not try to definitely answer whether
the example I am discussing is an elimination or a reduction, partly because
this is a question that can only be answered by future research, and partly
because I believe that attempting to make this distinction completely sharp is
more misleading than useful. Hopefully, however, my analysis will give some
sense of where on the reduction/elimination continuum we might place the
relationship between DST and the modular structures of the symbolic system
hypotheses.
A Brief Introduction to DST
In a multidisciplinary paper, it is frequently
necessary to include a brief summary of a science which takes a lifetime to
fully understand. Such summaries will sometimes belabor what is obvious, other
times oversimplify ideas that have important complications, and still have
parts which will be difficult to understand for many educated and intelligent
people. The following summary will probably have all of those faults, but it
will, at least, be focused towards those factors which are relevant to the
philosophical concerns of this paper. Its goal will be the understanding of the
essential nature of what the mathematical equations are measuring, rather than
with the equations themselves.
A
dynamic system is created when conflicting forces of various kinds interact,
then resolve into some kind of partly stable, partly unstable, equilibrium. The
relationships between these forces and substances create a range of possible
states that the system can be in. This set of possibilities is called the state
space of the system. The dimensions of the state space are
the variables of the system. Every newspaper contains graphs which plot the
relationship between two variables, such as inflation and unemployment, or
wages and price increases, or crop yield and rainfall etc. A graph of this sort is a
representation of a set of points in a two-dimensional space. It is also
possible to make a graph which adds a third variable and thus represents a
three dimensional space, using the tricks of perspective drawing. Because our
visual field has only three dimensions, that is the highest number of variables
that we can visualize in a computational space. But the mathematics is the same
regardless of how many variables the space contains. The state space of the
sort of dynamic system studied by cognitive scientists will have many more
dimensions than this, each of which measures variations in a different
biologically and/or cognitively relevant variable: Air pressure, temperature,
concentration of a certain chemical, even (surprise!) a position in physical space.
However, although these variables define the range
of possibilities for the system, only a few of these possibilities actually
occur. To study a dynamic system is to look for mathematically describable
patterns in the way the values of the variables change and fluctuate within the
borders of its state space. The patterns that a system tends to settle into are
called attractors, basins of attraction, or invariant sets.
I believe that these invariant sets have the potential to provide a reductive
explanation for what Fodor calls modules: i.e that science may eventually
decide that modules in dynamic systems really are basins of attraction,
just as light really is electromagnetic radiation.
In Port and Van Gelder 1995, an invariant set is
defined as "a subset of the state space that contains the whole orbit of
each of its points. Often one restricts attention to a given invariant set,
such as an attractor, and considers that to be a dynamical system in its own
right." (p.574) In other words, an invariant set is not just any set of
points within the state space of the system. When several interrelated
variables fluctuate in a predictable and law-like way, the point that describes
the relationship between those variables travels through state space in a path
which is called an orbit. The
set of points which contains that orbit is called an invariant set because the
variations in that part of the system repeat themselves within a permanent set
of boundaries. The second sentence in the above quote from Port and Van Gelder
is encouraging for our project. If an invariant set can be considered as a
dynamic system in its own right, this seems isomorphic with Fodor's claim that
modules are domain specific and informationally encapsulated.
Port
and Van Gelder define "attractor" as " the regions of the state
space of a dynamical system toward which trajectories tend as time passes. As
long as the parameters are unchanged, if the system passes close enough to the
attractor, then it will never leave that region." (p.573). The conditional
clause of the second sentence holds the key to the cognitive abilities of
dynamic systems. For of course the parameters of every dynamic system do
change, and these changes cause the system to tend towards another attractor,
and thus initiate a different complex pattern of behavior in response to that
change.
The
simplest example of an attractor is an attractor point, such as the lowest
point in the middle of a pendulum swing. The flow of this simple dynamic system
is continually drawn to this central attractor point, and after a time period
determined by a variety of factors (the force of the push, the length of the
string, the friction of the air etc.) eventually settles there. A slightly more
complex system would settle into not just an attractor point but an
attractor basin. i.e. a set
of points that describes a region of that space. The reason that these
attractors are called basins of attraction is because the system
"settles" into one of these patterns as its parameters shift, not
unlike the way a rolling ball will settle into a basin on a shifting irregular
surface. A soap bubble[1] is the result of a single fairly stable attractor
basin, caused by the interaction of the surface tension of the soap molecules
with the pressure of the air on its inside and outside. Because a spherical
shape has the smallest surface area for a given volume, uniform pressure on all
sides makes the bubble spherical. But when the air pressure around the soap
bubble changes, e.g. when the wind blows, the shape of the bubble also changes.
The bubble then becomes a simple easily visible dynamic system of a sort,
marking out a region in space that changes as the tensions that define its
boundaries change. To see how these same principles can eventually reach a
level of complexity that makes them a plausible embodiment of thought and
consciousness, imagine the following developments.
1)
The soap bubble could get caught up in an air current that flows regularly so
that, even though the soap bubble is not staying the same shape, it changes
shape in a repeating pattern. As I mentioned earlier, this pattern is often
called an orbit, because the
trajectory that describes this repeating change forms something like a loop
traveling through the state space of the system. Systems that settle into
orbits are usually more complicated than those which settle only into attractor
basins which are temporally static, particularly when those orbits follow
patterns that are more complicated than mere loops.
2) Instead of having the soap bubble fluctuate in
three dimensional space, imagine that it is fluctuating in a multi-dimensional
computational state space. As I
mentioned earlier, state space is not limited to the three dimensions of
physical space, for it can have a separate dimension for every changeable
parameter in the system. The most popular example in cognitive science of a
system that operates within a multi-dimensional state space is a connectionist
neural network. Connectionist nets consist of arrays of neurons, and each
neuron in a given array has a different input or output voltage. Each of those
voltages is seen as a point along a dimension of a Cartesian coordinate system,
so that an array of ten neurons, for example, would describe a ten-dimensional
space. But in other kinds of dynamic systems analyses, any variable parameter
can be a dimension in a Cartesian computational space. Our friend the soap
bubble can be interpreted as a visual representation of the air pressure coming
from each of the three dimensions in physical space, if all other background conditions
remain stable. And when the various interacting forces and variables in a
dynamic system are designated as dimensions in a multi-dimensional space, it
becomes possible to predict and describe the relationships between different
attractor basins in that system. This is the most relevant disanalogy between a
soap bubble and the more complicated dynamic systems studied by cognitive
scientists. Because:
3)
A soap bubble has really only one stable attractor basin. Although the
attractor space that produces a soap bubble is fairly flexible, the bubble pops
and dissolves if too much pressure is put on it from any one side. But in
certain systems, there are fluctuations of the variables which can cause the
system to settle into a completely different attractor space. These systems
thus consist of several different basins of attraction, which are connected to
each other by means of what are called bifurcations.
This makes it possible for the system to change from one attractor basin
to another by varying one parameter or group of parameters.
This
propensity to bifurcate between different attractor basins is what
differentiates relatively stable systems (like soap bubbles) from unstable
systems (like living organisms or ecosystems). In this sense, all living
systems are unstable, because they don’t settle into a equilibrium state
that isolates them from their surroundings. Organisms are constantly taking in
food, breathing in air, and excreting waste products back into the environment
they are interacting with. We usually think of unstable processes as formless
and incomprehensible, but this is often not the case. Certain unstable systems
have a tendency to settle into patterns which still fluctuate, but fluctuate
within parameters that are comprehensible enough to produce an illusion of
concreteness. When the various forces that constitute the processes shift in
interactive tension with each other, a basin of attraction destabilizes in a
way that makes the system bifurcate i.e. shift to another basin of attraction.
This kinds of system is sometimes called multi-stable, because its changes between various basins of
attraction are predictable and (to some degree) comprehensible.
My
claim is that, in a system that can shift between basins of attraction in a
biologically viable way, the attractor basins can be seen as functionally
isomorphic with the modules that are fundamental to GOFAI cognitive systems.
There is a lot of experimental work that supports this possibility. We will
first consider some work on infant and animal locomotion, in which some
experimenters identify attractor basins with modules, and alter the concept of
module significantly in doing so. However, because their research is measuring
abilities which are not ordinarily thought of as cognitive, this work provides
only an important first step in the direction I am suggesting. I will then show how neurobiologist
Walter Freeman has used the concepts of bifurcation and attractor landscapes to
explain how olfactory processing in the rabbit brain can produce perceptual
categories. Because Fodor has argued that perception is the cognitive ability
that is best duplicated by modular architecture, this shows that DST can
provide an alternative to some of the most important information processing models
that are the basis of GOFAI systems. However, the fact that DST can be used to
model perception does not necessarily show that it will be equally effective in
modeling the “higher level’ cognitive processes that are the basis
of rational inference. There is, however, other work on animal motion which
indicates that bifurcation between attractor basins can sometimes significantly
resemble the switching between possible branches of decision trees, which is
the fundamental cognitive process of AI computer languages such as LISP. I therefore propose that one of the
most fruitful directions for future research would be to determine whether
dynamic systems are capable of duplicating all the types of decision-making
performed by computers. The work that has been done so far seems to indicate
that the answer might be “yes”.
Thelen and Smith on Modularity
Without Modules
In Thelen and Smith 1994, the authors argue that
their research on infant locomotor development gives evidence that
“cognitive development is equally modular, heterochronic, context
dependent, and multidimensional.” (p.17) Not surprisingly, this
discussion will focus on their claim that cognitive development is modular.
This claim is proposed as an alternative to the idea that infant locomotion
develops to maturity by the gradual unfolding of what is called a Central
Pattern Generator (or CPG) i.e. a unified program stored in the brain or DNA
that controls the motor processes from a central location in the nervous
system. Although there was some evidence that such a thing existed in cats, the
behaviors that were isolated in cats during the search for a CPG controlled
only a part of what is essential for locomotion. Experimenters were able to
isolate the spinal cord neural firings from both the cat’s brain and from
perceptual influences, and thus produce a “spinal cat” that could
walk on a treadmill if supported. But Thelen and Smith (hereinafter T&S)
argue that this set of behaviors was only a “module” that could not
produce walking behaviors without the help of several other
“modules”. A spinal
cat, for example, cannot stand up without assistance, or reorganize
neuromuscular patterns to deal with terrain that was more irregular than a treadmill.
When T&S studied the development of walking
behaviors in human infants, they discovered that the separate components
necessary for walking appeared and reappeared at different times in the
infant’s life, and in response to different environmental stimuli. For
example, it is widely acknowledged that newborns have the ability to make
coordinated step-like movements when held erect. This ability disappears at
about two months, and does not reappear until the infant learns to walk many
months later. T&S discovered, however, that even after the ability to make these
step-like movements had supposedly disappeared, these infants would still
occasionally make them under certain conditions, such as lying on their backs,
or walking on treadmills, or a change in emotional mood. (pp.10-11) This vital
fragment of the ability to walk was a very early part of the infant’s
repetoire, which was eventually assembled together with other behavioral
components to make walking possible. Walking did not emerge because of the
switching on of a Central Pattern Generator. T&S’s
conclusion was that locomotion in humans and other non-human vertebrates
had “ homogeneity of process from afar, but modularity. . .when viewed
close up.” (p.17)
The work that T&S cite on non-human
vertebrates was done not only with cats, but also with frogs and chickens.
Stehouwer and Farel 1983 describes the discovery that the underlying neural
activity for hind limb stepping was found in bullfrog tadpoles before they had
hind limbs. When the tadpoles had grown vestigial limbs which were not yet
fully capable of walking, it was possible to get them to perform walking
movements by supporting them on dry rough surfaces in much the way that T&S
supported human babies on treadmills. Watson and Bekoff 1990, revealed a
similar modularity in the motor movements of chickens. There is a particular
motion that a prenatal chick uses to break its shell when hatching and which it
never uses again—unless the right context is created, by bending the
chick’s neck forward to simulate the position of a chick embryo which has
grown too big for its shell. In other contexts, the hatched chick uses a
completely different set of movements: stepping with alternate legs, hopping
with both legs together, even swimming when placed in water. T&S claim that
all of this data supports the view that animals “can generate patterned limb activity very
early in life, but walking alone requires more—postural stability, strong
muscles and bones, motivation to move forward, a facilitative state of arousal,
and an appropriate substrate. Only when these components act together does the
cat truly walk.” (p.20)
It may seem at first that T&S are attacking a
straw person with these arguments. Would anyone seriously claim that every
aspect of the ability to walk must be stored in Central Pattern Generator, or
deny the possibility that a CPG could rely on pre-existing muscle patterns to
do its job? And would anyone deny that the CPG must have parts, and cannot be
single undifferentiated whole? And if it has parts, why shouldn’t those
parts manifest at different times in the history of the organism? However,
T&S are in fact criticizing a specific position commonly held by their
colleagues which has serious problems. Of course everyone acknowledges that the
biological processes in the nervous system cannot be completely responsible for
locomotion. We can’t walk on our nerves. But T&S claim that
traditionally researchers have privileged those aspects of locomotion skills
which occur in the nervous system as being a “a fundamental neural structure
that encodes the pattern of movement” ( p. 8) and consider everything
else necessary for locomotion as being somehow less important. They even quote
one researcher who claims that the pattern must be stored in the genes. (p. 6)
They are quite right to consider this distinction as ad hoc and misleading, and to insist that the parts of
the locomotion process that take place outside the brain and/or genes are every
bit as important as the so-called neurological or genetic
“encoder”.
To some degree, the question of whether locomotor
development is controlled by a module in the brain is obviously an empirical
one. But although data and research are clearly necessary for answering this
question, they are not sufficient. There are significant disanalogies between
computers and biological systems which the computer metaphor forces us to
ignore, and which can render the centralized control theory dangerously
unfalsifiable. Mackey 1980 argues that because “the true concept of
programming transcends the centralist-peripheralist arguments. . . the term
‘central program’ is an oxymoron, and the concept unviable in the
real world” (pp. 97 and 100.)
After all, no computer program completely controls anything from a
central point. If it did, it would be, as Mackey points out, more like a tape
or phonograph record. The cognitive power of a program comes from its ability
to respond in different ways to different inputs. The instructions in the
program “detail the operations to be performed on receipt of specific
inputs” (ibid. p. 97), and without those inputs it would not be a program
at all. One could use these facts about computer programs to respond to all of
the objections that T&S raise against the CPG. One need only say that what
happens in the brain and/or genes is not a complete Central Pattern Generator. CPGs should instead be
seen as “generalized motor ‘schemata’, which encode only
general movement plans but not specific kinematic details.” (Thelen and
Smith 1994). The problem with this answer is that it can deal not only with all
of T&S’s objections, but with every possible objection that anyone
could ever make. Whenever an organism makes a motion, there will always be
something happening in the nervous system. This version of the CPG theory
enables you to call that neural activity the CPG, and everything else in the
body or environment mere “kinematic details”. And there would be no
reason you couldn’t do this regardless of the empirical results. Clearly it is not acceptable to use a
scientific theory which predetermines your answer before the data is in.
Why then does the distinction between program and
hardware work so well when we are talking about computers? In a computer, what is going on inside
the CPU is the program, and what is going outside the CPU is obviously
“peripheral” in some significant sense? Why doesn’t this distinction carry over to biology? I
believe that this is only because of the way computers are made and used in our
society, and that there is no comparable set of criteria that would enable us
to make the distinction for biological systems. Computers of a given brand are
all engineered the same, which makes it possible for the programmers to ignore
the hardware and create a control structure that resides in a central location.
Mackey’s description (mentioned above) says that a computer program must “detail the
operations to be performed on receipt of specific inputs”. In order to
specify these operations, however, the program must have a taxonomy of possible
input it will receive, so it can specify responses to each of them. With a
computer, we can tell ourselves that if we know the central program we know how
it works. The hardware never changes so it can be safely ignored. T&S point
out, however, that neural activity, unlike computer programs, does not have the
advantage of knowing precisely what kind of input it will receive. No two human infant bodies are alike,
and the bodily structure of the infant changes radically as the infant matures.
Because of these differences, radically different neurological development is
needed in order to produce the same behavior in two different people. Although
there are obviously things going on in the nervous system that are necessary
for developing locomotor skills, studying the nervous system isn’t going
to tell us the essential story if we don’t also know the peripheral
inputs that the nervous systems must interact with.
“There is. . .no essence of
locomotion either in the motor cortex or in the spinal chord. Indeed it would
be equally credible to assign the essence of walking to the treadmill than to a
neural structure, because it is the action of the treadmill that elicits the
most locomotor-like behavior. ( Thelen and Smith p.17)
We can thus see that although T&S frequently
use the word “module” to refer to the components that make
locomotion possible, their use of this term is very different from
Fodor’s. (as they explain in considerable detail on pp.34-37). They strongly emphasize that they
do not mean that there is an organ in the brain that produces or controls each
of these components. Furthermore, T&S’s modules, unlike
Fodor’s, are neither static nor informationally encapsulated. They grow
and change through time, and their borders overlap with each other. Their
interactions with each other are also not hardwired, which Fodor says is an
essential characteristic of modules (Fodor 1983 p.98). And most importantly,
T&S’s modules do not carve a cognitive system at its functional
joints. T&S’s main point is that what is happening in the nervous
system, or in the muscles, or in the bones, is functionally useless until it
sets up an effective equilibrium with various other parts of the body and with
the particular environment the organism is interacting with. That is why no
particular part of the nervous system or genes can be seen as a Central Pattern
Generator. That is also why T&S refer to Fodor’s modules as
“autonomous modules” to distinguish them from theirs.
There is nothing wrong in principle with not
following Fodor’s usage of the word “module”. But I need to
return to something closer to Fodor’s definition of “module”
if I am to make the central philosophical point of this paper. Fodor sees a
module as an organ in the brain that performs a single functional role all by
itself. He would probably describe T&S’s “modules” as
being fragments of modules. Consequently, if we were trying to find something
in a dynamic system which could be reductively equivalent to a Fodor’s
autonomous modules, T&S’s modules would not be up for the job. When
we look at chapter 4 of Thelen and Smith 1994, however, we see that they are
providing us with a detailed alternative that might be up for the job. They are
making a claim which I will describe thusly: 1) When an organism interacts with
its environment, the attractor basins of the resulting dynamic system perform
the functions that Fodor attributes to modules. 2) In order to study these
Fodorian modules, we must focus our attention not on physical space, but on
state space.
As I mentioned earlier, the difference between a
reductive identity and an outright elimination is only one of degree. When we
say that the concept of light can be reduced to the concept of electromagnetic
radiation, we are still acknowledging that the resulting new concept of light
is very different from the old one. For the reasons described above, among many
others, neither T&S nor I believe that the concept of attractor space is
exactly identical to Fodor’s concept of module. This is made more obvious
by the fact that T&S are more interested in what DST can do that classical
cognitive models cannot: interpret change as development and growth, rather
than dismissing it as ‘noise’. But a new theory can replace an old
one only if it is capable of explaining both what is inexplicable and what is
explicable to the old theory. T&S’ want to show that DST can do
things that GOFAI cannot. I want
to show that DST might also be able to replace GOFAI on its home turf if we
identify attractor spaces with Fodorian modules. And chapter four of Thelen and
Smith 1994. especially pp. 80 through 86, takes the first steps towards doing
exactly that.
When T&S
began researching the development of infant motor skills, they assumed
that they could account for them by measuring the neural voltages that were
being sent from the infant’s nervous system to its muscles.
Unfortunately, there was no repeating pattern that could be found. It was not
even possible to find a constant relationship between the voltages sent to the
flexor and extensor muscles. In theory, there had to be a precise alternation
between signals sent to the flexor and extensor muscles in order for the infant
to move its legs. In practice it didn’t always work out that way.
However, T&S were able to account for these variations with greater
accuracy when they saw the infant’s locomotor skills as emerging from the
interaction of several different factors, including the elasticity of the
muscles and tendons, the excess body weight produced by subcutaneous fat, the
length of the bones etc. When all of these factors were combined into what
T&S called a collective variable, it was possible to make sense out how the infant was learning to
move its legs. The effective movement emerged because this collective variable
gradually settled into “evolving and dissolving attractors” (p.
85). When the skill was fully developed, the attractor was a deep basin in the
state space i.e. only radical changes in the variables that defined the space
would throw the system out of equilibrium. But the times in which the infant
was still learning to walk would be mathematically described by saying that the
collective variables formed a system with shallow attractors i.e. a slight
change in any variable could cause the infant to topple over. To say that the
infant was learning to walk was to say that these basins of attraction were
adjusting themselves so that they gradually became deeper and more stable. Any
attempt to describe this process by referring to changes in only one of these
variables, such as the nervous system, would be essentially incomplete. The
only thing that you could identify as being the embodiment of the walking skill
was the system of attractor basins that existed when all of these factors
interacted in a single dynamic system. Consequently, it is these attractor
spaces that must be identified as the “walking module”.
T&S claim, I think correctly, that a complete living organism is
best understood by identifying all the significant variables that constitute
its behavior, both inside and outside the head, then measuring the patterns
that emerge as the resulting system fluctuates from one attractor space to
another. Nevertheless, what happens in the head is of course necessary (but not
sufficient) for these behavioral components to interact and create a dynamic
system. And there is good reason to believe that the brain itself is best
understood as a dynamic system. All DST analyses are incomplete, and limiting
the system being studied to parameters of brain states can often be a useful
way of drawing the borders of a Dynamic System. This is what Neurobiologist
Walter Freeman has elected to do, and like T&S, he has found that
identifying mental representations and functions with attractor basins is the
most effective way of understanding perception in the laboratory animals he has
studied. The fact that he came to this conclusion is strong evidence that these
attractor spaces are doing the work that Fodor attributed to perceptual modules.
Freeman and the Attractor
Landscape of the Olfactory Brain.
Unlike T&S, neurobiologist Walter Freeman is
willing to study cognitive functions by focusing entirely on the brain.
Nevertheless, their similarities are more important than their differences, for
Freeman believes that the brain itself is a dynamic system and not a system
made up of mechanical modules. Furthermore, Freeman was able to use dynamic
systems theory to account for a mental process that would be considered
cognitive by even the most orthodox GOFAI devotee. After training Rabbits to
recognize different kinds of odors, and measuring the neurological signals on
their olfactory bulbs, he decided that the best way to account for their
discriminative abilities was with the concepts of DST.
To use the language of dynamics.
. . there is a single large attractor for the olfactory system, which has
multiple wings that form an “attractor landscape”. . . .This
attractor landscape contains all the learned states as wings of the olfactory
attractor, each of which is accessed when the appropriate stimulus is
presented. Each attractor has its own
basin of attraction, which was shaped by the class of stimuli the
animals received during training. No matter where in each basin the stimulus
puts the bulb, the bulb goes to the attractor of that basin, accomplishing
generalization to the class. (Freeman 2000 p.80)
Freeman has thus come very close to articulating
the thesis of this paper: that cognition is best explained by identifying
mental functions, not with organs or modules, but with attractor basins. I,
however, agree with Thelen and Smith that neurological activity is not
sufficient to explain cognitive functions, and therefore we need to analyze the
attractor basins created by interacting variables throughout the
brain/body/world nexus. This is not a criticism of Freeman’s scientific work.
Every DST analysis has to focus on some variables and ignore others. Focusing
on the brain is as good a choice as any, as long as one remembers that it is
not the only possible choice. But I am saying that there is no essential
difference between T&S’s use of these principles and Freeman’s.
T&S are, I believe, correct in saying that locomotion cannot be effectively
understood with a modularity theory that assumes that each locomotive module
must be located in a particular spot in the brain. The most effective
alternative is to explain locomotion by identifying what T&S call modules
with attractor basins in state space.
Some might be tempted to ask whether T&S would
need such a complex conceptual apparatus. Is it really possible to think of
locomotion as a cognitive activity in a robust, non-metaphorical sense?
Distinguishing between different perceived items, such as odors, is a paradigm
example of the traditional view of perceptual cognition. But do walking,
running, and jumping really deserve to be members of the same category? If we
are going to answer that question fairly, we must have a definition of
cognition which will not prejudice our judgments in favor of the traditional
linguistic and perceptual idea of cognition. Fortunately Newell and Simon already
have formulated a definition, which was deliberately designed not to tip the
scales in favor of their own symbolic system hypothesis.
. . .we measure the intelligence
of a system by its ability to achieve stated ends in the face of variations,
difficulties, and complexities posed by the task environment. (in Haugeland
1997 p.83)
The
popular notion of muscular activity assumes that it simply an unconscious
mechanical activity which is “switched on” by the brain. One of the
reasons that Descartes believed that mind and body were fundamentally distinct
was that he believed it was impossible for a physical device to make rational
decisions that would vary so as to be equally appropriate in different
contexts. This was understandable,
because the most sophisticated machinery of his time was clockwork. The
humanoid automata that he had seen could do relatively complicated things, but
they were all stored in the machine in advance and would always be the same
regardless of what the outside world did. (see Dreyfus 1972 pp. 235-6) Once you
wound up a clockwork dummy, it would continue to do the same actions every time
you flipped the switch, even if that meant piling into a wall or plunging into
a fountain. It was only after the computer was invented that it was possible
for a machine to have some of the flexibility that we associate with rational
thought. Today, however, we are still in the grips of a Cartesian Materialism,
which assumes that the computer metaphor is applicable only to the brain. It is
often assumed that motor control does not involve decision making, but is
rather a matter of the brain flipping the switches of a variety of preset
muscular clockwork systems.
However, modern biology seriously weakens this
distinction between brain as computer and body as clockwork. We now know that
every step we take requires a constant flow of information between an organism
and its environment, and a variety of adjustments and “decisions”
made in response to that information. Ordinary walking is cognitive by Newell
and Simon’s definition, for it does have to “achieve stated ends in
the face of variations, difficulties and complexities posed by the task
environment”. It is “not controlled by an abstraction, but in a
continual dialogue with the periphery” (p. 9 Thelen and Smith 1994) The
following examples show that, in order to achieve those stated ends, the
walking organism must make “decisions” that can be seen as
functionally equivalent to the conditional branching which GOFAI expresses in
computer languages. And there is good reason to believe that these examples
could be the first of many more.
How Horses (and Other Animals) Move
The
ambulatory system of a horse divides into four distinct attractor spaces,
colloquially referred to as walk, trot, canter and gallop. Each of these
consists of a set of motions governed by complex input from both the
environment and horse's nervous and muscular system. Careful laboratory study
has made it possible to map the dynamics of each gait, (see Kelso 1995 p.70)
and each map reveals a multidimensional state space that contains a great
enough variety of possible states to respond to variations in the terrain, the
horse's heart and breathing rate etc., and yet regular enough to be
recognizable as only one of these four types of locomotion. There are no hybrid
ways for a horse to move that are part trot and part walk; the horse is always
definitely doing one or the other. And if all other factors remain stable, the
primary parameter that determines the horse's utilization of each gait is
usually how fast the horse is moving. From speed A to B the horse walks, from
speed B to C it trots, and so on. There is not an exact speed at which the
transition always occurs. If there were, a horse would wobble erratically
between the two gaits whenever it ran anywhere near those speeds. What usually
happens, however, is that the horse rarely travels at these borderline speeds
(unless it is being used as a laboratory subject). Instead it travels at
certain speeds around the middle of each range for each gait, because those are
the ones that require the minimum oxygen and/or metabolic energy consumption
per unit of distance traveled. This means that a graph correlating the horse's
choice of gait with its speed usually consisting of bunches of dots, rather
than a straight line, because certain speeds are not efficient with any of the
four possible gaits
We can make a computer model of the
horse's ability to adapt its gait to its speed using LISP, which is one of the
most popular GOFAI languages. LISP models cognitive processes by means of
commands that tell the program how to behave when it comes to a branch in the
flow of information, which seems isomorphic to a bifurcation in the flow of a
dynamic system from state space to state space. We'll start by positing four
subroutines we'll call WALK, TROT, CANTOR, and GALLOP, as well as a fifth subroutine we'll
call "CURRENT SPEED" which measures how fast the horse is moving.
Because we are only modeling the decision making process, rather than the entire
dynamic system, we will accept those as unexplained primitives. To these five
subroutines, we will add some basic subroutines from LISP:
1)
“defun”, which defines
a new subroutine
2)
“equal” which compares two numbers and checks whether they are equal
3)
“<” which compares two numbers and checks whether the first is
less than the second.
4)
“if. . . else”, the conditional which performs the decision making
process.
We
can now describe a possible program that essentially duplicates the decision function
of the horse's dynamic ambulatory system. We will posit convenient speeds for
each gait of 5,10, 15, and 25. the
LISP term “defun” will establish the word "GO" as the name of this
program.
(defun GO
(CURRENT-SPEED)
(if (equal CURRENT-SPEED 0) 0
(if ( < CURRENT-SPEED 5) (WALK)
(if ( < CURRENT-SPEED 10) (TROT)
(if ( < CURRENT-SPEED 15) (CANTER)
(if ( < CURRENT-SPEED 25) (GALLOP)
(else (GO) ) ) ) ) )
)
The complete program would have to contain four
definitions that looked something like this: (defun WALK (make the horse
walk)), and so on for the subroutines trot, canter, and gallop. The phrase
“make the horse walk” is of course deliberately empty hand waving,
because the details of the four gait programs are of no significance for the
point I am making. There is however other research which finds similar kinds of
conditional decision-making within the individual gaits used by animals. Taylor
1978, for example, describes research done with several different kinds of
animals, including birds, lions, and kangaroos, showing how changes in gait
require “recruitment” of different muscles and tendons. When any of
these animals is walking, a certain set muscles and tendons are brought into play,
and it is possible to measure how much energy is being used by each muscle by
measuring glycogen depletion. When an animal increases its speed, however, it
must run another “program” that decreases the reliance on those
muscles, and simultaneously recruits a different set of muscles. Taylor also
discovered that the relationship between speed and glycogen depletion turned
out to be dependent on several other factors as well. Gravitational energy is
stored by means of the stretch and recovery of muscles and tendons in the
faster gaits, making it possible for certain animals to actually use less
muscle energy when traveling at faster speeds. These relationships can only be described accurately by
means of complex conditional relationships very much like computer subroutines.
To
some degree, these examples[2] are an extension of Tim Van Gelder's
"computational governor" thought experiment. (Van Gelder 1995) Van Gelder's thought experiment showed
that if a computer were to duplicate the function performed by the device which
controls the speed of a Watt Steam engine, it would require fairly
sophisticated computations. Van Gelder claimed that this task was clearly
cognitive, that the Watt governor performed this task without computations, and
that the same kind of physics which underlies DST was the best explanation for
the Watt governor's cognitive abilities. This prompted some to say that the
Watt Governor really was computational after all (Bechtel 1998) and others to
say that the task was too simple to be called cognitive, and therefore the
analogy was spurious. (Eliasmith 1997). Others pointed out that the Watt
Governor was merely a feedback loop, and therefore DST must be (in Van Gelder's
own words summing up this criticism) "Cybernetics returned from the dead."
(Van Gelder 1999) My Horse LISP example is meant to be a partial answer to
these last two criticisms. The paradigm cognitive ability in computer science
is often considered to be decision making i.e. choosing between alternatives. An If-then-else command is certainly
more of a decision making device than a feedback loop, and this example shows
that the ambulatory system of a horse is a dynamic system that, among other
things, performs the function of an If-then-else command.
Some Possible Futures for DST
modules
If
the horse's ambulatory system is capable of making the kind of cognitive
distinctions that we ordinarily associate with high level computer programs
like LISP, and dynamic systems theory can explain how this is done by equations
that show bifurcations connecting sets of attractors, then perhaps we have
something like a reduction of certain aspects of the LISP computer language to
DST. And if it were possible to duplicate enough other branching functions
performed by LISP with bifurcations in dynamic systems, it would be tempting to
conclude that the symbolic system hypothesis had been reduced to being a subset
of DST, in much the same way that Newtonian physics was reduced to being a
subset of Einsteinian Physics.[3] This opens the possibility of an interesting
treasure hunt --one in which establishing that there was no treasure would be
as important as discovering it. There seem to be four possible extremes in what
research might eventually discover.
1) Dynamic Systems are incapable of
performing many bifurcations that are essential to cognition, and the horse
case described above is an isolated example from which we cannot extrapolate.
If it were possible to prove this mathematically, we might decide that
Cognitive DST was a blind alley.
2) Dynamic Systems are implementations
of classical computations, because basins of attraction (or some other feature
of dynamic systems) are identical with
certain computer subroutines the way electromagnetism is identical with
light.
3)
Dynamic Systems are cognitive, but in a way that has nothing to do with
classical computationalism. This would mean that DST could eventually eliminate
computationalist theories of mind, the way chemistry eliminated the alchemical
essences. (although computational theories would remain as useful to engineers
as ever.)
4) DST reduces computational theories of
mind not with identities but with more ambiguous relationships that make the
reduction more "bumpy" than "smooth". This would force us
to change our ideas both of what a subroutine is, and what a dynamic system is.
I
personally would bet on 4), but any one of these conclusions would be an
important discovery. For example, it would be very convenient if we could
simply find styles of dynamic bifurcation that corresponded to each of the five
LISP primitives described in McCarthy 1960. Then we would have a perfect
reductive identity between LISP and those particular dynamic systems, which
would produce the result described in 2). But the chances of things working out
exactly that neatly are very slim, for a variety of reasons.
For
one thing, it is far more likely that most cognitively effective DST
bifurcations will require several lines of code, or even whole programs, to be
modeled effectively. Conversely, a computer program simulating a dynamic system
would contain elements that would be unnecessary in the original system. Our
model of the horse ambulatory system, for example, contains several elements
that presuppose a computer's need to search and choose before each action. The
recursive terms in our horse LISP subroutine made it possible to compare the
value of the incoming speed variable to each of the gait subroutines in
sequence until the correct one was found. Dynamic systems do not have any need
for this kind of comparing function. They shift among different sets of
attractors when certain parameters change in value, but in no sense do they
"consider" other alternatives before they shift. They do it right the
first time. A connectionist net,
for example, does need a training period to adjust its weights to perform the
proper output. But unlike a computer program, it does not need to reconsider
all of the wrong choices after it has been trained.[4]
These dissimilarities could be a strength, however, if they helped to account
for many of the differences between real organic systems and their GOFAI
idealizations, such as the former's ability to move fast enough to interact
with the real world.
Secondly,
if we discover that a bifurcating dynamic systems can duplicate the branching
functions of computer subroutines connected by a LISP decision tree, the
dynamic system will still remain free of many of the limitations of the modular
architecture of computers. In a sense, the attractor spaces in a dynamic system
are both informationally encapsulated and domain specific to some degree. But
they also possess a flexibility that frees them from the limitations that were
unavoidable for Fodor's modules.
Can There be Distributed
Modules?
When
connectionism first appeared on the AI scene, it was seen as radically
non-modular, because everyone was struck by the fact that it used what was
called distributed representation. The usual claim, both defended and attacked,
was that in a connectionist net there was no single place where a particular
bit of information was represented.
I believe that the proper approach to this controversy is to remember
that a connectionist net is one kind of dynamic system, and that this means its
fundamental parts are not modules that exist in physical space, but basins of
attraction that exist in computational space. It may be that connectionist AI
was guilty of a kind of misplaced concreteness when it saw itself as modeling
the behavior of organlike neural structures, rather than the state spaces of dynamic
patterns. I am tempted to think of
the connectionist modules used in contemporary AI as little dynamic systems
imprisoned like birds in cages, so that they can communicate with other modules
only by means of input-and-output devices.
The current engineering perspective
tends to see connectionism as one more trick in an AI toolbox which is still
running on fundamentally GOFAI principles. The two most common approaches for
interfacing connectionism with GOFAI systems are:
1) Creating a virtual connectionist environment on a standard digital computer
system. These virtual
connectionist programs function as modules within a fundamentally
undistributed system. Although there is arguably distributed processing going
on within the virtual module created by these programs, the module communicates
to the rest of the system by means of standard input and output connections. It
thus functions by exchanging information the same way any other modular system
exchanges information. These connectionist programs are really only subroutines
that the digital computer calls up when it needs to activate them in a larger
programming context. This is why the designers of the Joone neural net
framework claim that that their programming environment makes it so that “Everyone
can write new {connectionist} modules to implement new algorithms or new
architectures”. (www.jooneworld.com).
2)
There are some computer chips which use genuinely analog connectionist
processing, although until recently very little AI work has been done with such
chips. The reasons for the initial failure of genuinely connectionist modules
are of little philosophical significance.
{The first analog connectionist chips} failed
for two reasons. First, the actual improvement in performance over software
running on a conventional processor was not that great. Secondly, five to 10
years ago you could not implement sufficiently large neural networks in
silicon. (Hamish Grant quoted in P.
Clark 1999)
There is thus no reason to deny that genuinely analog
connectionist chips will eventually be quite common. However, even if genuinely
connectionist processors do replace
virtual ones, this would not change the fundamentally modular nature of the
systems in which they are embedded. Even though there is no question that the
processing taking place within these chips is genuinely distributed, the
distribution stops when you hit the borders of the chip. The fundamental
computational tool inside these modules is state space transformations, just as
in the dynamic systems we discussed earlier. But the state spaces in the
connectionist chips are unnaturally easy to isolate. This makes them useful for
engineering, but very misleading biologically. Of course, real neurons really
do have inputs and outputs with reasonably exact voltages and weight
summations. And by replicating those in silicon, it becomes possible to create
modules that perform state space transformations on specific inputs. But as long as we see this as the only
way of utilizing connectionism, the relationship between connectionist and
other dynamic systems becomes obscured, and connectionism loses almost all of
its original revolutionary force. A connectionist net becomes rather like an AI
"toy world" version of a dynamic system, and is still subject to many
of the objections raised by Dreyfus against GOFAI systems. (See Dreyfus 1994
p.xxxiii-xxxviii)
The
creation of these connectionist modules makes sense from an engineering
perspective, at least in the short term. It enables us to use GOFAI and
connectionist systems in partnership, which results in the fullest utilization
of all of our engineering resources. But it also closes the door on
further development of the
distributed representations that
make organic systems so much more flexible than GOFAI systems. The fundamental building blocks of
these systems are, like any other GOFAI system, physically distinct modules
that are located on different parts of a circuit board, (or in the case of the
virtual nets, different regions of RAM or hard disk space), not attractor
basins located in different regions of computational space. They are thus
limited from performing Quinean and Isotropic processes for all the reasons
described in Fodor 1983.
However,
if we could show how a DST system could perform the functions of a GOFAI system
using attractor spaces as something like distributed modules, further progress
might be possible. To most people in the field, the idea of a distributed
module is a contradiction in terms, and as long as this is the case DST will
never be able to establish reductive relationships with the modular concepts of
GOFAI. Some proponents of DST like it that way, for they want to go for broke
for a total elimination of GOFAI by DST. But I don't think this is a very
realistic view of how either reductions or eliminations operate in the history
of science. If there is no relationship at all established between one domain
of discourse and another, there is
no way of establishing that the two discourses are even talking about the same
thing. As Dennett pointed out, no one would accept an elimination of the
concept of Santa Claus which claimed that Santa Claus is a skinny man named
Fred who lives in Florida, plays the violin, never buys gifts for anyone and
hates children (Dennett 1991 p. 85) At the very least we need to show why
people thought that the concepts of the old theory were legitimate. We may
eventually decide that a given reductive relationship is so cock-eyed that it
would be better described as an elimination than an identity. But we have to
begin by positing identities between things in the old and new theories, and in
this case, the concept of a distributed module seems to be a good place to
start.
The Nature of Distribution
Fodor claims most of the time that his
modules are not organs with concrete locations in the brain, but rather
abstract faculties defined by the functions they perform. A module is thus
"individuated by its characteristic operations, it being left open whether
there are distinct areas of the brain that are specific to the function that
the system carries out" (Fodor 1983 p.13). In the breach, however, Fodor
usually speaks as though his modules probably are organs in some sense. This is
most noticeable on p.98 of Fodor 1983, which has the heading "Input
systems are associated with fixed neural architecture". I can see no
difference between an organ and fixed neural architecture. Although Fodor
admits that there might not be distinct areas of the brain for each function ,
he apparently does not take this possibility really seriously. The only real
cash value of this assumption for Fodor is to permit him to describe the
function abstractly, and ignore as mere "hardware problems" exactly
how the function is physically embodied.
This
strategy became more obvious when many people began to claim that connectionist systems were not modular
because they used distributed processing. Fodor's response was basically to say
that connectionist systems were distributed only physically, and that
functionally they were still modular. (Fodor and Pylyshyn 1988) However, he has
never really explained how a system could be physically distributed yet
functionally modular. I think, however, that if DST does deliver on its promise
as a cognitive science paradigm, there is a sense in which distributed systems
can be described as modular in some sense, although with several important
qualifiers.
Van
Gelder 1991 claims, I think correctly, that the essence of distribution is
summed up in a concept he calls superposition. For our purposes, I think
Van Gelder's concept of superposition is effectively illustrated by the
following series of examples. Let us consider a set of 26 cards, each of which
has a letter of the alphabet on it. In this case, the representation of the
alphabet is completely modular and undistributed. Each card represents exactly
one letter of the alphabet, without any reliance on the other cards. Now let us
suppose that instead of twenty six cards we have only 10 cards. We lay the
cards out on the floor in a set pattern, and paint each card white on one side
and black on the other. We then represent "A" by turning a certain combination of
black and white surfaces face up (Odd cards black, even cards white, for
example), another combination is posited as representing "B", and so
on for all twenty six letters. In this case the representation of the alphabet
is superposed on all ten cards, because no one card represents any one
letter. Despite the fact that we have only ten cards, and all of them are used
to represent a single letter, we do not end up with less representational
power, but far, far more. There are, in fact, 210 possible
combinations of black and white, leaving us 998 (1024-26) possible other combinations to represent
whatever else we like. (the Russian and Sanskrit alphabets, perhaps). If we
used ten six-sided cubes with a different color on each side, instead of cards
with only two sides, the number of
possible combinations would be 610.
In
the one-card-per-letter system, each card could be seen as a module both
physically and functionally. One physical piece of paper performs exactly one
linguistic function, no more no less. In the superposed system with the black
and white cards, however, no one card is a letter. Instead, each card functions
like an axis in a Cartesian state space, and each axis has exactly two points
on it. (one for the black side of the card and one for the white) When the
cards are replaced with six sided cubes, each cube functions as an axis with
six points on it. And because the cards are performing this Cartesian function,
they make it possible to conceive of each of these letters as a point in the 10
dimensional state space defined by these 10 cards or cubes. Consequently, we
can functionally represent all twenty six letters without needing twenty six
distinct physical cards.
In their classic paper, Ramsey Stich and Garon
(1991) have claimed that because beliefs are represented distributively in a
connectionist system, we should therefore conclude that strictly speaking, from
a scientific standpoint, beliefs are not represented in our brains at all.
However, the above explication of distributed representation shows why this
inference is invalid. The ten cards in the example above really do represent
all twenty six letters of the alphabet, even though no one card represents any
single letter. They are able to do this because there is a point in the state
space of possible card positions that represents each letter. And similarly, the basins of attraction
in a dynamic system are capable in principle of doing every bit as much real
cognitive work as an undistributed “physical” module. They are
every bit as physically real as gravity, capacitance, voltage or any other
theoretical scientific entity that we cannot see, feel, hear, or trip over. The
state space modules described by DST are grist for the scientific mill, capable
of being studied and measured. And more importantly, because they are more like
events than objects, they have a speed and flexibility which is lacking in the
material entities that Fodor called modules, and anatomists call organs.
State Spaces Vs. Organs
The
concept of "organ" is, as Fodor points out, the biological equivalent
of a module, and it will probably continue to be useful. But it is based on a
possibly misleading assumption: that morphology is always an accurate guide to
function, because the body, like a Hi-Fi set, is supposedly divided up into
distinct modules with materially delineable borders. Those who accept this
assumption acknowledge that it may require microscopes, or sophisticated
staining techniques, to find those borders. But the assumption remains that
once one has marked out those borders, one has carved the brain or body at it's
fundamental joints, and that the purpose of neuroscience is then to answer
questions like "what is the hippocampus for?". But attractor basins
and orbits are far more volatile entities than modules, and their boundaries
and functions are far more flexible. Strictly speaking, they are much more like
events than objects. They endure longer than most events, but in this respect
they resemble events like tornadoes or waterfalls, which seem object-like
because the flow of their constituting processes cohere in a stable pattern.
We
must not discount the possibility that morphology is not the essential factor
in determining function. The gray matter of the brain could be seen as
primarily a medium through which dynamic ripples eddy and coalesce into state
spaces. The fact that people can often relearn skills lost after brain damage,
even though the "organs" supposedly responsible for those skills have
been damaged or surgically removed, could offer support for that possibility.
Although damage to the central
nervous system often results in permanent disruption. . .In some cases there is
a return to normal or near-normal levels of performance even after extensive
loss of nervous tissue. . .some see the fact of sparing and recovery as a
direct challenge to the principle of localization (Laurence and Stein 1978 pp.
369 & 394)
The
idea that functions are not modular and localized is still highly
controversial, and understandably so. To say that the brain can perform complex
functions without different parts of it doing different things seems to be
either magical or nonsensical. But if mental functions were localized in state
space, but not strongly attached to a material location in the brain, this
would provide an alternative to traditional modularity which is in principle
capable of making sense. In their concluding remarks, Lawrence and Stein admit
that modern neuroscience “cannot claim to have solved the riddle of
recovery” (ibid. p. 401) so clearly there is room for the development of
new theories. Such a theory might also be able to account for the fact that
brain scans of different people (or even the same person at different times)
often reveal brain functions taking place at noticeably different locations.
(John O’Keefe, Institute of Cognitive Neuroscience, University College,
London. Personal communication). However, the most dramatic support for
separating function from location is Mezernich’s and Kaas’s
research on the primary somatosensory cortex of new world monkeys.
Before Mezernich and Kaas, the model for
somatasensory function was the modular view of Penfield and Rasmussen. In 1957,
they published a map of the surface of the cortex, which reputed to show a one
to one relationship between parts of the body and the parts of the brain that
controlled them. This view of brain function implied “both that these
neatly ordered representations were established in early life by the anatomical
maturation of the nervous system, and that they were functionally static
thereafter.” (Thelen and Smith 1994 p.136). The work of Mezernich and
Kaas, however, “has forced a drastic reexamination of those
beliefs.” (ibid). Their experiments showed that what part of the brain
controlled what part of the body was shaped by how the monkeys used their
hands, and that the region of the cortex that controlled any given finger could
be shifted from one part of the brain to another by inhibiting the
monkey’s ability to move its hand and fingers. If a digit was amputated,
the region of cortex which formerly controlled the missing digit would be
‘taken over’ by the remaining digits. If two digits were fused
together, and used by the monkey for an extended time as a single digit, the
two controlling regions in the cortex would fuse together. Once the
monkey’s digits were freed and separated, the two controlling regions
separated again. Nor was this experiment an isolated anomaly.
“Subsequently, investigators have found similar reorganization for
somatic senses in subcortical areas and in the visual, auditory, and motor
cortices in monkeys, and in other mammals.” (ibid). This is not the sort
of thing that happens in a modular system, where each part does the job it was
designed to do and nothing else.
On the other hand, we also shouldn't assume that
we must make an either/or choice between a dynamic brain and a brain assembled
from organs. To continue the ripple metaphor (if it is only a metaphor), even a
river's dynamic flow is shaped by the contours of the riverbank. In a similar
way, perhaps the biological structure of the brain would make it more likely
that certain dynamic patterns would stabilize into recurring attractor spaces,
just as certain kinds of eddies and tide pools are more likely to form if the
river banks contain the appropriately shaped inlets. Only future research will
determine the exact relationship between the “organs” in the brain
and dynamic patterns that flow through them. But there seems to be strong
indications that we can’t unquestioningly adopt the “one
organ—one function” relationship presupposed by traditional
modularity theory.
Informational Encapsulation
and DST
The
connections between state spaces in embodied dynamic systems are not hard wired
the way they are in AI connectionist systems. The bifurcations between
attractor spaces are not specific connective neurons, they are only abstract
measurements of forces. As the parameters that shape these forces shift, so do
the cognitive characteristics of the attractor. For this reason, it is highly
implausible that attractor basins in dynamic systems are informationally encapsulated
in the way that Fodor claimed modules must be. Many of Fodor's arguments for
informational encapsulation (for example, the fact that modules must have fast
response times), require only that the module be informationally encapsulated at
the time it is performing its function. Fodor is correct when he says that
"in the rush and scramble of panther identification, there are many things
I know about panthers whose bearing on the likely pantherhood of the present
stimulus I do not wish to have to consider" (Fodor 1983 p. 71 italics in original). After the rush and
scramble have subsided, however, there is no reason that the module which
enables us to instantaneously identify panthers shouldn't receive all sorts of
input from other sources, and reconfigure itself so as to make more effective
responses the next time we see a panther. Consider another example: From my
experience as a musician, and from conversations with other musicians, I know
that learning to sight read music requires the development of sets of quick
response connections between the eyes and hands. However, a set of responses
which are very effective for one style of music do not yield the necessary
speed for another style. Even if one can sight read Bach fluently, it is likely
that difficulties will arise the first time one tries to sight read Duke
Ellington until one has learned several new pieces in his style. But because
the sight reading "module" is not permanently informationally encapsulated, it is possible for
it to take in new information the more one studies a new style of music, and
thus learn the reflex like speed that makes fluent sight reading
possible the next time around.
Furthermore,
it is possible in principle for attractor spaces to receive influxes of new
information with very little changes in material structure. Fodor claims that "if you facilitate the flow of
information from A to B by hardwiring a connection between them, then you
provide B with a kind of access to A that it doesn't have to locations C,D,,E,
. . ." (Fodor 1983 p.98). But although this is true for hardwired modules,
it is not true for bifurcations in a dynamic system. For them, informational interpenetration is probably the
rule, rather than the exception. We must remember that "invariant
set" is a highly conditional term in DST. "Invariant" really
only means that there is a pattern that stays stable long enough to be noticed
and (partially) measured. Given the number of parameters that must reach some
kind of equilibrium for an invariant set to emerge, it is highly unlikely that
they will always remain stable enough to produce anything that could be called
informational encapsulation. The
slightest flicker in the parameters that hold an invariant set stable could
bring in information from almost anywhere else in the system, which could
change the system (hopefully for the better) when it restabilized.[5]
Walter
Freeman research on the olfactory systems of rabbits gives very strong evidence
that perceptual brain processes do exactly that. When Freeman began his
researches, one of his goals was to find out how rabbits could tell one smell
from another. He assumed that there would be a particular pattern of neural
activity in the rabbit’s brain which represented or “stood
for”, each smell that the rabbit could recognize. Freeman did discover
that when a rabbit was trained to recognize a particular odor, the AM signal
recorded on its olfactory bulb would produce a recognizable pattern when the
rabbit smelled that odor. But when he trained the rabbit to recognize a new
odor, then returned to the first odor, the rabbit’s olfactory bulb did
not emit the same signal as it had before. Instead it emitted a signal which
contains elements of the signals caused by both odors. When the rabbit learned to recognize a
third odor, all three signals interacted and changed each other in a similar
way. In other words, there was no indication of an alphabet or code in the
rabbit’s brain that created a symbolic structure in which each symbol
corresponded to something distinct in the outside world. Instead, every new
olfactory experience had an effect on all of the “symbols”, so that no one of them was
informationally incapsulated with respect to the other. To paraphrase and reply
to Fodor’s objection mentioned above: There is no need to hardwire a
connection between A and B in order to facilitate the flow of information
between them. On the contrary, A and B and the rest of the alphabet don’t
have to be connected, because they are all attractor basins in the same dynamic
system. As Freeman puts it
“When a new class is learned, the synaptic modifications in the neuropil
tissue jostle the existing basins of the packed landscape, as the connections
within the neuropil form a seamless tissue. This is known as attractor
crowding. No basin is independent of the others.” (Freeman 2000 p.80)
Domain Specificity and DST
In
a dynamic system, domain specificity could also be every bit as flexible. The
connections between state spaces in embodied dynamic systems are not hard wired
the way modules are in AI connectionist systems. The bifurcations between
attractor spaces are not specific connective neurons, they are only abstract
measurements of forces. As the parameters that shape these forces shift, so do
the cognitive characteristics of the attractor. Kelso1995 points out that
almost anyone can pick up a pen with their toes and write their name the very
first time they do it. If this ability were stored in a rigidly domain-specific
module, it would be hard to see how this were possible. The actual neurological
signals for the commands that make our arms do this would have very different
values and relationships from the same commands when sent to our legs. Our Legs are longer than our arms. If the legs received a
signal that was designed to move the arms three degrees to the left it would
obviously make the foot much further than the same signal would move the hand
if it were sent to the foot. The difference between musculature, toe versus
finger size etc. would make transporting the same signal from leg to arm be as
impractical as trying to repair a car with sewing machine parts, or running a
MAC compatible program on an IBM.
On the other hand, if the "module" that
made this skill possible were an invariant attractor set, one could bend the
vector transformations in the attractor set just enough to do the different but
related task, in much the same way that one can change the shape of a soap
bubble by changing the forces that shape it. The fact that connectionist nets,
which are dynamic systems, are much better at dealing with degrees of
similarities than are digital computers give strong empirical support for that
possibility.
In
the simplified LISP simulation described above, we posited an arbitrary speed
at which the horse would switch from one gait to the next. But a real horse
would vary the speed at which it shifted gaits depending on changes in several
factors. Each one of these four gaits has a tremendous cognitive flexibility,
because it is governed by a multidimensional state space that contains a great
enough variety of possible states to respond to variations in the terrain, the
horse's heart and breathing rate etc., and yet regular enough to be
recognizable as only one of these four types of locomotion.
The
balance between flexibility and stability is achieved because all of the forces
in this system of tensions are interacting in a fluctuating equilibrium; again,
rather like a soap bubble in the wind, but a bubble that is suspended in
several dimensions instead of only three. As the terrain shifts from smooth to
rocky to muddy, or the horse becomes more winded, etc. the system of tensions
that determines the shape of this multi-dimensional soap bubble shifts
accordingly. This enables the gallop or trot to be flexible enough for the
horse to respond to the changes in the terrain and its own physiological state,
yet stable enough to still be a gallop or trot. If all other relevant factors
remain the same (say if the horse is running on a treadmill in a laboratory),
the decision of when to switch to which gait will be made almost entirely by
the speed parameter. But when a horse is out traveling through the real world,
all of these factors interact to maintain the system of tensions which is a
particular gait. The switch from one gait to the next is decided by a
"consensus vote" amongst these various forces, which shifts the
entire system of interacting parameters so as to form another kind of
multi-dimensional bubble. (i.e. produces a bifurcation to another basin of attraction,
such as from walk to trot.) This is domain specificity of a sort, but the
borders of any given attractor space are far more flexible than the borders of
the domains of GOFAI modules. There is no need to hard-wire a connection
between the various “subroutines”, because none of the subroutines
are completely separate from each other to begin with. As Freeman points out,
The natural tendency of the attractor spaces is to crowd into each other and overlap, because
the “parts” exist only as events in the flow of the system. This is
what makes it possible for them to deal with those ambiguities in the real
world which are too difficult for the rigid modular abilities of GOFAI systems.
For these reasons, it is highly
implausible that attractor sets in dynamic systems are informationally
encapsulated and domain specific in the way that Fodor claimed modules must
be. If further research reveals
that attractor sets are as good at emulating modular processes as they appear
to be now, without the inflexibility of hard wired modules, then it might be
possible to bridge the nasty moat that Fodor posits between modules and
Quinean-Isotropic processes. And this might justify a meta-modification of the
closing exclamation in Fodor 1983 from "Modified rapture!" to just
plain garden variety rapture. Or at least hope.
Bibliography
Bechtel,
W. (1998). Representations and cognitive explanations: Assessing the
dynamicist challenge in cognitive science. Cognitive Science, 22, 295-318.
http://www.artsci.wustl.edu/~bill/REPRESENT.html
Bickle,
J. (1998) Psychoneural Reduction: the New Wave. MIT
Press Cambridge Mass.
Churchland,
Paul (1989) a Neurocomputational Perspective. MIT
Press Cambridge Mass.
Clark,
P. (1999) “ Startup implements silicon neural
net in Learning Processor” in EE Times. <http://www.eetimes.com/story/OEG19990914S0033>
Dennett,
D. (1991) Consciousness Explained MIT Press Cambridge, Mass.
Dreyfus,
H. L. (1972/1994) What Computers Still Can't Do MIT Press
Cambridge, Mass.
Eliasmith,
C. (1996). "The third contender: A critical examination of the dynamicist
theory of cognition. Philosophical Psychology. "Vol. 9 No. 4 pp. 441-463.
Reprinted in P. Thagard (ed) (1998) Mind Readings: Introductory Selections
in Cognitive Science. MIT Press.
Finger,
S. (ed.) (1978) Recovery from
Brain Damage Plenum Press, New York.
Fodor,
J. (1983) The Modularity of Mind
MIT Press Cambridge, Mass.
Fodor,
J. (1985) “Précis of The
Modularity of Mind” published in Minds Brains and
Computers (2000) Edited by
Cummins and Cummins. Blackwell Publishers. London. first published in Behavior
and Brain Sciences, 8, 1985.
Fodor
and Pylyshyn (1988) “Connectionism and Cognitive Architecture: A Critical
Analysis” in Haugeland 1997
Freeman,
W. (2000) How Brains Make up their Minds Columbia University Press, New York.
Haugeland,
J. (Ed.) (1985) Artificial Intelligence: The Very Idea MIT Press Cambridge, Mass
Haugeland,
J. (Ed.) (1997) Mind Design II MIT
Press Cambridge, Mass
Kelso,
J.A. Scott (1995) Dynamic Patterns
MIT Press Cambridge, Mass.
Laurence
and Stein (1978) “Recovery after Brain Damage and the Concept of
Localization of Function” in Finger (1978)
MacKay,
W.A. (1980) “The Motor Program: Back to the Computer” in Trends
in Neurosciences—April 1980
pp. 97-100.
McCarthy, J.
(1960) Recursive Functions of Symbolic Expressions and Their Computation by
Machine, Part I Communications of the ACM
(Association for Computing
Machinery), April.
Port,
R.F. and Van Gelder, T., eds (1995) Mind as Motion: Explorations in the
Dynamics of Cognition MIT Press, Cambridge, Mass.
Ramsey,
Stich and Garon (1991) “ Connectionism, Eliminativism and the Future of
Folk Psychology” in Stich S., Rumelhart D. & Ramsey W. (eds) Philosophy
and Connectionist Theory. Hillsdale N.J.: Lawrence Erlbaum
Associates. Also reprinted in Haugeland, J. (Ed.) (1997)
Rockwell, W. T. (1995) "Can Reductionism be Eliminated?"
presented at the American Philosophy Association Meeting (Pacific division) in
San Francisco (with commentary by John Bickle). rewritten as "Beyond Eliminative
Materialism". http://www.california.com/~mcmf/beyondem.html
Taylor, R.C. (1978) “Why Change Gaits? Recruitment of Muscles and Muscle
Fibers as a Function of Speed and Gait” in American Zoologist 18, 153-161
Uttal,
W.R. (2001) The New Phrenology MIT Press, Cambridge, Mass.
Van
Gelder, T. (1991) What is the 'D'
in 'PDP'? An Overview of the Concept of Distribution. in Stich S., Rumelhart D. & Ramsey W. (eds) Philosophy
and Connectionist Theory. Hillsdale N.J.: Lawrence Erlbaum
Associates.
__________(1995) What Might Cognition Be
If Not Computation?
Journal
of Philosophy ; 92, 345-381. Reprinted as: The Dynamical Alternative. in
Johnson, D. & Erneling, C., eds., Reassessing the Cognitive Revolution:
Alternative Futures. Oxford:
Oxford University Press; 1996.
___________(1999) "The Dynamical
Hypothesis in Cognitive Science" and "Authors Response" Behavior
and Brain Sciences.
[1]Dreyfus 1996 paraphrases Merleau Ponty's use of
same soap bubble analogy to explain how we acquire what he calls maximum grip
on our world. Maximum grip is repeatedly described using terms that have been
incorporated into DST. "As an
account of skillful action, maximum grip means that we always tend to reduce a
sense of disequilibrium. . .Thus the 'I can' that is central to Merleau-Ponty's
account of embodiment is simply the body's ability to reduce tension"
Dreyfus 1996 par. 42)
[2] Thanks to Barry Smiler of Bardon Data Systems for
my first advice on LISP programming, and for pointing out that this program did
not need to have ways of dealing with negative numbers because "The horse
isn't going to run backwards".
The final version of this program was written by Jason Jenkins of
Stanford Research Internantional.
[3]More accurately, what we are trying to do is to
reduce the similarities between minds and computer languages to such a
dynamic system. The differences will remain, and anyone that relies on
computers to do things that people cannot do (or vice versa) will rightly
exclaim "Viva la difference!"
[4] This is also a helpful argument against Fodor's
claim that a fast system cannot be
Quinean and isotropic. Fodor says "if there is a body of information that
must be deployed in. . .perceptual identifications, then we would prefer not to
recover that information from a large memory, assuming that the speed of access
varies inversely with the amount of information that the memory contains (Fodor
1983 p.70) This assumption is unavoidable only if we also assume that every
system must follow the GOFAI procedure of considering several possible options
before acting.
[5] Those who admire John Dewey's prophetic abilities
might enjoy this passage from Dewey 1896(!)
The
'stimulus. . . is one
uninterrupted, continuous redistribution of mass in motion. And there is
nothing in the process, from the standpoint of description, which entitles us
to call this reflex. It is redistribution pure and simple; as much so as the
burning of a log, or the falling of a house or the movement of the wind. In the
physical process, as physical, there is nothing which can be set off as
stimulus, nothing which reacts, nothing which is response. There is just a
change in the system of tensions. (italics mine)
To my knowledge,
Dewey never used the mathematics of dynamic physics to understand the behavior
of living creatures. But it is impressive that over a hundred years ago, he was
able to conceive of psychological
processes as being best understood as stabilized patterns of interlocking
dynamic forces.