Draft NSF Prospectus/KDI Initiative
DRAFT 04/14/97 DRAFT 10
NSF PROSPECTUS
KNOWLEDGE AND DISTRIBUTED INTELLIGENCE (KDI)
Vision
Many of the most pressing problems in science and engineering are more complicated than anything
done before. Their solution depends on understanding complex behaviors across multiple scales of
time or space, or on manipulating vast amounts of data, or on merging data and models. What is
more difficult, their solution depends on building and exploiting a dynamic interplay between
simulations of the behavior of a system and measurements of the way the system is actually
behaving. Most telling, their solution depends on combining the insights of several disciplines.
(Examples of such problems are given below.)
Modern technologies and practices are revolutionizing the way we gather and handle information.
There is great potential for both science and society to build upon the fruits of this revolution, to
magnify our ability to understand and manage ever-larger and more complex systems such as those in natural, social, material, financial, and manufacturing
spheres. The vision of the Knowledge and Distributed Intelligence (KDI)
initiative is to achieve, across teh scientific community, the next generation
of human capability to:
- Generate or gather, and represent more complex and cross-disciplinary
scientific data and information, from new sources and at enormously larger
scales;
- Tranform this information into knowledge by combining, classifying, and
analying it in new ways;
- Collaborate in groups and organizations, sharing this knowledge and
working together interactively across space, time, disciplines, and
scientific cultures to multiply results.
With this approach, KDI aims to foster distributed intelligence as a foundation for advancing all areas
of science and engineering research and education, and to spin off important new results and
technologies with positive benefit to society at large. KDI will support these advances by enabling
more highly interactive, multidisciplinary, and collaborative paths to innovation. To achieve these
aims will require major advances in collaboration processes and infrastructures, computational and
representational capabilities, and our understanding of socio-technical interactions.
Challenges And Strategies
Our greatest challenges are to increase the productivity and impact of science and to cope with
scientific problems of greater complexity. KDI aims to meet these challenges by mobilizing the
distributed knowledge and capability of multiple science communities: surfacing common goals,
increasing interaction, improving cross-disciplinary understanding, and raising the collective power of
tools and methods. The KDI strategy is to support research that:
- Generates greater understanding of phenomena of distributed intelligence and collective behavior
in human, automated, and natural systems;
- Creates the next generation of mathematical, computational, data-oriented, and organizational
methods and infrastructure, which will exploit multidisciplinary distributed intelligence to advance
science and engineering;
- Enhances human ability to create and use knowledge in groups, organizations, and communities
through advances in human infrastructure, technology, and education.
As the primary governmental body that funds research in all of the relevant domains of knowledge,
as well as in education, the NSF is the most appropriate agency to spearhead this initiative. NSF-
funded research has already produced a body of knowledge in networking, computational
approaches to complex problems, collaboration technology, socio-technical systems and impacts,
learning, and education that provides a basis of community capability and infrastructure for
addressing KDI aims.
This initiative is an intellectual focus for collaborative, multidisciplinary thinking on three
complementary aspects of knowledge and distributed intelligence:
- increasing interaction, knowledge/tool integration, collaboration, and understanding within
communities and across disciplines through Knowledge Networking (KN);
- extending the power of tools, models, and simulations to represent and manage complex
systems through New Challenges in Computation (NCC); and
- extending our ability to learn and create through Learning and Intelligent Systems (LIS).
Learning and Intelligent Systems (LIS) seeks to stimulate interdisciplinary research that will
unify experimentally and theoretically derived concepts related to learning and intelligent systems,
and that will promote the use and development of information technologies in learning across a wide
variety of fields. The LIS initiative focuses on fundamental scientific and technological research
undertaken in the rigorous and disciplined manner characteristic of NSF-supported endeavors. The
initiative will ultimately have a major impact on enhancing and supporting human intellectual and
creative potential. Development of new scientific knowledge on learning and intelligent systems and
its creative application to education and to learning technologies are integral parts of the initiative.
Knowledge Networking (KN) focuses on the integration of knowledge from different sources and
domains across space and time. Modern computing and communications systems provide the
infrastructure to send bits anywhere, anytime in mass quantitiesÑradical connectivity. But
connectivity alone cannot assure (1) useful communication across disciplines, languages, cultures;
(2) appropriate processing and integration of knowledge from different sources, domains, and non-
text media; (3) efficacious activity and arrangements for teams, organizations, classrooms, or
communities, working together over distance and time. In short, we have connectivity, but not
interactivity and integration. KN research aims to move beyond connectivity to achieve new levels
of interactivity and to deepen our understanding of the ethical, legal, and social implications of new
types of interactivity, so as to increase the semantic bandwidth, knowledge bandwidth, activity
bandwidth, and cultural bandwidth among people, organizations, and communities.
New Challenges in Computation (NCC) focuses on research and tools needed to model, simulate,
analyze, display, or understand complicated phenomena, to control resources and deal with massive
volumes of data in real time, and to predict the behavior of complex systems.
Phenomena, data, and systems of interest so exceed in scope, multiplicity of scale, and
dimensionality what can be handled by present techniques that incremental advances do not suffice.
New computational schemas, such as quantum computing or biomimetic computing, are needed.
Models, tools, and resources must be shared among many researchers in different places.
Moreover, a key need is immediacyÑcontrol of networked resources in real time and tailored to
people's needs and capabilities, Òon-the-flyÓ analysis of data to guide experiments or manage
situations as they happen. These featuresÑscope, multiple scales, dimensionality, shared use,
and immediacyÑdistinguish NCC from the earlier work on which it builds. NCC aims to enable
collective understanding and effective management of complex systems. These aims will require
major advances in hardware and software to handle complexity, representation, and scale, to enable
distributed collaboration, and to facilitate real-time interactions and control.
Examples of Possible KDI Research Areas
- Systems as dissimilar as an economic market, the brain, behavioral norms in a society, large
computer networks, and the scientific enterprise have this in common: information is widely
distributed throughout the system; no identifiable entity within or outside the system coordinates
information or makes decisions; no particular locus of coordination is evident (although many
markets have a "place," many others do not); yet, information is coordinated and focused into
sensible outcomes. Economists, neuroscientists, sociologists,
and philosophers of science -- in
frequent collaboration with mathematicians, computer scientists and
others -- now study these
systems, but separately. Collaboration may reveal important similarities in how these disparate
systems produce knowledge from their distributed intelligence, and hence insights into improving
system performance.
- Understanding the behavior of a system that emerges from collective interactions of its relatively
simpler components is a central challenge. Typical problems include determining fluid flows in
reservoirs at the level of the reservoir field from a knowledge of flows at the level of pores in the
ground, predicting macroscopic properties of materials from a knowledge of atomistic properties,
understanding the regulation of interacting metabolic pathways, and studying the interplay of
natural and anthropogenic factors in issues of environment and biodiversity. The simulations
needed to study these systems will use substantial computational resources that are just
becoming available. The results of these simulations will provide an increased understanding of
the development of emergent behaviors that will lead to better models of physical and social
systems.
- Present abilities to manage experiments or numerical simulations in real time are inadequate.
Advances in these capabilities, and the new paradigms of investigation that will follow, will open
enormous opportunities to expand our understanding of complex systems. For instance, "on-
the-fly" analysis of observations of a living cell via light microscopy can direct attention to areas
of interest while the specimen remains viable. Similarly, in a numerical simulation, analysis of the
simulation behavior as the simulation progresses, coupled with the appropriate invasive
software, would permit changing parameters of the simulation during the calculation, thus
providing greater insight in less time. Finally, enabling the dynamic interplay between
observations of the behavior of a complex system and simulations of its behavior could
dramatically improve the understanding and even the control of the system. This has wide and
important consequences, for example in predicting weather, controlling communication and power
distribution systems, and managing medical treatment regimens.
- Managing large datasets has become a critical task in every area of science and engineering. In
astronomy and physics, sifting enormous volumes of data for indications of a few rare events is
a key step in observations of both galaxies and subatomic particles. In geosciences and
biology, effective ways to combine different datasets and manipulate elements of the sets are
critically needed. The next three examples show different aspects of these issues. More
generally, KDI requires advances in the interoperation, use, and construction of large distributed
repositories by participants including people, artificial agents, groups, and organizations.
Interactivity research studies how to build and maintain dynamic, content-rich, multi-media
relationships among participants, instruments, tools, and data. Representation research studies
how knowledge about processes and phenomena can be encoded, and how meanings for
representations are reconstructed in their contexts of use. Cognition research investigates
perception, reasoning, memory, learning, and action by participants, including groups and
organizations. Studies of agents investigate the active and sometimes physically embodied
algorithms, software, communications, and tools that can assist people in collaborating and
networking knowledge. Finally, investigations of corpora study the entire lifecycle (creation,
structuring, storage, maintenance, use and disposal) of general and community-specific
knowledge collections, including ad hoc data collections, complex scientific databases, digital
libraries and even such unconventional entities as digital forms of artifacts in museums. These
five areas of research are critical enablers of KDI: people's ability to collaboratively access,
retrieve and comprehend information from complex databases and distributed sources depends
on how that information is created, structured, stored, presented, reasoned about, and managed.
- Geospatial data and information are vital components of science, constituting the link between
scientific research results and their use by planners and decision-makers. New technologies
(satellite and airborne sensors, automated and remotely-operated sampling stations) produce
data that have generated an explosion of information. At the same time, our understanding of the
biological, chemical, geological, physical, and social processes that operate in and on the
environment has reached the stage where multidisciplinary integration offers enormous promise
for progress. Full and functional interoperability of geographic information (GIS) systems is
necessary for such multidisciplinary research. To achieve this will require basic research in the
representation of the semantic content of GIS data, together with interagency and international
standards activities. Once an interoperable distributed global GIS is available several scientific
opportunites arise, such as the investigation through data mining of anomalous environmental
phenomena (e.g., the Ozone Hole), and the possiblity of incorporating real-time data to
understand and thus predict storms and similar events. Such capabilities could also be used in
developing information systems for disaster relief.
- Taxonomists in biology are building a Web-linked network to share interactive specimen
databases, programs for identification keys, image-analysis software, and links to phylogenetic
trees, accessed through "expert workstations," to provide automated image-based means to
identify specimens. Neural networks employ ÒlearningÓ rules to speed up identification after initial
specimen presentations, while tomographic scanning and holographic images facilitate analysis
of complex specimens such as bones and shells. The challenge is managing and coordinating
the amount and complexity of information generated by such imaging technologies. However,
the result will be rapid, accurate, reliable taxonomic identification, from microscopic to macroscopic
scales, which will be invaluable to systematists who currently are limited to working with single
taxonomic groups, at constrained scales of measurement, using information from limited
databases. Other beneficiaries include agricultural extension agents encountering a new weed,
customs officials interdicting imports of new biological materials, and geologists searching marine
cores for fossils indicative of petroleum deposits, all of whom need quick and accurate taxonomic
identification to do their jobs.
- Genomics is the analysis of DNA sequences resulting from the sequencing of all of the genetic
material of selected bacteria, fungi, plants, and animals including Man. These enormous sets of
sequence data contain the history of the evolution of life. The ability to extract meaningful and
useful information from these datasets offers the keys to such wide applications as engineering
micro-organisms to clean up contaminated sites, designing food crops to withstand drought and
pests, and curing dreaded human diseases. Fundamental problems in this area include
comparing sequences for similar genes and gene products from different organisms to study gene
evolution, predicting the three-dimensional structure of proteins from the linear DNA sequences of
the genes encoding those proteins, and ultimately predicting how the proteins might interact with
each other to produce a differentiated cell. The size of datasets of DNA sequences and
molecular structural information has grown explosively in the past few years; the need for new
tools to understand the significance of those data has increased correspondingly. As the cost of
sequencing DNA drops by an order of magnitude during the next five years, the amount of
information to be analyzed will increase even faster than its current doubling time of a little over a
year. To manage and make the best use of these resources will require unprecedented levels of
collaboration among scientists in biology, computer science, mathematics, statistics, and
computer and software engineering.
- Collaborations and sharing of resources between people in different places, especially in real
time, are not yet effective enough. Advances in simulating or controlling complex systems
depend on major advances in computing systems, both hardware and software. Conventional
computing systems do not yet deliver the required performance; thus alternative schema such as
systems constructed at the molecular level using quantum physics or biological components
must be investigated. Similarly, the effective use of advanced computing systems, conventional
or otherwise, will require the scaling of software and tools to levels of complexity and interactivity
that are well beyond the usual evolutionary path of today's systems.
- Both the brain and its component neurons can learn and compute. Exactly how the massive
networks of neurons in a brain perform computational tasks to produce behavior is not known.
However, it is well established that the brain transmits information through distributed, parallel
channels. At a basic level, elucidation of the computing processes used by neurons may
provide insight into alternatives to or alternative designs for computer chips. At a higher level,
understanding the way in which the distributed networks of neurons in the brain process
information may reveal new paradigms, architectures, and approaches for solving complex
computational problems by means other than those currently used by computers and
supercomputers.
- How intelligent systems learn to develop strategies, to select among alternative plans, and to
predict the outcome of complex performances is poorly understood. Engineers, mathematicians,
computer scientists, psychologists, neuroscientists, and educators now have pieces of this
puzzle. Putting the pieces together is the next challenge. Developing models and technologies
that learn to maximize externally specified measures of performance
in complex situations --
situations that require design, "foresight," or "planning," to
optimize performance -- would
increase understanding of this complex behavior and improve assessment of different learning
methods in human and artificial intelligent systems.
- In both human learning and artificial intelligent systems, the management of information is a major
practical concern. To provide a scientific basis for addressing this concern, a deeper and more
multidisciplinary understanding of selective attention and memory management in learning
systems is essential. The fundamental insights are at hand: Biological systems have extensive
mechanisms for selecting and weighting information; statistical decision theory and filtering have
produced methods to extract system parameters from uncertain and noisy measurements. By
combining these and other insights, comprehensive selection designs and information
management tools could be developed and tested for their engineering and educational value.