Archives for posts with tag: Semantic computing

I put forward in this paper a vision for a new generation of cloud-based public communication service designed to foster reflexive collective intelligence. I begin with a description of the current situation, including the huge power and social shortcomings of platforms like Google, Apple, Facebook, Amazon, Microsoft, Alibaba, Baidu, etc. Contrasting with the practice of these tech giants, I reassert the values that are direly needed at the foundation of any future global public sphere: opennness, transparency and commonality. But such ethical and practical guidelines are probably not powerful enough to help us crossing a threshold in collective intelligence. Only a disruptive innovation in cognitive computing will do the trick. That’s why I introduce “deep meaning” a new research program in artificial intelligence, based on the Information Economy  MetaLanguage (IEML). I conclude this paper by evoking possible bootstrapping scenarii for the new public platform.

The rise of platforms

At the end of the 20th century, one percent of the human population was connected to the Internet. In 2017, more than half the population is connected. Most of the users interact in social media, search information, buy products and services online. But despite the ongoing success of digital communication, there is a growing dissatisfaction about the big tech companies – the “Silicon Valley” – who dominate the new communication environment.

The big techs are the most valued companies in the world and the massive amount of data that they possess is considered the most precious good of our time. Silicon Valley owns the big computers: the network of physical centers where our personal and business data are stored and processed. Their income comes from their economic exploitation of our data for marketing purposes and from their sales of hardware, software or services. But they also derive considerable power from the knowledge of markets and public opinions that stems from their information control.

The big cloud companies master new computing techniques mimicking neurons when they learn a new behavior. These programs are marketed as deep learning or artificial intelligence even if they have no cognitive autonomy and need some intense training by humans before becoming useful. Despite their well known limitations, machine learning algorithms have effectively augmented the abilities of digital systems. Deep learning is now used in every economic sector. Chips specialized in deep learning are found in big data centers, smartphones, robots and autonomous vehicles. As Vladimir Putin rightly told young Russians in his speech for the first day of school in fall 2017: “Whoever becomes the leader in this sphere [of artificial intelligence] will become the ruler of the world”.

The tech giants control huge business ecosystems beyond their official legal borders and they can ruin or buy competitors. Unfortunately, the big tech rivalry prevents a real interoperability between cloud services, even if such interoperability would be in the interest of the general public and of many smaller businesses. As if their technical and economic powers were not enough, the big tech are now playing into the courts of governments. Facebook warrants our identity and warns our family and friends that we are safe when a terrorist attack or a natural disaster occurs. Mark Zuckerberg states that one of Facebook’s mission is to insure that the electoral process is fair and open in democratic countries. Google Earth and Google Street View are now used by several municipal instances and governments as their primary source of information for cadastral plans and other geographical or geospatial services. Twitter became an official global political, diplomatic and news service. Microsoft sells its digital infrastructure to public schools. The kingdom of Denmark opened an official embassy in Silicon Valley. Cryptocurrencies independent from nation states (like Bitcoin) are becoming increasingly popular. Blockchain-based smart contracts (powered by Ethereum) bypass state authentication and traditional paper bureaucracies. Some traditional functions of government are taken over by private technological ventures.

This should not come as a surprise. The practice of writing in ancient palace-temples gave birth to government as a separate entity. Alphabet and paper allowed the emergence of merchant city-states and the expansion of literate empires. The printing press, industrial economy, motorized transportation and electronic media sustained nation-states. The digital revolution will foster new forms of government. Today, we discuss political problems in a global public space taking advantage of the web and social media and the majority of humans live in interconnected cities and metropoles. Each urban node wants to be an accelerator of collective intelligence, a smart city. We need to think about public services in a new way. Schools, universities, public health institutions, mail services, archives, public libraries and museums should take full advantage of the internet and de-silo their datasets. But we should go further. Are current platforms doing their best to enhance collective intelligence and human development? How about giving back to the general population the data produced in social media and other cloud services, instead of just monetizing it for marketing purposes ? How about giving to the people access to cognitive powers unleashed by an ubiquitous algorithmic medium?

Information wants to be open, transparent and common

We need a new kind of public sphere: a platform in the cloud where data and metadata would be our common good, dedicated to the recording and collaborative exploitation of memory in the service of our collective intelligence. The core values orienting the construction of this new public sphere should be: openness, transparency and commonality

Firstly openness has already been experimented in the scientific community, the free software movement, the creative commons license, Wikipedia and many more endeavors. It has been adopted by several big industries and governments. “Open by default” will soon be the new normal. Openness is on the rise because it maximizes the improvement of goods and services, foster trust and support collaborative engagement. It can be applied to data formats, operating systems, abstract models, algorithms and even hardware. Openness applies also to taxonomies, ontologies, search architectures, etc. This notion may be generalized to an open creation, description and interpretation of data. A new open public space should encourage all participants to create, comment, categorize, assess and analyze its content.

, transparency is the very basis of trust and the precondition of authentic dialogue. Data and people (including the administrators of a platform), should be traceable and audit-able. Transparency should be reciprocal, without distinction between rulers and ruled. Such transparency will ultimately be the basis of reflexive collective intelligence, allowing teams and communities of any size to observe and compare their cognitive activity

Commonality means that people will not have to pay to get access to the new public sphere: all will be free and public property. Commonality means also transversality: de-silo and cross-pollination. Smart communities will interconnect and recombine all kind of useful information: open archives of libraries and museums, free academic publications, shared learning resources, knowledge management repositories, open-source intelligence datasets, news, public legal databases…

From deep learning to deep meaning

The new public platform will be based on the web and its open standards like http, URL, html, etc. Like all current platforms, it will take advantage of distributed computing in the cloud. It will use “deep learning”: an artificial intelligence technology that employs specialized chips and algorithms that roughly mimic the learning process of neurons. Deep learning is used by Google, Facebook, Amazon, Microsoft and by other companies specialized in data analytics. Finally, to be completely up to date, the public platform should enable blockchain-based payments, transactions, contracts and secure records

If our public platform offers the same technologies as the big tech (cloud, deep learning, blockchain), with the sole difference of openness, transparency and commonality, it may prove insufficient to foster a swift adoption, as is demonstrated by the relative failures of Diaspora (open Facebook) and Mastodon (open Twitter). Such a project may only succeed if it has some technical advantage compared to the existing commercial platforms. Moreover, this technical advantage should have appealing political and philosophical dimensions.

The majority of us do not fancy the dream of autonomous machines, specially considering the current limitations of artificial intelligence. We want instead an artificial intelligence designed for the augmentation of human personal and collective intellect. That’s why, in addition to the current state of the art, the new platform should integrate the brand new deep meaning technology. Deep meaning will expand the actual reach of artificial intelligence, improve the user experience of big data analytics and allow the reflexivity of personal and collective intelligence.

Language as a platform

In a nutshell, deep learning models neurons and deep meaning models language. In order to augment the human intellect, we need both! Deep learning is based on neural networks simulation. It is enough to model roughly animal cognition (every animal species has neurons) but not enough to model human cognition. The difference between animal cognition and human reflexive thought comes from language, which adds a layer of semantic addressing on top of neuronal connectivity. Speech production and understanding is an innate property of individual human brains. But as humanity is a social species, language works only at the social scale. Languages are conventional, shared by members of the same culture and learned by social contact. In human cognition, the categories that organize perception, action, memory and learning are expressed linguistically so they may be reflected upon and shared in conversations. A language works like the semantic addressing system of a social virtual database.

The problem with natural languages (english, french, arabic, etc.) is that they are irregular and do not lend themselves easily to machine understanding or machine translation. The current trend in natural language processing (an important field of artificial intelligence) is to use statistical algorithms and deep learning methods to understand and produce linguistic data. Instead of using statistics, deep meaning adopts a regular and computable metalanguage to organize linguistic and non-linguistic data. IEML (Information Economy MetaLanguage) has been designed to optimize semantic computing. IEML words are built from six primitive symbols and two operations: addition and multiplication. The semantic relations between words follow the lines of their generative operations. Words (the total number of which do not exceed 10 000) represent the conceptual building blocks of the language. From these elementary concepts, the generative grammar of IEML allows the construction of propositions at three layers of complexity: words into topics, topics into phrases (facts, events) and phrases into super-phrases (theories, narratives). The higher meaning unit, or text, is a unique set of propositions. Deep meaning technology uses IEML as the semantic addressing system of a social database.

From an analytics angle, deep meaning allows the automatic computing of semantic relations between data and semantic visualizations of large datasets. From the point of view of interoperability, it decompartmentalizes tags, folksonomies, taxonomies, ontologies and languages. On the reflexive side, when on line communities categorize, assess and exchange semantic data, they generate explorable ecosystems of ideas that represent their collective intelligence. Note that the vision of collective intelligence proposed here is opposed to the “wisdom of the crowd” model, that assumes independent agents and excludes dialogue and reflexivity. Just the opposite : deep meaning was designed from the beginning to foster dialogue and reflexivity.

The main functions of the new public sphere

deepmeaning

In the new public sphere, every netizen has the rights of an author, an editor, an artist, a curator, a critique, a messenger, a contractor and a gamer. The next platform weaves five functions together: curation, creation, communication, transaction and immersion.

By curation I mean the collaborative creation, edition, analysis, synthesis, visualization, explanation and publication of datasets. People posting, liking and commenting content on social media are already doing data curation, even if in a crude way and unknowingly. Active professionals in the fields of heritage preservation (library, museums), digital humanities, education, knowledge management, data-driven journalism or open-source intelligence practice data curation in a more systematic and mindful manner. The new platform offers a consistent service of collaborative data curation empowered by a common semantic addressing system.

Augmented by deep meaning, our public sphere includes a semantic metadata editor applicable to any document format. It works as a registration system for the works of the mind. Communication is ensured by a global Twitter-like public posting system. But instead of the current hashtags that are mere sequences of characters, the new semantic tags self-translate in all natural languages and interconnect by conceptual proximity. The blockchain layer allows any transaction to be recorded. The platform remunerates authors and curators in collective intelligence coins, according to the public engagement generated by their work. The new public sphere is grounded in the internet of things, smart cities, ambient intelligence and augmented reality. People control their environment and communicate with sensors, software agents and bots of all kinds in the same immersive semantic space. Virtual worlds simulate the collective intelligence of teams, networks and cities.

Bootstrapping

The design and prototyping of this platform has been developed between 2002 and 2017 at the University of Ottawa. A prototype is currently in a pre-alpha version, featuring the curation functionality. An alpha version will be demonstrated in the summer of 2018. How to bridge the gap from the fundamental research to the full scale industrial platform? Such endeavor will be much less expensive than the conquest of space and could bring a tremendous augmentation of human collective intelligence. Even if the network effect applies obviously to the new public space, small communities of pioneers will benefit immediately from its early release. On the humanistic side, I have already mentioned museums and libraries, researchers in humanities and social science, collaborative learning networks, data-oriented journalists, knowledge management and business intelligence professionals, etc. On the engineering side, deep meaning opens a new sub-field of artificial intelligence that will enhance current techniques of big data analytics, machine learning, natural language processing, internet of things, augmented reality and other immersive interfaces. Because it is open source by design, the development of the new technology can be crowdsourced and shared easily among many different actors.

Let’s draw a distinction between the new public sphere, including its semantic coordinate system, and the commercial platforms that will give access to it. This distinction being made, we can imagine a consortium of big tech companies, universities and governments supporting the development of the global public service of the future. We may also imagine one of the big techs taking the lead to associate its name to the new platform and developing some hardware specialized in deep meaning. Another scenario is the foundation of a company that will ensure the construction and maintenance of the new platform as a free public service while sustaining itself by offering semantic services: research, consulting, design and training. In any case, a new international school must be established around a virtual dockyard where trainees and trainers build and improve progressively the semantic coordinate system and other basic models of the new platform. Students from various organizations and backgrounds will gain experience in the field of deep meaning and will disseminate the acquired knowledge back into their communities.

Abstract

IEML is an artificial language that allows the automatic computing of (a) the semantic relationships internal to its texts and of (b) the semantic relationships between its texts. Such an innovation could have a positive impact on the development of human collective intelligence. While we are currently limited to logical and statistical analytics, semantic coding could allow large scale computing on the meaning of data, provided that these data are categorized in IEML. Moreover “big data” algorithms are currently monopolized by big companies and big governemnts. But according to the perspective adopted here, the algorithmic tools of the future will put data-anaytics, machine learning and reflexive collective intelligence in the hands of the majority of Internet users.
I will first describe the main components of an algorithm (code, operators, containers, instructions), then I will show that the growth of the algorithmic medium has been shaped by innovations in coding and containers addressing. The current limitations of the web (absence of semantic interoperability and statistical positivism) could be overcomed by the invention of a new coding system aimed at making the meaning computable. Finally I will describe the cognitive gains that can be secured from this innovation.

This paper has been published by Spanda Journal special issue on “Creativity & Collective Enlightenment”,  VI, 2, December 2015, p. 59-66

Our communications—transmission and reception of data—are based on an increasingly complex infrastructure for the automatic manipulation of symbols, which I call the algorithmic medium because it automates the transformation of data, and not only their conservation, reproduction and dissemination (as with previous media). Both our data-centric society and the algorithmic medium that provides its tools are still at their tentative beginnings. Although it is still hard to imagine today, a huge space will open up for the transformation and analysis of the deluge of data we produce daily. But our minds are still fascinated by the Internet’s power of dissemination of messages, which has almost reached its maximum.
In the vanguard of the new algorithmic episteme, IEML (or any other system that has the same properties) will democratize the categorization and automatic analysis of the ocean of data. The use of IEML to categorize data will create a techno-social environment that is even more favourable for collaborative learning and the distributed production of knowledge. In so doing, it will contribute to the emergence of the algorithmic medium of the future and reflect collective intelligence in the form of ecosystems of ideas.
This text begins by analyzing the structure and functioning of algorithms and shows that the major stages in the evolution of the new medium correspond to the appearance of new systems for encoding and addressing data: the Internet is a universal addressing system for computers and the Web, a universal addressing system for data. However, the Web, in 2016, has many limitations. Levels of digital literacy are still low. Interoperability and semantic transparency are sorely lacking. The majority of its users see the Web only as a big multimedia library or a means of communication, and pay no attention to its capacities for data transformation and analysis. As for those concerned with the processing of big data, they are hindered by statistical positivism. In providing a universal addressing system for concepts, IEML takes a decisive step toward the algorithmic medium of the future. The ecosystems of ideas based on this metalanguage will give rise to cognitive augmentations that are even more powerful than those we already enjoy.

What is an algorithm?

To help understand the nature of the new medium and its evolution, let us represent as clearly as possible what an algorithm is and how it functions. In simplified explanations of programming, the algorithm is often reduced to a series of instructions or a “recipe.” But no series of instructions can play its role without the three following elements: first, an adequate encoding of the data; second, a well-defined set of reified operators or functions that act as black boxes; third, a system of precisely addressed containers capable of recording initial data, intermediate results and the end result. The rules—or instructions—have no meaning except in relation to the code, the operators and the memory addresses.
I will now detail these aspects of the algorithm and use that analysis to periodize the evolution of the algorithmic medium. We will see that the major stages in the growth of this medium are precisely related to the appearance of new systems of addressing and encoding, both for the containers of data and for the operators. Based on IEML, the coming stage of development of the algorithmic medium will provide simultaneously a new type of encoding (semantic encoding) and a new system of virtual containers (semantic addressing).

Encoding of data

For automatic processing, data must first be encoded appropriately and uniformly. This involves not only binary encoding (zero and one), but more specialized types of encoding such as encoding of numbers (in base two, eight, ten, sixteen, etc.), that of characters used in writing, that of images (pixels), that of sounds (sampling), and so on.

Operators

We must then imagine a set of tools or specialized micro-machines for carrying out certain tasks on the data. Let us call these specialized tools “operators.” The operators are precisely identified, and they act in a determined or mechanical way, always the same way. There obviously has to be a correspondence or a match between the encoding of the data and the functioning of the operators.
The operators were first identified insider computers: they are the elementary electronic circuits that make up processors. But we can consider any processor of data—however complex it is—as a “black box” serving as a macro-operator. Thus the protocol of the Internet, in addressing the computers in the network, at the same time set up a universal addressing system for operators.

Containers

In addition to a code for the data and a set of operators, we have to imagine a storehouse of data whose basic boxes or “containers” are completely addressed: a logical system of recording with a smooth surface for writing, erasing and reading. It is clear that the encoding of data, the operations applied to them and the mode of recording them—and therefore their addressing—must be harmonized to optimize processing.
The first addressing system of the containers is internal to computers, and it is therefore managed by the various operating systems (for example, UNIX, Windows, Apple OS, etc.). But at the beginning of the 1990s, a universal addressing system for containers was established above that layer of internal addressing: the URLs of the World Wide Web.

Instructions

The fourth and last aspect of an algorithm is an ordered set of rules—or a control mechanism—that organizes the recursive circulation of data between the containers and the operators. The circulation is initiated by a data flow that goes from containers to the appropriate operators and then directs the results of the operations to precisely addressed containers. A set of tests (if . . . , then . . .) determines the choice of containers from which the data to be processed are drawn, the choice of operators and the choice of containers in which the results are recorded. The circulation of data ends when a test has determined that processing is complete. At that point, the result of the processing—a set of encoded data—is located at a precise address in the system of containers.

The growth of the new medium

To shape the future development of the algorithmic medium, we have to first look at its historical evolution.

Automatic calculation (1940-1970)

From when can we date the advent of the algorithmic medium? We might be tempted to give its date of birth as 1937, since it was in that year that Alan Turing (1912-1954) published his famous article introducing the concept of the universal machine, that is, the formal structure of a computer. The article represents calculable functions as programs of the universal machine, that is, essentially, algorithms. We could also choose 1945, because in June of that year, John von Neumann (1903-1957) published his “First draft of a report on the EDVAC,” in which he presented the basic architecture of computers: 1) a memory containing data and programs (the latter encoding algorithms), 2) an arithmetic, logical calculation unit and 3) a control unit capable of interpreting the instructions of the programs contained in the memory. Since the seminal texts of Alan Turing and John von Neumann represent only theoretical advances, we could date the new era from the construction and actual use of the first computers, in the 1950s. It is clear, however, that (in spite of the prescience of a few visionaries ) until the end of the 1970s, it was still hard to talk about an algorithmic medium. One of the main reasons is that the computers at that time were still big, costly, closed machines whose input and output interfaces could only be manipulated by experts. Although already in its infancy, the algorithmic medium was not yet socially prevalent.
It should be noted that between 1950 and 1980 (before Internet connections became the norm), data flows circulated mainly between containers and operators with local addresses enclosed in a single machine.

The Internet and personal computers (1970-1995)

A new trend emerged in the 1970s and became dominant in the 1980s: the interconnection of computers. The Internet protocol (invented in 1969) won out over its competitors in addressing machines in telecommunication networks. This was also the period when computing became personal. The digital was now seen as a vector of transformation and communication of all symbols, not only numbers. The activities of mail, telecommunications, publishing, the press, and radio and television broadcasting began to converge.
At the stage of the Internet and personal computers, data processed by algorithms were always stored in containers with local addresses, but—in addition to those addresses—operators now had universal physical addresses in the global network. Consequently, algorithmic operators could “collaborate,” and the range of types of processing and applications expanded significantly.

The World Wide Web (1995-2020)

It was only with the arrival of the Web, around 1995, however, that the Internet became the medium of most communication—to the point of irreversibly affecting the functioning of the traditional media and most economic, political and cultural institutions.
The revolution of the Web can be explained essentially as the creation of a universal system of physical addresses for containers. This system, of course, is URLs. It should be noted that—like the Internet protocol for operators—this universal system is added to the local addresses of the containers of data, it does not eliminate them. Tim Berners-Lee’s ingenious idea may be described as follows: by inventing a universal addressing system for data, he made possible the shift from a multitude of actual databases (each controlled by one computer) to a single virtual database for all computers. One of the main benefits is the possibility of creating hyperlinks among any of the data of that universal virtual database: “the Web.”
From then on, the effective power and the capacity for collaboration—or inter-operation—between algorithms increased and diversified enormously, since both operators and containers now possessed universal addresses. The basic programmable machine became the network itself, as is shown by the spread of cloud computing.
The decade 2010-2020 is seeing the beginning of the transition to a data-centric society. Indeed, starting with this phase of social utilization of the new medium, the majority of interactions among people take place through the Internet, whether purely for socialization or for information, work, research, learning, consumption, political action, gaming, watches, and so on. At the same time, algorithms increasingly serve as the interface for relationships between people, relationships among data, and relationships between people and data. The increase in conflicts around ownership and free accessibility of data, and around openness and transparency of algorithms, are clear signs of a transition to a data-centric society. However, in spite of their already decisive role, algorithms are not yet perceived in the collective consciousness as the new medium of human communication and thought. People were still fascinated by the logic of dissemination of previous media.
The next stage in the evolution of the algorithmic medium—the semantic sphere based on IEML—will provide a conceptual addressing system for data. But before we look at the future, we need to think about the limitations of the contemporary Web. Indeed, the Web was invented to help solve problems in interconnecting data that arose around 1990, at a time when one percent of the world’s population (mainly anglophone) was connected. But now in 2014, new problems have arisen involving the difficulties of translating and processing data, as well as the low level of digital literacy. When these problems become too pronounced (probably around 2020, when more than half the world’s population will be connected), we will be obliged to adopt a conceptual addressing system on top of the layer of physical addressing of the WWW.

The limitations of the Web in 2016

The inadequacy of the logic of dissemination

From Gutenberg until the middle of the twentieth century, the main technical effect of the media was the mechanical recording, reproduction and transmission of the symbols of human communication. Examples include printing (newspapers, magazines, books), the recording industry, movies, telephone, radio and television. While there were also technologies for calculation, or automatic transformation of symbols, the automatic calculators available before computers were not very powerful and their usefulness was limited.
The first computers had little impact on social communication because of their cost, the complexity of using them and the small number of owners (essentially big corporations, some scientific laboratories and the government administrations of rich countries). It was only beginning in the 1980s that the development of personal computing provided a growing proportion of the population with powerful tools for producing messages, whether these were texts, tables of numbers, images or music. From then on, the democratization of printers and the development of communication networks among computers, as well as the increased number of radio and television networks, gradually undermined the monopoly on the massive dissemination of messages that had traditionally belonged to publishers, professional journalists and the major television networks. This revolution in dissemination accelerated with the arrival of the World Wide Web in the mid-1990s and blossomed into the new kind of global multimedia public sphere that prevails now at the beginning of the twenty-first century.
In terms of the structure of social communication, the essential characteristic of the new public sphere is that it permits anyone to produce messages, to transmit to a community without borders and to access messages produced and transmitted by others. This freedom of communication is all the more effective since its exercise is practically free and does not require any prior technical knowledge. In spite of the limits I will describe below, we have to welcome the new horizon of communication that is now offered to us: at the rate at which the number of connections is growing, almost all human beings in the next generation will be able to disseminate their messages to the entire planet for free and effortlessly.
It is certain that automatic manipulation—or transformation—of symbols has been practiced since the 1960s and 1970s. I have also already noted that a large proportion of personal computing was used to produce information and not only to disseminate it. Finally, the major corporations of the Web such as Google, Amazon, eBay, Apple, Facebook, Twitter and Netflix daily process huge masses of data in veritable “information factories” that are entirely automated. In spite of that, the majority of people still see and use the Internet as a tool for the dissemination and reception of information, in continuity with the mass media since printing and, later, television. It is a little as if the Web gave every individual the power of a publishing house, a television network and a multimedia postal service in real time, as well as access to an omnipresent global multimedia library. Just as the first printed books—incunabula—closely copied the form of manuscripts, we still use the Internet to achieve or maximize the power of dissemination of previous media. Everyone can transmit universally. Everyone can receive from anywhere.
No doubt we will have to exhaust the technical possibilities of automatic dissemination—the power of the media of the last four centuries—in order to experience and begin to assimilate intellectually and culturally the almost unexploited potential of automatic transformation—the power of the media of centuries to come. That is why I am again speaking of the algorithmic medium: to emphasize digital communication’s capacity for automatic transformation. Of course, the transformation or processing power of the new medium can only be actualized on the basis of the irreversible achievement of the previous medium, the universal dissemination or ubiquity of information. That was nearly fully achieved at the beginning of the twenty-first century, and coming generations will gradually adapt to automatic processing of the massive flow of global data, with all its unpredictable cultural consequences. There are at this time three limits to this process of adaptation: users’ literacy, the absence of semantic interoperability and the statistical positivism that today governs data analysis.

The problem of digital literacy

The first limit of the contemporary algorithmic medium is related to the skills of social groups and individuals: the higher their education level (elementary, secondary, university), the better developed their critical thinking, the greater their mastery of the new tools for manipulation of symbols and the more capable they are of turning the algorithmic medium to their advantage. As access points and mobile devices increase in number, the thorny question of the digital divide is less and less related to the availability of hardware and increasingly concerns problems of print literacy, media literacy and education. Without any particular skills in programming or even in using digital tools, the power provided by ordinary reading and writing is greatly increased by the algorithmic medium: we gain access to possibilities for expression, social relationships and information such as we could not even have dreamed of in the nineteenth century. This power will be further increased when, in the schools of the future, traditional literacy, digital literacy and understanding of ecosystems of ideas are integrated. Then, starting at a very young age, children will be introduced to categorization and evaluation of data, collection and analysis of large masses information and programming of semantic circuits.

The absence of semantic interoperability

The second limit is semantic, since, while technical connection is tending to become universal, the communication of meaning still remains fragmented according to the boundaries of languages, systems of classification, disciplines and other cultural worlds that are more or less unconnected. The “semantic Web” promoted by Tim Berners-Lee since the late 1990s is very useful for translating logical relationships among data. But it has not fulfilled its promise with regard to the interoperability of meaning, in spite of the authority of its promoter and the contributions of many teams of engineers. As I showed in the first volume of The Semantic Sphere, it is impossible to fully process semantic problems while remaining within the narrow limits of logic. Moreover, the essentially statistical methods used by Google and the numerous systems of automatic translation available provide tools to assist with translation, but they have not succeeded any better than the “semantic Web” in opening up a true space of translinguistic communication. Statistics are no more effective than logic in automating the processing of meaning. Here again, we lack a coding of linguistic meaning that would make it truly calculable in all its complexity. It is to meet this need that IEML is automatically translated into natural languages in semantic networks.

Statistical positivism

The general public’s access to the power of dissemination of the Web and the flows of digital data that now result from all human activities confront us with the following problem: how to transform the torrents of data into rivers of knowledge? The solution to this problem will determine the next stage in the evolution of the algorithmic medium. Certain enthusiastic observers of the statistical processing of big data, such as Chris Anderson, the former editor-in-chief of Wired, were quick to declare that scientific theories—in general!—were now obsolete. In this view, we now need only flows of data and powerful statistical algorithms operating in the computing centres of the cloud: theories—and therefore the hypotheses they propose and the reflections from which they emerge—belong to a bygone stage of the scientific method. It appears that numbers speak for themselves. But this obviously involves forgetting that it is necessary, before any calculation, to determine the relevant data, to know exactly what is being counted and to name—that is, to categorize—the emerging patterns. In addition, no statistical correlation directly provides causal relationships. These are necessarily hypotheses to explain the correlations revealed by statistical calculations. Under the guise of revolutionary thought, Chris Anderson and his like are reviving the old positivist, empiricist epistemology that was fashionable in the nineteenth century, according to which only inductive reasoning (that is, reasoning based solely on data) is scientific. This position amounts to repressing or ignoring the theories—and therefore the risky hypotheses based on individual thought—that are necessarily at work in any process of data analysis and that are expressed in decisions of selection, identification and categorization. One cannot undertake statistical processing and interpret its results without any theory. Once again, the only choice we have is to leave the theories implicit or to explicate them. Explicating a theory allows us to put it in perspective, compare it with other theories, share it, generalize from it, criticize it and improve it. This is even one of the main components of what is known as critical thinking, which secondary and university education is supposed to develop in students.
Beyond empirical observation, scientific knowledge has always been concerned with the categorization and correct description of phenomenal data, description that is necessarily consistent with more or less formalized theories. By describing functional relationships between variables, theory offers a conceptual grasp of the phenomenal world that make it possible (at least partially) to predict and control it. The data of today correspond to what the epistemology of past centuries called phenomena. To extend this metaphor, the algorithms for analyzing flows of data of today correspond to the observation tools of traditional science. These algorithms show us patterns, that is, ultimately, images. But the fact that we are capable of using the power of the algorithmic medium to observe data does not mean we should stop here on this promising path. We now need to use the calculating power of the Internet to theorize (categorize, model, explain, share, discuss) our observations, without forgetting to make our theorizing available to the rich collective intelligence.
In their 2013 book on big data, Viktor Mayer-Schonberger and Kenneth Cukier, while emphasizing the distinction between correlation and causality, predicted that we would take more and more interest in correlations and less and less in causality, which put them firmly in the empiricist camp. Their book nevertheless provides an excellent argument against statistical positivism. Indeed, they recount the very beautiful story of Matthew Maury, an American naval officer who in the mid-nineteenth century compiled data from log books in the official archives to establish reliable maps of winds and currents. Those maps were constructed from an accumulation of empirical data. But with all due respect for Cukier and Mayer-Schonberger, I would point out that such an accumulation would never have been useful, or even feasible, without the system of geographic coordinates of meridians and parallels, which is anything but empirical and based on data. Similarly, it is only by adopting a system of semantic coordinates such as IEML that we will be able to organize and share data flows in a useful way.
Today, most of the algorithms that manage routing of recommendations and searching of data are opaque, since they are protected trade secrets of major corporations of the Web. As for the analytic algorithms, they are, for the most part, not only opaque but also beyond the reach of most Internet users for both technical and economic reasons. However, it is impossible to produce reliable knowledge using secret methods. We must obviously consider the contemporary state of the algorithmic medium to be transitory.
What is more, if we want to solve the problem of the extraction of useful information from the deluge of big data, we will not be able to eternally limit ourselves to statistical algorithms working on the type of organization of digital memory that exists in 2016. We will sooner or later, and the sooner the better, have to implement an organization of memory designed from the start for semantic processing. We will only be able to adapt culturally to the exponential growth of data—and therefore transform these data into reflected knowledge—through a qualitative change of the algorithmic medium, including the adoption of a system of semantic coordinates such as IEML.

The semantic sphere and its conceptual addressing (2020…)

It is notoriously difficult to observe or recognize what does not yet exist, and even more, the absence of what does not yet exist. However, what is blocking the development of the algorithmic medium—and with it, the advent of a new civilization—is precisely the absence of a universal, calculable system of semantic metadata. I would like to point out that the IEML metalanguage is the first, and to my knowledge (in 2016) the only, candidate for this new role of a system of semantic coordinates for data.
We already have a universal physical addressing system for data (the Web) and a universal physical addressing system for operators (the Internet). In its full deployment phase, the algorithmic medium will also include a universal semantic code: IEML. This system of metadata—conceived from the outset to optimize the calculability of meaning while multiplying its differentiation infinitely—will open the algorithmic medium to semantic interoperability and lead to new types of symbolic manipulation. Just as the Web made it possible to go from a great many actual databases to one universal virtual database (but based on a physical addressing system), IEML will make it possible to go from a universal physical addressing system to a universal conceptual addressing system. The semantic sphere continues the process of virtualization of containers to its final conclusion, because its semantic circuits—which are generated by an algebra—act as data containers. It will be possible to use the same conceptual addressing system in operations as varied as communication, translation, exploration, searching and three-dimensional display of semantic relationships.
Today’s data correspond to the phenomena of traditional science, and we need calculable, interoperable metadata that correspond to scientific theories and models. IEML is precisely an algorithmic tool for theorization and categorization capable of exploiting the calculating power of the cloud and providing an indispensable complement to the statistical tools for observing patterns. The situation of data analysis before and after IEML can be compared to that of cartography before and after the adoption of a universal system of geometric coordinates. The data that will be categorized in IEML will be able to be processed much more efficiently than today, because the categories and the semantic relationships between categories will then become not only calculable but automatically translatable from one language to another. In addition, IEML will permit comparison of the results of the analysis of the same set of data according to different categorization rules (theories!).

Algo-medium

FIGURE 1 – The four interdependent levels of the algorithmic medium

When this symbolic system for conceptual analysis and synthesis is democratically accessible to everyone, translated automatically into all languages and easily manipulated by means of a simple tablet, then it will be possible to navigate the ocean of data, and the algorithmic medium will be tested directly as a tool for cognitive augmentation—personal and social—and not only for dissemination. Then a positive feedback loop between the collective testing and creation of tools will lead to a take-off of the algorithmic intelligence of the future.
In Figure 1, the increasingly powerful levels of automatic calculation are represented by rectangles. Each level is based on the “lower” levels that precede it in order of historical emergence. Each level is therefore influenced by the lower levels. But, conversely, each new level gives the lower levels an additional socio-technical determination, since it uses them for a new purpose.
The addressing systems, which are represented under the rectangles, can be considered the successive solutions—influenced by different socio-technical contexts—to the perennial problem of increasing the power of automatic calculation. An addressing system thus plays the role of a step on a stairway that lets you go from one level of calculation to a higher level. The last addressing system, that of metadata, is supplied by IEML or any other system of encoding of linguistic meaning that makes that meaning calculable, exactly as the system of pixels made images manipulable by means of algorithms.

The cognitive revolution of semantic encoding

We know that the algorithmic medium is not only a medium of communication or dissemination of information but also, especially, a ubiquitous environment for the automatic transformation of symbols. We also know that a society’s capacities for analysis, synthesis and prediction are based ultimately on the structure of its memory, and in particular its system for encoding and organizing data. As we saw in the previous section, the only thing the algorithmic medium now in construction lacks to become the matrix of a new episteme that is more powerful than today’s, which has not yet broken its ties to the typographical era, is a system of semantic metadata that is equal to the calculating power of algorithms.

Memory, communication and intuition

It is now accepted that computers increase our memory capacities, in which I include not only capacities for recording and recall, but also those for analysis, synthesis and prediction. The algorithmic medium also increases our capacities for communication, in particular in terms of the breadth of the network of contacts and the reception, transmission and volume of flows of messages. Finally, the new medium increases our capacities for intuition, because it increases our sensory-motor interactions (especially gestural, tactile, visual and sound interactions) with large numbers of people, documents and environments, whether they are real, distant, simulated, fictional or mixed. These augmentations of memory, communication and intuition influence each other to produce an overall augmentation of our field of cognitive activity.
Semantic encoding, that is, the system of semantic metadata based on IEML, will greatly increase the field of augmented cognitive activity that I have described. It will produce a second level of cognitive complexity that will enter into dynamic relationship with the one described above to give rise to algorithmic intelligence. As we will see, semantic coding will generate a reflexivity of memory, a new perspectivism of intellectual intuition and an interoperability of communication.

Reflexive memory

The technical process of objectivation and augmentation of human memory began with the invention of writing and continued up to the development of the Web. But in speaking of reflexive memory, I go beyond Google and Wikipedia. In the future, the structure and evolution of our memory and the way we use it will become transparent and open to comparison and critical analysis. Indeed, communities will be able to observe—in the form of ecosystems of ideas—the evolution and current state of their cognitive activities and apply their capacities for analysis, synthesis and prediction to the social management of their knowledge and learning. At the same time, individuals will become capable of managing their personal knowledge and learning in relation to the various communities to which they belong. So much so that this reflexive memory will enable a new dialectic—a virtuous circle—of personal and collective knowledge management. The representation of memory in the form of ecosystems of ideas will allow individuals to make maximum use of the personal growth and cross-pollination brought about by their circulation among communities.

Perspectivist intellectual intuition

Semantic coding will give us a new sensory-motor intuition of the perspectivist nature of the information universe. Here we have to distinguish between the conceptual perspective and the contextual perspective.
The conceptual perspective organizes the relationships among terms, sentences and texts in IEML so that each of these semantic units can be processed as a point of view, or a virtual “centre” of the ecosystems of ideas, organizing the other units around it according to the types of relationships it has with them and their distance from it.
In IEML, the elementary units of meaning are terms, which are organized in the IEML dictionary (optimized for laptops + Chrome) in paradigms, that is, in systems of semantic relationships among terms. In the IEML dictionary, each term organizes the other terms of the same paradigm around it according to its semantic relationships with them. The different paradigms of the IEML dictionary are in principle independent of each other and none has precedence over the others a priori. Each of them can, in principle, be used to filter or categorize any set of data.
The sentences, texts and hypertexts in IEML represent paths between the terms of various paradigms, and these paths in turn organize the other paths around them according to their relationships and semantic proximity in the ecosystems of ideas. It will be possible to display this cascade of semantic perspectives and points of view using three-dimensional holograms in an immersive interactive mode.
Let us now examine the contextual perspective, which places in symmetry not the concepts within an ecosystem of ideas, but the ecosystems of ideas themselves, that is, the way in which various communities at different times categorize and evaluate data. It will thus be possible to display and explore the same set of data interactively according to the meaning and value it has for a large number of communities.
Reflexive memory, perspectivist intuition, interoperable and transparent communication together produce a cognitive augmentation characteristic of algorithmic intelligence, an augmentation more powerful than that of today.

Interoperable and transparent communication

The interoperability of communication will first concern the semantic compatibility of various theories, disciplines, universes of practices and cultures that will be able to be translated into IEML and will thus become not only comparable but also capable of exchanging concepts and operating rules without loss of their uniqueness. Semantic interoperability will also cover the automatic translation of IEML concepts into natural languages. Thanks to this pivot language, any semantic network in any natural language will be translated automatically into any other natural language. As a result, through the IEML code, people will be able to transmit and receive messages and categorize data in their own languages while communicating with people who use other languages. Here again, we need to think about cultural interoperability (communication in spite of differences in conceptual organization) and linguistic interoperability (communication in spite of differences in language) together; they will reinforce each other as a result of semantic coding.

Emergence

Emergence happens through an interdependant circulation of information between two levels of complexity. A code translates and betrays information in both directions: bottom-up and top-down.

Nature

According to our model, human collective intelligence emerges from natural evolution. The lower level of quantic complexity translates into a higher level of molecular complexity through the atomic stabilization and coding. There are no more than 120 atomic elements that explain the complexity of matter by their connections and reactions. The emergence of the next level of complexity – life – comes from the genetic code that is used by organisms as a trans-generational memory. Communication in neuronal networks translates organic life into conscious phenomena, including sense data, pleasure and pain, desire, etc. So emerges the animal life. Let’s note that organic life is intrinsically ecosystemic and that animals have developed many forms of social or collective intelligence. The human level emerges through the symbolic code : language, music, images, rituals and all the complexity of culture. It is only thank to symbols that we are able to conceptualize phenomena and think reflexively about what we do and think. Symbolic systems are all conventional but the human species is symbolic by nature, so to speak. Here, collective intelligence reaches a new level of complexity because it is based on collaborative symbol manipulation.

Culture

[WARNING: the next 5 paragraphs can be found in “collective intelligence for educators“, if you have already read them, go to the next slide: “algorithmic medium”] The above slide describes the successive steps in the emergence of symbolic manipulation. As for the previous slide, each new layer of cultural complexity emerges from the creation of a coding system.

During the longest part of human history, the knowledge was only embedded in narratives, rituals and material tools. The first revolution in symbolic manipulation is the invention of writing with symbols endowed with the ability of self-conservation. This leads to a remarquable augmentation of social memory and to the emergence of new forms of knowledge. Ideas were reified on an external surface, which is an important condition for critical thinking. A new kind of systematic knowledge was developed: hermeneutics, astronomy, medicine, architecture (including geometry), etc.

The second revolution optimizes the manipulation of symbols like the invention of the alphabet (phenician, hebrew, greek, roman, arab, cyrilic, korean, etc.), the chinese rational ideographies, the indian numeration system by position with a zero, paper and the early printing techniques of China and Korea. The literate culture based on the alphabet (or rational ideographies) developed critical thinking further and gave birth to philosophy. At this stage, scholars attempted to deduce knowledge from observation and deduction from first principles. There was a deliberate effort to reach universality, particularly in mathematics, physics and cosmology.

The third revolution is the mecanization and the industrialization of the reproduction and diffusion of symbols, like the printing press, disks, movies, radio, TV, etc. This revolution supported the emergence of the modern world, with its nation states, industries and its experimental mathematized natural sciences. It was only in the typographic culture, from the 16th century, that natural sciences took the shape that we currently enjoy: systematic observation or experimentation and theories based on mathematical modeling. From the decomposition of theology and philosophy emerged the contemporary humanities and social sciences. But at this stage human science was still fragmented by disciplines and incompatible theories. Moreover, its theories were rarely mathematized or testable.

We are now at the beginning of a fourth revolution where an ubiquitous and interconnected infosphere is filled with symbols – i.e. data – of all kinds (music, voice, images, texts, programs, etc.) that are being automatically transformed. With the democratization of big data analysis, the next generations will see the advent of a new scientific revolution… but this time it will be in the humanities and social sciences. The new human science will be based on the wealth of data produced by human communities and a growing computation power. This will lead to reflexive collective intelligence, where people will appropriate (big) data analysis and where subjects and objects of knowledge will be the human communities themselves.

Algo-medium

Let’s have a closer look at the algorithmic medium. Four layers have been added since the middle of the 20th century. Again, we observe the progressive invention of new coding systems, mainly aimed at the addressing of processors, data and meta-data.

The first layer is the invention of the automatic digital computer itself. We can describe computation as « processing on data ». It is self-evident that computation cannot be programmed if we don’t have a very precise addressing system for the data and for the specialized operators/processors that will transform the data. At the beginning these addressing systems were purely local and managed by operating systems.

The second layer is the emergence of a universal addressing system for computers, the Internet protocol, that allows for exchange of data and collaborative computing across the telecommunication network.

The third layer is the invention of a universal system for the addressing and displaying of data (URLs, http, html). Thank to this universal addressing of data, the World Wide Web is a hypertextual global database that we all create and share. It is obvious that the Web has had a deep social, cultural and economic impact in the last twenty years.

The construction of the algorithmic medium is ongoing. We are now ready to add a fourth layer of addressing and, this time, it will be a universal addressing system for semantic metadata. Why? First, we are still unable to resolve the problem of semantic interoperability across languages, classifications and ontologies. And secondly, except for some approximative statistical and logical methods, we are still unable to compute semantic relations, including distances and differences. This new symbolic system will be a key element to a future scientific revolution in the humanities and social sciences, leading to a new kind of reflexive collective intelligence for our species. Moreover, it will pave the way for the emergence of a new scientific cosmos – not a physical one but a cosmos of the mind that we will build and explore collaboratively. I want to strongly underline here that the semantic categorization of data will stay in the hands of people. We will be able to categorize the data as we want, from many different point of views. All that is required is that we use the same code. The description itself will be free.

Algo-intel

Let’s examine now the future emerging algorithmic intelligence. This new level of symbolic manipulation will be operated and shared in a mixed environment combining virtual worlds and augmented realities. The two lower levels of the above slide represent the current internet: an interaction between the « internet of things » and the « clouds » where all the data converge in an ubiquitous infosphere… The two higher levels, the « semantic sensorium » and the « reflexive collective intelligence » depict the human condition that will unfold in the future.

The things are material, localized realities that have GPS addresses. Here we speak about the smart territories, cities, buildings, machines, robots and all the mobile gadgets (phones, tablets, watches, etc.) that we can wear. Through binary code, the things are in constant interaction with the ubiquitous memory in the clouds. Streams of data and information processing reverberate between the things and the clouds.

When the data will be coded by a computable universal semantic addressing system, the data in the clouds will be projected automatically into a new sensorium. In this 3D, immersive and dynamic virtual environment we will be able to explore through our senses the abstract relationships between the people, the places and the meaning of digital information. I’m not speaking here of a representation, reproduction or imitation of the material space, like, for example, in Second Life. We have to imagine something completely different: a semantic sphere where the cognitive processes of human communities will be modeled. This semantic sphere will empower all its users. Search, knowledge exploration, data analysis and synthesis, collaborative learning and collaborative data curation will be multiplied and enhanced by the new interoperable semantic computing.

We will get reflexive collective intelligence thank to a scientific computable and transparent modeling of cognition from real data. This modeling will be based on the semantic code, that provides the « coordinate system » of the new cognitive cosmos. Of course, people will not be forced to understand the details of this semantic code. They will interact in the new sensorium through their prefered natural language (the linguistic codes of the above slide) and their favorite multimedia interfaces. The translation between different languages and optional interface metaphors will be automatic. The important point is that people will observe, analyze and map dynamically their own personal and collective cognitive processes. Thank to this new reflexivity, we will improve our collaborative learning processes and the collaborative monitoring and control of our physical environments. And this will boost human development!

Collective-Intelligence

The above slide represents the workings of a collective intelligence oriented towards human development. In this model, collective intelligence emerges from an interaction between two levels: virtual and actual. The actual is addressed in space and time while the virtual is latent, potential or intangible. The two levels function and communicate through several symbolic codes. In any coding system, there are coding elements (signs), coded references (things) and coders (being). This is why both actual and virtual levels can be conceptually analysed into three kinds of networks: signs, beings and things.

The actual human development can be analysed into a sphere of messages (signs), a sphere of people (beings) and a sphere of equipments – this last word understood in the largest possible sense – (things). Of course, the three spheres are interdependent.

The virtual human development is analysed into a sphere of knowledge (signs), a sphere of ethics (being) and a sphere of power (things). Again, the three spheres are interdependent.

Each of the six spheres is further analysed into three subdivisions, corresponding to the sub-rows on the slide. The mark S (sign) points to the abstract factors, the mark B (being) indicates the affective dimensions and the mark T (thing) shows the concrete aspects of each sphere.

All the realities described in the above table are interdependent following the actual/virtual and the sign/being/thing dialectics. Any increase of decrease in one « cell » will have consequences in other cells. This is just an example of the many ways collective intelligence will be represented, monitored and made reflexive in the semantic sensorium…

To dig into the philosophical concept of algorithmic intelligence go there

E-sphere-copie

An IEML paradigm projected onto a sphere.

Communication presented at The Future of Text symposium IV at the Google’s headquarters in London (2014).

Symbolic manipulation accounts for the uniqueness of human cognition and consciousness. This symbolic manipulation is now augmented by algorithms. The problem is that we still have not invented a symbolic system that could fully exploit the algorithmic medium in the service of human development and human knowledge.

E-Cultural-revolutions

The slide above describes the successive steps in the augmentation of symbolic manipulation.

The first revolution is the invention of writing with symbols endowed with the ability of self-conservation. This leads to a remarquable augmentation of social memory and to the emergence of new forms of knowledge.

The second revolution optimizes the manipulation of symbols like the invention of the alphabet (phenician, hebrew, greek, roman, arab, cyrilic, korean, etc.), the chinese rational ideographies, the indian numeration system by position with a zero, paper and the early printing techniques of China and Korea.

The third revolution is the mecanization and the industrialization of the reproduction and diffusion of symbols, like the printing press, disks, movies, radio, TV, etc. This revolution supported the emergence of the modern world, with its nation states, industries and its experimental mathematized natural sciences.

We are now at the beginning of a fourth revolution where an ubiquitous and interconnected infosphere is filled with symbols – i.e. data – of all kinds (music, voice, images, texts, programs, etc.) that are being automatically transformed. With the democratization of big data analysis, the next generations will see the advent of a new scientific revolution… but this time it will be in the humanities and social sciences.

E-Algorithmic-medium

Let’s have a closer look to the algorithmic medium. Four layers have been added since the middle of the 20th century.

– The first layer is the invention of the automatic digital computer itself. We can describe computation as « processing on data ». It is self-evident that computation cannot be programmed if we don’t have a very precise addressing system for the data and for the specialized operators/processors that will transform the data. At the beginning these addressing systems were purely local and managed by operating systems.

– The second layer is the emergence of a universal addressing system for computers, the Internet protocol, that allowed for exchange of data and collaborative computing across the telecommunication network.

– The third layer is the invention of a data universal addressing and displaying system (http, html), welcoming a hypertextual global database: the World Wide Web. We all know that the Web has had a deep social, cultural and economic impact in the last fifteen years.

– The construction of this algorithmic medium is ongoing. We are now ready to add a fourth layer of addressing and, this time, we need a universal addressing system for metadata, and in particular for semantic metadata. Why? First, we are still unable to resolve the problem of semantic interoperability across languages, classifications and ontologies. And secondly, except for some approximative statistical and logical methods, we are still unable to compute semantic relations, including distances and differences. This new symbolic system will be a key element to a future scientific revolution in the humanities and social sciences leading to a new kind of reflexive collective intelligence for our species. There lies the future of text.

E-IEML-math2

My version of a universal semantic addressing system is IEML, an artificial language that I have invented and developped over the last 20 years.

IEML is based on a simple algebra with six primitive variables (E, U, A, S, B, T) and two operations (+, ×). The multiplicative operation builds the semantic links. This operation has three roles: a depature node, an arrival node and a tag for the link. The additive operation gathers several links to build a semantic network and recursivity builds semantic networks with multiple levels of complexity: it is « fractal ». With this algebra, we can automatically compute an internal network corresponding to any variable and also the relationships between any set of variables.

IEML is still at the stage of fundamental research but we now have an extensive dictionary – a set of paradigms – of three thousand terms and grammatical algorithmic rules that conform to the algebra. The result is a language where texts self-translate into natural language, manifest as semantic networks and compute collaboratively their relationships and differences. Any library of IEML texts then self-organizes into ecosystems of texts and data categorized in IEML will self-organize according to their semantic relationships and differences.

E-Collective-intel2

Now let’s take an example of an IEML paradigm, the paradigm of “Collective Intelligence in the service of human development” for instance, where we will grasp the meaning of the primitives and in which way they are being used.

-First, let’s look at the dialectic between virtual (U) and actual (A) human development represented by the rows.

-Then, the ternary dialectic between sign (S), being (B) and thing (T) are represented by the columns.

-The result is six broad interdependent aspects of collective intelligence corresponding to the intersections of the rows (virtual/actual) and columns (sign/being/thing).

– Each of these six broad aspects of CI are then decomposed into three sub-aspects corresponding to the sign/being/thing dialectic.

The semantic relations (symmetries and inclusions) between the terms of a paradigm are all explicit and therefore computable. All IEML paradigms are designed with the same principles as this one, and you can build phrases by assembling the terms through multiplications and additions.

Fortunatly, fundamental research is now finished. I will spend the next months preparing a demo of the automatic computing of semantic relations between data coded in IEML. With tools to come…

E-Future-text2

Human-dev-CI

E = Emptiness, U = Virtual, A = Actual, S = Sign, B = Being, T = Thing


The algorithmic medium

Before the algorithmic medium was the typographical medium (printing press, broadcasting) that industrialized and automated the reproduction of information. In the new algorithmic medium, information is, de facto, ubiquitous and automation now concentrates on the transformation of information.

The algorithmic medium is built from three interdependent components: the Web as a universal database (big data), the Internet as a universal computer (cloud), and the algorithms in the hands of people.

IEML (the Information Economy MetaLanguage) has been designed to exploit the full potential of the new algorithmic medium.

IEML, who and what is it for?

It would have been impossible to have designed IEML before the automatic-computing era and, a fortiori, to implement and use it. IEML was designed for digital natives, and built to take advantage of the new pervasive social computing supported by big data, the cloud and open algorithms.

IEML is a language

IEML is an artificial language that has the expressive power of any natural language (like English, French, Russian, Arabic, etc.). In other words, you can say in IEML whatever you want and its opposite, with varying degrees of precision.

IEML is an inter-linguistic semantic code

We can describe IEML as a sort of pivot language. Its reading/writing interface pops up in the the natural language that you want with an IEML text that self-translates in that specific language.

IEML is a semantic metadata system

IEML was also designed as a tagging system supporting semantic interoperability. Its main use is data categorization. As a universal system addressing concepts, IEML can complement the universal addressing of data on the Web and of processors on the Internet.

IEML is a programming language

An IEML text programs the construction of a semantic network in natural languages and it computes its relations and its semantic differences with other texts.

IEML is a symbolic system

As with any other symbolic systems, IEML is a result from the interaction of three interdependent layers of linguistic complexity: a syntax, semantics and pragmatics.

EN-C-14-MMOM

IEML syntax

IEML syntax is an algebraic topology: this means that a complex network of relations (topology) is coded by an algebraic expression.

IEML Algebra

IEML algebra is based on six basic variables {E, U, A, S, B, T} and two operations {+, ×}. The multiplication builds links (node A, node B, tag) and the addition operation creates graphs by connecting the links. The results of any algebraic operation can be used as a basis for new operations. This recursivity allows the construction of successive layers of complexity.

A computable Topology

Each distinct variable of the IEML algebra corresponds to a distinct graph. Given a set of variables, their relations and their semantic differences are computable.

EN-D-10-MMu_MMu

IEML semantics

As it is projected onto an algebraic topology, IEML’s semantics becomes computable.

The semantic projection onto an algebraic topology

– An IEML script normalizes the notation of an algebraic expression.
– The IEML dictionary is organized as a set of paradigms, a paradigm being a semantic network of terms. Each IEML term can be translated in natural languages.
– With IEML operations {+, ×} and its recursivity, the IEML grammar allows the construction of morphemes, words, clauses, phrases, complex propositions, texts and hypertexts.

The grammatical algorithms

Embedded in IEML, any grammatical algorithms can compute:
– the intra-textual semantic network corresponding to an IEML text
– the translation of an IEML semantic network into any chosen natural language
– the inter-textual semantic network and the semantic differences corresponding to any set of IEML texts.

IEML pragmatics

IEML pragmatics is oriented towards self-organization and reflexive collective intelligence.

A new approach to data and social networks

When data are categorized in IEML, they self-organize into semantic networks and automatically compute their semantic relations and differences. Moreover, when communities engage in collaborative data curation using IEML, what they get in return is a simulated image of their collective intelligence process.

Modeling ideas as dynamic texts

We can model our collective intelligence into an evolving ecosystem of ideas. In this framework, an idea can be defined as the assembly of a concept, an affect, a percept (a sensory-motor image) and a social context. In a dynamic text, the concept is represented by an IEML text, the affect by credits (positive or negative), the percept by a multimedia dataset and the social context as an author (a player) a community (a semantic game) and a time-stamp.

Automatic computing of dynamic hypertexts

Thanks to IEML grammatical algorithms, any set of dynamic texts self-organizes into a dynamic hypertext that represents an ecosystem of ideas in the form of an immersive simulation. Now, a reflexive collective intelligence can emerge from a collaborative data curation.

lampadaire-5

Critique réciproque de l’intelligence artificielle et des sciences humaines

Je me souviens d’avoir participé, vers la fin des années 1980, à un Colloque de Cerisy sur les sciences cognitives auquel participaient quelques grands noms américains de la discipline, y compris les tenants des courants neuro-connexionnistes et logicistes. Parmi les invités, le philosophe Hubert Dreyfus (notamment l’auteur de What Computers Can’t Do, MIT Press, 1972) critiquait vertement les chercheurs en intelligence artificielle parce qu’ils ne tenaient pas compte de l’intentionnalité découverte par la phénoménologie. Les raisonnements humains réels, rappelait-il, sont situés, orientés vers une fin et tirent leur pertinence d’un contexte d’interaction. Les sciences de la cognition dominées par le courant logico-statistique étaient incapables de rendre compte des horizons de conscience qui éclairent l’intelligence. Dreyfus avait sans doute raison, mais sa critique ne portait pas assez loin, car ce n’était pas seulement la phénoménologie qui était ignorée. L’intelligence artificielle (IA) n’intégrait pas non plus dans la cognition qu’elle prétendait modéliser la complexité des systèmes symboliques et de la communication humaine, ni les médias qui la soutiennent, ni les tensions pragmatiques ou les relations sociales qui l’animent. A cet égard, nous vivons aujourd’hui dans une situation paradoxale puisque l’IA connaît un succès pratique impressionnant au moment même où son échec théorique devient patent.

Succès pratique, en effet, puisqu’éclate partout l’utilité des algorithmes statistiques, de l’apprentissage automatique, des simulations d’intelligence collective animale, des réseaux neuronaux et d’autres systèmes de reconnaissance de formes. Le traitement automatique du langage naturel n’a jamais été aussi populaire, comme en témoigne par exemple l’usage de Google translate. Le Web des données promu par le WWW consortium (dirigé par Sir Tim Berners-Lee). utilise le même type de règles logiques que les systèmes experts des années 1980. Enfin, les algorithmes de computation sociale mis en oeuvre par les moteurs de recherche et les médias sociaux montrent chaque jour leur efficacité.

Mais il faut bien constater l’échec théorique de l’IA puisque, malgré la multitude des outils algorithmiques disponibles, l’intelligence artificielle ne peut toujours pas exhiber de modèle convaincant de la cognition. La discipline a prudemment renoncé à simuler l’intelligence dans son intégralité. Il est clair pour tout chercheur en sciences humaines ayant quelque peu pratiqué la transdisciplinarité que, du fait de sa complexité foisonnante, l’objet des sciences humaines (l’esprit, la pensée, l’intelligence, la culture, la société) ne peut être pris en compte dans son intégralité par aucune des théories computationnelles de la cognition actuellement disponible. C’est pourquoi l’intelligence artificielle se contente dans les faits de fournir une boîte à outils hétéroclite (règles logiques, syntaxes formelles, méthodes statistiques, simulations neuronales ou socio-biologiques…) qui n’offrent pas de solution générale au problème d’une modélisation mathématique de la cognition humaine.

Cependant, les chercheurs en intelligence artificielle ont beau jeu de répondre à leurs critiques issus des sciences humaines : « Vous prétendez que nos algorithmes échouent à rendre compte de la complexité de la cognition humaine, mais vous ne nous en proposez vous-mêmes aucun pour remédier au problème. Vous vous contentez de pointer du doigt vers une multitude de disciplines, plus « complexes » les unes que les autres (philosophie, psychologie, linguistique, sociologie, histoire, géographie, littérature, communication…), qui n’ont pas de métalangage commun et n’ont pas formalisé leurs objets ! Comment voulez-vous que nous nous retrouvions dans ce bric-à-brac ? » Et cette interpellation est tout aussi sensée que la critique à laquelle elle répond.

lampadaire-13c0c12

Synthèse de l’intelligence artificielle et des sciences humaines

Ce que j’ai appris de Hubert Dreyfus lors de ce colloque de 1987 où je l’ai rencontré, ce n’était pas tant que la phénoménologie serait la clé de tous les problèmes d’une modélisation scientifique de l’esprit (Husserl, le père de la phénoménologie, pensait d’ailleurs que la phénoménologie – une sorte de méta-science de la conscience – était impossible à mathématiser et qu’elle représentait même le non-mathématisable par exellence, l’autre de la science mathématique de la nature), mais plutôt que l’intelligence artificielle avait tort de chercher cette clé dans la seule zone éclairée par le réverbère de l’arithmétique, de la logique et des neurones formels… et que les philosophes, herméneutes et spécialistes de la complexité du sens devaient participer activement à la recherche plutôt que de se contenter de critiquer. Pour trouver la clé, il fallait élargir le regard, fouiller et creuser dans l’ensemble du champ des sciences humaines, aussi opaque au calcul qu’il semble à première vue. Nous devions disposer d’un outil à traiter le sens, la signification, la sémantique en général, sur un mode computationnel. Une fois éclairé par le calcul le champ immense des relations sémantiques, une science de la cognition digne de ce nom pourrait voir le jour. En effet, pour peu qu’un outil symbolique nous assure du calcul des relations entre signifiés, alors il devient possible de calculer les relations sémantiques entre les concepts, entre les idées et entre les intelligences. Mû par ces considérations, j’ai développé la théorie sémantique de la cognition et le métalangage IEML : de leur union résulte la sémantique computationnelle.

Les spécialistes du sens, de la culture et de la pensée se sentent démunis face à la boîte à outils hétérogène de l’intelligence artificielle : ils n’y reconnaissent nulle part de quoi traiter la complexité contextuelle de la signification. C’est pourquoi la sémantique computationnelle leur propose de manipuler les outils algorithmiques de manière cohérente à partir de la sémantique des langues naturelles. Les ingénieurs s’égarent face à la multitude bigarrée, au flou artistique et à l’absence d’interopérabilité conceptuelle des sciences humaines. Remédiant à ce problème, la sémantique computationnelle leur donne prise sur les outils et les concepts foisonnants des insaisissables sciences humaines. En somme, le grand projet de la sémantique computationnelle consiste à construire un pont entre l’ingénierie logicielle et les sciences humaines de telle sorte que ces dernières puissent utiliser à leur service la puissance computationnelle de l’informatique et que celle-ci parvienne à intégrer la finesse herméneutique et la complexité contextuelle des sciences humaines. Mais une intelligence artificielle grande ouverte aux sciences humaines et capable de calculer la complexité du sens ne serait justement plus l’intelligence artificielle que nous connaissons aujourd’hui. Quant à des sciences humaines qui se doteraient d’un métalangage calculable, qui mobiliseraient l’intelligence collective et qui maîtriseraient enfin le médium algorithmique, elles ne ressembleraient plus aux sciences humaines que nous connaissons depuis le XVIIIe siècle : nous aurions franchi le seuil d’une nouvelle épistémè.

biface

Le concepteur

J’ai saisi dès la fin des années 1970 que la cognition était une activité sociale et outillée par des technologies intellectuelles. Il ne faisait déjà aucun doute pour moi que les algorithmes allaient transformer le monde. Et si je réfléchis au sens de mon activité de recherche depuis les trente dernières années, je réalise qu’elle a toujours été orientée vers la construction d’outils cognitifs à base d’algorithmes.

A la fin des années 1980 et au début des années 1990, la conception de systèmes experts et la mise au point d’une méthode pour l’ingénierie des connaissances m’ont fait découvrir la puissance du raisonnement automatique (J’en ai rendu compte dans De la programmation considérée comme un des beaux-arts, Paris, La Découverte, 1992). Les systèmes experts sont des logiciels qui représentent les connaissances d’un groupe de spécialistes sur un sujet restreint au moyen de règles appliquées à une base de données soigneusement structurée. J’ai constaté que cette formalisation des savoir-faire empiriques menait à une transformation de l’écologie cognitive des collectifs de travail, quelque chose comme un changement local de paradigme. J’ai aussi vérifié in situ que les systèmes à base de règles fonctionnaient en fait comme des outils de communication de l’expertise dans les organisations, menant ainsi à une intelligence collective plus efficace. J’ai enfin expérimenté les limites de la modélisation cognitive à base purement logique : elle ne débouchait alors, comme les ontologies d’aujourd’hui, que sur des micro-mondes de raisonnement cloisonnés. Le terme d’« intelligence artificielle », qui évoque des machines capables de décisions autonomes, était donc trompeur.

Je me suis ensuite consacré à la conception d’un outil de visualisation dynamique des modèles mentaux (Ce projet est expliqué dans L’Idéographie dynamique, vers une imagination artificielle, La Découverte, Paris, 1991). Cet essai m’a permis d’explorer la complexité sémiotique de la cognition en général et du langage en particulier. J’ai pu apprécier la puissance des outils de représentation de systèmes complexes pour augmenter la cognition. Mais j’ai aussi découvert à cette occasion les limites des modèles cognitifs non-génératifs, comme celui que j’avais conçu. Pour être vraiment utile, un outil d’augmentation intellectuelle devait être pleinement génératif, capable de simuler des processus cognitifs et de faire émerger de nouvelles connaissances.

Au début des années 1990 j’ai co-fondé une start up qui commercialisait un logiciel de gestion personnelle et collective des connaissances. J’ai été notamment impliqué dans l’invention du produit, puis dans la formation et le conseil de ses utilisateurs (Voir Les Arbres de connaissances, avec Michel Authier, La Découverte, Paris, 1992). Les Arbres de connaissances intégraient un système de représentation interactive des compétences et connaissances d’une communauté, ainsi qu’un système de communication favorisant l’échange et l’évaluation des savoirs. Contrairement aux outils de l’intelligence artificielle classique, celui-ci permettait à tous les utilisateurs d’enrichir librement la base de données commune. J’ai retenu de mon expérience dans cette entreprise la nécessité de représenter les contextes pragmatiques par des simulations immersives, dans lesquelles chaque ensemble de données sélectionné (personnes, connaissances, projets, etc.) réorganise l’espace autour de lui et génère automatiquement une représentation singulière du tout : un point de vue. Mais j’ai aussi rencontré lors de ce travail le défi de l’interopérabilité sémantique, qui allait retenir mon attention pendant les vingt-cinq années suivantes. En effet, mon expérience de constructeur d’outils et de consultant en technologies intellectuelles m’avait enseigné qu’il était impossible d’harmoniser la gestion personnelle et collective des connaissances à grande échelle sans langage commun. La publication de L’intelligence collective en 1994 traduisait en théorie ce que j’avais entrevu dans ma pratique : de nouveaux outils d’augmentation cognitive à support algorithmique allaient supporter des formes de collaboration intellectuelle inédites. Mais le potentiel des algorithmes ne serait pleinement exploité que grâce à un métalangage rassemblant les données numérisées dans le même système de coordonnées sémantique.

A partir du milieu des années 1990, pendant que je dévouais mon temps libre à concevoir ce système de coordonnées (qui ne s’appelait pas encore IEML), j’ai assisté au développement progressif du Web interactif et social. Le Web offrait pour la première fois une mémoire universelle accessible indépendamment de la localisation physique de ses supports et de ses lecteurs. La communication multimédia entre points du réseau était instantanée. Il suffisait de cliquer sur l’adresse d’une collection de données pour y accéder. Au concepteur d’outils cognitifs que j’étais, le Web apparaissait comme une opportunité à exploiter.

L’utilisateur

J’ai participé pendant près d’un quart de siècle à de multiples communautés virtuelles et médias sociaux, en particulier ceux qui outillaient la curation collaborative des données. Grâce aux plateformes de social bookmarking de Delicious et Diigo, j’ai pu expérimenter la mise en commun des mémoires personnelles pour former une mémoire collective, la catégorisation coopérative des données, les folksonomies émergeant de l’intelligence collective, les nuages de tags qui montrent le profil sémantique d’un ensemble de données. En participant à l’aventure de la plateforme Twine créée par Nova Spivack entre 2008 et 2010, j’ai mesuré les points forts de la gestion collective de données centrée sur les sujets plutôt que sur les personnes. Mais j’ai aussi touché du doigt l’inefficacité des ontologies du Web sémantique – utilisées entre autres par Twine – dans la curation collaborative de données. Les succès de Twitter et de son écosystème m’ont confirmé dans la puissance de la catégorisation collective des données, symbolisée par le hashtag, qui a finalement été adopté par tous les médias sociaux. J’ai rapidement compris que les tweets étaient des méta données contenant l’identité de l’auteur, un lien vers les données, une catégorisation par hashtag et quelques mots d’appréciation. Cette structure est fort prometteuse pour la gestion personnelle et collective des connaissances. Mais parce que Twitter est fait d’abord pour la circulation rapide de l’information, son potentiel pour une mémoire collective à long terme n’est pas suffisamment exploité. C’est pourquoi je me suis intéressé aux plateformes de curation de données plus orientées vers la mémoire à long terme comme Bitly, Scoop.it! et Trove. J’ai suivi sur divers forums le développement des moteurs de recherche sémantiques, des techniques de traitement du langage naturel et des big data analytics, sans y trouver les outils qui feraient franchir à l’intelligence collective un seuil décisif. Enfin, j’ai observé comment Google réunissait les données du Web dans une seule base et comment la firme de Mountain View exploitait la curation collective des internautes au moyen de ses algorithmes. En effet, les résultats du moteur de recherche sont basés sur les hyperliens que nous créons et donc sur notre collaboration involontaire. Partout dans les médias sociaux je voyais se développer la gestion collaborative et l’analyse statistique des données, mais à chaque pas je rencontrais l’opacité sémantique qui fragmentait l’intelligence collective et limitait son développement.

La future intelligence algorithmique reposera forcément sur la mémoire hypertextuelle universelle. Mais mon expérience de la curation collaborative de données me confirmait dans l’hypothèse que j’avais développée dès le début des années 1990, avant même le développement du Web. Tant que la sémantique ne serait pas transparente au calcul et interopérable, tant qu’un code universel n’aurait pas décloisonné les langues et les systèmes de classification, notre intelligence collective ne pourrait faire que des progrès limités.

Mon activité de veille et d’expérimentation a nourri mon activité de conception technique. Pendant les années où je construisais IEML, pas à pas, à force d’essais et d’erreurs, de versions, de réformes et de recommencements, je ne me suis jamais découragé. Mes observations me confirmaient tous les jours que nous avions besoin d’une sémantique calculable et interopérable. Il me fallait inventer l’outil de curation collaborative de données qui reflèterait nos intelligences collectives encore séparées et fragmentées. Je voyais se développer sous mes yeux l’activité humaine qui utiliserait ce nouvel outil. J’ai donc concentré mes efforts sur la conception d’une plateforme sémantique universelle où la curation de données serait automatiquement convertie en simulation de l’intelligence collective des curateurs.

Mon expérience de concepteur technique et de praticien a toujours précédé mes synthèses théoriques. Mais, d’un autre côté, la conception d’outils devait être associée à la connaissance la plus claire possible de la fonction à outiller. Comment augmenter la cognition sans savoir ce qu’elle est, sans connaître son fonctionnement ? Et puisque, dans le cas qui m’occupait, l’augmentation s’appuyait précisément sur un saut de réflexivité, comment aurais-je pu réfléchir, cartographier ou observer quelque chose dont je n’aurais eu aucun modèle ? Il me fallait donc établir une correspondance entre un outil interopérable de catégorisation des données et une théorie de la cognition. A suivre dans mon prochain livre: L’intelligence algorithmique