Archives for posts with tag: humanities

Pierre Lévy a tenu un séminaire sur IEML pendant trois après-midi (13h-17h) les 24, 25 et 26 octobre 2022 à l’Université de Montréal, dans la salle C-8132, Pavillon Lionel-Groulx, 3150 Jean-Brillant.

Pour en savoir plus sur IEML, voir ce texte en anglais facile et qui se lit en 15 min.

Première séance 24 Oct. 13h-17h 

  • Vidéo
  • Présentation générale de la langue et du projet IEML
  • La nouvelle grammaire et le nouvel éditeur
  • Le Power point

Seconde séance 25 Oct. 13h-17h

  • Vidéo
  • Présentation d’exemples d’ontologies en IEML (psychiatrie et autres)
  • Comment concevoir une ontologie ou un modèle de données en IEML?
  • Le Power point.

Troisième séance 26 Oct 13h-17h

  • Vidéo
  • Présentation de la librairie IEML open-source (un gros parseur) en C++ par Louis Van Beurden
  • Comment transformer IEML en projet collectif-collaboratif open-source?
  • Le Power point
  • La présentation de Louis van Beurden, qui a programmé le back-end de l’éditeur IEML, y compris le parseur.

La problématique est définie dans le texte qui suit.

L’université de Montréal

La recherche en sciences humaines et sociales utilise de manière croissante les bases de données, l’analyse automatique, voire l’intelligence artificielle. D’autre part, les résultats de la recherche sont de plus en plus disponibles en ligne sur les blogs des chercheurs, certains réseaux sociaux, les sites web des revues, mais aussi dans des moteurs de recherches spécialisés comme ISIDORE. Tout ceci pose de façon cruciale le problème d’une catégorisation interopérable des données et des documents en sciences humaines et sciences sociales. La question ne se posait pas (ou moins gravement) lorsque chaque bibliothèque, voire chaque pays, avait son système de classement cohérent. Mais dans le nouvel espace numérique, la multiplicité des langues et des systèmes de classifications incompatibles fragmente la mémoire. 


Un premier niveau de réponse à ce problème est fourni par des *formats standards* pour les métadonnées sémantiques, notamment RDF (Resource Description Framework) proposé par le WWW Consortium. Signalons également d’autres formats standards comme JSON LD et Graph QL. Mais il ne s’agit dans tous ces cas que d’une interopérabilité technique, au niveau de la forme des fichiers. Pour résoudre le problème de l’interopérabilité sémantique (traitant de la cohérence des architectures de concepts) on a élaboré des *modèles standards*. Par exemple schema.org pour les sites web, CIDOC-CRM pour le domaine culturel, etc. Il existe de tels modèles pour de nombreux domaines, de la finance à la médecine, mais – notons-le – aucun d’eux n’unifie l’ensemble des sciences humaines. Non seulement plusieurs modèles se font concurrence pour un domaine, mais les modèles eux-mêmes sont hypercomplexes et relativement rigides, au point que même les spécialistes n’en maîtrisent qu’une petite partie. De plus, ces modèles sont exprimés en langues naturelles – le plus souvent en anglais – avec les problèmes de traduction et d’ambiguïté que cela suppose. 


Afin de résoudre le problème de l’interopérabilité sémantique dans la catégorisation des données en sciences humaines et sociales, nous proposons d’expérimenter une approche à la fois plus souple et plus générale que celle des modèles standards: une langue documentaire standard capable d’exprimer n’importe quel modèle ou ontologie et se traduisant dans toutes les langues naturelles. On trouvera ici une rapide description d’IEML en français.


IEML (Information Economy Metalanguage) développé par Pierre Lévy depuis plusieurs années est un langage artificiel (1) ayant le même pouvoir d’expression et de traduction que n’importe quelle langue naturelle, et (2) dont la grammaire et la sémantique sont régulières et calculables. IEML est le seul langage à posséder ces deux propriétés. IEML peut servir de système de métadonnées, assurant l’interopérabilité sémantique des bases de données, quel que soit le domaine. Grâce à sa nature régulière, IEML est également destiné à soutenir la prochaine génération d’intelligence artificielle “neuro-sémantique”. Voir sur ce blog un article d’une vingtaine de pages qui situe IEML dans le paysage général de l’intelligence artificielle. Un outil open-source, l’éditeur IEML (basé sur un parseur en C++) permet de modéliser finement des domaines complexes au moyen de graphes de connaissances ou ontologies. Les modèles sont générés à l’aide d’un langage de programmation déclaratif original et pourront être explorés de manière interactive sous forme d’hypertextes, de tables et de graphes. Les modèles pourront être exportés dans n’importe quel format standard.


L’objectif global du séminaire consiste à réunir des leaders établis et émergents dans les domaines de la recherche, de l’édition et de la fouille de données en humanités numériques pour faire le point sur les récents développements d’IEML. On présentera notamment une ontologie déjà construite et les enseignements méthodologiques issus des travaux en cours. Les trois jours d’échanges intensifs se tiendront sous la direction de Pierre Lévy (Professeur associé à l’Université de Montréal, membre de la Société Royale du Canada) et Marcello Vitali-Rosati (Chaire de recherche du Canada en écritures numériques et professeur titulaire en littérature française à l’Université de Montréal).”

Photo prise par Luc Courchesne lors de la séance du 25 octobre 2022

Plus de 60% de la population humaine est connectée à l’Internet, la plupart des secteurs d’activité ont basculé dans le numérique et le logiciel pilote l’innovation. Or les normes et protocoles de l’Internet ont été inventés à une époque où moins d’un pour cent de la population était connectée. Il est temps d’utiliser les flots de données, la puissance de calcul disponible et les nouvelles possibilités de communication interactive au service du développement humain… et de la solution des graves problèmes auxquels nous sommes confrontés. C’est pourquoi je vais lancer bientôt un projet international – comparable à la construction d’un cyclotron ou d’un voyage vers Mars – autour d’une transcroissance de l’Internet au service de l’intelligence collective.

Saturne (photo Voyager)

Ce projet vise plusieurs objectifs interdépendants : 

  • Décloisonner la mémoire numérique et assurer son interopérabilité sémantique (linguistique, culturelle et disciplinaire).
  • Ouvrir les modes d’indexation et maximiser la diversité des interprétations de la mémoire numérique.
  • Fluidifier la communication entre les machines, mais aussi entre les humains et les machines afin d’assurer notre maîtrise collective sur l’internet des choses, les villes intelligentes, les robots, les véhicules autonomes, etc.
  • Etablir de nouvelles formes de modélisation et d’observation réflexive de l’intelligence collective humaine sur la base de notre mémoire partagée.

IEML

Le fondement technique de ce projet est IEML (Information Economy MetaLanguage), un système de métadonnées sémantiques que j’ai inventé, notamment grâce au soutien du gouvernement fédéral canadien. IEML possède :

  • la puissance d’expression d’une langue naturelle, 
  • la syntaxe d’un langage régulier, 
  • une sémantique calculable alignée sur sa syntaxe.

IEML s’exporte en RDF et il est basé sur les standards du Web. Les concepts IEML sont appelés des USLs (Uniform Semantic Locators). Ils se lisent et se traduisent dans n’importe quelle langue naturelle. Les ontologies sémantiques  – ensembles d’USLs liés par un réseau de relations – sont interopérables par construction. IEML établit une base de connaissances virtuelle qui alimente aussi bien les raisonnements automatiques que les calculs statistiques. En somme, IEML accomplit la promesse du Web sémantique grâce à sa signification calculable et à ses ontologies inter-opérables.

Pour une courte description de la grammaire d’IEML cliquez

Intlekt

Le système des URL et la norme http ne deviennent utiles que grâce à un navigateur. De la même manière, le nouveau système d’adressage sémantique de l’Internet basé sur IEML nécessite une application particulière, nommée Intlekt, dont le chef de projet technique est Louis van Beurden. Intlekt est une plateforme collaborative et distribuée qui supporte l’édition de concepts, la curation de données et de nouvelles formes de recherche, de fouille et de visualisation de données. 

Intlekt permet d’éditer et publier des ontologies sémantiques – ensembles de concepts en relation – liés à un domaine de pratique ou de connaissance. Ces ontologies peuvent être originales ou traduire des métadonnées sémantiques existantes telles que : thésauri, langages documentaires, ontologies, taxonomies SKOS, folksonomies, ensembles de tags ou de hashtags, mots-clés, têtes de colonnes et de rangées, etc. Les ontologies sémantiques publiées augmentent un  dictionnaire de concepts, que l’on peut considérer comme une méta-ontologie ouverte

Intlekt est également un outil de curation de données. Il permet d’éditer, d’indexer en IEML et de publier des collections de données qui viennent alimenter une base de connaissance commune. A terme, on pourra utiliser des algorithmes statistiques pour automatiser l’indexation sémantique des données.

Enfin, Intlekt exploite les propriétés d’IEML pour autoriser de nouvelles formes de search, de raisonnement automatique et de simulation de systèmes complexes.

Des applications particulières peuvent être imaginées dans de nombreux domaines comme:

  • la préservation des héritages culturels, 
  • la recherche en sciences humaines et les humanités numériques, 
  • l’éducation et la formation
  • la santé publique, 
  • la délibération démocratique informée, 
  • les transactions commerciales, 
  • les contrats intelligents, 
  • l’Internet des choses, 
  • etc.

Et maintenant?

Où en sommes-nous de ce projet à l’été 2020 ? Après de nombreux essais qui se sont étalés sur plusieurs années, la grammaire d’IEML s’est stabilisée ainsi que la base de mots d’environ 3000 unités qui permet de construire à volonté n’importe quel concept. J’ai testé positivement les possibilités expressives du langage sur plusieurs domaines des sciences humaines et des sciences de la terre. Néanmoins, au moment où j’écris ces lignes, le dernier état de la grammaire n’est pas encore implémenté. De plus, pour obtenir une version d’Intlekt qui supporte les fonctions d’édition d’ontologies sémantiques, de curation de données et de fouille décrites plus haut, il faut compter une équipe de plusieurs programmeurs travaillant pendant un an. Dans les mois qui viennent, les amis d’IEML vont s’activer à réunir cette masse critique. 

Rejoignez-nous!

Pour plus d’information, consultez:
INTLEKT.io 

https://pierrelevyblog.com/my-research-in-a-nutshell/

et https://pierrelevyblog.com/my-research-in-a-nutshell/the-basics-of-ieml/

More than 60% of the human population is connected to the Internet, most sectors of activity have switched to digital and software drives innovation. Yet Internet standards and protocols were invented at a time when less than one percent of the population was connected. It is time to use the data flows, the available computing power and the possibilities of interactive communication for human development… and to solve the serious problems we are facing. That is why I will launch soon a major international project – comparable to the construction of a cyclotron or a voyage to Mars – aiming at an augmentation of the Internet in the service of collective intelligence.

This project has several interrelated objectives: 

  • Decompartmentalize digital memory and ensure its semantic (linguistic, cultural and disciplinary) interoperability.
  • Open up indexing modes and maximize the diversity of interpretations of the digital memory.
  • Make communication between machines, but also between humans and machines, more fluid in order to enforce our collective mastery of the Internet of Things, intelligent cities, robots, autonomous vehicles, etc.
  • Establish new forms of modeling and reflexive observation of human collective intelligence on the basis of our common memory.

IEML

The technical foundation of this project is IEML (Information Economy MetaLanguage), a semantic metadata system that I invented with support from the Canadian federal government. IEML has :

  • the expressive power of a natural language, 
  • the syntax of a regular language, 
  • calculable semantics aligned with its syntax.

IEML is exported in RDF and is based on Web standards. IEML concepts are called USLs (Uniform Semantic Locators). They can be read and translated into any natural language. Semantic ontologies – sets of USLs linked by a network of relationships – are interoperable by design. IEML establishes a virtual knowledge base that feeds both automatic reasoning and statistical calculations. In short, IEML fulfills the promise of the Semantic Web through its computable meaning and interoperable ontologies.

For a short description of the IEML grammar, click here.

Intlekt

The URLs system and the http standard only become useful through a browser. Similarly, the new IEML-based semantic addressing system for the Internet requires a special application, called Intlekt, whose technical project manager is Louis van Beurden. Intlekt is a collaborative and distributed platform that supports concept editing, data curation and new forms of search, data mining and data visualization. 

Intlekt empowers the edition and publishing of semantic ontologies – sets of linked concepts – related to a field of practice or knowledge. These ontologies can be original or translate existing semantic metadata such as: thesauri, documentary languages, ontologies, SKOS taxonomies, folksonomies, sets of tags or hashtags, keywords, column and row headings, etc. Published semantic ontologies augment a dictionary of concepts, which can be considered as an open meta-ontology

Intlekt is also a data curation tool. It enables editing, indexing in IEML and publishing data collections that feed a common knowledge base. Eventually, statistical algorithms will be used to automate the semantic indexing of data.

Finally, Intlekt exploits the properties of IEML to allow new forms of search, automatic reasoning and simulation of complex systems.

Special applications can be imagined in many areas, like:

  • the preservation of cultural heritage, 
  • research in the humanities (digital humanities), 
  • education and training
  • public health, 
  • informed democratic deliberation, 
  • commercial transactions, 
  • smart contracts, 
  • the Internet of things, 
  • and so on…

And now, what?

Where do we stand on this project in the summer of 2020? After many tests over several years, IEML’s grammar has stabilized, as well as the base of morphemes of about 5000 units which enables any concept to be built at will. I tested positively the expressive possibilities of the language in several fields of humanities and earth sciences. Nevertheless, at the time of writing, the latest state of the grammar is not yet implemented. Moreover, to obtain a version of Intlekt that enables the semantic ontology editing, data curation and data mining functions described above, a team of several programmers working for one year is needed. In the coming months, the friends of IEML will be busy pursuing this critical mass. 

Come and join us!

For more information, see: https://pierrelevyblog.com/my-research-in-a-nutshell/ and https://pierrelevyblog.com/my-research-in-a-nutshell/the-basics-of-ieml/

IEML (the Information Economy Meta Language) has four main directions of research and development in 2019: in mathematics, data science, linguistics and software development. This blog entry reviews them successively.

1- A mathematical research program

I will give here a philosophical description of the structure of IEML, the purpose of the mathematical research to come being to give a formal description and to draw from this formalisation as much useful information as possible on the calculation of relationships, distances, proximities, similarities, analogies, classes and others… as well as on the complexity of these calculations. I had already produced a formalization document in 2015 with the help of Andrew Roczniak, PhD, but this document is now (2019) overtaken by the evolution of the IEML language. The Brazilian physicist Wilson Simeoni Junior has volunteered to lead this research sub-program.

IEML Topos

The “topos” is a structure that was identified by the great mathematician Alexander Grothendieck, who “is considered as the re-founder of algebraic geometry and, as such, as one of the greatest mathematicians of the 20th century” (see Wikipedia).

Without going into technical details, a topos is a bi-directional relationship between, on the one hand, an algebraic structure, usually a “category” (intuitively a group of transformations of transformation groups) and, on the other hand, a spatial structure, which is geometric or topological. 

In IEML, thanks to a normalization of the notation, each expression of the language corresponds to an algebraic variable and only one. Symmetrically, each algebraic variable corresponds to one linguistic expression and only one. 

Topologically, each variable in IEML algebra (i.e. each expression of the language) corresponds to a “point”. But these points are arranged in different nested recursive complexity scales: primitive variables, morphemes of different layers, characters, words, sentences, super-phrases and texts. However, from the level of the morpheme, the internal structure of each point – which comes from the function(s) that generated the point – automatically determines all the semantic relationships that this point has with the other points, and these relationships are modelled as connections. There are obviously a large number of connection types, some very general (is contained in, has an intersection with, has an analogy with…) others more precise (is an instrument of, contradicts X, is logically compatible with, etc.).

The topos that match all the expressions of the IEML language with all the semantic relationships between its expressions is called “The Semantic Sphere”.

Algebraic structure of IEML

In the case of IEML, the algebraic structure is reduced to 

  • 1. Six primitive variables 
  • 2. A non-commutative multiplication with three variables (substance, attribute and mode). The IEML multiplication is isomorphic to the triplet ” departure vertex, arrival vertex, edge ” which is used to describe the graphs.
  • 3. A commutative addition that creates a set of objects.

This algebraic structure is used to construct the following functions and levels of variables…

1. Functions using primitive variables, called “morpheme paradigms”, have as inputs morphemes at layer n and as outputs morphemes at layer n+1. Morpheme paradigms include additions, multiplications, constants and variables and are visually presented in the form of tables in which rows and columns correspond to certain constants.

2. “Character paradigms” are complex additive functions that take morphemes as inputs and characters as outputs. Character paradigms include a group of constant morphemes and several groups of variables. A character is composed of 1 to 5 morphemes arranged in IEML alphabetical order. (Characters may not include more than five morphemes for cognitive management reasons).

3. IEML characters are assembled into words (a substance character, an attribute character, a mode character) by means of a multiplicative function called a “word paradigm”. A word paradigm intersects a series of characters in substance and a series of characters in attribute. The modes are chosen from predefined auxiliary character paradigms, depending on whether the word is a noun, a verb or an auxiliary. Words express subjects, keywords or hashtags. A word can be composed of only one character.

4. Sentence building functions assemble words by means of multiplication and addition, with the necessary constraints to obtain grammatical trees. Mode words describe the grammatical/semantic relationships between substance words (roots) and attribute words (leaves). Sentences express facts, proposals or events; they can take on different pragmatic and logical values.

5. Super-sentences are generated by means of multiplication and addition of sentences, with constraints to obtain grammatical trees. Mode sentences express relationships between substance sentences and attribute sentences. Super-sentences express hypotheses, theories or narratives.

6. A USL (Uniform Semantic Locator) or IEML text is an addition (a set) of words, sentences and super-sentences. 

Topological structure of IEML: a semantic rhizome

Static

The philosophical notion of rhizome (a term borrowed from botany) was developed on a philosophical level by Deleuze and Guattari in the preface to Mille Plateaux (Minuit 1980). In this Deleuzo-Guattarian lineage, by rhizome I mean here a complex graph whose points or “vertices” are organized into several levels of complexity (see the algebraic structure) and whose connections intersect several regular structures such as series, tree, matrix and clique. In particular, it should be noted that some structures of the IEML rhizome combine hierarchical or genealogical relationships (in trees) with transversal or horizontal relationships between “leaves” at the same level, which therefore do not respect the “hierarchical ladder”. 

Dynamic

We can distinguish the abstract, or virtual, rhizomatic grid drawn by the grammar of the language (the sphere to be dug) and the actualisation of points and relationships by the users of the language (the dug sphere of chambers and galleries).  Characters, words, sentences, etc. are all chambers in the centre of a star of paths, and the generating functions establish galleries of “rhizomatic” relationships between them, as many paths for exploring the chambers and their contents. It is therefore the users, by creating their lexicons and using them to index their data, communicate and present themselves, who shape and grow the rhizome…

Depending on whether circuits are more or less used, on the quantity of data or on the strength of interactions, the rhizome undergoes – in addition to its topological transformations – various types of quantitative or metric transformations. 

* The point to remember is that IEML is a language with calculable semantics because it is also an algebra (in the broad sense) and a complex topological space. 

* In the long term, IEML will be able to serve as a semantic coordinate system for the information world at large.

2 A research program in data science

The person in charge of the data science research sub-program is the software engineer (Eng. ENSIMAG, France) Louis van Beurden, who holds also a master’s degree in data science and machine translation from the University of Montréal, Canada. Louis is planning to complete a PhD in computer science in order to test the hypothesis that, from a data science perspective, a semantic metadata system in IEML is more efficient than a semantic metadata system in natural language and phonetic writing. This doctoral research will make it possible to implement phases A and B of the program below and to carry out our first experiment.

Background information

The basic cycle in data science can be schematized according to the following loop:

  • 1. selection of raw data,
  • 2. pre-processing, i.e. cleaning data and metadata imposition (cataloguing and categorization) to facilitate the exploitation of the results by human users,
  • 3. statistical processing,
  • 4. visual and interactive presentation of results,
  • 5. exploitation of the results by human users (interpretation, storytelling) and feedback on steps 1, 2, 3

Biases or poor quality of results may have several causes, but often come from poor pre-treatment. According to the old computer adage “garbage in, garbage out“, it is the professional responsibility of the data-scientists to ensure the quality of the input data and therefore not to neglect the pre-processing phase where this data is organized using metadata.

Two types of metadata can be distinguished: 1) semantic metadata, which describes the content of documents or datasets, and 2) ordinary metadata, which describes authors, creation dates, file types, etc. Let us call “semantic pre-processing” the imposition of semantic metadata on data.

Hypothesis

Since IEML is a univocal language and the semantic relationships between morphemes, words, sentences, etc. are mathematically computable, we assume that a semantic metadata system in IEML is more efficient than a semantic metadata system in natural language and phonetic writing. Of course, the efficiency in question is related to a particular task: search, data analysis, knowledge extraction from data, machine learning, etc.

In other words, compared to a “tokenization” of semantic metadata in phonetic writing noting a natural language, a “tokenization” of semantic metadata in IEML would ensure better processing, better presentation of results to the user and better exploitation of results. In addition, semantic metadata in IEML would allow datasets that use different languages, classification systems or ontologies to be de-compartmentalized, merged and compared.

Design of the first experience

The ideal way to do an experiment is to consider a multi-variable system and transform only one of the system variables, all other things being equal. In our case, it is only the semantic metadata system that must vary. This will make it easy to compare the system’s performance with one (phonetic tokens) or the other (semantic tokens) of the semantic metadata systems.

  • – The dataset of our first experience encompasses all the articles of the Sens Public scientific journal.
  • – Our ordinary metadata are the author, publication date, etc.
  • – Our semantic metadata describe the content of articles.
  •     – In phonetic tokens, using RAMEAU categories, keywords and summaries,
  •     – In IEML tokens by translating phonetic tokens.
  • – Our processes are “big data” algorithms traditionally used in natural language processing 
  •     – An algorithm for calculating the co-occurrences of keywords.
  •     – A TF-IDF (Term Frequency / Inverse Document Frequency) algorithm that works from a word / document matrix.
  •     – A clustering algorithm based on “word embeddings” of keywords in articles (documents are represented by vectors, in a space with as many dimensions as words).
  • – A user interface will offer a certain way to access the database. This interface will be obviously adapted to the user’s task (which remains to be chosen, but could be of the “data analytics” type).
  • Result 1 corresponds to the execution of the “machine task”, i.e. the establishment of a connection network on the articles (relationships, proximities, groupings, etc.). We’ll have to compare….
  •     – result 1.1 based on the use of phonetic tokens with 
  •     – result 1.2 based on the use of IEML tokens.
  • Result 2 corresponds to the execution of the selected user-task (data analytics, navigation, search, etc.). We’ll have to compare….
  •     – result 2.1, based on the use of phonetic tokens, with 
  •     – result 2.2, based on the use of IEML tokens.

Step A: First indexing of a database in IEML

Reminder: the data are the articles of the scientific journal, the semantic metadata are the categories, keywords and summaries of the articles. From the categories, keywords and article summaries, a glossary of the knowledge area covered by the journal is created, or a sub-domain if it turns out that the task is too difficult. It should be noted that in 2019 we do not yet have the software tools to create IEML sentences and super-phrases that allow us to express facts, proposals, theories, narratives, hypotheses, etc. Phrases and super-phrases, perhaps accessible in a year or two, will therefore have to wait for a later phase of the research.

The creation of the glossary will be the work of a project community, linked to the editors of Sens-Public magazine and the Canada Research Chair in Digital Writing (led by Prof. Marcello Vitali-Rosati) at the Université de Montréal (Digital Humanities). Pierre Lévy will accompany this community and help it to identify the constants and variables of its lexicon. One of the auxiliary goals of the research is to verify whether motivated communities can appropriate IEML to categorize their data. Once we are satisfied with the IEML indexing of the article database, we will proceed to the next step.

Step B: First experimental test

  • 1. The test is determined to measure the difference between results based on phonetic tokens and results based on IEML tokens. 
  • 2. All data processing operations are carried out on the data.
  • 3. The results (machine tasks and user tasks) are compared with both types of tokens.

The experiment can eventually be repeated iteratively with minor modifications until satisfactory results are achieved.

If the hypothesis is confirmed, we proceed to the next step

Step C: Towards an automation of semantic pre-processing in IEML.

If the superior efficiency of IEML tokens for semantic metadata is demonstrated, then there will be a strong interest in maximizing the automation of IEML semantic pre-processing

The algorithms used in our experiment are themselves powerful tools for data pre-processing, they can be used, according to methods to be developed, to partially automate semantic indexing in IEML. The “word embeddings” will make it possible to study how IEML words are correlated with the natural language lexical statistics of the articles and to detect anomalies. For example, we will check if similar USLs (a USL is an IEML text) point to very different texts or if very different texts have similar USLs. 

Finally, methods will be developed to use deep learning algorithms to automatically index datasets in IEML.

Step D: Research and development perspective in Semantic Machine Learning

If step C provides the expected results, i.e. methods using AI to automate the indexing of data in IEML, then big data indexed in IEML will be available.  As progress will be made, semantic metadata may become increasingly similar to textual data (summary of sections, paragraphs, sentences, etc.) until translation into IEML is achieved, which remains a distant objective.

The data indexed in IEML could then be used to train artificial intelligence algorithms. The hypothesis that machines learn more easily when data is categorized in IEML could easily be validated by experiments of the same type as described above, by comparing the results obtained from training data indexed in IEML and the results obtained from the same data indexed in natural languages.

This last step paves the way for a better integration of statistical AI and symbolic AI (based on facts and rules, which can be expressed in IEML).

3 A research program in linguistics, humanities and social sciences

Introduction

The semiotic and linguistic development program has two interdependent components:

1. The development of the IEML metalanguage

2. The development of translation systems and bridges between IEML and other sign systems, in particular… 

  •     – natural languages,
  •     – logical formalisms,
  •     – pragmatic “language games” and games in general,
  •     – iconic languages,
  •     – artistic languages, etc.

This research and development agenda, particularly in its linguistic dimension, is important for the digital humanities. Indeed, IEML can serve as a system of semantic coordinates of the cultural universe, thus allowing the humanities to cross a threshold of scientific maturity that would bring their epistemological status closer to that of the natural sciences. Using IEML to index data and to formulate assumptions would result in….

  • (1) a de-silo of databases used by researchers in the social sciences and humanities, which would allow for the sharing and comparison of categorization systems and interpretive assumptions;
  • (2) an improved analysis of data.
  • (3) The ultimate perspective, set out in the article “The Role of the Digital Humanities in the New Political Space” (http://sens-public.org/article1369.html in French), is to aim for a reflective collective intelligence of the social sciences and humanities research community. 

But IEML’s research program in the perspective of the digital humanities – as well as its research program in data science – requires a living and dynamic semiotic and linguistic development program, some aspects of which I will outline here.

IEML and the Meaning-Text Theory

IEML’s linguistic research program is very much based on the Meaning-Text theory developed by Igor Melchuk and his school. “The main principle of this theory is to develop formal and descriptive representations of natural languages that can serve as a reliable and convenient basis for the construction of Meaning-Text models, descriptions that can be adapted to all languages, and therefore universal. ”(Excerpt translated from the Wikipedia article on Igor Melchuk). Dictionaries developed by linguists in this field connect words according to universal “lexical functions” identified through the analysis of many languages. These lexical functions have been formally transposed into the very structure of IEML (See the IEML Glossary Creation Guide) so that the IEML dictionary can be organized by the same tools (e.g. Spiderlex) as those of the Meaning-Text Theory research network. Conversely, IEML could be used as a pivot language – or concept description language – *between* the natural language dictionaries developed by the network of researchers skilled in Meaning-Text theory.

Construction of specialized lexicons in the humanities and social sciences

A significant part of the IEML lexicon will be produced by communities having decided to use IEML to mark out their particular areas of knowledge, competence or interaction. Our research in specialized lexicon construction aims to develop the best methods to help expert communities produce IEML lexicons. One of the approaches consists in identifying the “conceptual skeleton” of a domain, namely its main constants in terms of character paradigms and word paradigms. 

The first experimentation of this type of collaborative construction of specialized lexicons by experts will be conducted by Pierre Lévy in collaboration with the editorial team of the Sens Public scientific journal and the Canada Research Chair in Digital Textualities at the University of Montréal (led by Prof. Marcello Vitali-Rosati). Based on a determination of their economic and social importance, other specialized glossaries can be constructed, for example on the theme of professional skills, e-learning resources, public health prevention, etc.

Ultimately, the “digital humanities” branch of IEML will need to collaboratively develop a conceptual lexicon of the humanities to be used for the indexation of books and articles, but also chapters, sections and comments in documents. The same glossary should also facilitate data navigation and analysis. There is a whole program of development in digital library science here. I would particularly like to focus on the human sciences because the natural sciences have already developed a formal vocabulary that is already consensual.

Construction of logical, pragmatic and narrative character-tools

When we’ll have a sentence and super-phrase editor, it is planned to establish a correspondence between IEML – on the one hand – and propositional calculus and first order logics – on the other hand –. This will be done by specifying special character-tools to implement logical functions. Particular attention will be paid to formalizing the definition of rules and the declaration that “facts” are true in IEML. It should be noted in passing that, in IEML, grammatical expressions represent classes, sets or categories, but that logical individuals (proper names, numbers, etc.) or instances of classes are represented by “literals” expressed in ordinary characters (phonetic alphabets, Chinese characters, Arabic numbers, URLs, etc.).

In anticipation of practical use in communication, games, commerce, law (smart contracts), chatbots, robots, the Internet of Things, etc., we will develop a range of character-tools with illocutionary force such as “I offer”, “I buy”, “I quote”, “I give an instruction”, etc.

Finally, we will making it easier for authors of super-sentences by developing a range of character-tools implementing “narrative functions”.

4 A software development program

A software environment for the development and public use of the IEML language

Logically, the first multi-user IEML application will be dedicated to the development of the language itself. This application is composed of the following three web modules.

  • 1. A morpheme editor that also allows you to navigate in the morphemes database, or “dictionary”.
  • 2. A character and word editor that also allows navigation in the “lexicon”.
  • 3. A navigation and reading tool in the IEML library as a whole, or “IEML database” that brings together the dictionary and lexicon, with translations, synonyms and comments in French and English for the moment.

The IEML database is a “Git” database and is currently hosted by GitHub. Indeed, a Git database makes it possible to record successive versions of the language, as well as to monitor and model its growth. It also allows large-scale collaboration among teams capable of developing specific branches of the lexicon independently and then integrating them into the main branch after discussion, as is done in the collaborative development of large software projects. As soon as a sub-lexicon is integrated into the main branch of the Git database, it becomes a “common” usable by everyone (according to the latest General Public License version.

Morpheme and word editors are actually “Git clients” that feed the IEML database. A first version of this collaborative read-write environment should be available in the fall of 2019 and then tested by real users: the editors of the Scientific Journal “Sens Public” as well as other participants in the University of Montréal’s IEML seminar.

The following versions of the IEML read/write environment should allow the editing of sentences and texts as well as literals that are logical individuals not translated into IEML, such as proper names, numbers, URLs, etc.

A social medium for collaborative knowledge management

A large number of applications using IEML can be considered, both commercial and non-commercial. Among all these applications, one of them seems to be particularly aligned with the public interest: a social medium dedicated to collaborative knowledge and skills management. This new “place of knowledge” could allow the online convergence of the missions of… 

  • – museums and libraries, 
  • – schools and universities, 
  • – companies and administrations (with regard to their knowledge creation and management dimension), 
  • – smart cities, employment agencies, civil society networks, NGO, associations, etc.

According to its general philosophy, such a social medium should…

  • – be supported by an intrinsically distributed platform, 
  • – have the simplicity – or the economy of means – of Twitter,
  • – ensure the sovereignty of users over their data,
  • – promote collaborative processes.

The main functions performed by this social medium would be:

  • – data curation (reference and categorization of web pages, edition of resource collections), 
  • – teaching offers and learning demands,
  • – offers and demands for skills, or employment market.

IEML would serve as a common language for

  • – data categorization, 
  • – description of the knowledge and skills, 
  • – the expression of acts within the social medium (supply, demand, consent, publish, etc.)
  • – addressing users through their knowledge and skills.

Three levels of meaning would thus be formalized in this medium.

  • (1) The linguistic level in IEML  – including lexical and narrative functions – formalizes what is spoken about (lexicon) and what is said (sentences and super-phrases).
  • – (2) The logical – or referential – level adds to the linguistic level… 
  •     – logical functions (first order logic and propositional logic) expressed in IEML using logical character-tools,
  •     – the ability of pointing to references (literals, document URLs, datasets, etc.),
  •     – the means to express facts and rules in IEML and thus to feed inference engines.
  • – (3) The pragmatic level adds illocutionary functions and users to the linguistic and logical levels.
  •     – Illocutionary functions (thanks to pragmatic character-tools) allow the expression of conventional acts and rules (such as “game” rules). 
  •     – The pragmatic level obviously requires the consideration of players or users, as well as user groups.
  •     – It should be noted that there is no formal difference between logical inference and pragmatic inference but only a difference in use, one aiming at the truth of propositions according to referred states of things, the other calculating the rights, obligations, gains, etc. of users according to their actions and the rules of the games they play.

The semantic profiles of users and datasets will be arranged according to the three levels that have just been explained. The “place of knowledge” could be enhanced by the use of tokens or crypto-currencies to reward participation in collective intelligence. If successful, this type of medium could be generalized to other areas such as health, democratic governance, trade, etc.

I put forward in this paper a vision for a new generation of cloud-based public communication service designed to foster reflexive collective intelligence. I begin with a description of the current situation, including the huge power and social shortcomings of platforms like Google, Apple, Facebook, Amazon, Microsoft, Alibaba, Baidu, etc. Contrasting with the practice of these tech giants, I reassert the values that are direly needed at the foundation of any future global public sphere: openness, transparency and commonality. But such ethical and practical guidelines are probably not powerful enough to help us crossing a new threshold in collective intelligence. Only a disruptive innovation in cognitive computing will do the trick. That’s why I introduce “deep meaning” a new research program in artificial intelligence, based on the Information Economy  MetaLanguage (IEML). I conclude this paper by evoking possible bootstrapping scenarii for the new public platform.

The rise of platforms

At the end of the 20th century, one percent of the human population was connected to the Internet. In 2017, more than half the population is connected. Most of the users interact in social media, search information, buy products and services online. But despite the ongoing success of digital communication, there is a growing dissatisfaction about the big tech companies – the “Silicon Valley” – who dominate the new communication environment.

The big techs are the most valued companies in the world and the massive amount of data that they possess is considered the most precious good of our time. Silicon Valley owns the big computers: the network of physical centers where our personal and business data are stored and processed. Their income comes from their economic exploitation of our data for marketing purposes and from their sales of hardware, software or services. But they also derive considerable power from the knowledge of markets and public opinions that stems from their information control.

The big cloud companies master new computing techniques mimicking neurons when they learn a new behavior. These programs are marketed as deep learning or artificial intelligence even if they have no cognitive autonomy and need some intense training by humans before becoming useful. Despite their well known limitations, machine learning algorithms have effectively augmented the abilities of digital systems. Deep learning is now used in every economic sector. Chips specialized in deep learning are found in big data centers, smartphones, robots and autonomous vehicles. As Vladimir Putin rightly told young Russians in his speech for the first day of school in fall 2017: “Whoever becomes the leader in this sphere [of artificial intelligence] will become the ruler of the world”.

The tech giants control huge business ecosystems beyond their official legal borders and they can ruin or buy competitors. Unfortunately, the big tech rivalry prevents a real interoperability between cloud services, even if such interoperability would be in the interest of the general public and of many smaller businesses. As if their technical and economic powers were not enough, the big tech are now playing into the courts of governments. Facebook warrants our identity and warns our family and friends that we are safe when a terrorist attack or a natural disaster occurs. Mark Zuckerberg states that one of Facebook’s mission is to insure that the electoral process is fair and open in democratic countries. Google Earth and Google Street View are now used by several municipal instances and governments as their primary source of information for cadastral plans and other geographical or geospatial services. Twitter became an official global political, diplomatic and news service. Microsoft sells its digital infrastructure to public schools. The kingdom of Denmark opened an official embassy in Silicon Valley. Cryptocurrencies independent from nation states (like Bitcoin) are becoming increasingly popular. Blockchain-based smart contracts (powered by Ethereum) bypass state authentication and traditional paper bureaucracies. Some traditional functions of government are taken over by private technological ventures.

This should not come as a surprise. The practice of writing in ancient palace-temples gave birth to government as a separate entity. Alphabet and paper allowed the emergence of merchant city-states and the expansion of literate empires. The printing press, industrial economy, motorized transportation and electronic media sustained nation-states. The digital revolution will foster new forms of government. Today, we discuss political problems in a global public space taking advantage of the web and social media and the majority of humans live in interconnected cities and metropoles. Each urban node wants to be an accelerator of collective intelligence, a smart city. We need to think about public services in a new way. Schools, universities, public health institutions, mail services, archives, public libraries and museums should take full advantage of the internet and de-silo their datasets. But we should go further. Are current platforms doing their best to enhance collective intelligence and human development? How about giving back to the general population the data produced in social media and other cloud services, instead of just monetizing it for marketing purposes ? How about giving to the people access to cognitive powers unleashed by an ubiquitous algorithmic medium?

Information wants to be open, transparent and common

We need a new kind of public sphere: a platform in the cloud where data and metadata would be our common good, dedicated to the recording and collaborative exploitation of memory in the service of our collective intelligence. The core values orienting the construction of this new public sphere should be: openness, transparency and commonality

Firstly openness has already been experimented in the scientific community, the free software movement, the creative commons licensing, Wikipedia and many more endeavors. It has been adopted by several big industries and governments. “Open by default” will soon be the new normal. Openness is on the rise because it maximizes the improvement of goods and services, fosters trust and supports collaborative engagement. It can be applied to data formats, operating systems, abstract models, algorithms and even hardware. Openness applies also to taxonomies, ontologies, search architectures, etc. A new open public space should encourage all participants to create, comment, categorize, assess and analyze its content.

Then, transparency is the very ground for trust and the precondition of an authentic dialogue. Data and people (including the administrators of a platform), should be traceable and audit-able. Transparency should be reciprocal, without distinction between the rulers and the ruled. Such transparency will ultimately be the basis for reflexive collective intelligence, allowing teams and communities of any size to observe and compare their cognitive activity

Commonality means that people will not have to pay to get access to this new public sphere: all will be free and public property. Commonality means also transversality: de-silo and cross-pollination. Smart communities will interconnect and recombine all kind of useful information: open archives of libraries and museums, free academic publications, shared learning resources, knowledge management repositories, open-source intelligence datasets, news, public legal databases…

From deep learning to deep meaning

This new public platform will be based on the web and its open standards like http, URL, html, etc. Like all current platforms, it will take advantage of distributed computing in the cloud and it will use “deep learning”: an artificial intelligence technology that employs specialized chips and algorithms that roughly mimic the learning process of neurons. Finally, to be completely up to date, the next public platform will enable blockchain-based payments, transactions, contracts and secure records

If a public platform offers the same technologies as the big tech (cloud, deep learning, blockchain), with the sole difference of openness, transparency and commonality, it may prove insufficient to foster a swift adoption, as is demonstrated by the relative failures of Diaspora (open Facebook) and Mastodon (open Twitter). Such a project may only succeed if it comes up with some technical advantage compared to the existing commercial platforms. Moreover, this technical advantage should have appealing political and philosophical dimensions.

No one really fancies the dream of autonomous machines, specially considering the current limitations of artificial intelligence. Instead, we want an artificial intelligence designed for the augmentation of human personal and collective intellect. That’s why, in addition to the current state of the art, the new platform will integrate the brand new deep meaning technology. Deep meaning will expand the actual reach of artificial intelligence, improve the user experience of big data analytics and allow the reflexivity of personal and collective intelligence.

Language as a platform

In a nutshell, deep learning models neurons and deep meaning models language. In order to augment the human intellect, we need both! Right now deep learning is based on neural networks simulation. It is enough to model roughly animal cognition (every animal species has neurons) but it is not refined enough to model human cognition. The difference between animal cognition and human cognition is the reflexive thinking that comes from language, which adds a layer of semantic addressing on top of neural connectivity. Speech production and understanding is an innate property of individual human brains. But as humanity is a social species, language is a property of human societies. Languages are conventional, shared by members of the same culture and learned by social contact. In human cognition, the categories that organize perception, action, memory and learning are expressed linguistically so they may be reflected upon and shared in conversations. A language works like the semantic addressing system of a social virtual database.

But there is a problem with natural languages (english, french, arabic, etc.), they are irregular and do not lend themselves easily to machine understanding or machine translation. The current trend in natural language processing, an important field of artificial intelligence, is to use statistical algorithms and deep learning methods to understand and produce linguistic data. But instead of using statistics, deep meaning adopts a regular and computable metalanguage. I have designed IEML (Information Economy MetaLanguage) from the beginning to optimize semantic computing. IEML words are built from six primitive symbols and two operations: addition and multiplication. The semantic relations between IEML words follow the lines of their generative operations. The total number of words do not exceed 10 000. From its dictionary, the generative grammar of IEML allows the construction of sentences at three layers of complexity: topics are made of words, phrases (facts, events) are made of topics and super-phrases (theories, narratives) are made of phrases. The higher meaning unit, or text, is a unique set of sentences. Deep meaning technology uses IEML as the semantic addressing system of a social database.

Given large datasets, deep meaning allows the automatic computing of semantic relations between data, semantic analysis and semantic visualizations. This new technology fosters semantic interoperability: it decompartmentalizes tags, folksonomies, taxonomies, ontologies and languages. When on line communities categorize, assess and exchange semantic data, they generate explorable ecosystems of ideas that represent their collective intelligence. Take note that the vision of collective intelligence proposed here is distinct from the “wisdom of the crowd” model, that assumes independent agents and excludes dialogue and reflexivity. Just the opposite : deep meaning was designed from the beginning to nurture dialogue and reflexivity.

The main functions of the new public sphere

deepmeaning

In the new public sphere, every netizen will act as an author, editor, artist, curator, critique, messenger, contractor and gamer. The next platform weaves five functions together: curation, creation, communication, transaction and immersion.

By curation I mean the collaborative creation, edition, analysis, synthesis, visualization, explanation and publication of datasets. People posting, liking and commenting content on social media are already doing data curation, in a primitive, simple way. Active professionals in the fields of heritage preservation (library, museums), digital humanities, education, knowledge management, data-driven journalism or open-source intelligence practice data curation in a more systematic and mindful manner. The new platform will offer a consistent service of collaborative data curation empowered by a common semantic addressing system.

Augmented by deep meaning technology, our public sphere will include a semantic metadata editor applicable to any document format. It will work as a registration system for the works of the mind. Communication will be ensured by a global Twitter-like public posting system. But instead of the current hashtags that are mere sequences of characters, the new semantic tags will self-translate in all natural languages and interconnect by conceptual proximity. The blockchain layer will allow any transaction to be recorded. The platform will remunerate authors and curators in collective intelligence coins, according to the public engagement generated by their work. The new public sphere will be grounded in the internet of things, smart cities, ambient intelligence and augmented reality. People will control their environment and communicate with sensors, software agents and bots of all kinds in the same immersive semantic space. Virtual worlds will simulate the collective intelligence of teams, networks and cities.

Bootstrapping

This IEML-based platform has been developed between 2002 and 2017 at the University of Ottawa. A prototype is currently in a pre-alpha version, featuring the curation functionality. An alpha version will be demonstrated in the summer of 2018. How to bridge the gap from the fundamental research to the full scale industrial platform? Such endeavor will be much less expensive than the conquest of space and could bring a tremendous augmentation of human collective intelligence. Even if the network effect applies obviously to the new public space, small communities of pioneers will benefit immediately from its early release. On the humanistic side, I have already mentioned museums and libraries, researchers in humanities and social science, collaborative learning networks, data-oriented journalists, knowledge management and business intelligence professionals, etc. On the engineering side, deep meaning opens a new sub-field of artificial intelligence that will enhance current techniques of big data analytics, machine learning, natural language processing, internet of things, augmented reality and other immersive interfaces. Because it is open source by design, the development of the new technology can be crowdsourced and shared easily among many different actors.

Let’s draw a distinction between the new public sphere, including its semantic coordinate system, and the commercial platforms that will give access to it. This distinction being made, we can imagine a consortium of big tech companies, universities and governments supporting the development of the global public service of the future. We may also imagine one of the big techs taking the lead to associate its name to the new platform and developing some hardware specialized in deep meaning. Another scenario is the foundation of a company that will ensure the construction and maintenance of the new platform as a free public service while sustaining itself by offering semantic services: research, consulting, design and training. In any case, a new international school must be established around a virtual dockyard where trainees and trainers build and improve progressively the semantic coordinate system and other basic models of the new platform. Students from various organizations and backgrounds will gain experience in the field of deep meaning and will disseminate the acquired knowledge back into their communities.

Emission de radio (Suisse romande), 25 minutes en français.

Sémantique numérique et réseaux sociaux. Vers un service public planétaire, 1h en français

You-Tube Video (in english) 1h

 

 

Diapositive1.jpg
FIGURE 1

J’ai montré dans un post précédent, l’importance contemporaine de la curation collaborative de données.  Les compétences dans ce domaine sont au coeur de la nouvelle litéracie algorithmique. La figure 1 présente ces compétences de manière systématique et, ce faisant, elle met en ordre les savoir-faire intellectuels et pratiques tout comme les « savoir-être » éthiques qui supportent l’augmentation de l’intelligence collective en ligne. L’étoile évoque le signe, le visage l’être et le cube la chose (sur ces concepts voir ce post). La table est organisée en trois rangées et trois colonnes interdépendantes. La première rangée explicite les fondements de l’intelligence algorithmique au niveau personnel, la seconde rappelle l’indispensable travail critique sur les sources de données et la troisième détaille les compétences nécessaires à l’émergence d’une intelligence collective augmentée par les algorithmes. L’intelligence personnelle et l’intelligence collective travaillent ensemble et ni l’une ni l’autre ne peuvent se passer d’intelligence critique ! Les colonnes évoquent trois dimensions complémentaires de la cognition : la conscience réflexive, la production de signification et la mémoire. Aucune d’elles ne doit être tenue pour acquise et toutes peuvent faire l’objet d’entraînement et de perfectionnement. Dans chaque case, l’item du haut pointe vers un exercice de virtualisation tandis que celui du bas indique une mise en oeuvre actuelle de la compétence, plus concrète et située. Je vais maintenant commenter le tableau de la figure 1 rangée par rangée.

L’intelligence personnelle

La notion d’intelligence personnelle doit ici s’entendre au sens d’une compétence cognitive individuelle. Mais elle tire également vers la signification du mot « intelligence » en anglais. Dans ce dernier sens, elle désigne la capacité d’un individu à mettre en place son propre système de renseignement.

La gestion de l’attention ne concerne pas seulement l’exercice de la concentration et l’art complémentaire d’éviter les distractions. Elle inclut aussi le choix réfléchi de priorités d’apprentissage et le discernement de sources d’information pertinentes. Le curateur lui-même doit décider de ce qui est pertinent et de ce qui ne l’est pas selon ses propres critères et en fonction des priorités qu’il s’est donné. Quant à la notion de source, est-il besoin de souligner ici que seuls les individus, les groupes et les institutions peuvent être ainsi qualifiés. Seuls donc ils méritent la confiance ou la méfiance. Quant aux médias sociaux, ce ne sont en aucun cas des sources (contrairement à ce que croient certains journalistes) mais plutôt des plateformes de communication. Prétendre, par exemple, que « Twitter n’est pas une source fiable », n’a pas plus de sens que l’idée selon laquelle « le téléphone n’est pas une source fiable ».

L’interpretation des données relève également de la responsabilité des curateurs. Avec tous les algorithmes statistiques et tous les outils d’analyse automatique de données (« big data analytics ») du monde, nous aurons encore besoin d’hypothèses causales, de théories et de systèmes de catégorisation pour soutenir ces théories. Les corrélations statistiques peuvent suggérer des hypothèses causales mais elles ne les remplacent pas. Car nous voulons non seulement prédire le comportement de phénomènes complexes, mais aussi les comprendre et agir sur la base de cette compréhension. Or l’action efficace suppose une saisie des causes réelles et non seulement la perception de corrélations. Sans les intuitions et les théories dérivées de notre connaissance personnelle d’un domaine, les outils d’analyse automatique de données ne seront pas utilisés à bon escient. Poser de bonnes questions aux données n’est pas une entreprise triviale !

Finalement, les données collectionnées doivent être gérées au plan matériel. Il nous faut donc choisir les bons outils d’entreposage dans les « nuages » et savoir manipuler ces outils. Mais la mémoire doit être aussi entretenue au niveau conceptuel. C’est pourquoi le bon curateur est capable de créer, d’adopter et surtout de maintenir un système de catégorisation qui lui permettra de retrouver l’information désirée et d’extraire de ses collections la connaissance qui lui sera utile.

L’intelligence critique

L’intelligence critique porte essentiellement sur la qualité des sources. Elle exige d’abord un travail de critique « externe ». Nous savons qu’il n’existe pas d’autorité transcendante dans le nouvel espace de communication. Si nous ne voulons pas être trompé, abusé, ou aveuglé par des oeillères informationnelles, il nous faut donc autant que possible diversifier nos sources. Notre fenêtre d’attention doit être maintenue bien ouverte, c’est pourquoi nous nous abonnerons à des sources adoptant divers points de vue, récits organisateurs et théories. Cette diversité nous permettra de croiser les données, d’observer les sujets sur lesquelles elles se contredisent et ceux sur lesquelles elles se confirment mutuellement.

L’évaluation des sources demande également un effort de décryptage des identités : c’est la critique « interne ». Pour comprendre la nature d’une source, nous devons reconnaître son système de classification, ses catégories maîtresses et son récit organisateur. En un sens, une source n’est autre que le récit autour duquel elle organise ses données : sa manière de produire du sens.

Finalement l’intelligence critique possède une dimension « pragmatique ». Cette critique est la plus dévastatrice parce qu’elle compare le récit de la source avec ce qu’elle fait réellement. Je vise ici ce qu’elle fait en diffusant ses messages, c’est-à-dire l’effet concret de ses actes de communication sur les conversations en cours et l’état d’esprit des participants. Je vise également les contributions intellectuelles et esthétiques de la source, ses interactions économiques, politiques, militaires ou autres telles qu’elles sont rapportées par d’autres sources. Grâce à cette bonne mémoire nous pouvons noter les contradictions de la source selon les moments et les publics, les décalages entre son récit officiel et les effets pratiques de ses actions. Enfin, plus une source se montre transparente au sujet de ses propres sources d’informations, de ses références, de son agenda et de son financement et plus elle est fiable. Inversement, l’opacité éveille les soupçons.

L’intelligence collective

Je rappelle que l’intelligence collective dont il est question ici n’est pas une « solution miracle » mais un savoir-faire à cultiver qui présuppose et renforce en retour les intelligences personnelles et critiques.

Commençons par définir la stigmergie : il s’agit d’un mode de communication dans lequel les agents se coordonnent et s’informent mutuellement en modifiant un environnement ou une mémoire commune. Dans le médium algorithmique, la communication tend à s’établir entre des pairs qui créent, catégorisent, critiquent, organisent, lisent, promeuvent et analysent des données au moyen d’outils algorithmiques. Il s’agit bien d’une communication stigmergique parce que, même si les personnes dialoguent et se parlent directement, le principal canal de communication reste une mémoire commune que les participants exploitent et transforment ensemble. Il est utile de distinguer entre les mémoires locale et globale. Dans la mémoire « locale » de réseaux ou de communautés particulières, nous devons prêter attention à des contextes et à des histoires singulières. Il est également recommandé de tenir compte des contributions des autres participants, de ne pas aborder des sujets non-pertinents pour le groupe, d’éviter les provocations, les explosions d’agressivité, les provocations, etc.

Quant à la mémoire « globale », il faut se souvenir que chaque action dans le médium algorithmique réorganise – même de façon infinitésimale – la mémoire commune : lire, taguer, acheter, poster, créer un hyperlien, souscrire, s’abonner, « aimer », etc. Nous créons notre environnement symbolique de manière collaborative. Le bon agent humain de l’intelligence collective gardera donc à la conscience que ses actions en ligne contribuent à l’information des autres agents.

La liberté dont il est question dans la figure 1 se présente comme une dialectique entre pouvoir et responsabilité. Le pouvoir recouvre notre capacité à créer, évaluer, organiser, lire et analyser les données, notre aptitude à faire évoluer la mémoire commune à partir de la multitude distribuée de nos actions. La responsabilité se fonde sur une conscience réfléchie de notre pouvoir collectif, conscience qui informe en retour l’orientation de notre attention et le sens que nous donnons à l’exercice de nos pouvoirs.

Diapositive4.jpg

FIGURE 2

L’apprentissage collaboratif

Finalement, l’apprentissage collaboratif est un des processus cognitifs majeurs de l’intelligence collective et le principal bénéfice social des habiletés en curation de données. Afin de bien saisir ce processus, nous devons distinguer entre savoirs tacites et savoirs explicites. Les savoirs tacites recouvrent ce que les membres d’une communauté ont appris dans des contextes particuliers, les savoir-faire internalisés dans les réflexes personnels à partir de l’expérience. Les savoirs explicites, en revanche, sont des récits, des images, des données, des logiciels ou d’autres ressources documentaires, qui sont aussi clairs et décontextualisés que possible, afin de pouvoir être partagés largement.

L’apprentissage collaboratif enchaîne deux mouvements. Le premier consiste à traduire le savoir tacite en savoir explicite pour alimenter une mémoire commune. Dans un second mouvement, complémentaire du premier, les participants exploitent le savoir explicite et les ressources d’apprentissage disponibles dans la mémoire commune afin d’adapter ces connaissances à leur contexte particulier et de les intégrer dans leurs réflexes quotidiens. Les curateurs sont potentiellement des étudiants ou des apprenants lorsqu’ils internalisent un savoir explicite et ils peuvent se considérer comme des enseignants lorsqu’ils mettent des savoirs explicites à la disposition des autres. Ce sont donc des pairs (voir la figure 2) qui travaillent dans un champ de pratique commun. Ils transforment autant que possible leur savoir tacite en savoir explicite et travaillent en retour à traduire la partie des connaissances explicites qu’ils veulent acquérir en savoir pratique personnel. J’écris “autant que possible” parce que l’explicitation totale du savoir tacite est hors de portée, comme l’a bien montré Michael Polanyi.

Dans le médium algorithmique, le savoir explicite prend la forme de données catégorisées et évaluées. Le cycle de transformation des savoirs tacites en savoirs explicites et vice versa prend place dans les médias sociaux, où il est facilité par une conversation créative civilisée : les compétences intellectuelles et sociales (ou morales) fonctionnent ensemble !

La curation de données

Comme Monsieur Jourdain faisait de la prose sans le savoir, tout le monde fait aujourd’hui de la curation de données – on dit aussi de la curation de contenu – sans le savoir. Sur les grandes plateformes de médias sociaux comme Facebook, Twitter, Pinterest ou Instagram, mais aussi dans une multitude d’applications en ligne plus spécialisées comme Evernote, Scoop.it ou Diigo, les utilisateurs font référence à des données (textes, images, vidéos, musique…) qu’ils accompagnent de commentaires, de hashtags classificateurs et de diverses formes d’évaluations et d’émoticons. Ces posts s’accumulent dans des collections personnelles ou communautaires, apparaissent sur les fils d’autres utilisateurs et sont réexpédiées ad libitum avec d’éventuels changements de commentaires, de hashtags et d’appréciations émotionnelles. Les posts deviennent eux-mêmes des données qui peuvent à leur tour faire l’objet de références, de commentaires, de marquage affectif, de recherche et d’analyse. Les médias sociaux nous proposent des outils perfectionnés de gestion de base de données, avec des algorithmes de fouille, d’apprentissage machine, de reconnaissance de forme et de filtrage collaboratif qui nous aident à naviguer parmi la masse du contenu et les foules d’utilisateurs. Mais l’alimentation de la base tout comme la catégorisation et l’évaluation des données sont à notre charge.

labyrinthe-livres.jpg

Le mot curation, employé d’abord en anglais pour désigner l’activité d’un commissaire d’exposition dans l’univers des galeries d’art et des musées, a été récemment généralisé à toutes les activités de collection d’information. L’étymologie latine du mot évoque le soin médical (la cure) et plus généralement le souci. S’il est vrai que nous entrons dans une société datacentrique, le souci des données, l’activité qui consiste à collectionner et organiser des données pour soi et pour les autres devient cruciale. Et puisque la société datacentrique repose sur une effervescente économie de la connaissance, au sens le plus vaste et le plus « écologique » de la notion d’économie (voir à ce sujet La Sphère sémantique 1, Chp. 6.) l’enjeu ultime de la curation de données n’est autre que la production et le partage des connaissances.

Je vais maintenant évoquer un certain nombre de sphères d’activité dans lesquelles la maîtrise de la curation collaborative de données commence à s’imposer comme une compétence essentielle : la conservation des héritages, la recherche en sciences humaines, l’apprentissage collaboratif, la production et la diffusion des nouvelles, le renseignement à sources ouvertes et la gestion des connaissances.

url.jpg

La conservation des héritages

Les responsables des archives, bibliothèques, médiathèques et musées collectionnent depuis des siècles des artefacts porteurs d’information et les organisent de telle sorte que leur public puisse les retrouver et les consulter. C’est dans ce milieu professionnel qu’est d’abord apparue la distinction entre données et métadonnées. Du côté des données, les documents physiques sont posés sur des étagères. Du côté des métadonnées, un fichier permet de rechercher les documents par auteurs, titres, sujets, disciplines, dates, etc. Le bibliothécaire fabrique une fiche, voire plusieurs fiches, pour chaque document qui entre dans la bibliothèque et le lecteur fouille dans les fiches pour explorer le contenu de la bibliothèque et savoir où se trouvent placés les livres qu’il veut lire. Sans l’appareillage des métadonnées et les principes d’organisation qui les sous-tendent il serait impossible d’exploiter les informations contenues dans une bibliothèque. Depuis la fin du XXe siècle, le monde des archives, des bibliothèques et des musées connaît une grande transformation. La numérisation fait converger toutes les informations dans le médium algorithmique et cette unification met cruellement en évidence la disparité et l’incompatibilité des systèmes de classification en usage. De plus, les principaux systèmes de métadonnées ont été conçus et utilisés à l’époque de l’imprimerie, ils n’exploitent donc pas les nouvelles possibilités de calcul automatique. Finalement, les flots d’information ont tellement crû qu’ils échappent à toute possibilité de catalogage classique par un petit nombre de professionnels. Depuis quelques années, les musées et bibliothèques numérisent et mettent en ligne leurs collections en faisant appel au crowdsourcing, c’est-à-dire à l’intelligence collective des internautes, pour catégoriser les données. Cette curation collaborative de données brouille la distinction entre curateurs et utilisateurs tout en manifestant la diversité des points de vue et des intérêts du public. Par ailleurs, une multitude de sites puisant leurs données dans le Web ouvert, et souvent indépendants des institutions classiques de préservation des héritages culturels, permettent aux amateurs d’art ou aux bibliophiles de partager leurs goûts et leurs trouvailles, de se regrouper par sensibilité et par centres d’intérêts.

La recherche en sciences humaines

La numérisation des archives et des héritages culturels, l’accessibilité des données et statistiques compilées par les gouvernements et les institutions internationales, les communications et transactions des internautes recueillies par les grandes plateformes du Web, toutes ces nouvelles sources offrent aux sciences humaines une matière première dont l’abondance défie l’imagination. Par ailleurs les blogs de chercheurs, les plateformes collaboratives spécialisées dans la collection d’articles (comme Academia.edu, Researchgate, Mendeley, CiteULike…) et les bases de données partagées transforment profondément les pratiques de recherche. Enfin, une frange croissante des professionnels des sciences humaines s’initie à la programmation et à l’usage avancé des algorithmes, produisant et partageant le plus souvent des outils open source. L’édition scientifique traditionnelle est en crise puisque la communication entre chercheurs n’a plus besoin de journaux imprimés. Chaque plateforme en ligne propose ses propres méthodes d’appréciation des publications, basées sur un traitement automatisé des interactions sociales, ce qui remet en question les modes classiques de filtrage et d’évaluation des articles. Certes, le problème posé par l’incompatibilité des plateformes et des systèmes de catégorisation reste à résoudre. Il subsiste donc quelques obstacles à franchir, mais tout est en place pour que la curation collaborative de données s’impose comme l’activité centrale de la recherche en sciences humaines… et de son évaluation.

L’apprentissage collaboratif

La curation collaborative de données émerge également comme une pratique essentielle dans le domaine de l’éducation. A l’époque du médium algorithmique, les connaissances évoluent vite, presque toutes les ressources d’apprentissage sont disponibles gratuitement en ligne et les étudiants sont déjà plongés dans le bain des médias sociaux. Le vieux modèle des communautés d’apprentissage s’organisant autour d’une bibliothèque ou d’un entrepôt physique de documents est donc obsolète. L’apprentissage doit être de plus en plus pensé comme partiellement délocalisé, collaboratif et continu. L’ensemble de la société acquiert une dimension d’apprentissage. Cela n’implique pas que les institutions d’enseignement classiques, école et université, ne soient plus pertinentes, bien au contraire. C’est précisément parce que l’apprentissage va puiser dans un stock de ressources pratiquement infini qu’aucune autorité transcendante ne peut plus organiser et hiérarchiser a priori que l’école a l’obligation d’entraîner les jeunes gens à l’apprentissage collaboratif et critique par le biais des médias sociaux. La fameuse littéracie numérique ne repose pas principalement sur l’acquisition de compétences techniques en informatique (qui changent rapidement), mais plutôt sur un savoir-faire socio-cognitif orienté vers la curation collaborative de données : filtrer les contenus pertinents pour tel ou tel groupe, les catégoriser, les évaluer, consulter les données, rédiger de courtes synthèses… Ainsi les enseignants utilisent des plateformes de social bookmarking (partage de signets) comme Diigo pour animer leurs cours, les MOOCs connectivistes font appel aux étudiants pour alimenter leurs ressources d’apprentissage, on trouvera une multitude de hashtags reliés à l’éducation et à l’apprentissage sur Twitter et les groupes Facebook abritent de plus en plus de classes…

Les nouvelles

La production et la dissémination des nouvelles participe du même type de mutation que celles qui viennent d’être évoquées. Du côté de la production, les journalistes s’initient à l’exploitation statistique des bases de données ouvertes pour en retirer les synthèses et les visualisations qui vont alimenter leurs articles. Ils suivent leurs collègues ainsi qu’une foules de sources sur Twitter afin de rester à jour sur les thèmes dont ils s’occupent. Par ailleurs, ce ne sont plus seulement les agences de presse et les journalistes professionnels qui produisent les nouvelles mais également les acteurs culturels, économiques, politiques et militaires par l’intermédiaire de leurs sites et de leurs agents dans les médias sociaux. N’oublions pas non plus les citoyens ordinaires qui prennent des photos et des vidéos grâce à leurs téléphones intelligents, qui diffusent ce qu’ils voient et ce qu’ils pensent sur toutes les plateformes et qui réagissent en temps réel aux nouvelles diffusées par les médias classiques. Du côté de la réception, la consommation des nouvelles se fait de plus en plus en ligne par le biais de Facebook, de Twitter, de Google news et d’autres plateformes sociales. Puisque chacun peut accéder directement aux sources (les messages émis par les acteurs eux-mêmes), les médias classiques ont perdu le monopole de l’information. Sur les sujets qui m’intéressent, je suis les experts de mon choix, j’écoute tous les sons de cloche et je me fais ma propre idée sans être obligé de m’en remettre à des synthèses journalistiques simplificatrices et forcément tributaires d’un agenda ou d’un maître-récit (« narrative ») politique ou national. En somme, aussi bien les professionnels de l’information que le nouveau public critique en ligne pratiquent assidûment la curation collaborative de données

L’intelligence open-source

Le domaine du renseignement économique (« business intelligence »), politique ou militaire échappe progressivement à l’ancienne logique de l’espionnage. Désormais, l’abondance des sources d’information en ligne rend de moins en moins judicieux l’entretien d’un personnel spécialement chargé de recueillir des informations sur place. En revanche, les compétences linguistiques, culturelles et scientifiques, l’érudition en sciences humaines, la capacité à extraire les renseignements pertinents du flot des données, le monitoring des médias sociaux et le savoir-faire collaboratif deviennent indispensables. A part les noms et adresses des agents doubles et le détail des plans d’attaque, tout est désormais disponible sur internet. A qui sait chercher en ligne et lire entre les mots, les images des satellites, les sites médiatiques, académiques, diplomatiques et militaires, sans oublier les rapports des « think tanks » en pdf, permettent de comprendre les situations et de prendre des décisions éclairées. Certes, les agents d’influence, trolls, utilisateurs masqués et robots logiciels tentent de brouiller les cartes, mais ils révèlent à la longue les stratégies des marionnettistes qui les manipulent. Dans le domaine en pleine expansion de l’open source intelligence les agences de renseignement – comme la nuée de leurs fournisseurs d’information, d’analyse et de synthèse – coopèrent dans la production, l’échange et l’évaluation des données. Ici encore, la curation collaborative de contenu est à l’ordre du jour.

La gestion des connaissances

Une équipe de travail, une entreprise quelconque – qu’elle soit publique, privée ou associative – se trouve dans la nécessité de « gérer ses connaissances » pour atteindre ses buts. Le terme de gestion des connaissances a commencé à être utilisé vers le milieu des années 1990, au moment même où naissait le Web et alors que l’idée d’une économie basée sur les savoirs et l’innovation commençait à s’affirmer. L’un des principaux fondateurs de cette nouvelle discipline, Ikujiro Nonaka (né en 1935), s’est attaché à décrire le cycle de création des connaissances dans les entreprises en insistant sur la phase d’explicitation des savoir-faire pratiques. A la suite de Nonaka, de nombreux chercheurs et praticiens ont tenté de déterminer les meilleures méthodes pour expliciter les savoirs tacites – nés de l’expérience – afin de les conserver et de les diffuser dans les organisations. Les premiers outils de gestion des connaissances étaient assez rigides et centralisés, à l’image de l’informatique de l’époque. On met en place aujourd’hui (2016) de véritables médias sociaux d’entreprise, dans lesquels les collaborateurs peuvent repérer mutuellement leurs compétences, créer des groupes de travail et des communautés de pratique, accumuler des ressources et partager des données. Indépendamment des outils techniques utilisés, la gestion des connaissances est une dimension transversale de toute entreprise. Cette épistémologie appliquée inclut la conservation des savoirs et savoir-faire, le développement des compétences et des ressources humaines, l’art de créer et de diffuser les connaissances. De fait, en observant les pratiques contemporaines dans les médias sociaux d’entreprise qui supportent la gestion des connaissances, on découvre que l’une des principales activités se trouve être justement la curation collaborative de données.

Il existe donc une pratique commune à de nombreux secteurs de la culture mondiale contemporaine, pratique dont les cloisonnements sociaux et la disparité des jargons professionnels dissimulent l’unité et la transversalité. Je fais l’hypothèse que la curation collaborative de données est le support techno-social de l’intelligence collective à l’époque du médium algorithmique : écrire et lire… sur des flots de données.

Pour en savoir plus sur les compétences en curation collaborative de données, lisez-donc le post qui suit!

We will first make a detour by the history of knowledge and communication in order to understand what are the current priorities in education.

THE EVOLUTION OF KNOWLEDGE

0-Four-revol

The above slide describes the successive steps in the augmentation of symbolic manipulation. At each step in the history of symbolic manipulation, a new kind of knowledge unfolds. During the longest part of human history, the knowledge was only embedded in narratives, rituals and material tools.

The first revolution is the invention of writing with symbols endowed with the ability of self-conservation. This leads to a remarquable augmentation of social memory and to the emergence of new forms of knowledge. Ideas were reified on an external surface, which is an important condition for critical thinking. A new kind of systematic knowledge was developed: hermeneutics, astronomy, medicine, architecture (including geometry), etc.

The second revolution optimizes the manipulation of symbols like the invention of the alphabet (phenician, hebrew, greek, roman, arab, cyrilic, korean, etc.), the chinese rational ideographies, the indian numeration system by position with a zero, paper and the early printing techniques of China and Korea. The literate culture based on the alphabet (or rational ideographies) developed critical thinking further and gave birth to philosophy. At this stage, scholars attempted to deduce knowledge from observation and deduction from first principles. There was a deliberate effort to reach universality, particularly in mathematics, physics and cosmology.

The third revolution is the mecanization and the industrialization of the reproduction and diffusion of symbols, like the printing press, disks, movies, radio, TV, etc. This revolution supported the emergence of the modern world, with its nation states, industries and its experimental mathematized natural sciences. It was only in the typographic culture, from the 16th century, that natural sciences took the shape that we currently enjoy: systematic observation or experimentation and theories based on mathematical modeling. From the decomposition of theology and philosophy emerged the contemporary humanities and social sciences. But at this stage human science was still fragmented by disciplines and incompatible theories. Moreover, its theories were rarely mathematized or testable.

We are now at the beginning of a fourth revolution where an ubiquitous and interconnected infosphere is filled with symbols – i.e. data – of all kinds (music, voice, images, texts, programs, etc.) that are being automatically transformed. With the democratization of big data analysis, the next generations will see the advent of a new scientific revolution… but this time it will be in the humanities and social sciences. The new human science will be based on the wealth of data produced by human communities and a growing computation power. This will lead to reflexive collective intelligence, where people will appropriate (big) data analysis and where subjects and objects of knowledge will be the human communities themselves.

THE EVOLUTION OF EDUCATION

0-School-revol

We have seen that for each revolution in symbolic manipulation, there was some new developements of knowledge. The same can be said of learning methods and institutions. The school was invented by the scribes. At the beginning, it was a professional training for a caste of writing specialists: scribes and priests. Pedagogy was strict and repetitive. Our current primary school is reminiscent of this first learning institution.

Emerging in the literate culture, the liberal education was aimed at broader elites than the first scribal schools. Young people were trained in reading and interpreting the « classics ». They learned how to build rational argumentation and persuasive discourses.

In modern times, education became compulsory for every citizen of the nation state. Learning became industrialized and uniform through state programs and institutions.

At the time of the algorithmic medium, knowledge is evolving very fast, almost all learning resources are available for free and we interact in social media. This is the end of the old model of learning communities organizing themselves around a library or any physical knowledge repository. Current learning should be conceived as delocalized, life-long and collaborative. The whole society will get a learning dimension. But that does not mean that traditional learning institutions for young people are no longer relevant. Just the opposite, because young people should be prepared for collaborative learning in social media using a practically infinite knowledge repository without any transcending guiding authority. They will need not only technical skills (that will evolve and become obsolete very quickly) but above all moral and intellectual skills that will empower them in their life-long discovery travels.

DATA CURATION SKILLS AT THE CORE OF THE NEW LITERACY

0-Hypersphere

In the algorithmic medium, communication becomes a collaboration between peers to create, categorize, criticize, organize, read, promote and analyse data by the way of algorithmic tools. It is a stigmergic communication because, even if people dialogue and talk to each other, the main channel of communication is the common memory itself, a memory that everybody transforms and exploits. The above slide lists some examples of this new communication practices. Data curation skills are at the core of the new algorithmic literacy.

0-Data-curation

I present in the above slide the fundamental intellectual and moral skills that every student will have to master in order to survive in the algorithmic culture. The slide is organized by three rows and three columns that work in an interdependant manner. As the reader can see, personal intelligence is not independant form collective intelligence and vice versa. Moreover, both of them need critical intelligence!

PERSONAL INTELLIGENCE

Attention management is not only about focusing or avoiding distraction. It is also about choosing what we want or need to learn and being able to select the relevant sources. We decide what is relevant or not according to our own priorities and our criteria for trust. By the way, people and institutions are the real sources to be trusted or not, not the platforms!

Interpretation. Even with the statistical tools of big data analysis, we will always need theories and causal hypothesis, and not only correlations. We want to understand something and act upon this understanding. Having intuitions and theories derived from our knowledge of a domain, we can use data analytics to test our hypothesis. Asking the right questions to the data is not trivial!

Memory management. The data that we gather must be managed at the material level: we must choose the right memory tool in the clouds. But the data must also be managed at the conceptual level: we have to create and maintain a useful categorisation system (tags, ontologies…) in order to retrieve and analyse easily the desired information.

CRITICAL INTELLIGENCE

External critique. There is no transcendant authority in the new communication space. If we don’t want to be fooled, we need to diversify our sources. This means that we will gather sources that have diverse theories and point of views. Then, we should act on this diversity by cross-examining the data and observe where they contradict and where they confirm each other.

Internal critique. In order to understand who is a source, we must identify its classification system, its categories and its narrative. In a way, the source is its narrative.

Pragmatic critique. In essence, the pragmatic critique is the most devastating because it is at this point that we compare the narrative of the source and what it is effectively doing. We can do this by checking the actions of one source as reported by other sources. We can also notice the contradictions in the source’s narratives or a discrepancy between its official narrative and the pragmatic effects of its discourses. A source cannot be trusted when it is not transparent about its references, agenda, finance, etc.

COLLECTIVE INTELLIGENCE

The collective intelligence that I am speaking about is not a miracle solution but a goal to reach. It emerges in the new algorithmic environment in interaction with personal and critical intelligence .

Stigmergic communication. Stigmergy means that people communicate by modifying a common memory. We should distinguish between the local and the global memory. In the local memory (particular communities or networks), we should pay attention to singular contexts and histories. We should also avoid ignorance of other’s contributions, non-relevant questions, trolling, etc.

Liberty. Liberty is a dialectic of power and responsability. Our power here is our ability to create, assess, organize, read and analyse data. Every act in the algorithmic medium re-organizes the common memory: reading, tagging, buying, posting, linking, liking, subscribing, etc. We create collaboratively our own common environment. So we need to take responsability of our actions.

Collaborative learning. This is the main goal of collective intelligence and data curation skills in general. People add explicit knowledge to the common memory. They express what they have learnt in particular contexts (tacit knowledge) into clear and decontextualized propositions, or narratives, or visuals, etc. They translate into common software or other easily accessible resources (explicit) the skills and knowledge that they have internalized in their personal reflexes through their experience (tacit). Symetrically, people try to apply whatever usefull resources they have found in the common memory (explicit) and to acquire or integrate it into their reflexes (tacit).

0-Collaborative-learning

The final slide above is a visual explicitation of the collaborative learning process. Peers working in a common field of practice use their personal intelligence (PI) to transform tacit knowledge into explicit knowledge. They also work in order to translate some common explicit knowledge into their own practical knowledge. In the algorithmic medium, the explicit knowledge takes the form of a common memory: data categorized and evaluated by the community. The whole process of transforming tacit knowledge into explicit knowledge and vice versa takes place largely in social media, thank to a civilized creative conversation. Intellectual and social (or moral) skills work together!

Ancient-Hands-Argentina

Proper quotation: « The Philosophical Concept of Algorithmic Intelligence », Spanda Journal special issue on “Collective Intelligence”, V (2), December 2014, p. 17-25. The original text can be found for free online at  Spanda

“Transcending the media, airborne machines will announce the voice of the many. Still indiscernible, cloaked in the mists of the future, bathing another humanity in its murmuring, we have a rendezvous with the over-language.” Collective Intelligence, 1994, p. xxviii.

Twenty years after Collective Intelligence

This paper was written in 2014, twenty years after L’intelligence collective [the original French edition of Collective Intelligence].[2] The main purpose of Collective Intelligence was to formulate a vision of a cultural and social evolution that would be capable of making the best use of the new possibilities opened up by digital communication. Long before the success of social networks on the Web,[3] I predicted the rise of “engineering the social bond.” Eight years before the founding of Wikipedia in 2001, I imagined an online “cosmopedia” structured in hypertext links. When the digital humanities and the social media had not even been named, I was calling for an epistemological and methodological transformation of the human sciences. But above all, at a time when less than one percent of the world’s population was connected,[4] I was predicting (along with a small minority of thinkers) that the Internet would become the centre of the global public space and the main medium of communication, in particular for the collaborative production and sharing of knowledge and the dissemination of news.[5] In spite of the considerable growth of interactive digital communication over the past twenty years, we are still far from the ideal described in Collective Intelligence. It seemed to me already in 1994 that the anthropological changes under way would take root and inaugurate a new phase in the human adventure only if we invented what I then called an “over-language.” How can communication readily reach across the multiplicity of dialects and cultures? How can we map the deluge of digital data, order it around our interests and extract knowledge from it? How can we master the waves, currents and depths of the software ocean? Collective Intelligence envisaged a symbolic system capable of harnessing the immense calculating power of the new medium and making it work for our benefit. But the over-language I foresaw in 1994 was still in the “indiscernible” period, shrouded in “the mists of the future.” Twenty years later, the curtain of mist has been partially pierced: the over-language now has a name, IEML (acronym for Information Economy MetaLanguage), a grammar and a dictionary.[6]

Reflexive collective intelligence

Collective intelligence drives human development, and human development supports the growth of collective intelligence. By improving collective intelligence we can place ourselves in this feedback loop and orient it in the direction of a self-organizing virtuous cycle. This is the strategic intuition that has guided my research. But how can we improve collective intelligence? In 1994, the concept of digital collective intelligence was still revolutionary. In 2014, this term is commonly used by consultants, politicians, entrepreneurs, technologists, academics and educators. Crowdsourcing has become a common practice, and knowledge management is now supported by the decentralized use of social media. The interconnection of humanity through the Internet, the development of the knowledge economy, the rush to higher education and the rise of cloud computing and big data are all indicators of an increase in our cognitive power. But we have yet to cross the threshold of reflexive collective intelligence. Just as dancers can only perfect their movements by reflecting them in a mirror, just as yogis develop awareness of their inner being only through the meditative contemplation of their own mind, collective intelligence will only be able to set out on the path of purposeful learning and thus move on to a new stage in its growth by achieving reflexivity. It will therefore need to acquire a mirror that allows it to observe its own cognitive processes. Be careful! Collective intelligence does not and will not have autonomous consciousness: when I talk about reflexive collective intelligence, I mean that human individuals will have a clearer and better-shared knowledge than they have today of the collective intelligence in which they participate, a knowledge based on transparent principles and perfectible scientific methods.

The key: A complete modelling of language

But how can a mirror of collective intelligence be constructed? It is clear that the context of reflection will be the algorithmic medium or, to put it another way, the Internet, the calculating power of cloud computing, ubiquitous communication and distributed interactive mobile interfaces. Since we can only reflect collective intelligence in the algorithmic medium, we must yield to the nature of that medium and have a calculable model of our intelligence, a model that will be fed by the flows of digital data from our activities. In short, we need a mathematical (with calculable models) and empirical (based on data) science of collective intelligence. But, once again, is such a science possible? Since humanity is a species that is highly social, its intelligence is intrinsically social, or collective. If we had a mathematical and empirical science of human intelligence in general, we could no doubt derive a science of collective intelligence from it. This leads us to a major problem that has been investigated in the social sciences, the human sciences, the cognitive sciences and artificial intelligence since the twentieth century: is a mathematized science of human intelligence possible? It is language or, to put it another way, symbolic manipulation that distinguishes human cognition. We use language to categorize sensory data, to organize our memory, to think, to communicate, to carry out social actions, etc. My research has led me to the conclusion that a science of human intelligence is indeed possible, but on the condition that we solve the problem of the mathematical modelling of language. I am speaking here of a complete scientific modelling of language, one that would not be limited to the purely logical and syntactic aspects or to statistical correlations of corpora of texts, but would be capable of expressing semantic relationships formed between units of meaning, and doing so in an algebraic, generative mode.[7] Convinced that an algebraic model of semantics was the key to a science of intelligence, I focused my efforts on discovering such a model; the result was the invention of IEML.[8] IEML—an artificial language with calculable semantics—is the intellectual technology that will make it possible to find answers to all the above-mentioned questions. We now have a complete scientific modelling of language, including its semantic aspects. Thus, a science of human intelligence is now possible. It follows, then, that a mathematical and empirical science of collective intelligence is possible. Consequently, a reflexive collective intelligence is in turn possible. This means that the acceleration of human development is within our reach.

The scientific file: The Semantic Sphere

I have written two volumes on my project of developing the scientific framework for a reflexive collective intelligence, and I am currently writing the third. This trilogy can be read as the story of a voyage of discovery. The first volume, The Semantic Sphere 1 (2011),[9] provides the justification for my undertaking. It contains the statement of my aims, a brief intellectual autobiography and, above all, a detailed dialogue with my contemporaries and my predecessors. With a substantial bibliography,[10] that volume presents the main themes of my intellectual process, compares my thoughts with those of the philosophical and scientific tradition, engages in conversation with the research community, and finally, describes the technical, epistemological and cultural context that motivated my research. Why write more than four hundred pages to justify a program of scientific research? For one very simple reason: no one in the contemporary scientific community thought that my research program had any chance of success. What is important in computer science and artificial intelligence is logic, formal syntax, statistics and biological models. Engineers generally view social sciences such as sociology or anthropology as nothing but auxiliary disciplines limited to cosmetic functions: for example, the analysis of usage or the experience of users. In the human sciences, the situation is even more difficult. All those who have tried to mathematize language, from Leibniz to Chomsky, to mention only the greatest, have failed, achieving only partial results. Worse yet, the greatest masters, those from whom I have learned so much, from the semiologist Umberto Eco[11] to the anthropologist Levi-Strauss,[12] have stated categorically that the mathematization of language and the human sciences is impracticable, impossible, utopian. The path I wanted to follow was forbidden not only by the habits of engineers and the major authorities in the human sciences but also by the nearly universal view that “meaning depends on context,”[13] unscrupulously confusing mathematization and quantification, denouncing on principle, in a “knee jerk” reaction, the “ethnocentric bias” of any universalist approach[14] and recalling the “failure” of Esperanto.[15] I have even heard some of the most agnostic speak of the curse of Babel. It is therefore not surprising that I want to make a strong case in defending the scientific nature of my undertaking: all explorers have returned empty-handed from this voyage toward mathematical language, if they returned at all.

The metalanguage: IEML

But one cannot go on forever announcing one’s departure on a voyage: one must set forth, navigate . . . and return. The second volume of my trilogy, La grammaire d’IEML,[16] contains the very technical account of my journey from algebra to language. In it, I explain how to construct sentences and texts in IEML, with many examples. But that 150-page book also contains 52 very dense pages of algorithms and mathematics that show in detail how the internal semantic networks of that artificial language can be calculated and translated automatically into natural languages. To connect a mathematical syntax to a semantics in natural languages, I had to, almost single-handed,[17] face storms on uncharted seas, to advance across the desert with no certainty that fertile land would be found beyond the horizon, to wander for twenty years in the convoluted labyrinth of meaning. But by gradually joining sign, being and thing in turn in the sense of the virtual and actual, I finally had my Ariadne’s thread, and I made a map of the labyrinth, a complicated map of the metalanguage, that “Northwest Passage”[18] where the waters of the exact sciences and the human sciences converged. I had set my course in a direction no one considered worthy of serious exploration since the crossing was thought impossible. But, against all expectations, my journey reached its goal. The IEML Grammar is the scientific proof of this. The mathematization of language is indeed possible, since here is a mathematical metalanguage. What is it exactly? IEML is an artificial language with calculable semantics that puts no limits on the possibilities for the expression of new meanings. Given a text in IEML, algorithms reconstitute the internal grammatical and semantic network of the text, translate that network into natural languages and calculate the semantic relationships between that text and the other texts in IEML. The metalanguage generates a huge group of symmetric transformations between semantic networks, which can be measured and navigated at will using algorithms. The IEML Grammar demonstrates the calculability of the semantic networks and presents the algorithmic workings of the metalanguage in detail. Used as a system of semantic metadata, IEML opens the way to new methods for analyzing large masses of data. It will be able to support new forms of translinguistic hypertextual communication in social media, and will make it possible for conversation networks to observe and perfect their own collective intelligence. For researchers in the human sciences, IEML will structure an open, universal encyclopedic library of multimedia data that reorganizes itself automatically around subjects and the interests of its users.

A new frontier: Algorithmic Intelligence

Having mapped the path I discovered in La grammaire d’IEML, I will now relate what I saw at the end of my journey, on the other side of the supposedly impassable territory: the new horizons of the mind that algorithmic intelligence illuminates. Because IEML is obviously not an end in itself. It is only the necessary means for the coming great digital civilization to enable the sun of human knowledge to shine more brightly. I am talking here about a future (but not so distant) state of intelligence, a state in which capacities for reflection, creation, communication, collaboration, learning, and analysis and synthesis of data will be infinitely more powerful and better distributed than they are today. With the concept of Algorithmic Intelligence, I have completed the risky work of prediction and cultural creation I undertook with Collective Intelligence twenty years ago. The contemporary algorithmic medium is already characterized by digitization of data, automated data processing in huge industrial computing centres, interactive mobile interfaces broadly distributed among the population and ubiquitous communication. We can make this the medium of a new type of knowledge—a new episteme[19]—by adding a system of semantic metadata based on IEML. The purpose of this paper is precisely to lay the philosophical and historical groundwork for this new type of knowledge.

Philosophical genealogy of algorithmic intelligence

The three ages of reflexive knowledge

Since my project here involves a reflexive collective intelligence, I would like to place the theme of reflexive knowledge in its historical and philosophical context. As a first approximation, reflexive knowledge may be defined as knowledge knowing itself. “All men by nature desire to know,” wrote Aristotle, and this knowledge implies knowledge of the self.[20] Human beings have no doubt been speculating about the forms and sources of their own knowledge since the dawn of consciousness. But the reflexivity of knowledge took a decisive step around the middle of the first millennium BCE,[21] during the period when the Buddha, Confucius, the Hebrew prophets, Socrates and Zoroaster (in alphabetical order) lived. These teachers involved the entire human race in their investigations: they reflected consciousness from a universal perspective. This first great type of systematic research on knowledge, whether philosophical or religious, almost always involved a divine ideal, or at least a certain “relation to Heaven.” Thus we may speak of a theosophical age of reflexive knowledge. I will examine the Aristotelian lineage of this theosophical consciousness, which culminated in the concept of the agent intellect. Starting in the sixteenth century in Europe—and spreading throughout the world with the rise of modernity—there was a second age of reflection on knowledge, which maintained the universal perspective of the previous period but abandoned the reference to Heaven and confined itself to human knowledge, with its recognized limits but also its rational ideal of perfectibility. This was the second age, the scientific age, of reflexive knowledge. Here, the investigation follows two intertwined paths: one path focusing on what makes knowledge possible, the other on what limits it. In both cases, knowledge must define its transcendental subject, that is, it must discover its own determinations. There are many signs in 2014 indicating that in the twenty-first century—around the point where half of humanity is connected to the Internet—we will experience a third stage of reflexive knowledge. This “version 3.0” will maintain the two previous versions’ ideals of universality and scientific perfectibility but will be based on the intensive use of technology to augment and reflect systematically our collective intelligence, and therefore our capacities for personal and social learning. This is the coming technological age of reflexive knowledge with its ideal of an algorithmic intelligence. The brief history of these three modalities—theosophical, scientific and technological—of reflexive knowledge can be read as a philosophical genealogy of algorithmic intelligence.

The theosophical age and its agent intellect

A few generations earlier, Socrates might have been a priest in the circle around the Pythia; he had taken the famous maxim “Know thyself” from the Temple of Apollo at Delphi. But in the fifth century BCE in Athens, Socrates extended the Delphic injunction in an unexpected way, introducing dialectical inquiry. He asked his contemporaries: What do you think? Are you consistent? Can you justify what you are saying about courage, justice or love? Could you repeat it seriously in front of a little group of intelligent or curious citizens? He thus opened the door to a new way of knowing one’s own knowledge, a rational expansion of consciousness of self. His main disciple, Plato, followed this path of rigorous questioning of the unthinking categorization of reality, and finally discovered the world of Ideas. Ideas for Plato are intellectual forms that, unlike the phenomena they categorize, do not belong to the world of Becoming. These intelligible forms are the original essences, archetypes beyond reality, which project into phenomenal time and space all those things that seem to us to be truly real because they are tangible, but that are actually only pale copies of the Ideas. We would say today that our experience is mainly determined by our way of categorizing it. Plato taught that humanity can only know itself as an intelligent species by going back to the world of Ideas and coming into contact with what explains and motivates its own knowledge. Aristotle, who was Plato’s student and Alexander the Great’s tutor, created a grand encyclopedic synthesis that would be used as a model for eighteen centuries in a multitude of cultures. In it, he integrates Plato’s discovery of Ideas with the sum of knowledge of his time. He places at the top of his hierarchical cosmos divine thought knowing itself. And in his Metaphysics,[22] he defines the divinity as “thought thinking itself.” This supreme self-reflexive thought was for him the “prime mover” that inspires the eternal movement of the cosmos. In De Anima,[23] his book on psychology and the theory of knowledge, he states that, under the effect of an agent intellect separate from the body, the passive intellect of the individual receives intelligible forms, a little like the way the senses receive sensory forms. In thinking these intelligible forms, the passive intellect becomes one with its objects and, in so doing, knows itself. Starting from the enigmatic propositions of Aristotle’s theology and psychology, a whole lineage of Peripatetic and Neo-Platonic philosophers—first “pagans,” then Muslims, Jews and Christians—developed the discipline of noetics, which speculates on the divine intelligence, its relation to human intelligence and the type of reflexivity characteristic of intelligence in general.[24] According to the masters of noetics, knowledge can be conceptually divided into three aspects that, in reality, are indissociable and complementary:

  • the intellect,or the knowing subject
  • the intelligence,or the operation of the subject
  • the intelligible,or what is known—or can be known—by the subject by virtue of its operation

From a theosophical perspective, everything that happens takes place in the unity of a self-reflexive divine thought, or (in the Indian tradition) in the consciousness of an omniscient Brahman or Buddha, open to infinity. In the Aristotelian tradition, Avicenna, Maimonides and Albert the Great considered that the identity of the intellect, the intelligence and the intelligible was achieved eternally in God, in the perfect reflexivity of thought thinking itself. In contrast, it was clear to our medieval theosophists that in the case of human beings, the three aspects of knowledge were neither complete nor identical. Indeed, since the passive intellect knows itself only through the intermediary of its objects, and these objects are constantly disappearing and being replaced by others, the reflexive knowledge of a finite human being can only be partial and transitory. Ultimately, human knowledge could know itself only if it simultaneously knew, completely and enduringly, all its objects. But that, obviously, is reserved only for the divinity. I should add that the “one beyond the one” of the neo-Platonist Plotinus and the transcendent deity of the Abrahamic traditions are beyond the reach of the human mind. That is why our theosophists imagined a series of mediations between transcendence and finitude. In the middle of that series, a metaphysical interface provides communication between the unimaginable and inaccessible deity and mortal humanity dispersed in time and space, whose living members can never know—or know themselves—other than partially. At this interface, we find the agent intellect, which is separate from matter in Aristotle’s psychology. The agent intellect is not limited—in the realm of time—to sending the intelligible categories that inform the human passive intellect; it also determines—in the realm of eternity—the maximum limit of what the human race can receive of the universal and perfectly reflexive knowledge of the divine. That is why, according to the medieval theosophists, the best a mortal intelligence can do to approach complete reflexive knowledge is to contemplate the operation in itself of the agent intellect that emanates from above and go back to the source through it. In accordance with this regulating ideal of reflexive knowledge, living humanity is structured hierarchically, because human beings are more or less turned toward the illumination of the agent intellect. At the top, prophets and theosophists receive a bright light from the agent intellect, while at the bottom, human beings turned toward coarse material appetites receive almost nothing. The influx of intellectual forms is gradually obscured as we go down the scale of degree of openness to the world above.

The scientific age and its transcendental subject

With the European Renaissance, the use of the printing press, the construction of new observation instruments, and the development of mathematics and experimental science heralded a new era. Reflection on knowledge took a critical turn with Descartes’s introduction of radical doubt and the scientific method, in accordance with the needs of educated Europe in the seventeenth century. God was still present in the Cartesian system, but He was only there, ultimately, to guarantee the validity of the efforts of human scientific thought: “God is not a deceiver.”[25] The fact remains that Cartesian philosophy rests on the self-reflexive edge, which has now moved from the divinity to the mortal human: “I think, therefore I am.”[26] In the second half of the seventeenth century, Spinoza and Leibniz received the critical scientific rationalism developed by Descartes, but they were dissatisfied with his dualism of thought (mind) and extension (matter). They therefore attempted, each in his own way, to constitute reflexive knowledge within the framework of coherent monism. For Spinoza, nature (identified with God) is a unique and infinite substance of which thought and extension are two necessary attributes among an infinity of attributes. This strict ontological monism is counterbalanced by a pluralism of expression, because the unique substance possesses an infinity of attributes, and each attribute, an infinity of modes. The summit of human freedom according to Spinoza is the intellectual love of God, that is, the most direct and intuitive possible knowledge of the necessity that moves the nature to which we belong. For Leibniz, the world is made up of monads, metaphysical entities that are closed but are capable of an inner perception in which the whole is reflected from their singular perspective. The consistency of this radical pluralism is ensured by the unique, infinite divine intelligence that has considered all possible worlds in order to create the best one, which corresponds to the most complex—or the richest—of the reciprocal reflections of the monads. As for human knowledge—which is necessarily finite—its perfection coincides with the clearest possible reflection of a totality that includes it but whose unity is thought only by the divine intelligence. After Leibniz and Spinoza, the eighteenth century saw the growth of scientific research, critical thought and the educational practices of the Enlightenment, in particular in France and the British Isles. The philosophy of the Enlightenment culminated with Kant, for whom the development of knowledge was now contained within the limits of human reason, without reference to the divinity, even to envelop or guarantee its reasoning. But the ideal of reflexivity and universality remained. The issue now was to acquire a “scientific” knowledge of human intelligence, which could not be done without the representation of knowledge to itself, without a model that would describe intelligence in terms of what is universal about it. This is the purpose of Kantian transcendental philosophy. Here, human intelligence, armed with its reason alone, now faces only the phenomenal world. Human intelligence and the phenomenal world presuppose each other. Intelligence is programmed to know sensory phenomena that are necessarily immersed in space and time. As for phenomena, their main dimensions (space, time, causality, etc.) correspond to ways of perceiving and understanding that are specific to human intelligence. These are forms of the transcendental subject and not intrinsic characteristics of reality. Since we are confined within our cognitive possibilities, it is impossible to know what things are “in themselves.” For Kant, the summit of reflexive human knowledge is in a critical awareness of the extension and the limits of our possibility of knowing. Descartes, Spinoza, Leibniz, the English and French Enlightenment, and Kant accomplished a great deal in two centuries, and paved the way for the modern philosophy of the nineteenth and twentieth centuries. A new form of reflexive knowledge grew, spread, and fragmented into the human sciences, which mushroomed with the end of the monopoly of theosophy. As this dispersion occurred, great philosophers attempted to grasp reflexive knowledge in its unity. The reflexive knowledge of the scientific era neither suppressed nor abolished reflexive knowledge of the theosophical type, but it opened up a new domain of legitimacy of knowledge, freed of the ideal of divine knowledge. This de jure separation did not prevent de facto unions, since there was no lack of religious scholars or scholarly believers. Modern scientists could be believers or non-believers. Their position in relation to the divinity was only a matter of motivation. Believers loved science because it revealed the glory of the divinity, and non-believers loved it because it explained the world without God. But neither of them used as arguments what now belonged only to their private convictions. In the human sciences, there were systematic explorations of the determinations of human existence. And since we are thinking beings, the determinations of our existence are also those of our thought. How do the technical, historical, economic, social and political conditions in which we live form, deform and set limits on our knowledge? What are the structures of our biology, our language, our symbolic systems, our communicative interactions, our psychology and our processes of subjectivation? Modern thought, with its scientific and critical ideal, constantly searches for the conditions and limits imposed on it, particularly those that are as yet unknown to it, that remain in the shadows of its consciousness. It seeks to discover what determines it “behind its back.” While the transcendental subject described by Kant in his Critique of Pure Reason fixed the image a great mind had of it in the late eighteenth century, modern philosophy explores a transcendental subject that is in the process of becoming, continually being re-examined and more precisely defined by the human sciences, a subject immersed in the vagaries of cultures and history, emerging from its unconscious determinations and the techno-symbolic mechanisms that drive it. I will now broadly outline the figure of the transcendental subject of the scientific era, a figure that re-examines and at the same time transforms the three complementary aspects of the agent intellect.

  • The Aristotelian intellect becomes living intelligence. This involves the effective cognitive activities of subjects, what is experienced spontaneously in time by living, mortal human beings.
  • The intelligence becomes scientific investigation. I use this term to designate all undertakings by which the living intelligence becomes scientifically intelligible, including the technical and symbolic tools, the methods and the disciplines used in those undertakings.
  • The intelligible becomes the intelligible intelligence, which is the image of the living intelligence that is produced through scientific and critical investigation.

An evolving transcendental subject emerges from this reflexive cycle in which the living intelligence contemplates its own image in the form of a scientifically intelligible intelligence. Scientific investigation here is the internal mirror of the transcendental subjectivity, the mediation through which the living intelligence observes itself. It is obviously impossible to confuse the living intelligence and its scientifically intelligible image, any more than one can confuse the map and the territory, or the experience and its description. Nor can one confuse the mirror (scientific investigation) with the being reflected in it (the living intelligence), nor with the image that appears in the mirror (the intelligible intelligence). These three aspects together form a dynamic unit that would collapse if one of them were eliminated. While the living intelligence would continue to exist without a mirror or scientific image, it would be very much diminished. It would have lost its capacity to reflect from a universal perspective. The creative paradox of the intellectual reflexivity of the scientific age may be formulated as follows. It is clear, first of all, that the living intelligence is truly transformed by scientific investigation, since the living intelligence that knows its image through a certain scientific investigation is not the same (does not have the same experience) as the one that does not know it, or that knows another image, the result of another scientific investigation. But it is just as clear, by definition, that the living intelligence reflects itself in the intelligible image presented to it through scientific knowledge. In other words, the living intelligence is equally dependent on the scientific and critical investigation that produces the intelligible image in which it is reflected. When we observe our physical appearance in a mirror, the image in the mirror in no way changes our physical appearance, only the mental representation we have of it. However, the living intelligence cannot discover its intelligible image without including the reflexive process itself in its experience, and without at the same time being changed. In short, a critical science that explores the limits and determinations of the knowing subject does not only reflect knowledge—it increases it. Thus the modern transcendental subject is—by its very nature—evolutionary, participating in a dynamic of growth. In line with this evolutionary view of the scientific age, which contrasts with the fixity of the previous age, the collectivity that possesses reflexive knowledge is no longer a theosophical hierarchy oriented toward the agent intellect but a republic of letters oriented toward the augmentation of human knowledge, a scientific community that is expanding demographically and is organized into academies, learned societies and universities. While the agent intellect looked out over a cosmos emanating from eternity, in analog resonance with the human microcosm, the transcendental subject explores a universe infinitely open to scientific investigation, technical mastery and political liberation.

The technological age and its algorithmic intelligence

Reflexive knowledge has, in fact, always been informed by some technology, since it cannot be exercised without symbolic tools and thus the media that support those tools. But the next age of reflexive knowledge can properly be called technological because the technical augmentation of cognition is explicitly at the centre of its project. Technology now enters the loop of reflexive consciousness as the agent of the acceleration of its own augmentation. This last point was no doubt glimpsed by a few pre–twentieth century philosophers, such as Condorcet in the eighteenth century, in his posthumous book of 1795, Sketch for a Historical Picture of the Progress of the Human Mind. But the truly technological dimension of reflexive knowledge really began to be thought about fully only in the twentieth century, with Pierre Teilhard de Chardin, Norbert Wiener and Marshall McLuhan, to whom we should also add the modest genius Douglas Engelbart. The regulating ideal of the reflexive knowledge of the theosophical age was the agent intellect, and that of the scientific-critical age was the transcendental subject. In continuity with the two preceding periods, the reflexive knowledge of the technological age will be organized around the ideal of algorithmic intelligence, which inherits from the agent intellect its universality or, in other words, its capacity to unify humanity’s reflexive knowledge. It also inherits its power to be reflected in finite intelligences. But, in contrast with the agent intellect, instead of descending from eternity, it emerges from the multitude of human actions immersed in space and time. Like the transcendental subject, algorithmic intelligence is rational, critical, scientific, purely human, evolutionary and always in a state of learning. But the vocation of the transcendental subject was to reflexively contain the human universe. However, the human universe no longer has a recognizable face. The “death of man” announced by Foucault[27] should be understood in the sense of the loss of figurability of the transcendental subject. The labyrinth of philosophies, methodologies, theories and data from the human sciences has become inextricably complicated. The transcendental subject has not only been dissolved in symbolic structures or anonymous complex systems, it is also fragmented in the broken mirror of the disciplines of the human sciences. It is obvious that the technical medium of a new figure of reflexive knowledge will be the Internet, and more generally, computer science and ubiquitous communication. But how can symbol-manipulating automata be used on a large scale not only to reunify our reflexive knowledge but also to increase the clarity, precision and breadth of the teeming diversity enveloped by our knowledge? The missing link is not only technical, but also scientific. We need a science that grasps the new possibilities offered by technology in order to give collective intelligence the means to reflect itself, thus inaugurating a new form of subjectivity. As the groundwork of this new science—which I call computational semantics—IEML makes use of the self-reflexive capacity of language without excluding any of its functions, whether they be narrative, logical, pragmatic or other. Computational semantics produces a scientific image of collective intelligence: a calculated intelligence that will be able to be explored both as a simulated world and as a distributed augmented reality in physical space. Scientific change will generate a phenomenological change,[28] since ubiquitous multimedia interaction with a holographic image of collective intelligence will reorganize the human sensorium. The last, but not the least, change: social change. The community that possessed the previous figure of reflexive knowledge was a scientific community that was still distinct from society as a whole. But in the new figure of knowledge, reflexive collective intelligence emerges from any human group. Like the previous figures—theosophical and scientific—of reflexive knowledge, algorithmic intelligence is organized in three interdependent aspects.

  • Reflexive collective intelligence represents the living intelligence, the intellect or soul of the great future digital civilization. It may be glimpsed by deciphering the signs of its approach in contemporary reality.
  • Computational semantics holds up a technical and scientific mirror to collective intelligence, which is reflected in it. Its purpose is to augment and reflect the living intelligence of the coming civilization.
  • Calculated intelligence, finally, is none other than the scientifically knowable image of the living intelligence of digital civilization. Computational semantics constructs, maintains and cultivates this image, which is that of an ecosystem of ideas coming out of the human activity in the algorithmic medium and can be explored in sensory-motor mode.

In short, in the emergent unity of algorithmic intelligence, computational semantics calculates the cognitive simulation that augments and reflects the collective intelligence of the coming civilization.

[1] Professor at the University of Ottawa

[2] And twenty-three years after L’idéographie dynamique (Paris: La Découverte, 1991).

[3] And before the WWW itself, which would become a public phenomenon only in 1994 with the development of the first browsers such as Mosaic. At the time when the book was being written, the Web still existed only in the mind of Tim Berners-Lee.

[4] Approximately 40% in 2014 and probably more than half in 2025.

[5] I obviously do not claim to be the only “visionary” on the subject in the early 1990s. The pioneering work of Douglas Engelbart and Ted Nelson and the predictions of Howard Rheingold, Joël de Rosnay and many others should be cited.

[6] See The basics of IEML (on line at: http://wp.me/P3bDiO-9V )

[7] Beyond logic and statistics.

[8] IEML is the acronym for Information Economy MetaLanguage. See La grammaire d’IEML (On line http://wp.me/P3bDiO-9V ) [9] The Semantic Sphere 1: Computation, Cognition and Information Economy (London: ISTE, 2011; New York: Wiley, 2011).

[10] More than four hundred reference books.

[11] Umberto Eco, The Search for the Perfect Language (Oxford: Blackwell, 1995).

[12] “But more madness than genius would be required for such an enterprise”: Claude Levi-Strauss, The Savage Mind (University of Chicago Press, 1966), p. 130.

[13] Which is obviously true, but which only defines the problem rather than forbidding the solution.

[14] But true universalism is all-inclusive, and our daily lives are structured according to a multitude of universal standards, from space-time coordinates to HTTP on the Web. I responded at length in The Semantic Sphere to the prejudices of extremist post-modernism against scientific universality.

[15] Which is still used by a large community. But the only thing that Esperanto and IEML have in common is the fact that they are artificial languages. They have neither the same form nor the same purpose, nor the same use, which invalidates criticisms of IEML based on the criticism of Esperanto.

[16] See IEML Grammar (On line http://wp.me/P3bDiO-9V ).

[17] But, fortunately, supported by the Canada Research Chairs program and by my wife, Darcia Labrosse.

[18] Michel Serres, Hermès V. Le passage du Nord-Ouest (Paris: Minuit, 1980).

[19] The concept of episteme, which is broader than the concept of paradigm, was developed in particular by Michel Foucault in The Order of Things (New York: Pantheon, 1970) and The Archaeology of Knowledge and the Discourse on Language (New York: Pantheon, 1972).

[20] At the beginning of Book A of his Metaphysics.

[21] This is the Axial Age identified by Karl Jaspers.

[22] Book Lambda, 9

[23] In particular in Book III.

[24] See, for example, Moses Maimonides, The Guide For the Perplexed, translated into English by Michael Friedländer (New York: Cosimo Classic, 2007) (original in Arabic from the twelfth century). – Averroes (Ibn Rushd), Long Commentary on the De Anima of Aristotle, translated with introduction and notes by Richard C. Taylor (New Haven: Yale University Press, 2009) (original in Arabic from the twelfth century). – Saint Thomas Aquinas: On the Unity of the Intellect Against the Averroists (original in Latin from the thirteenth century) – Herbert A. Davidson, Alfarabi, Avicenna, and Averroes, on Intellect. Their Cosmologies, Theories of the Active Intellect, and Theories of Human Intellect (New York, Oxford: Oxford University Press, 1992). – Henri Corbin, History of Islamic Philosophy, translated by Liadain and Philip Sherrard (London: Kegan Paul, 1993). – Henri Corbin, En Islam iranien: aspects spirituels et philosophiques, 2d ed. (Paris: Gallimard, 1978), 4 vol. – De Libera, Alain Métaphysique et noétique: Albert le Grand (Paris: Vrin, 2005).

[25] In Meditations on First Philosophy, “First Meditation.” [26] Discourse on the Method, “Part IV.”

[27] At the end of The Order of Things (New York: Pantheon Books, 1970). [28] See, for example, Stéphane Vial, L’être et l’écran (Paris: PUF, 2013).

lampadaire-5

Critique réciproque de l’intelligence artificielle et des sciences humaines

Je me souviens d’avoir participé, vers la fin des années 1980, à un Colloque de Cerisy sur les sciences cognitives auquel participaient quelques grands noms américains de la discipline, y compris les tenants des courants neuro-connexionnistes et logicistes. Parmi les invités, le philosophe Hubert Dreyfus (notamment l’auteur de What Computers Can’t Do, MIT Press, 1972) critiquait vertement les chercheurs en intelligence artificielle parce qu’ils ne tenaient pas compte de l’intentionnalité découverte par la phénoménologie. Les raisonnements humains réels, rappelait-il, sont situés, orientés vers une fin et tirent leur pertinence d’un contexte d’interaction. Les sciences de la cognition dominées par le courant logico-statistique étaient incapables de rendre compte des horizons de conscience qui éclairent l’intelligence. Dreyfus avait sans doute raison, mais sa critique ne portait pas assez loin, car ce n’était pas seulement la phénoménologie qui était ignorée. L’intelligence artificielle (IA) n’intégrait pas non plus dans la cognition qu’elle prétendait modéliser la complexité des systèmes symboliques et de la communication humaine, ni les médias qui la soutiennent, ni les tensions pragmatiques ou les relations sociales qui l’animent. A cet égard, nous vivons aujourd’hui dans une situation paradoxale puisque l’IA connaît un succès pratique impressionnant au moment même où son échec théorique devient patent.

Succès pratique, en effet, puisqu’éclate partout l’utilité des algorithmes statistiques, de l’apprentissage automatique, des simulations d’intelligence collective animale, des réseaux neuronaux et d’autres systèmes de reconnaissance de formes. Le traitement automatique du langage naturel n’a jamais été aussi populaire, comme en témoigne par exemple l’usage de Google translate. Le Web des données promu par le WWW consortium (dirigé par Sir Tim Berners-Lee). utilise le même type de règles logiques que les systèmes experts des années 1980. Enfin, les algorithmes de computation sociale mis en oeuvre par les moteurs de recherche et les médias sociaux montrent chaque jour leur efficacité.

Mais il faut bien constater l’échec théorique de l’IA puisque, malgré la multitude des outils algorithmiques disponibles, l’intelligence artificielle ne peut toujours pas exhiber de modèle convaincant de la cognition. La discipline a prudemment renoncé à simuler l’intelligence dans son intégralité. Il est clair pour tout chercheur en sciences humaines ayant quelque peu pratiqué la transdisciplinarité que, du fait de sa complexité foisonnante, l’objet des sciences humaines (l’esprit, la pensée, l’intelligence, la culture, la société) ne peut être pris en compte dans son intégralité par aucune des théories computationnelles de la cognition actuellement disponible. C’est pourquoi l’intelligence artificielle se contente dans les faits de fournir une boîte à outils hétéroclite (règles logiques, syntaxes formelles, méthodes statistiques, simulations neuronales ou socio-biologiques…) qui n’offrent pas de solution générale au problème d’une modélisation mathématique de la cognition humaine.

Cependant, les chercheurs en intelligence artificielle ont beau jeu de répondre à leurs critiques issus des sciences humaines : « Vous prétendez que nos algorithmes échouent à rendre compte de la complexité de la cognition humaine, mais vous ne nous en proposez vous-mêmes aucun pour remédier au problème. Vous vous contentez de pointer du doigt vers une multitude de disciplines, plus « complexes » les unes que les autres (philosophie, psychologie, linguistique, sociologie, histoire, géographie, littérature, communication…), qui n’ont pas de métalangage commun et n’ont pas formalisé leurs objets ! Comment voulez-vous que nous nous retrouvions dans ce bric-à-brac ? » Et cette interpellation est tout aussi sensée que la critique à laquelle elle répond.

lampadaire-13c0c12

Synthèse de l’intelligence artificielle et des sciences humaines

Ce que j’ai appris de Hubert Dreyfus lors de ce colloque de 1987 où je l’ai rencontré, ce n’était pas tant que la phénoménologie serait la clé de tous les problèmes d’une modélisation scientifique de l’esprit (Husserl, le père de la phénoménologie, pensait d’ailleurs que la phénoménologie – une sorte de méta-science de la conscience – était impossible à mathématiser et qu’elle représentait même le non-mathématisable par exellence, l’autre de la science mathématique de la nature), mais plutôt que l’intelligence artificielle avait tort de chercher cette clé dans la seule zone éclairée par le réverbère de l’arithmétique, de la logique et des neurones formels… et que les philosophes, herméneutes et spécialistes de la complexité du sens devaient participer activement à la recherche plutôt que de se contenter de critiquer. Pour trouver la clé, il fallait élargir le regard, fouiller et creuser dans l’ensemble du champ des sciences humaines, aussi opaque au calcul qu’il semble à première vue. Nous devions disposer d’un outil à traiter le sens, la signification, la sémantique en général, sur un mode computationnel. Une fois éclairé par le calcul le champ immense des relations sémantiques, une science de la cognition digne de ce nom pourrait voir le jour. En effet, pour peu qu’un outil symbolique nous assure du calcul des relations entre signifiés, alors il devient possible de calculer les relations sémantiques entre les concepts, entre les idées et entre les intelligences. Mû par ces considérations, j’ai développé la théorie sémantique de la cognition et le métalangage IEML : de leur union résulte la sémantique computationnelle.

Les spécialistes du sens, de la culture et de la pensée se sentent démunis face à la boîte à outils hétérogène de l’intelligence artificielle : ils n’y reconnaissent nulle part de quoi traiter la complexité contextuelle de la signification. C’est pourquoi la sémantique computationnelle leur propose de manipuler les outils algorithmiques de manière cohérente à partir de la sémantique des langues naturelles. Les ingénieurs s’égarent face à la multitude bigarrée, au flou artistique et à l’absence d’interopérabilité conceptuelle des sciences humaines. Remédiant à ce problème, la sémantique computationnelle leur donne prise sur les outils et les concepts foisonnants des insaisissables sciences humaines. En somme, le grand projet de la sémantique computationnelle consiste à construire un pont entre l’ingénierie logicielle et les sciences humaines de telle sorte que ces dernières puissent utiliser à leur service la puissance computationnelle de l’informatique et que celle-ci parvienne à intégrer la finesse herméneutique et la complexité contextuelle des sciences humaines. Mais une intelligence artificielle grande ouverte aux sciences humaines et capable de calculer la complexité du sens ne serait justement plus l’intelligence artificielle que nous connaissons aujourd’hui. Quant à des sciences humaines qui se doteraient d’un métalangage calculable, qui mobiliseraient l’intelligence collective et qui maîtriseraient enfin le médium algorithmique, elles ne ressembleraient plus aux sciences humaines que nous connaissons depuis le XVIIIe siècle : nous aurions franchi le seuil d’une nouvelle épistémè.