Archives for posts with tag: semantics
Vassili Kandinsky: Circles in a Circle

A Scientific Language

IEML is an acronym for Information Economy MetaLanguage. IEML is the result of thirty years of fundamental research under the direction of Pierre Lévy, fourteen years of which were funded by the Canadian federal government through the Canada Research Chair in Collective Intelligence at the University of Ottawa (2002-2016). In 2020, IEML is the only language that has the following three properties:

– it has the expressive power of a natural language;

– it has the syntax of a regular language;

– its semantics is unambiguous and computable, because it is aligned with its syntax.

In other words, it is a “well-formed symbolic system”, which comprises a bijection between a set of relations between signifieds, or meanings (a language) and a set of relations between signifiers (an algebra) and which can be manipulated by a set of symmetrical and automatic operations. 

On the basis of these properties, IEML can be used as a concept coding system that solves the problem of semantic interoperability in an original way, lays the foundations for a new generation of artificial intelligence and allows collective intelligence to be reflexive. IEML complies with Web standards and can be exported in RDF. IEML expressions are called USLs (Uniform Semantic Locators). They can be read and translated into any natural language. Semantic ontologies – sets of IEML expressions linked by a network of relations – are interoperable by design. IEML provides the coordinate system of a common knowledge base that feeds both automatic reasoning and statistical calculations. In sum, IEML fulfills the promise of the Semantic Web through its computable meaning and interoperable ontologies. IEML’s grammar consists of four layers: elements, words, sentences and texts. Examples of elements and words can be found at


The semantic elements are the basic building blocks, or elementary concepts, from which all language expressions are composed. A dictionary of about 5000 elements translated into natural languages is given with IEML and shared among all its users. Semantic interoperability comes from the fact that everyone shares the same set of elements whose meanings are fixed. The dictionary is organized into tables and sub-tables related to the same theme and the elements are defined reciprocally through a network of explicit semantic relations. IEML allows the design of an unlimited variety of concepts from a limited number of elements. 

Exemple of an elements paradigm in the IEML dictionary

The user does not have to worry about the rules from which the elements are constructed. However, they are regularly generated from six primitive symbols forming the “layer 0” of the language, and since the generative operation is recursive, the elements are stratified on six layers above layer 0.


Using the elements dictionary and grammar rules, users can freely model a field of knowledge or practice within IEML. These models can be original or translate existing classifications, ontologies or semantic metadata.

The basic unit of an IEML sentence is the word. A word is a pair composed of two small sets of elements: the radical and the inflection. The choice of radical elements is free, but inflection elements are selected from a closed list of elements tables corresponding to adverbs, prepositions, postpositions, articles, conjugations, declensions, modes, etc. (see “auxiliary morphemes” in

Each word or sentence corresponds to a distinct concept that can be translated, according to its author’s indications and its grammatical role, as a verb (encourage), a noun (courage), an adjective (courageous) or an adverb (bravely). 


The words are distributed on a grammatical tree composed of a root (verbal or nominal) and eight leaves corresponding to the roles of classical grammar: subject, object, complement of time, place, etc. 

The nine grammatical roles

Nine grammatical roles

The Root of the sentence can be a process (a verb), a substance, an essence, an affirmation of existence… 

The Initiator is the subject of a process, answering the question “who?” He can also define the initial conditions, the first motor, the first cause of the concept evoked by the root.

The Interactant corresponds to the object of classical grammar. It answers the question “what”. It also plays the role of medium in the relationship between the initiator and the recipient. 

The Recipient is the beneficiary (or the victim) of a process. It answers the questions “for whom, to whom, towards whom?”. 

The Time answers the question “when?”. It indicates the moment in the past, the present or the future and gives references as to anteriority, posteriority, duration, date and frequency. 

The Place answers the question “where?”. It indicates the location, spatial distribution, pace of movement, paths, paths, spatial relationships and metaphors. 

The Intention answers the question of finality, purpose, motivation: “for what”, “to what end?”It concerns mental orientation, direction of action, pragmatic context, emotion or feeling.

The Manner answers the questions “how?” and “how much?”. It situates the root on a range of qualities or on a scale of values. It specifies quantities, gradients, measurements and sizes. It also indicates properties, genres and styles.

The Causality answers the question “why? It specifies logical, material and formal determinations. It describes causes that have not been specified by the initiator, the interactant or the recipient: media, instruments, effects, consequences. It also describes the units of measurement and methods. It may also specify rules, laws, reasons, points of view, conditions and contracts.

For example: Robert (initiator) offers (root-process) a (interactant) gift to Mary (recipient) today (time) in the garden (place), to please her (intention), with a smile (manner), for her birthday (causality). 


IEML allows the junction of several words in the same grammatical role. This can be a logical connection (and, or inclusive or exclusive), a comparison (same as, different from), an ordering (larger than, smaller than…), an antinomy (but, in spite of…), and so on.

Layers of complexity

Grammatical roles of a complex sentence

A word that plays one of the eight leaf roles at complexity layer 1 can play the role of secondary root at a complexity layer 2, and so on recursively up to layer 4.


IEML strictly speaking enables only general categories or concepts to be expressed. It is nevertheless possible to insert numbers, units of measurement, dates, geographical positions, proper names, etc. into a sentence, provided they are categorized in IEML. For example t.u.-t.u.-‘. [23] means ‘number: 23’. Individual names, numbers, etc. are called literals in IEML.



A semantic relationship is a sentence in a special format that is used to link a source node (element, word, sentence) to a target node. IEML includes a query language enabling easy programming of semantic relationships on a set of nodes. 

By design, a semantic relationship makes the following four points explicit.

1. The function that connects the source node and the target node.

2. The mathematical form of the relation: equivalence relationship, order relationship, intransitive symmetrical relationship or intransitive asymmetrical relationship.

3. The kind of context or social rule that validates the relationship: syntax, law, entertainment, science, learning, etc.

4. The content of the relationship: logical, taxonomic, mereological (whole-part relationship), temporal, spatial, quantitative, causal, or other. The relation can also concern the reading order or the anaphora.

The (hyper) textual network

An IEML text is a network of semantic relationships. This network can describe linear successions, trees, matrices, cliques, cycles and complex subnetworks of all types.

An IEML text can be considered as a theory, an ontology, or a narrative that accounts for the dataset it is used to index.

We can define a USL as an ordered (normalized) set of triples of the form : (a source node, a target node, a relationship sentence).  A set of such triples describes a semantic network or IEML text. 

The following special cases should be noted:

– A network may contain only one sentence.

– A sentence may contain only one root to the exclusion of other grammatical roles.

– A root may contain only one word (no junction).

– A word may contain only one element.


In short, IEML is a language with computable semantics that can be considered from three complementary points of view: linguistics, mathematics and computer science. Linguistically, it is a philological language, i.e. it can translate any natural language. Mathematically, it is a topos, that is, an algebraic structure (a category) in isomorphic relation with a topological space (a network of semantic relations). Finally, on the computer side, it functions as the indexing system of a virtual database and as a programming language for semantic networks.

Ramon Lull

Le Livre Blanc d’IEML, le métalangage de l’économie de l’information. 2019.
RESUMÉ. IEML est une langue à la sémantique calculable inventée par Pierre Lévy. Le “Livre blanc” (version Beta et non finie) explique les grands principes, la grammaire et les premières applications d’IEML. (une centaine de pages)

Etre et Mémoire dans la revue Sens Public 2019
RÉSUMÉ Le premier enjeu de cet article est de replacer l’objet des sciences humaines (la culture et la signification symbolique) dans la continuité des objets des sciences de la nature. Je fais l’hypothèse que le sens n’apparaît pas brusquement avec l’humanité mais que différentes couches de codage et de mémoire (quantique, atomique, génétique, nerveuse et symbolique) s’empilent et se complexifient progressivement, la strate symbolique n’étant que la dernière en date des « machines d’écriture ». Le second enjeu du texte est de définir la spécificité et l’unité de la couche symbolique, et donc le champ des sciences humaines. Par opposition à une certaine tradition logocentrique, je montre que le symbolisme – s’il comprend évidemment le langage – englobe aussi des sémiotiques (comme la cuisine ou la musique) où la coupure signifiant/signifié n’est pas aussi pertinente que pour les langues. Le troisième enjeu de cet essai est de montrer que les formes culturelles et les puissances interprétatives de l’humanité évoluent avec ses machines d’écriture. L’émergence du numérique, en particulier, laisse entrevoir un raffinement des sciences humaines allant jusqu’au calcul de la complexité sémantique. Cet essai de redéfinition des sciences humaines dans la continuité des sciences de la nature suppose une ontologie – ou une méta-ontologie, selon l’expression de Marcello Vitali-Rosati – pour qui les notions d’écriture et de mémoire sont centrales et qui, en rupture avec la critique kantienne, accepte la pleine réalité de la spatialité et de la temporalité naturelle.

Le rôle des humanités numériques dans le nouvel espace politique dans la revue Sens Public, 2019
RESUMÉ. Alors que plus de 50% de la population mondiale est connectée à l’Internet, les grandes plateformes, et particulièrement Facebook, ont acquis un énorme pouvoir politique. Cette nouvelle situation nous oblige a repenser le projet d’émancipation des lumières. Je propose dans cet article que les chercheurs en sciences humaines et sociales relèvent ce défi en adoptant et en diffusant de nouvelles normes d’intelligence collective réflexive. Les communs de la connaissance, la science ouverte et la souveraineté des individus sur les données qu’ils produisent font l’unanimité. Mais ces principes incontournables sont encore insuffisants. La puissance de calcul et de communication disponible, combinée à l’utilisation d’IEML (une langue à la sémantique calculable), nous permettent d’envisager une mise en transparence des opérations de création de connaissance, de sens et d’autorité. Je présente ici les grandes orientations stratégiques permettant d’atteindre ces objectifs. Une révolution épistémologique des sciences humaines est à portée de main, et avec elle une nouvelle étape dans l’évolution de la pensée critique. (une cinquantaine de pages)

La Pyramide algorithmique dans la revue Sens Public 2017
RESUMÉ. Le medium algorithmique est une infrastructure de communication qui augmente les pouvoirs des médias antérieurs en y ajoutant la mécanisation des opérations symboliques. Son émergence au milieu du vingtième siècle résulte d’une longue histoire scientifique et technique que je résume au début de l’article. Je rappelle ensuite les grandes étapes de son développement (ordinateurs centraux, internet et PC, Web social, Cloud augmenté par l’intelligence artificielle et la chaîne de blocs) ainsi que leurs conséquences sociocognitives. J’évoque pour finir les développements futurs de ce médium dans la perspective d’une intelligence collective réflexive basée sur une nouvelle forme de calcul sémantique.

Les opérateurs élémentaires de la réflexionCahiers Sens public, 2018/1 (n° 21-22), p. 75-102. La philosophie qui a inspiré les “primitives” d’IEML.
RÉSUMÉ. Cet article tente de réduire au minimum les concepts fondamentaux nécessaires à la réflexion sur le sens. Deux concepts complémentaires, la virtualité et l’actualité, rendent compte des dualités de l’action et de la grande opposition métaphysique entre transcendance et immanence. L’actuel possède une adresse spatio-temporelle, il est situé dans le temps séquentiel et dans l’espace physique tridimensionnel tandis qu’on ne peut assigner d’adresse spatio-temporelle précise à l’abstraction du virtuel. Le triangle sémiotique rend compte des triades de la représentation. Le signe (1) indique (2) une chose, un objet ou un référent quelconque auprès (3) d’un être ou interprétant. Il n’y a de signe que « de » quelque chose et « pour » quelqu’un. Enfin, il faut pouvoir considérer explicitement une absence, y compris un vide de connaissance, pour poser des questions et réfléchir. Les six opérateurs élémentaires de la réflexion (virtuel, actuel, signe, être, chose et vide) fonctionnent de manière interdépendante et traversent tous les champs des sciences humaines et sociale : on étudie particulièrement dans cet article leur pertinence en sémiotique, épistémologie, cosmologie, religion, politique et économie.


Image: Kuo Cheng Liao (found here).

Je voudrais répondre dans cette petite entrée de blog à quelques questions qui m’ont été posées par des amis Turcs (du site Çeviri Konusmalar) au sujet de l’intelligence artificielle et de l’autonomie des machines. Voir ici sur Twitter…

Un des rôles de la philosophie est de catégoriser l’expérience humaine de façon à réduire le plus possible l’illusion, ou si l’on préfère à trouver les concepts qui vont nous permettre de comprendre notre situation et de mieux guider notre action. Cela amène souvent les philosophes à contredire l’opinion courante. Aujourd’hui cette opinion est propagée par le journalisme et la fiction. Aussi bien les journalistes que les auteurs de roman ou de série TV présentent les robots ou l’intelligence artificielle comme capable d’autonomie et de conscience, que ce soit dès maintenant ou dans un futur proche. Cette représentation est à mon avis fausse, mais elle fonctionne très bien parce qu’elle joue…

  • ou bien sur la peur d’être éliminé ou asservi par des machines (sensationnalisme ou récit dystopique),
  • ou bien sur l’espoir que l’intelligence artificielle va nous aider magiquement à résoudre tous nos problèmes ou – pire – qu’elle représenterait une espèce plus avancée que l’homme (dans le cas de certaines publicités ou d’utopies naïves).

Dans les deux cas, espoir ou peur, le ressort principal est la passion, l’émotion, et non pas une compréhension exacte de ce que c’est que le traitement automatique de l’information et du rôle qu’il joue dans l’intelligence humaine.

Afin de recadrer cette question de l’autonomie des machines, je voudrais répondre ici le plus simplement possible à trois questions:

  1. Qu’est-ce que l’intelligence humaine?
  2. Qu’est-ce que l’informatique, ou les machines à traiter l’information?
  3. Est-ce que les machines peuvent devenir autonomes?

Qu’est-ce que c’est que l’Intelligence humaine?

D’abord il faut reconnaître que les humains sont des animaux et que les animaux ont déjà des capacité de mémoire, de représentation interne des situations, de résolution de problèmes, d’apprentissage, etc. Les animaux sont des êtres sensibles, qui ressentent attraction et répulsion, plaisir et douleur, voire empathie. Les plus plus intelligents d’entre eux ont la capacité de transmettre certaines connaissances acquises dans l’expérience à leur progéniture, d’utiliser des outils, etc. Ensuite, l’intelligence animale se manifeste de manière particulièrement frappante sur un plan collectif ou social et, pour ce qui nous intéresse, notamment chez les primates (les grands singes), dont nous faisons partie. Les primates ont des structures sociales avec des rôles sociaux fort différenciés et des stratégies collectives élaborées pour se défendre, se nourrir, contrôler leur territoire, etc. Nous partageons bien sûr toute cette intelligence animale. Mais nous avons en plus la manipulation symbolique.

Ce qui différencie l’intelligence humaine de l’intelligence animale c’est d’abord et avant tout l’usage du langage et des systèmes symboliques. Un système symbolique c’est un moyen de communication et de pensée dont les éléments – les symboles – ont deux aspects: un aspect sensible (visible, audible) et un aspect invisible, abstrait, la catégorie générale. Et le rapport entre le signifiant sensible – le son – et le signifié intelligible – le sens – est conventionnel, décidé par la société. Il n’y a aucune autre raison que la convention et l’usage pour que le concept de raison, par exemple, se représente par les deux syllabes et zon en français, et la preuve c’est que ça se dit autrement dans d’autres langues. Tous les animaux communiquent mais seuls les êtres humains parlent, posent des questions, reconnaissent leur ignorance, dialoguent et surtout racontent des histoires. L’usage du langage donne aux humains non pas la conscience (que les autres animaux ont déjà), mais la conscience réflexive. La capacité de réfléchir sur les concepts nous est donnée par la manipulation des symboles.

Avec cette capacité de manipulation symbolique et cette réflexivité viennent deux caractéristiques spéciales de l’humanité: les systèmes techniques et les institutions sociales, tous deux d’une grande complexité et en constante évolution historique.

Une énorme partie de l’intelligence humaine est réifiée dans l’environnement technique et vécue dans des institutions sociales (rituels, politique, droit, religion, morale, etc.). La partie individuelle de notre intelligence est marginale, mais essentielle, c’est elle qui nous permet d’innover, de progresser et d’améliorer notre condition.

Qu’est-ce que l’informatique, ou le traitement automatique de l’information?

L’intelligence artificielle est une expression de type « marketing » pour designer en fait la zone la plus avancée et toujours en mouvement des techniques de traitement de l’information.

Quand je dis que l’intelligence humaine a toujours été artificielle, je ne veux pas dire que les humains sont des robots ou des machines, je veux dire que les humains ont toujours utilisé des procédés techniques pour augmenter leur intelligence, qu’il s’agisse de l’intelligence personnelle ou collective. L’écriture nous a donné le moyen d’étendre notre mémoire individuelle et nos capacités critiques. Aujourd’hui l’Internet nous permet un accès rapide à une quantité d’information que nos ancêtres n’auraient jamais pu imaginer. Mais ce n’est pas seulement une question de mémoire, nous avons aussi des capacités de calcul, de simulation de systèmes complexes, d’analyse automatique des données, voire de raisonnement automatique qui amplifient les capacités cognitives “purement biologiques” des premiers homo sapiens. Nous avons le même cerveau que les hommes préhistoriques, avec la même capacité de manipuler les symboles et de raconter des histoires, mais nous avons en plus un énorme appareillage d’enregistrement, de communication et de traitement des symboles qu’ils n’avaient pas et qui se branche sur la partie purement biologique de notre intelligence.

L’informatique, le traitement automatique des données, avec sa pointe avancée et mouvante qu’on appelle l’intelligence artificielle, est apparue dans la seconde moitié du 20e siècle, mais elle poursuit un effort multi-séculaire d’augmentation cognitive qui a commencé avec l’écriture, s’est poursuivi avec le perfectionnement des systèmes de codage de la connaissance, la notation des nombres par position et le 0, l’imprimerie et les médias électriques…

La partie névralgique du nouvel appareillage de traitement automatique des symboles se trouve aujourd’hui dans d’énormes centres de calculs qu’on appelle le “cloud” et dont nos ordinateurs et smartphones ne sont que des terminaux. Mais dans ce réseau de machines, le traitement automatique des données se fait uniquement sur la forme sensible des symboles, sur le signifiant ramené à des zeros et des uns. Les ordinateurs n’ont pas accès au signifié, au sens.

Puisqu’on m’interroge sur le machine learning, oui, parmi toutes les techniques de calcul utilisées aujourd’hui par les ingénieurs en informatique, le machine learning, et le deep learning qui en est un cas particulier, sont en plein développement depuis une dizaine d’années. Mais il faut se garder d’attribuer à l’apprentissage automatique plus qu’il ne peut donner. Il s’agit essentiellement d’algorithmes de traitement statistique auxquels on soumet en entrée d’énormes masses de données et qui produisent en sortie des modèles de reconnaissance de formes ou d’action qui sont “appris” des données. Or non seulement l’apprentissage machine dépend des algorithmes qui sont programmés et continuellement débogués par des humains, mais en plus ses résultats en sortie dépendent des masses de données qui leur sont fournies en entrée. Or ce sont encore des humains qui choisissent les données, les filtrent, les classent, les catégorisent, les organisent, les interprètent, etc. Aussi bien les approches logiques que les approches statistiques de l’intelligence artificielle condensent dans des machines logicielles et matérielles des connaissances et des finalités humaines. Leur autonomie, si autonomie il y a, ne peut être que locale et momentanée. A moyen et long terme, les machines ne peuvent évoluer qu’avec nous et vice versa: nous ne pouvons évoluer qu’avec elles.

La question de l’autonomie des machines

Le traitement automatique des données prolonge l’ensemble du système technique contemporain et il baigne dedans. Il est totalement dépendant de la production d’énergie, de la distribution d’électricité, de la production des matériaux, etc. On ne peut absolument pas imaginer le système technique contemporain sans l’informatique mais pas non plus l’informatique sans toute cette infrastructure technique.

De la même manière, le système technique s’effondrerait rapidement si les humains disparaissaient. Notre environnement technique est conçu, construit, utilisé, entretenu, réparé, interprété par des humains: il n’a aucune autonomie d’aucune sorte.

la technique nous *apparaît* autonome parce que nous projetons sur elle les effets émergents des interactions sociales et des inerties socio-techniques que nous ne pouvons pas contrôler à l’échelle individuelle. Nous avons tendance à réifier les effets de nombreuses décisions et actions humaines agrégées dans les machines et à prêter aux machines une volonté propre. Mais c’est une illusion. Une illusion qui nous décharge de nos responsabilités personnelles et collectives: “c’est la faute de la machine”.

Qu’on utilise une interface pseudo-humaine ou des robots androïdes autant qu’on veut, mais c’est un artifice, un décor. Le robot ou la machine est toujours susceptible d’être éteint ou débranché, quant à son logiciel dans le cloud, il doit sans cesse être déboggué et de nouvelles versions doivent être téléchargées périodiquement. Pour moi, cette idée de la machine autonome relève du fétichisme : on donne une personnalité à un appareil qui n’est pas un être sensible et qui a été – encore une fois – conçu, fabriqué, marqueté, vendu, utilisé, réparé et qui va finalement être jeté à la poubelle au profit d’un nouveau modèle.

Nous avons des machines capables de traitement automatique des symboles. Et nous ne les avons que depuis moins d’un siècle. A l’échelle de l’évolution historique, trois ou quatre générations, ce n’est presque rien. A la fin du XXe siècle, 1% de la population humaine avait accès à l’Internet et le Machine Learning était confiné dans des laboratoires scientifiques. Aujourd’hui plus de 60% de la population est branchée et le machine learning s’applique à grande échelle aux données entreposées dans le cloud. Face à cette mutation si rapide, nous avons la responsabilité d’orienter, autant que possible, le développement technique, social et culturel. Plutôt que de s’égarer dans le fantasme de la machine qui prend le pouvoir, pour le meilleur ou pour le pire, Il me semble beaucoup plus intéressant d’utiliser les machines pour une augmentation de l’intelligence humaine, intelligence à la fois personnelle et collective. C’est plutôt dans cette direction qu’il faut travailler parce que c’est la seule qui soit utile et raisonnable. Et c’est d’ailleurs ce que font en silence les principaux industriels du secteur, même si la “singularité” attire plus l’attention des foules.

Si vous visez le divin, ou le dépassement, ne tentez pas de remplacer l’homme par une machine prétendument consciente et ne craignez pas non plus un tel remplacement, parce qu’il est impossible. Ce qui est peut-être possible, en revanche, est un état de la technique et de la civilisation dans lequel l’intelligence collective humaine pourra s’observer scientifiquement, déployer et cultiver sa complexité inepuisable dans le miroir numérique. Faire travailler les machines à l’emergence d’une intelligence collective réflexive, un pas apres l’autre…

Pas une pipe

This blog post offers a simple guide to the landscape of signification in language. We’ll begin by distinguishing the numerous elements that construct meaning. We’ll start by having a look at signs, and how they are everywhere in communication between living beings and how a sign is different from a symbol for instance. A symbol is a special kind of sign unique to humans, that folds into a signifier (a sound, an image, etc.) and a signified (a category or a concept). We’ll learn that the relationship between a signifier and a signified is conventional. A bit further, I’ll explain the workings of language, our most powerful symbolic system. I will review successively what grammar is: the recursive construction of sense units; semantics: the relations between these units; and pragmatics: the relations between speech, reference and social context. I’ll end this chapter by recalling some of the problems in fields of natural language processing (NLP).

Sign, symbol, language


Meaning involves at least three actors playing distinct roles. A sign (1) is a clue, a trace, an image, a message or a symbol (2) that means something (3) for someone.

A sign may be an entity or an event. What makes it a sign is not its intrinsic properties but the role it plays in meaning. For example, an individual can be the subject (thing) of a conversation, the interpreter of a conversation (being) or he can be a clue in an investigation (sign).

A thing, designated by a sign, is often called the object or referent, and – again –what makes it a referent is not its intrinsic properties but the role it plays in the triadic relation.

A being is often called the subject or the interpreter. It may be a human being, a group, an animal, a machine or whatever entity or process endowed with self-reference (by distinguishing self from the environment) and interpretation. The interpreter always takes the context into account when it interprets a sign. For example, a puppy (being) understands that a bite (sign) from its playful sibling is part of a game (thing) and may not be a real threat in the context.

Generally speaking, communication and signs exist for any living organisms. Cells can recognize concentrations of poison or food from afar, plants use their flowers to trick insects and birds into their reproductive processes. Animals – organisms with brains or nervous systems – practice complex semiotic games that include camouflage, dance and mimicries. They acknowledge, interpret and emit signs constantly. Their cognition is complex: the sensorimotor cycle involves categorization, feeling, and environmental mapping. They learn from experience, solve problems, communicate and social species manifest collective intelligence. All these cognitive properties imply the emission and interpretation of signs. When a wolf growls, no need to add a long discourse, a clear message is sent to its adversary.


A symbol is a sign divided into two parts: the signifier and the signified. The signified (virtual) is a general category, or an abstract class, and the signifier (actual) is a tangible phenomenon that represents the signified. A signifier may be a sound, a black mark on white paper, a trace or a gesture. For example, let’s take the word “tree” as a symbol. It is made of: 1) a signifier sound voicing the word “tree”, and 2) a signified concept that means it is part of the family of perennial plants with roots, trunk, branches, and leaves. The relationship between the signifier and the signified is conventional and depends on which symbolic system the symbol belongs to (in this case, the English language). What we mean by conventional is that in most cases, there is no analogy or causal connection between the sound and the concept: for example, between the sound “crocodile” and the actual crocodile species. We use different signifiers to indicate the same signified in different languages. Furthermore, the concepts symbolized by languages depend on the environment and culture of their speakers.

The signified of the sound “tree” is ruled by the English language and not left to the choice of the interpreter. However, it is in the context of a speech act that the interlocutor understands the referent of the word: is it a syntactic tree, a palm tree, a Christmas tree…? Let’s remember this important distinction: the signified is determined by the language but the referent depends on the context.


A language is a general symbolic system that allows humans to think reflexively, ask questions, tell stories, dialogue and engage in complex social interactions. English, French, Spanish, Arabic, Russian, or Mandarin are all natural languages. Each one of us is biologically equipped to speak and recognize languages. Our linguistic ability is natural, genetic, universal and embedded in our brain. By contrast, any language (like English, French, etc.) is based on a social, conventional and cultural environment; it is multiple, evolving and hybridizing. Languages mix and change according to the transformations of demographic, technological, economic, social and political contexts.

Our natural linguistic abilities multiply our cognitive faculties. They empower us with reflexive thinking, making it easy for us to learn and remember, to plan in the long-term and to coordinate large-scale endeavors. Language is also the basis for knowledge transmission between generations. Animals can’t understand, grasp or use linguistic symbols to their full extent, only humans can. Even the best-trained animals can’t evaluate if a story is false or exaggerated. Koko the famous gorilla will never ask you for an appointment for the first Tuesday of next month, nor will it communicate to you where its grandfather was born. In animal cognition, the categories that organize perception and action are enacted by neural networks. In human cognition, these categories may become explicit once symbolized and move to the forefront of our awareness. Ideas become objects of reflection. With human language comes arithmetic, art, religion, politics, economy, and technology. Compared to other social animal species, human collective intelligence is most powerful and creative when it is supported and augmented by its linguistic abilities. Therefore, when working in artificial intelligence or cognitive computing, it would be paramount to understand and model the functioning of neurons and neurotransmitters common to all animals, as well as the structure and organization of language, unique to our species.

I will now describe briefly how we shape meaning through language. Firstly, we will review what the grammatical units are (words, sentences, etc.). Secondly, we will explore the semantic networks between these units, and thirdly, what are the pragmatic interactions between language and extralinguistic realities.

Grammatical units

A natural language is made of recursively nested units: a phoneme which is an elementary sound, a word, a chain of phonemes, a syntagm, a chain of words and a text, a chain of syntagms. A language has a finite dictionary of words and syntactic rules for the construction of texts. With its dictionary and set of syntactic rules, a language offers its users the possibility to generate – and understand – an infinity of texts.


Humans beings can’t pronounce or recognize several phonemes simultaneously. They can only pronounce one sound at a time. So languages have to obey the constraint of sequentiality. A speech is a chain of phonemes with an acoustic punctuation reflecting its grammatical organization.

Phonemes are meaningless sounds without signification1 and generally divided into consonants and vowels. Some languages also have “click” sounding consonants (in Eastern and Southern Africa) and others (in Chinese Mandarin) use different tones on their vowels. Despite the great diversity of sounds used to pronounce human languages, the number of conventional sounds in a language is limited: the order of magnitude is between thirty and one hundred.


The first symbolic grammatical unit is the word, a signifier with a signified. By word, I mean an atomic unit of meaning. For example, “small” contains one unit of meaning. But “smallest” contains two: “small” (meaning tiny) and “est” (a superlative suffix used at the end of a word indicating the most).

Languages contain nouns depicting structures or entities, and verbs describing actions, events, and processes. Depending on the language, there are other types of words like adjectives, adverbs, prepositions or sense units that orient grammatical functions, such as gender, number, grammatical person, tense and cases.

Now let’s see how many words does a language hold? It depends. The largest English dictionary counts 200,000 words, Latin has 50,000 words, Chinese 30,000 characters and biblical Hebrew amounts to 6,000 words. The French classical author Jean Racine was able to evoke the whole range of human passions and emotions by using only 3,700 words in 13 plays. Most linguists think that whatever the language is, an educated, refined speaker masters about 10,000 words in his or her lifetime.


Note that a word alone cannot be true or false. Its signifier points to its signified (an abstract category) and not to a state of things. It is only when a sentence is spoken in a context describing a reality – a sentence with a referent – that it can be true or false.

A syntagm (a topic, sentence, and super-sentence) is a sequence of words organized by grammatical relationships. When we utter a syntagm, we leave behind the abstract dictionary of a language to enter the concrete world of speech acts in contexts. We can distinguish three sub-levels of complexity in a syntagm: the topic, the sentence, and the super-sentence. Firstly, a topic is a super-word that designates a subject, a matter, an object or a process that cannot be described by just a single word, i.e., “history of linguistics”, “smartphone” or “tourism in Canada”. Different languages have diverse rules for building topics like joining the root of a word with a grammatical case (in Latin), or agglutination of words (in German or Turkish). By relating several topics together a sentence brings to mind an event, an action or a fact, i.e., “I bought her a smartphone for her twentieth birthday”. A sentence can be verbal like in the previous example, or nominal like “the leather seat of my father’s car”. Finally, a super-sentence evokes a network of relations between facts or events, like in a theory or a narrative. The relationships between sentences can be temporal (after), spatial (behind), causal (because), logical (therefore) or underline contrasts (but, despite…), and so on.


The highest grammatical unit is a text: a punctuated sequence of syntagms. The signification of a text comes from the application of grammatical rules by combining its signifieds. The text also has a referent inferred from its temporal, spatial and social context.

In order to construct a mental model of a referent, a reader can’t help but imagine a general intention of meaning behind a text, even when it is produced by a computer program, for instance.

Semantic relationships

When we hear a speech, we are actually transforming a chain of sounds into a semantic network, and from this network, we infer a new mental model of a situation. Conversely, we are able to transform a mental model into the corresponding semantic network and then from this network, back into a sequence of phonemes. Semantics is the back and forth translation between chains of phonemes and semantic networks. Semantic networks themselves are multi-layered and can be broken down into three levels: paradigmatic, syntagmatic and textual.


Figure: Hierarchy of grammatical units and semantic relations

Paradigmatic relationships

In linguistics, a paradigm is a set of semantic relations between words of the same language. They may be etymological, taxonomical relations, oppositions or differences. These relations may be the inflectional forms of a word, like “one apple” and “two apples”. Languages may comprise paradigms to indicate verb tenses (past, present, future) or mode (active, passive). For example, the paradigm for “go” is “go, went, gone”. The notion of paradigm also indicates a set of words which cover a particular functional or thematic area. For instance, most languages include paradigms for economic actions (buy, sell, lend, repay…), or colors (red, blue, yellow…). A speaker may transform a sentence by replacing one word from a paradigm by another from the same paradigm and get a sentence that still makes sense. In the sentence “I bought a car”, you could easily replace “bought” by “sold” because “buy” and “sell” are part of the same paradigm: they have some meaning in common. But in that sentence, you can’t replace “bought” by “yellow” for instance. Two words from the same paradigm may be opposites (if you are buying, you are not selling) but still related (buying and selling can be interchangeable).

Words can also be related when they are in taxonomic relation, like “horse” and “animal”. The English dictionary describes a horse as a particular case of animal. Some words come from ancient words (etymology) or are composed of several words: for example, the word metalanguage is built from “meta” (beyond, in ancient Greek) and “language”.

In general, the conceptual relationships between words from a dictionary may be qualified as paradigmatic.

Syntagmatic relationships

By contrast, syntagmatic relations describe the grammatical connections between words in the same sentence. In the two following sentences: “The gazelle smells the presence of the lion” and “The lion smells the presence of the gazelle”, the set of words are identical but the words “gazelle” and “lion” do not share the same grammatical role. Since those words are inversed in the syntagmatic structure, the sentences have distinct meanings.

Textual relationships

At the text level, which includes several syntagms, we find semantic relations like anaphoras and isotopies. Let’s consider the super-sentence: “If a man has talent and can’t use it, he’s failed.” (Thomas Wolfe). In this quotation “it” is an anaphora for “talent” and “he”, an anaphora for “a man”. When reading a pronoun (it, he), we resolve the anaphora when we know which noun – mentioned in a previous or following sentence – it is referring to. On the other hand, isotopies are recurrences of themes that weave the unity of a text: the identity of heroes (characters), genres (love stories or historical novels), settings, etc. The notion of isotopy also encompasses repetitions that help the listener understand a text.

Pragmatic interactions

Pragmatics weave the triadic relation between signs (symbols, speeches or texts), beings (interpreters, people or interlocutors) and things (referents, objects, reality, extra-textual context). On the pragmatic level of communication, speeches point to – and act upon – a social context. A speech act functions as a move in a game played by its speaker. So, distinct from semantic meaning, that we have analyzed in a previous section, pragmatic meaning would address questions like: what kind of act (an advice, a promise, a blame, a condemnation, etc.) is carried by a speech? Is a speech spoken in a play on a stage or in a real tribunal? The pragmatic meaning of a speech also relates to the actual effects of its utterance, effects that are not always known at the moment of the enunciation. For example: “Did I convince you? Have you kept your word?”. The sense of a speech can only be understood after its utterance and future events can always modify it.

A speech act is highly dependent on cultural conventions, on the identity of speakers and attendees, time and place, etc. By proclaiming: “The session is open”, I am not just announcing that an official meeting is about to start, I am actually opening the session. But I have to be someone relevant or important like the president of that assembly to do so. If I am a janitor and I say: “The session is open”, the act is not performed because I don’t have any legitimacy to open the session.

If an utterance is descriptive, it’s either true or false. In other cases, if an utterance does something instead of describing a state of things, it has a pragmatic force instead of a truth value.

Resolving ambiguities

We have just reviewed the different layers of grammatical, semantic and pragmatic complexity to better understand the meaning of a text. Now, we are going to examine the ambiguities that may arise during the reading or listening of a text in a natural language.

Semantic ambiguities

How do we go from to the sound of a chain of phonemes to the understanding of a text? From a sequence of sounds, we build a multi-layered (paradigmatic, syntagmatic and textual) semantic network. When weaving the paradigmatic layer, we answer questions like: “What is this word? To what paradigm does it belong? Which one of its meanings should I consider?”. Then, we connect words together by answering: “What are the syntagmatic relations between the words in that sentence?”. Finally, we comprehend the text by recognizing the anaphoras and isotopies that connect its sentences. Our understanding of a text is based on this three-layered network of sense units.

Furthermore, ambiguities or uncertainties of meaning in languages can happen on all three levels and can multiply their effects. In the case of homophony, the same sound can point to different words like in “ate” and “eight”. And sometimes, the same word may convey several distinct meanings like in “mole”: (1) a shortsighted mouse-like animal digging underground galleries, (2) an undercover spy, or (3) a pigmented spot or mark on the skin. In the case of synonymy, the same meaning can apply to distinct words like “tiny” and “small”. Amphibologies refer to syntagmatic ambiguities as in: “Mary saw a woman on the mountain with a telescope.” Who is on the mountain? Moreover, who has the telescope? Mary or the woman? On a higher level of complexity, textual relations can be even more ambiguous than paradigmatic and syntagmatic ones because rules for anaphoras and isotopies are loosely defined.

Resolving semantic ambiguities in pragmatic contexts

Human beings don’t always correctly resolve all the semantic ambiguities of a speech, but when they do, it is often because they take into account the pragmatic (or extra-textual) context that is generally implicit. It’s in a context, that deictic symbols like: here, you, me, that one over there, or next Tuesday, take their full meaning. Let’s add that, comparing a text in hand with the author’s corpus, genre, historical period, helps to better discern the meaning of a text. But some pragmatic aspects of a text may remain unknown. Ambiguities can stem from many causes: the precise referents of a speech, the uncertainty of the speaker’s social interactions, the ambivalence or concealment of the speaker’s intentions, and of course not knowing in advance the effects of an utterance.

Problems in natural language processing

Computer programs can’t understand or translate texts with dictionaries and grammars alone. They can’t engage in the pragmatic context of speeches like human beings do to disambiguate texts unless this context is made explicit. Understanding a text implies building and comparing complex and dynamic mental models of text and context.

On the other hand, natural language processing (a sub-discipline of artificial intelligence) compensates for the irregularity of natural languages by using a lot of statistical calculations and deep learning algorithms that have been trained on huge corpora. Depending on its training set, an algorithm can interpret a text by choosing the most probable semantic network amongst those compatible within a chain of phonemes. Imperatively, the results have to be validated and improved by human reviewers.