Archives for posts with tag: Big data

IEML Research Program in 2019

23 April 2019 //

IEML (the Information Economy Meta Language) has four main directions of research and development in 2019: in mathematics, data science, linguistics and software development. This blog entry reviews them successively.

1- A mathematical research program

I will give here a philosophical description of the structure of IEML, the purpose of the mathematical research to come being to give a formal description and to draw from this formalisation as much useful information as possible on the calculation of relationships, distances, proximities, similarities, analogies, classes and others… as well as on the complexity of these calculations. I had already produced a formalization document in 2015 with the help of Andrew Roczniak, PhD, but this document is now (2019) overtaken by the evolution of the IEML language. The Brazilian physicist Wilson Simeoni Junior has volunteered to lead this research sub-program.

IEML Topos

The “topos” is a structure that was identified by the great mathematician Alexander Grothendieck, who “is considered as the re-founder of algebraic geometry and, as such, as one of the greatest mathematicians of the 20th century” (see Wikipedia).

Without going into technical details, a topos is a bi-directional relationship between, on the one hand, an algebraic structure, usually a “category” (intuitively a group of transformations of transformation groups) and, on the other hand, a spatial structure, which is geometric or topological.

In IEML, thanks to a normalization of the notation, each expression of the language corresponds to an algebraic variable and only one. Symmetrically, each algebraic variable corresponds to one linguistic expression and only one.

Topologically, each variable in IEML algebra (i.e. each expression of the language) corresponds to a “point”. But these points are arranged in different nested recursive complexity scales: primitive variables, morphemes of different layers, characters, words, sentences, super-phrases and texts. However, from the level of the morpheme, the internal structure of each point – which comes from the function(s) that generated the point – automatically determines all the semantic relationships that this point has with the other points, and these relationships are modelled as connections. There are obviously a large number of connection types, some very general (is contained in, has an intersection with, has an analogy with…) others more precise (is an instrument of, contradicts X, is logically compatible with, etc.).

The topos that match all the expressions of the IEML language with all the semantic relationships between its expressions is called “The Semantic Sphere”.

Algebraic structure of IEML

In the case of IEML, the algebraic structure is reduced to

1. Six primitive variables
2. A non-commutative multiplication with three variables (substance, attribute and mode). The IEML multiplication is isomorphic to the triplet ” departure vertex, arrival vertex, edge ” which is used to describe the graphs.
3. A commutative addition that creates a set of objects.

This algebraic structure is used to construct the following functions and levels of variables…

1. Functions using primitive variables, called “morpheme paradigms”, have as inputs morphemes at layer n and as outputs morphemes at layer n+1. Morpheme paradigms include additions, multiplications, constants and variables and are visually presented in the form of tables in which rows and columns correspond to certain constants.

2. “Character paradigms” are complex additive functions that take morphemes as inputs and characters as outputs. Character paradigms include a group of constant morphemes and several groups of variables. A character is composed of 1 to 5 morphemes arranged in IEML alphabetical order. (Characters may not include more than five morphemes for cognitive management reasons).

3. IEML characters are assembled into words (a substance character, an attribute character, a mode character) by means of a multiplicative function called a “word paradigm”. A word paradigm intersects a series of characters in substance and a series of characters in attribute. The modes are chosen from predefined auxiliary character paradigms, depending on whether the word is a noun, a verb or an auxiliary. Words express subjects, keywords or hashtags. A word can be composed of only one character.

4. Sentence building functions assemble words by means of multiplication and addition, with the necessary constraints to obtain grammatical trees. Mode words describe the grammatical/semantic relationships between substance words (roots) and attribute words (leaves). Sentences express facts, proposals or events; they can take on different pragmatic and logical values.

5. Super-sentences are generated by means of multiplication and addition of sentences, with constraints to obtain grammatical trees. Mode sentences express relationships between substance sentences and attribute sentences. Super-sentences express hypotheses, theories or narratives.

6. A USL (Uniform Semantic Locator) or IEML text is an addition (a set) of words, sentences and super-sentences.

Topological structure of IEML: a semantic rhizome

Static

The philosophical notion of rhizome (a term borrowed from botany) was developed on a philosophical level by Deleuze and Guattari in the preface to Mille Plateaux (Minuit 1980). In this Deleuzo-Guattarian lineage, by rhizome I mean here a complex graph whose points or “vertices” are organized into several levels of complexity (see the algebraic structure) and whose connections intersect several regular structures such as series, tree, matrix and clique. In particular, it should be noted that some structures of the IEML rhizome combine hierarchical or genealogical relationships (in trees) with transversal or horizontal relationships between “leaves” at the same level, which therefore do not respect the “hierarchical ladder”.

Dynamic

We can distinguish the abstract, or virtual, rhizomatic grid drawn by the grammar of the language (the sphere to be dug) and the actualisation of points and relationships by the users of the language (the dug sphere of chambers and galleries). Characters, words, sentences, etc. are all chambers in the centre of a star of paths, and the generating functions establish galleries of “rhizomatic” relationships between them, as many paths for exploring the chambers and their contents. It is therefore the users, by creating their lexicons and using them to index their data, communicate and present themselves, who shape and grow the rhizome…

Depending on whether circuits are more or less used, on the quantity of data or on the strength of interactions, the rhizome undergoes – in addition to its topological transformations – various types of quantitative or metric transformations.

* The point to remember is that IEML is a language with calculable semantics because it is also an algebra (in the broad sense) and a complex topological space.

* In the long term, IEML will be able to serve as a semantic coordinate system for the information world at large.

2 A research program in data science

The person in charge of the data science research sub-program is the software engineer (Eng. ENSIMAG, France) Louis van Beurden, who holds also a master’s degree in data science and machine translation from the University of Montréal, Canada. Louis is planning to complete a PhD in computer science in order to test the hypothesis that, from a data science perspective, a semantic metadata system in IEML is more efficient than a semantic metadata system in natural language and phonetic writing. This doctoral research will make it possible to implement phases A and B of the program below and to carry out our first experiment.

Background information

The basic cycle in data science can be schematized according to the following loop:

1. selection of raw data,
2. pre-processing, i.e. cleaning data and metadata imposition (cataloguing and categorization) to facilitate the exploitation of the results by human users,
3. statistical processing,
4. visual and interactive presentation of results,
5. exploitation of the results by human users (interpretation, storytelling) and feedback on steps 1, 2, 3

Biases or poor quality of results may have several causes, but often come from poor pre-treatment. According to the old computer adage “garbage in, garbage out“, it is the professional responsibility of the data-scientists to ensure the quality of the input data and therefore not to neglect the pre-processing phase where this data is organized using metadata.

Two types of metadata can be distinguished: 1) semantic metadata, which describes the content of documents or datasets, and 2) ordinary metadata, which describes authors, creation dates, file types, etc. Let us call “semantic pre-processing” the imposition of semantic metadata on data.

Hypothesis

Since IEML is a univocal language and the semantic relationships between morphemes, words, sentences, etc. are mathematically computable, we assume that a semantic metadata system in IEML is more efficient than a semantic metadata system in natural language and phonetic writing. Of course, the efficiency in question is related to a particular task: search, data analysis, knowledge extraction from data, machine learning, etc.

In other words, compared to a “tokenization” of semantic metadata in phonetic writing noting a natural language, a “tokenization” of semantic metadata in IEML would ensure better processing, better presentation of results to the user and better exploitation of results. In addition, semantic metadata in IEML would allow datasets that use different languages, classification systems or ontologies to be de-compartmentalized, merged and compared.

Design of the first experience

The ideal way to do an experiment is to consider a multi-variable system and transform only one of the system variables, all other things being equal. In our case, it is only the semantic metadata system that must vary. This will make it easy to compare the system’s performance with one (phonetic tokens) or the other (semantic tokens) of the semantic metadata systems.

– The dataset of our first experience encompasses all the articles of the Sens Public scientific journal.
– Our ordinary metadata are the author, publication date, etc.
– Our semantic metadata describe the content of articles.
– In phonetic tokens, using RAMEAU categories, keywords and summaries,
– In IEML tokens by translating phonetic tokens.
– Our processes are “big data” algorithms traditionally used in natural language processing
– An algorithm for calculating the co-occurrences of keywords.
– A TF-IDF (Term Frequency / Inverse Document Frequency) algorithm that works from a word / document matrix.
– A clustering algorithm based on “word embeddings” of keywords in articles (documents are represented by vectors, in a space with as many dimensions as words).
– A user interface will offer a certain way to access the database. This interface will be obviously adapted to the user’s task (which remains to be chosen, but could be of the “data analytics” type).
– Result 1 corresponds to the execution of the “machine task”, i.e. the establishment of a connection network on the articles (relationships, proximities, groupings, etc.). We’ll have to compare….
– result 1.1 based on the use of phonetic tokens with
– result 1.2 based on the use of IEML tokens.
– Result 2 corresponds to the execution of the selected user-task (data analytics, navigation, search, etc.). We’ll have to compare….
– result 2.1, based on the use of phonetic tokens, with
– result 2.2, based on the use of IEML tokens.

Step A: First indexing of a database in IEML

Reminder: the data are the articles of the scientific journal, the semantic metadata are the categories, keywords and summaries of the articles. From the categories, keywords and article summaries, a glossary of the knowledge area covered by the journal is created, or a sub-domain if it turns out that the task is too difficult. It should be noted that in 2019 we do not yet have the software tools to create IEML sentences and super-phrases that allow us to express facts, proposals, theories, narratives, hypotheses, etc. Phrases and super-phrases, perhaps accessible in a year or two, will therefore have to wait for a later phase of the research.

The creation of the glossary will be the work of a project community, linked to the editors of Sens-Public magazine and the Canada Research Chair in Digital Writing (led by Prof. Marcello Vitali-Rosati) at the Université de Montréal (Digital Humanities). Pierre Lévy will accompany this community and help it to identify the constants and variables of its lexicon. One of the auxiliary goals of the research is to verify whether motivated communities can appropriate IEML to categorize their data. Once we are satisfied with the IEML indexing of the article database, we will proceed to the next step.

Step B: First experimental test

1. The test is determined to measure the difference between results based on phonetic tokens and results based on IEML tokens.
2. All data processing operations are carried out on the data.
3. The results (machine tasks and user tasks) are compared with both types of tokens.

The experiment can eventually be repeated iteratively with minor modifications until satisfactory results are achieved.

If the hypothesis is confirmed, we proceed to the next step

Step C: Towards an automation of semantic pre-processing in IEML.

If the superior efficiency of IEML tokens for semantic metadata is demonstrated, then there will be a strong interest in maximizing the automation of IEML semantic pre-processing.

The algorithms used in our experiment are themselves powerful tools for data pre-processing, they can be used, according to methods to be developed, to partially automate semantic indexing in IEML. The “word embeddings” will make it possible to study how IEML words are correlated with the natural language lexical statistics of the articles and to detect anomalies. For example, we will check if similar USLs (a USL is an IEML text) point to very different texts or if very different texts have similar USLs.

Finally, methods will be developed to use deep learning algorithms to automatically index datasets in IEML.

Step D: Research and development perspective in Semantic Machine Learning

If step C provides the expected results, i.e. methods using AI to automate the indexing of data in IEML, then big data indexed in IEML will be available. As progress will be made, semantic metadata may become increasingly similar to textual data (summary of sections, paragraphs, sentences, etc.) until translation into IEML is achieved, which remains a distant objective.

The data indexed in IEML could then be used to train artificial intelligence algorithms. The hypothesis that machines learn more easily when data is categorized in IEML could easily be validated by experiments of the same type as described above, by comparing the results obtained from training data indexed in IEML and the results obtained from the same data indexed in natural languages.

This last step paves the way for a better integration of statistical AI and symbolic AI (based on facts and rules, which can be expressed in IEML).

3 A research program in linguistics, humanities and social sciences

Introduction

The semiotic and linguistic development program has two interdependent components:

1. The development of the IEML metalanguage

2. The development of translation systems and bridges between IEML and other sign systems, in particular…

– natural languages,
– logical formalisms,
– pragmatic “language games” and games in general,
– iconic languages,
– artistic languages, etc.

This research and development agenda, particularly in its linguistic dimension, is important for the digital humanities. Indeed, IEML can serve as a system of semantic coordinates of the cultural universe, thus allowing the humanities to cross a threshold of scientific maturity that would bring their epistemological status closer to that of the natural sciences. Using IEML to index data and to formulate assumptions would result in….

(1) a de-silo of databases used by researchers in the social sciences and humanities, which would allow for the sharing and comparison of categorization systems and interpretive assumptions;
(2) an improved analysis of data.
(3) The ultimate perspective, set out in the article “The Role of the Digital Humanities in the New Political Space” (http://sens-public.org/article1369.html in French), is to aim for a reflective collective intelligence of the social sciences and humanities research community.

But IEML’s research program in the perspective of the digital humanities – as well as its research program in data science – requires a living and dynamic semiotic and linguistic development program, some aspects of which I will outline here.

IEML and the Meaning-Text Theory

IEML’s linguistic research program is very much based on the Meaning-Text theory developed by Igor Melchuk and his school. “The main principle of this theory is to develop formal and descriptive representations of natural languages that can serve as a reliable and convenient basis for the construction of Meaning-Text models, descriptions that can be adapted to all languages, and therefore universal. ”(Excerpt translated from the Wikipedia article on Igor Melchuk). Dictionaries developed by linguists in this field connect words according to universal “lexical functions” identified through the analysis of many languages. These lexical functions have been formally transposed into the very structure of IEML (See the IEML Glossary Creation Guide) so that the IEML dictionary can be organized by the same tools (e.g. Spiderlex) as those of the Meaning-Text Theory research network. Conversely, IEML could be used as a pivot language – or concept description language – *between* the natural language dictionaries developed by the network of researchers skilled in Meaning-Text theory.

Construction of specialized lexicons in the humanities and social sciences

A significant part of the IEML lexicon will be produced by communities having decided to use IEML to mark out their particular areas of knowledge, competence or interaction. Our research in specialized lexicon construction aims to develop the best methods to help expert communities produce IEML lexicons. One of the approaches consists in identifying the “conceptual skeleton” of a domain, namely its main constants in terms of character paradigms and word paradigms.

The first experimentation of this type of collaborative construction of specialized lexicons by experts will be conducted by Pierre Lévy in collaboration with the editorial team of the Sens Public scientific journal and the Canada Research Chair in Digital Textualities at the University of Montréal (led by Prof. Marcello Vitali-Rosati). Based on a determination of their economic and social importance, other specialized glossaries can be constructed, for example on the theme of professional skills, e-learning resources, public health prevention, etc.

Ultimately, the “digital humanities” branch of IEML will need to collaboratively develop a conceptual lexicon of the humanities to be used for the indexation of books and articles, but also chapters, sections and comments in documents. The same glossary should also facilitate data navigation and analysis. There is a whole program of development in digital library science here. I would particularly like to focus on the human sciences because the natural sciences have already developed a formal vocabulary that is already consensual.

Construction of logical, pragmatic and narrative character-tools

When we’ll have a sentence and super-phrase editor, it is planned to establish a correspondence between IEML – on the one hand – and propositional calculus and first order logics – on the other hand –. This will be done by specifying special character-tools to implement logical functions. Particular attention will be paid to formalizing the definition of rules and the declaration that “facts” are true in IEML. It should be noted in passing that, in IEML, grammatical expressions represent classes, sets or categories, but that logical individuals (proper names, numbers, etc.) or instances of classes are represented by “literals” expressed in ordinary characters (phonetic alphabets, Chinese characters, Arabic numbers, URLs, etc.).

In anticipation of practical use in communication, games, commerce, law (smart contracts), chatbots, robots, the Internet of Things, etc., we will develop a range of character-tools with illocutionary force such as “I offer”, “I buy”, “I quote”, “I give an instruction”, etc.

Finally, we will making it easier for authors of super-sentences by developing a range of character-tools implementing “narrative functions”.

4 A software development program

A software environment for the development and public use of the IEML language

Logically, the first multi-user IEML application will be dedicated to the development of the language itself. This application is composed of the following three web modules.

1. A morpheme editor that also allows you to navigate in the morphemes database, or “dictionary”.
2. A character and word editor that also allows navigation in the “lexicon”.
3. A navigation and reading tool in the IEML library as a whole, or “IEML database” that brings together the dictionary and lexicon, with translations, synonyms and comments in French and English for the moment.

The IEML database is a “Git” database and is currently hosted by GitHub. Indeed, a Git database makes it possible to record successive versions of the language, as well as to monitor and model its growth. It also allows large-scale collaboration among teams capable of developing specific branches of the lexicon independently and then integrating them into the main branch after discussion, as is done in the collaborative development of large software projects. As soon as a sub-lexicon is integrated into the main branch of the Git database, it becomes a “common” usable by everyone (according to the latest General Public License version.

Morpheme and word editors are actually “Git clients” that feed the IEML database. A first version of this collaborative read-write environment should be available in the fall of 2019 and then tested by real users: the editors of the Scientific Journal “Sens Public” as well as other participants in the University of Montréal’s IEML seminar.

The following versions of the IEML read/write environment should allow the editing of sentences and texts as well as literals that are logical individuals not translated into IEML, such as proper names, numbers, URLs, etc.

A social medium for collaborative knowledge management

A large number of applications using IEML can be considered, both commercial and non-commercial. Among all these applications, one of them seems to be particularly aligned with the public interest: a social medium dedicated to collaborative knowledge and skills management. This new “place of knowledge” could allow the online convergence of the missions of…

– museums and libraries,
– schools and universities,
– companies and administrations (with regard to their knowledge creation and management dimension),
– smart cities, employment agencies, civil society networks, NGO, associations, etc.

According to its general philosophy, such a social medium should…

– be supported by an intrinsically distributed platform,
– have the simplicity – or the economy of means – of Twitter,
– ensure the sovereignty of users over their data,
– promote collaborative processes.

The main functions performed by this social medium would be:

– data curation (reference and categorization of web pages, edition of resource collections),
– teaching offers and learning demands,
– offers and demands for skills, or employment market.

IEML would serve as a common language for

– data categorization,
– description of the knowledge and skills,
– the expression of acts within the social medium (supply, demand, consent, publish, etc.)
– addressing users through their knowledge and skills.

Three levels of meaning would thus be formalized in this medium.

– (1) The linguistic level in IEML – including lexical and narrative functions – formalizes what is spoken about (lexicon) and what is said (sentences and super-phrases).
– (2) The logical – or referential – level adds to the linguistic level…
– logical functions (first order logic and propositional logic) expressed in IEML using logical character-tools,
– the ability of pointing to references (literals, document URLs, datasets, etc.),
– the means to express facts and rules in IEML and thus to feed inference engines.
– (3) The pragmatic level adds illocutionary functions and users to the linguistic and logical levels.
– Illocutionary functions (thanks to pragmatic character-tools) allow the expression of conventional acts and rules (such as “game” rules).
– The pragmatic level obviously requires the consideration of players or users, as well as user groups.
– It should be noted that there is no formal difference between logical inference and pragmatic inference but only a difference in use, one aiming at the truth of propositions according to referred states of things, the other calculating the rights, obligations, gains, etc. of users according to their actions and the rules of the games they play.

The semantic profiles of users and datasets will be arranged according to the three levels that have just been explained. The “place of knowledge” could be enhanced by the use of tokens or crypto-currencies to reward participation in collective intelligence. If successful, this type of medium could be generalized to other areas such as health, democratic governance, trade, etc.

Tags artificial intelligence, Big data, deep meaning, humanities, IEML, research, semantic analysis, Semantic computing, Semantic Sphere

Categories English, Semantic Sphere

La litéracie en curation de données

16 March 2016 //

FIGURE 1

J’ai montré dans un post précédent, l’importance contemporaine de la curation collaborative de données. Les compétences dans ce domaine sont au coeur de la nouvelle litéracie algorithmique. La figure 1 présente ces compétences de manière systématique et, ce faisant, elle met en ordre les savoir-faire intellectuels et pratiques tout comme les « savoir-être » éthiques qui supportent l’augmentation de l’intelligence collective en ligne. L’étoile évoque le signe, le visage l’être et le cube la chose (sur ces concepts voir ce post). La table est organisée en trois rangées et trois colonnes interdépendantes. La première rangée explicite les fondements de l’intelligence algorithmique au niveau personnel, la seconde rappelle l’indispensable travail critique sur les sources de données et la troisième détaille les compétences nécessaires à l’émergence d’une intelligence collective augmentée par les algorithmes. L’intelligence personnelle et l’intelligence collective travaillent ensemble et ni l’une ni l’autre ne peuvent se passer d’intelligence critique ! Les colonnes évoquent trois dimensions complémentaires de la cognition : la conscience réflexive, la production de signification et la mémoire. Aucune d’elles ne doit être tenue pour acquise et toutes peuvent faire l’objet d’entraînement et de perfectionnement. Dans chaque case, l’item du haut pointe vers un exercice de virtualisation tandis que celui du bas indique une mise en oeuvre actuelle de la compétence, plus concrète et située. Je vais maintenant commenter le tableau de la figure 1 rangée par rangée.

L’intelligence personnelle

La notion d’intelligence personnelle doit ici s’entendre au sens d’une compétence cognitive individuelle. Mais elle tire également vers la signification du mot « intelligence » en anglais. Dans ce dernier sens, elle désigne la capacité d’un individu à mettre en place son propre système de renseignement.

La gestion de l’attention ne concerne pas seulement l’exercice de la concentration et l’art complémentaire d’éviter les distractions. Elle inclut aussi le choix réfléchi de priorités d’apprentissage et le discernement de sources d’information pertinentes. Le curateur lui-même doit décider de ce qui est pertinent et de ce qui ne l’est pas selon ses propres critères et en fonction des priorités qu’il s’est donné. Quant à la notion de source, est-il besoin de souligner ici que seuls les individus, les groupes et les institutions peuvent être ainsi qualifiés. Seuls donc ils méritent la confiance ou la méfiance. Quant aux médias sociaux, ce ne sont en aucun cas des sources (contrairement à ce que croient certains journalistes) mais plutôt des plateformes de communication. Prétendre, par exemple, que « Twitter n’est pas une source fiable », n’a pas plus de sens que l’idée selon laquelle « le téléphone n’est pas une source fiable ».

L’interpretation des données relève également de la responsabilité des curateurs. Avec tous les algorithmes statistiques et tous les outils d’analyse automatique de données (« big data analytics ») du monde, nous aurons encore besoin d’hypothèses causales, de théories et de systèmes de catégorisation pour soutenir ces théories. Les corrélations statistiques peuvent suggérer des hypothèses causales mais elles ne les remplacent pas. Car nous voulons non seulement prédire le comportement de phénomènes complexes, mais aussi les comprendre et agir sur la base de cette compréhension. Or l’action efficace suppose une saisie des causes réelles et non seulement la perception de corrélations. Sans les intuitions et les théories dérivées de notre connaissance personnelle d’un domaine, les outils d’analyse automatique de données ne seront pas utilisés à bon escient. Poser de bonnes questions aux données n’est pas une entreprise triviale !

Finalement, les données collectionnées doivent être gérées au plan matériel. Il nous faut donc choisir les bons outils d’entreposage dans les « nuages » et savoir manipuler ces outils. Mais la mémoire doit être aussi entretenue au niveau conceptuel. C’est pourquoi le bon curateur est capable de créer, d’adopter et surtout de maintenir un système de catégorisation qui lui permettra de retrouver l’information désirée et d’extraire de ses collections la connaissance qui lui sera utile.

L’intelligence critique

L’intelligence critique porte essentiellement sur la qualité des sources. Elle exige d’abord un travail de critique « externe ». Nous savons qu’il n’existe pas d’autorité transcendante dans le nouvel espace de communication. Si nous ne voulons pas être trompé, abusé, ou aveuglé par des oeillères informationnelles, il nous faut donc autant que possible diversifier nos sources. Notre fenêtre d’attention doit être maintenue bien ouverte, c’est pourquoi nous nous abonnerons à des sources adoptant divers points de vue, récits organisateurs et théories. Cette diversité nous permettra de croiser les données, d’observer les sujets sur lesquelles elles se contredisent et ceux sur lesquelles elles se confirment mutuellement.

L’évaluation des sources demande également un effort de décryptage des identités : c’est la critique « interne ». Pour comprendre la nature d’une source, nous devons reconnaître son système de classification, ses catégories maîtresses et son récit organisateur. En un sens, une source n’est autre que le récit autour duquel elle organise ses données : sa manière de produire du sens.

Finalement l’intelligence critique possède une dimension « pragmatique ». Cette critique est la plus dévastatrice parce qu’elle compare le récit de la source avec ce qu’elle fait réellement. Je vise ici ce qu’elle fait en diffusant ses messages, c’est-à-dire l’effet concret de ses actes de communication sur les conversations en cours et l’état d’esprit des participants. Je vise également les contributions intellectuelles et esthétiques de la source, ses interactions économiques, politiques, militaires ou autres telles qu’elles sont rapportées par d’autres sources. Grâce à cette bonne mémoire nous pouvons noter les contradictions de la source selon les moments et les publics, les décalages entre son récit officiel et les effets pratiques de ses actions. Enfin, plus une source se montre transparente au sujet de ses propres sources d’informations, de ses références, de son agenda et de son financement et plus elle est fiable. Inversement, l’opacité éveille les soupçons.

L’intelligence collective

Je rappelle que l’intelligence collective dont il est question ici n’est pas une « solution miracle » mais un savoir-faire à cultiver qui présuppose et renforce en retour les intelligences personnelles et critiques.

Commençons par définir la stigmergie : il s’agit d’un mode de communication dans lequel les agents se coordonnent et s’informent mutuellement en modifiant un environnement ou une mémoire commune. Dans le médium algorithmique, la communication tend à s’établir entre des pairs qui créent, catégorisent, critiquent, organisent, lisent, promeuvent et analysent des données au moyen d’outils algorithmiques. Il s’agit bien d’une communication stigmergique parce que, même si les personnes dialoguent et se parlent directement, le principal canal de communication reste une mémoire commune que les participants exploitent et transforment ensemble. Il est utile de distinguer entre les mémoires locale et globale. Dans la mémoire « locale » de réseaux ou de communautés particulières, nous devons prêter attention à des contextes et à des histoires singulières. Il est également recommandé de tenir compte des contributions des autres participants, de ne pas aborder des sujets non-pertinents pour le groupe, d’éviter les provocations, les explosions d’agressivité, les provocations, etc.

Quant à la mémoire « globale », il faut se souvenir que chaque action dans le médium algorithmique réorganise – même de façon infinitésimale – la mémoire commune : lire, taguer, acheter, poster, créer un hyperlien, souscrire, s’abonner, « aimer », etc. Nous créons notre environnement symbolique de manière collaborative. Le bon agent humain de l’intelligence collective gardera donc à la conscience que ses actions en ligne contribuent à l’information des autres agents.

La liberté dont il est question dans la figure 1 se présente comme une dialectique entre pouvoir et responsabilité. Le pouvoir recouvre notre capacité à créer, évaluer, organiser, lire et analyser les données, notre aptitude à faire évoluer la mémoire commune à partir de la multitude distribuée de nos actions. La responsabilité se fonde sur une conscience réfléchie de notre pouvoir collectif, conscience qui informe en retour l’orientation de notre attention et le sens que nous donnons à l’exercice de nos pouvoirs.

FIGURE 2

L’apprentissage collaboratif

Finalement, l’apprentissage collaboratif est un des processus cognitifs majeurs de l’intelligence collective et le principal bénéfice social des habiletés en curation de données. Afin de bien saisir ce processus, nous devons distinguer entre savoirs tacites et savoirs explicites. Les savoirs tacites recouvrent ce que les membres d’une communauté ont appris dans des contextes particuliers, les savoir-faire internalisés dans les réflexes personnels à partir de l’expérience. Les savoirs explicites, en revanche, sont des récits, des images, des données, des logiciels ou d’autres ressources documentaires, qui sont aussi clairs et décontextualisés que possible, afin de pouvoir être partagés largement.

L’apprentissage collaboratif enchaîne deux mouvements. Le premier consiste à traduire le savoir tacite en savoir explicite pour alimenter une mémoire commune. Dans un second mouvement, complémentaire du premier, les participants exploitent le savoir explicite et les ressources d’apprentissage disponibles dans la mémoire commune afin d’adapter ces connaissances à leur contexte particulier et de les intégrer dans leurs réflexes quotidiens. Les curateurs sont potentiellement des étudiants ou des apprenants lorsqu’ils internalisent un savoir explicite et ils peuvent se considérer comme des enseignants lorsqu’ils mettent des savoirs explicites à la disposition des autres. Ce sont donc des pairs (voir la figure 2) qui travaillent dans un champ de pratique commun. Ils transforment autant que possible leur savoir tacite en savoir explicite et travaillent en retour à traduire la partie des connaissances explicites qu’ils veulent acquérir en savoir pratique personnel. J’écris “autant que possible” parce que l’explicitation totale du savoir tacite est hors de portée, comme l’a bien montré Michael Polanyi.

Dans le médium algorithmique, le savoir explicite prend la forme de données catégorisées et évaluées. Le cycle de transformation des savoirs tacites en savoirs explicites et vice versa prend place dans les médias sociaux, où il est facilité par une conversation créative civilisée : les compétences intellectuelles et sociales (ou morales) fonctionnent ensemble !

Tags Big data, categorization, communication, curation, education, humanities, pensée critique

Categories French, paper, Uncategorized

A Project for a New Humanism: an interview with Pierre Lévy about IEML

15 November 2014 //

Originally published by the CCCTLab as an interview with Sandra Alvaro.

Pierre Lévy is a philosopher and a pioneer in the study of the impact of the Internet on human knowledge and culture. In Collective Intelligence. Mankind’s Emerging World in Cyberspace, published in French in 1994 (English translation in 1999), he describes a kind of collective intelligence that extends everywhere and is constantly evaluated and coordinated in real time, a collective human intelligence, augmented by new information technologies and the Internet. Since then, he has been working on a major undertaking: the creation of IEML (Information Economy Meta Language), a tool for the augmentation of collective intelligence by means of the algorithmic medium. IEML, which already has its own grammar, is a metalanguage that includes the semantic dimension, making it computable. This in turn allows a reflexive representation of collective intelligence processes.

In the book Semantic Sphere I. Computation, Cognition, and Information Economy, Pierre Lévy describes IEML as a new tool that works with the ocean of data of participatory digital memory, which is common to all humanity, and systematically turns it into knowledge. A system for encoding meaning that adds transparency, interoperability and computability to the operations that take place in digital memory.

By formalising meaning, this metalanguage adds a human dimension to the analysis and exploitation of the data deluge that is the backdrop of our lives in the digital society. And it also offers a new standard for the human sciences with the potential to accommodate maximum diversity and interoperability.

In “The Technologies of Intelligence” and “Collective Intelligence”, you argue that the Internet and related media are new intelligence technologies that augment the intellectual processes of human beings. And that they create a new space of collaboratively produced, dynamic, quantitative knowledge. What are the characteristics of this augmented collective intelligence?

The first thing to understand is that collective intelligence already exists. It is not something that has to be built. Collective intelligence exists at the level of animal societies: it exists in all animal societies, especially insect societies and mammal societies, and of course the human species is a marvellous example of collective intelligence. In addition to the means of communication used by animals, human beings also use language, technology, complex social institutions and so on, which, taken together, create culture. Bees have collective intelligence but without this cultural dimension. In addition, human beings have personal reflexive intelligence that augments the capacity of global collective intelligence. This is not true for animals but only for humans.

Now the point is to augment human collective intelligence. The main way to achieve this is by means of media and symbolic systems. Human collective intelligence is based on language and technology and we can act on these in order to augment it. The first leap forward in the augmentation of human collective intelligence was the invention of writing. Then we invented more complex, subtle and efficient media like paper, the alphabet and positional systems to represent numbers using ten numerals including zero. All of these things led to a considerable increase in collective intelligence. Then there was the invention of the printing press and electronic media. Now we are in a new stage of the augmentation of human collective intelligence: the digital or – as I call it – algorithmic stage. Our new technical structure has given us ubiquitous communication, interconnection of information, and – most importantly – automata that are able to transform symbols. With these three elements we have an extraordinary opportunity to augment human collective intelligence.

You have suggested that there are three stages in the progress of the algorithmic medium prior to the semantic sphere: the addressing of information in the memory of computers (operating systems), the addressing of computers on the Internet, and finally the Web – the addressing of all data within a global network, where all information can be considered to be part of an interconnected whole–. This externalisation of the collective human memory and intellectual processes has increased individual autonomy and the self-organisation of human communities. How has this led to a global, hypermediated public sphere and to the democratisation of knowledge?

This democratisation of knowledge is already happening. If you have ubiquitous communication, it means that you have access to any kind of information almost for free: the best example is Wikipedia. We can also speak about blogs, social media, and the growing open data movement. When you have access to all this information, when you can participate in social networks that support collaborative learning, and when you have algorithms at your fingertips that can help you to do a lot of things, there is a genuine augmentation of collective human intelligence, an augmentation that implies the democratisation of knowledge.

What role do cultural institutions play in this democratisation of knowledge?

Cultural Institutions are publishing data in an open way; they are participating in broad conversations on social media, taking advantage of the possibilities of crowdsourcing, and so on. They also have the opportunity to grow an open, bottom-up knowledge management strategy.

A Model of Collective Intelligence in the Service of Human Development (Pierre Lévy, en The Semantic Sphere, 2011) S = sign, B = being, T = thing.

We are now in the midst of what the media have branded the ‘big data’ phenomenon. Our species is producing and storing data in volumes that surpass our powers of perception and analysis. How is this phenomenon connected to the algorithmic medium?

First let’s say that what is happening now, the availability of big flows of data, is just an actualisation of the Internet’s potential. It was always there. It is just that we now have more data and more people are able to get this data and analyse it. There has been a huge increase in the amount of information generated in the period from the second half of the twentieth century to the beginning of the twenty-first century. At the beginning only a few people used the Internet and now almost the half of human population is connected.

At first the Internet was a way to send and receive messages. We were happy because we could send messages to the whole planet and receive messages from the entire planet. But the biggest potential of the algorithmic medium is not the transmission of information: it is the automatic transformation of data (through software).

We could say that the big data available on the Internet is currently analysed, transformed and exploited by big governments, big scientific laboratories and big corporations. That’s what we call big data today. In the future there will be a democratisation of the processing of big data. It will be a new revolution. If you think about the situation of computers in the early days, only big companies, big governments and big laboratories had access to computing power. But nowadays we have the revolution of social computing and decentralized communication by means of the Internet. I look forward to the same kind of revolution regarding the processing and analysis of big data.

Communications giants like Google and Facebook are promoting the use of artificial intelligence to exploit and analyse data. This means that logic and computing tend to prevail in the way we understand reality. IEML, however, incorporates the semantic dimension. How will this new model be able to describe they way we create and transform meaning, and make it computable?

Today we have something called the “semantic web”, but it is not semantic at all! It is based on logical links between data and on algebraic models of logic. There is no model of semantics there. So in fact there is currently no model that sets out to automate the creation of semantic links in a general and universal way. IEML will enable the simulation of ecosystems of ideas based on people’s activities, and it will reflect collective intelligence. This will completely change the meaning of “big data” because we will be able to transform this data into knowledge.

We have very powerful tools at our disposal, we have enormous, almost unlimited computing power, and we have a medium were the communication is ubiquitous. You can communicate everywhere, all the time, and all documents are interconnected. Now the question is: how will we use all these tools in a meaningful way to augment human collective intelligence?

This is why I have invented a language that automatically computes internal semantic relations. When you write a sentence in IEML it automatically creates the semantic network between the words in the sentence, and shows the semantic networks between the words in the dictionary. When you write a text in IEML, it creates the semantic relations between the different sentences that make up the text. Moreover, when you select a text, IEML automatically creates the semantic relations between this text and the other texts in a library. So you have a kind of automatic semantic hypertextualisation. The IEML code programs semantic networks and it can easily be manipulated by algorithms (it is a “regular language”). Plus, IEML self-translates automatically into natural languages, so that users will not be obliged to learn this code.

The most important thing is that if you categorize data in IEML it will automatically create a network of semantic relations between the data. You can have automatically-generated semantic relations inside any kind of data set. This is the point that connects IEML and Big Data.

So IEML provides a system of computable metadata that makes it possible to automate semantic relationships. Do you think it could become a new common language for human sciences and contribute to their renewal and future development?

Everyone will be able to categorise data however they want. Any discipline, any culture, any theory will be able to categorise data in its own way, to allow diversity, using a single metalanguage, to ensure interoperability. This will automatically generate ecosystems of ideas that will be navigable with all their semantic relations. You will be able to compare different ecosystems of ideas according to their data and the different ways of categorising them. You will be able to chose different perspectives and approaches. For example, the same people interpreting different sets of data, or different people interpreting the same set of data. IEML ensures the interoperability of all ecosystem of ideas. On one hand you have the greatest possibility of diversity, and on the other you have computability and semantic interoperability. I think that it will be a big improvement for the human sciences because today the human sciences can use statistics, but it is a purely quantitative method. They can also use automatic reasoning, but it is a purely logical method. But with IEML we can compute using semantic relations, and it is only through semantics (in conjunction with logic and statistics) that we can understand what is happening in the human realm. We will be able to analyse and manipulate meaning, and there lies the essence of the human sciences.

Let’s talk about the current stage of development of IEML: I know it’s early days, but can you outline some of the applications or tools that may be developed with this metalanguage?

Is still too early; perhaps the first application may be a kind of collective intelligence game in which people will work together to build the best ecosystem of ideas for their own goals.

I published The Semantic Sphere in 2011. And I finished the grammar that has all the mathematical and algorithmic dimensions six months ago. I am writing a second book entitled Algorithmic Intelligence, where I explain all these things about reflexivity and intelligence. The IEML dictionary will be published (online) in the coming months. It will be the first kernel, because the dictionary has to be augmented progressively, and not just by me. I hope other people will contribute.

This IEML interlinguistic dictionary ensures that semantic networks can be translated from one natural language to another. Could you explain how it works, and how it incorporates the complexity and pragmatics of natural languages?

The basis of IEML is a simple commutative algebra (a regular language) that makes it computable. A special coding of the algebra (called Script) allows for recursivity, self-referential processes and the programming of rhizomatic graphs. The algorithmic grammar transforms the code into fractally complex networks that represent the semantic structure of texts. The dictionary, made up of terms organized according to symmetric systems of relations (paradigms), gives content to the rhizomatic graphs and creates a kind of common coordinate system of ideas. Working together, the Script, the algorithmic grammar and the dictionary create a symmetric correspondence between individual algebraic operations and different semantic networks (expressed in natural languages). The semantic sphere brings together all possible texts in the language, translated into natural languages, including the semantic relations between all the texts. On the playing field of the semantic sphere, dialogue, intersubjectivity and pragmatic complexity arise, and open games allow free regulation of the categorisation and the evaluation of data. Ultimately, all kinds of ecosystems of ideas – representing collective cognitive processes – will be cultivated in an interoperable environment.

Schema from the START – IEML / English Dictionary by Prof. Pierre Lévy FRSC CRC University of Ottawa 25^th August 2010 (Copyright Pierre Lévy 2010 (license Apache 2.0)

Since IEML automatically creates very complex graphs of semantic relations, one of the development tasks that is still pending is to transform these complex graphs into visualisations that make them usable and navigable.

How do you envisage these big graphs? Can you give us an idea of what the visualisation could look like?

The idea is to project these very complex graphs onto a 3D interactive structure. These could be spheres, for example, so you will be able to go inside the sphere corresponding to one particular idea and you will have all the other ideas of its ecosystem around you, arranged according to the different semantic relations. You will be also able to manipulate the spheres from the outside and look at them as if they were on a geographical map. And you will be able to zoom in and zoom out of fractal levels of complexity. Ecosystems of ideas will be displayed as interactive holograms in virtual reality on the Web (through tablets) and as augmented reality experienced in the 3D physical world (through Google glasses, for example).

I’m also curious about your thoughts on the social alarm generated by the Internet’s enormous capacity to retrieve data, and the potential exploitation of this data. There are social concerns about possible abuses and privacy infringement. Some big companies are starting to consider drafting codes of ethics to regulate and prevent the abuse of data. Do you think a fixed set of rules can effectively regulate the changing environment of the algorithmic medium? How can IEML contribute to improving the transparency and regulation of this medium?

IEML does not only allow transparency, it allows symmetrical transparency. Everybody participating in the semantic sphere will be transparent to others, but all the others will also be transparent to him or her. The problem with hyper-surveillance is that transparency is currently not symmetrical. What I mean is that ordinary people are transparent to big governments and big companies, but these big companies and big governments are not transparent to ordinary people. There is no symmetry. Power differences between big governments and little governments or between big companies and individuals will probably continue to exist. But we can create a new public space where this asymmetry is suspended, and where powerful players are treated exactly like ordinary players.

And to finish up, last month the CCCB Lab held began a series of workshops related to the Internet Universe project, which explore the issue of education in the digital environment. As you have published numerous works on this subject, could you summarise a few key points in regard to educating ‘digital natives’ about responsibility and participation in the algorithmic medium?

People have to accept their personal and collective responsibility. Because every time we create a link, every time we “like” something, every time we create a hashtag, every time we buy a book on Amazon, and so on, we transform the relational structure of the common memory. So we have a great deal of responsibility for what happens online. Whatever is happening is the result of what all the people are doing together; the Internet is an expression of human collective intelligence.

Therefore, we also have to develop critical thinking. Everything that you find on the Internet is the expression of particular points of view, that are neither neutral nor objective, but an expression of active subjectivities. Where does the money come from? Where do the ideas come from? What is the author’s pragmatic context? And so on. The more we know the answers to these questions, the greater the transparency of the source… and the more it can be trusted. This notion of making the source of information transparent is very close to the scientific mindset. Because scientific knowledge has to be able to answer questions such as: Where did the data come from? Where does the theory come from? Where do the grants come from? Transparency is the new objectivity.

Blog of Collective Intelligence (since 2003)

Pierre Lévy is a philosopher and a pioneer in the study of the impact of the Internet on human knowledge and culture. In Collective Intelligence. Mankind’s Emerging World in Cyberspace, published in French in 1994 (English translation in 1999), he describes a kind of collective intelligence that extends everywhere and is constantly evaluated and coordinated in real time, a collective human intelligence, augmented by new information technologies and the Internet. Since then, he has been working on a major undertaking: the creation of IEML (Information Economy Meta Language), a tool for the augmentation of collective intelligence by means of the algorithmic medium. IEML, which already has its own grammar, is a metalanguage that includes the semantic dimension, making it computable. This in turn allows a reflexive representation of collective intelligence processes.

In the book Semantic Sphere I. Computation, Cognition, and Information Economy, Pierre Lévy describes IEML as…

View original post 2,744 more words

Tags Big data, Collective intelligence, Cyberdemocracy, epistemology, IEML, Semantic Sphere

Categories English, Interview, Semantic Sphere

Collective intelligence, big data and IEML

14 November 2014 //

Interview with Nelesi Rodriguez, published in spanish in the academic journal Comunicacion , Estudios venezolanos de comunicación • 2º trimestre 2014, n. 166

Collective intelligence in the digital age: A revolution just at its beginning

Pierre Lévy (P.L.) is a renowned theorist and media scholar. His ideas on collective intelligence have been essential for the comprehension of some phenomena of contemporary communication, and his research on Information Economy Meta Language (IEML) is today one of the biggest promises of data processing and of knowledge management. In this interview conducted by the team of the “Comunicación” (C.M.) magazine, he explained to us some of the basic points of his theory, and gave us an interesting reading on current topics related to communication and digital media. Nelesi Rodríguez, April 2014.

APPROACH TO THE SUBJECT MATTER

C.M: Collective intelligence can be defined as shared knowledge that exists everywhere, that is constantly measured, coordinated in real time, and that drives the effective mobilization of several skills. In this regard, it is understood that collective intelligence is not a quality exclusive to human beings. In what way is human collective intelligence different from other species’ collective intelligence?

P.L: You are totally right when you say that collective intelligence is not exclusive to human race. We know that the ants, the bees, and in general all social animals have got collective intelligence. They solve problems together, and –as social animals-, they are not able to survive alone and this is also the case with human species; we are not able to survive alone and we solve problems together.

But there is a big difference that is related to the use of language: Animals are able to communicate, but they do not have language, I mean, they cannot ask questions, they cannot tell stories, they cannot have dialogues, they cannot communicate about their emotions, their fears, and so on.

So there is the language, that is specific to the human kind, and with the language you have of course better communication and an enhanced collective intelligence; and you have also all that comes with this linguistic ability, that is the technology, the complexity of social institutions –like law, religion, ethics, economy… All these things that animals don`t have. This ability to play with symbolic systems, to play with tools and to build complex social institutions, creates a much more powerful collective intelligence for the humans.

Also, I would say that there are two important features that come from the human culture: The first is that human collective intelligence can improve during history, because each new generation can improve the symbolic systems, the technology, and the social institutions; so there is an evolution of human collective intelligence and, of course, we are talking about a cultural evolution, not a biological evolution. And then, finally, and maybe the most important feature of human collective intelligence, is that each unit of the human collectivity has an ability to reflect, to think by itself. We have individual consciousness, unfortunately for them, the ants don`t; so the fact that the humans have individual consciousness creates at the level of the social cognition something that it is very powerful. That is the main difference between human and animal collective intelligence.

C.M: Do the writing and digital technologies also contribute to this difference?

P.L: In the oral culture, there was certain kind of transmission of knowledge, but of course, when we invented the writing systems we were able to accumulate much more knowledge to transmit to the next generations. With the invention of the diverse writing systems, and then their improvements -like the invention of the alphabet, the invention of the paper, the printing press, and then the electronic media- human collective intelligence expanded. So, for example, the ability to build libraries, to build scientific coordination and collaboration, the communication supported by the telephone, the radio, the television makes human collective intelligence more powerful, and I think that it will be the main challenge our generation and the next will have to face: to take advantage of the digital tools; the computer, the internet, the smartphones, et caetera; to discover new ways to improve our cognitive abilities, our memory, our communication, our problem solving abilities, our abilities to coordinate and collaborate, and so on.

C.M: In an interview conducted by Howard Rheingold, you mentioned that every device and technology that have the purpose of enhancing language also enhance collective intelligence and, at the same time, have an impact on cognitive skills such as memory, collaboration and the ability to connect with one another. Taking this into account:

It is said that today, the enhancement of cognitive abilities manifests in different ways: from fandoms and wikis, to crowdsourcing projects that are created with the intent of finding effective treatments for serious illnesses. Do you consider that every one of these manifestations contribute in the same way towards the expansion of our collective intelligence?

P.L: Maybe the most important sector where we should put particular effort is scientific research and learning, because we are talking about knowledge, so the most important part is the creation of knowledge, the dissemination of knowledge or, generally, the collective and individual learning.

Today there is a transformation of communication in the scientific community; more and more journals are open and online, people are doing virtual teams, they communicate by internet, people are using big amounts of digital data, and they are processing this data with computer power; so we are already witnessing this augmentation, but we are just at the beginning of this new approach.

In the case of learning I think it is very important that we recognize the emergence of new ways of learning online collaboratively, where people who want to learn are helping each other, are communicating, are accumulating common memories from where they can take what is interesting for them. This collective learning is not limited to schools; it happens in all kinds of social environments. We could call this “knowledge management”, and there is an individual or personal aspect of this knowledge management that some people call “personal knowledge management”: choosing the right sources on the internet, featuring the sources, categorizing information, doing synthesis, sharing these synthesis on social media, looking for a feedback, initiating a conversation, and so on. We have to realize that learning is and always has been an individual process at is core. Someone has to learn; you cannot learn for someone else. Help other people to learn, this is teaching; but the learner is doing the real work. Then, if the learners are helping each other, you have a process of collective learning. Of course, it works better if these people are interested in the same topics or if they are engaged in the same activities.

Collective learning augmentation is something that is very general and that has increased with the online communication. It also happens at the political level; there is an augmented deliberation, because people can discuss easily on the internet and also there is an enhanced coordination (for public demonstrations and similar things).

M: With the passage of time, collective intelligence becomes less a human quality and more one akin to machines; this affair worries more than one individual. What is your stance in the wake of this reality?

P.L: There is a process of artificialization of cognition in general that is very old; it began with the writing, with books; it is already a kind of externalization or objectification of memory. I mean, a library, for instance, is something that is completely material, completely technical, and without libraries we would be much less intelligent.

We cannot be against libraries because instead of being pure brain they are just paper, and ink, and buildings, and index cards. Similarly, it makes no sense that we “revolt” against computer and against the internet. It is the same kind of reasoning than with the libraries, it is just another technology, more powerful, but it is the same idea. It is an augmentation of our cognitive ability -individual and collective-, so it is absurd to be afraid of it.

But we have to distinguish very clearly the material support and the texts. The texts come from our mind, but the text that is in my mind can be projected on paper as well as in a computer network. What it is really important here is the text.

IEML AND THE FUTURE OF COLLECTIVE INTELLIGENCE

C.M: You’ve mentioned before that what we define today as the “semantic web”, more than being based on semantic principles, is based on logical principles. According to your ideas, this represents a roadblock in making the most out of the possibilities offered by digital media. As an alternative, you proposed the IEML (Information Economy Meta Language).

Could you elaborate on the basic differences between the semantic web and the IEML?

P.L: The so called “semantic web” –in fact, people call it now “web of data”, and it is a better term for it– is based on very well known principles of artificial intelligence that were developed in the 70s, the 80s, and that were adapted to the web.

Basically, you have a well-organized database, and you have rules to compute the relations between different parts of the database, and these rules are mainly logical rules. IEML works in a completely different manner: you have as many data as you want, and you categorize this data in IEML.

IEML is a language, not a computer language, but an artificial human language. So you can say “the sea”, “this person”, or anything… There are words in IEML, there are no words in the semantic web formats, it doesn’t work like this.

In this artificial language that is IEML, each word is in semantic relations with the other words in the dictionary. So, all the words are intertwined by semantic relations, and are perfectly defined. When you use these words, create sentences, or create texts; you create new relationships between the words, grammatical relationships.

And from texts written in IEML you have algorithms that make automatic relations inside those sentences, from one sentence to the other, and so on. So you have a whole semantic network inside the text that is automatically computed, and even more, you can automatically compute the semantic relations between any text and any library of texts.

An IEML text automatically creates its own semantic relations with all the other texts, and these texts in IEML can automatically translate themselves into natural languages; Spanish, English, Portuguese or Chinese… So, when you use IEML to categorize data, you create automatically semantic links between the data; with all the openness, the subtleness, and the ability to say exactly what you want that language can offer you.

You can categorize any type of content; images, music, software, articles, websites, books, any kind of information. You can categorize these in IEML and at the same time you create links within the data because of the links that are internal to the language.

M: Can we consider metatags, hashtags, and Twitter lists as a precedent to the IEML?

P.L: Yes, exactly. I have been inspired by the fact that people are already categorizing data. They started doing this with social bookmarking sites, such as del.icio.us. The act of curation today goes with the act of categorization, of tagging. We do this very often on Twitter, and now we can do it on Facebook, on Google Plus, on Youtube, on Flickr, and so on. The thing is that these tags don`t have the ability to interconnect with other tags and to create a big and consistent semantic network. In addition, these tags are in different natural languages.

From the point of view of the user, it will be the same action, but tagging in IEML will just be more powerful.

M: What will the IEML’s initial array of applications be?

P.L: I hope the main applications will be in the creation of collective intelligence games; games of categorization and evaluation of data; a sort of collective curation that will help people to create a very useful memory for their collaborative learning. That, for me, would be the most interesting application, and of course, the creation of a inter-linguistic or trans-linguistic environment.

BIG DATA AND COLLECTIVE INTELLIGENCE

C.M: You’ve referred to big data as one of the phenomena that could take collective intelligence to a whole new level. You’ve mentioned as well that in fact this type of information can only be processed by powerful institutions (governments, corporations, etc.), and that only when the capacity to read big data is democratized, will there truly be a revolution.

Would you say that the IEML will have a key role in this process of democratization? If so, why?

P.L: I think that currently there are two important aspects of big data analytics: First, we have more and more data every day. We have to realize this. And, second, the main producer of this immense flow of data is ourselves. We, the users of the Internet are producing data. So currently lots of people are trying to make sense of this data and here you have two “avenues”:

First is the avenue that is more scientific. In natural sciences you have a lot of data –genetic data, data coming from physics or astronomy-, and also something that is relatively new; the data coming from human sciences. This is called “digital humanities”, and it takes data from spaces like social media and tries to make sense of it from a sociological point of view. Or you take data from libraries and you try to make sense of it from a literary or historical point of view. This is one application.

The second application is in business, in administration –private or public. You have many companies that are trying to sell services to companies and to governments.

I would say that there are two big problems with this landscape:

The first is related to the methodology; today we use mainly statistical methods and logical methods. It is very difficult to have a semantic analysis of the data, because we do not have a semantic code, and let’s remember that every thing we analyze is coded before we analyze it. So you can code quantitatively and you have statistical analysis, code logically and you have logical analysis. So you need a semantic code to have a semantic analysis. We do not have it yet, but I think that IEML will be that code.

The second problem is the fact that this analysis of data is currently in the hands of very powerful or rich players –big governments, big companies. It is expensive and it is not easy to do –you need to learn how to code, you need to learn how to read statistics…

I think that with IEML –because people will be able to code semantically the data– people will also be able to do semantic analysis with the help of the right user-interfaces. They will be able to manipulate this semantic code in natural language, it will be open to everybody.

This famous “revolution of big data” is just at its beginning. In the coming decades there will be much more data and many more powerful tools to analyze it. And it will be democratized; the tools will be open and free.

A BRIEF READING OF THE CURRENT SITUATION IN VENEZUELA

C.M: In the interview conducted by Howard Rheingold, you defined collective intelligence as a synergy between personal and collective knowledge; as an example, you mentioned the curation process that we, as users of social media, develop and that in most cases serves as resource material for others to use. Regarding this particular issue, I’d like to analyze with you this particular situation using collective intelligence:

During the last few months, Venezuela has suffered an important information blackout, product of the government’s monopolized grasp of the majority of the media outlets, the censorship efforts made by the State’s organisms, and the self-imposed censorship of the last independent media outlets of the country. As a response to this blockade, Venezuelans have taken upon themselves to stay informed by invading the digital space. In a relatively short period of time, various non-standard communication networks have been created, verified source lists have been consolidated, applications have been developed, and a sort of ethics code has been established in order to minimize the risk of spreading false information.

Based on your theory on collective intelligence, what reading could you give of this phenomenon?

P.L: You have already given a response to this; I have nothing else to say. Of course I am against any kind of censorship. We have already seen that many authoritarian regimes do not like the internet, because it represents an augmentation of freedom of expression. Not only in Venezuela but in fact in different countries, governments have tried to limit free expression and the people that are politically active and that are not pro-government have tried to organize themselves through the internet. I think that the new environment created by social media –Twitter, Facebook, Youtube, the blogs, and all the apps that help people find the information they need– helps to the coordination and the discussion inside all these opposition movements, and this is the current political aspect of collective intelligence.

Tags Big data, Collective intelligence, Cyberdemocracy, IEML

Categories English, Interview, Semantic Sphere