Juan Chamero

Metodología Darwin

  • Aumentar el tamaño de la letra
  • Tamaño de letra predeterminado
  • Reducir el tamaño de la letra


Home Darwin - Philosophical approach I

Darwin - Philosophical approach I

Correo electrónico Imprimir PDF

Semantic Search

Philosophical approach - from ideas to concepts - I

The  world of Ideas, from Plato to Spinoza and today

Dr.  Juan Chamero, Esta dirección electrónica esta protegida contra spambots. Es necesario activar Javascript para visualizarla , Darwin Architect, Buenos Aires 10 February 2009



 Something about the deep mechanic of knowledge searching

The paradox of the man as hunter/fisher of Information and Knowledge


    When people have information needs start a not well known yet mental process that probably inspects first their own cognitive assets and as a probable outcome “ideas” prompt within their minds, perhaps of similar nature to the ones prompted when being hungry or thirsty, If challenged to explicit those “information needs activated ideas” they will try to do it via oral and/or written symbols of a given language. If they were challenged to mark from 0 to 10 how much they “know” about those ideas they will do it as well. Finally no matter how much they confess to know as they are human and as such curious creatures most of them will try to know more and better about their needs in order to be fully “satisfied”. However the knowledge as a nutrient behaves strongly cumulative but paradoxically within the same physical storing volume and a significant part of it lasting for long. The scheme would be something like this:.


Something to look for => “idea in mind” prompts => representation (exteriorization) of it in a given code


    And as this nutrient has to be fished, retrieved from a sort of World Oracle, fished out of a Web Ocean, the fisher needs of a strategy, adequate tools and weapons and an adequate bait for example a sequence of “words” and/or logical operators belonging to a given Jargon:


(“a b”, c AND d, NOT “e f”),


meaning to look for documents that have the precise chain aSPACEb in their text corpuses where for example a and b are words of a Computing Jargon in American English and SPACE stands for a “written” space as in:


“parallel computing”


a chain of two words and a single space between them, and in more detail a sequence of 18 characters as follows:


“P, A, R, A, L, L, E, L, SP, C, O, M, P, U, T, I, N, G)


where characters including the SP (Space), could be represented in a particular 8-bits code as a 144 bits chain.  Immaterial commas are included here to clarify the meaning visually. Some search engines like Google use these types of words chains interpretation when words are within “quotation marks” as we put above “a b” ó “parallel computing”.


    Usually people make use of a sequence of words evenly separated by spaces, generally one, as for example: parallel computing without being bracketed by “quotation marks” or some other equivalent character In this case most search engines look for documents that somewhere within their text corpuses have the words parallel AND/OR computing. Some search engines control the words distance between them in order to consider the “match” as valid and some others do not. Some consider text corpuses as single vectors not interrupted by punctuation marks meanwhile some others do not.



Actually in the Web Ocean only words are indexed, no concepts

   The whole expression above (“a b”, c AND d, NOT “e f”) could then be considered a “query” to a “Knowledge Database”, usually structured and “seen” by conventional Search Engines as a huge virtual two-dimensional array of “documents” – “words”, namely 20,000 million documents versus a few million of distinguishable” words” per language.

    Among distinguishable words we may find “Common Words and Expressions” (1), and a myriad of single word acronyms, neologisms, bad written but acceptable as valid common words and expressions, toponymics , names of persons either physical or juridical, tools, mechanisms, methodologies, procedures, etc. However as we are going to see the most numerous set of concepts is missing.

    Effectively most actual search engines do not index concepts such as “parallel processing”, “”quality of education”, “big-bang theory”, “attention deficit hyperactivity disorder”, and even well known names as “Albert Einstein” and “Isaac Newton”. What are usually indexed are single words “equivalences” as QOE (with at least two main acceptations: Quality of Education and Quality of Experience), ADHD by the attention disorder, possible names as Einstein and Newton but encompassing all possible single and multiple words homonyms and acceptations. This missing is crucial because it impedes the semantic search. But even though actual search engines were able to detect and index documents by all meaningful single and multiple words keywords the semantic ambiguity still would persist!. Why?: because the same keyword, no matter if single or multiple word may have multiple meanings as a function of subjects deal with. The single word keyword “pond” for instance may be meaningfully used in more than a hundred of different knowledge matters.

    In brief conventional search engines work with huge but very limited virtual arrays, from the point of view of semantic, because they index documents only by single words and some concatenation of words seen as single words like for instance “big-bang”-


How concepts are unveiled

    Darwin, a distributed intelligent agents methodology of Knowledge Discovery intents to go farther, to unveil from existent Web documents all meaningful word chains and attach to them their semantic path, the meaning domain where they belong transforming them, by “de facto”, in concepts instead of remaining as potential and ambiguous keywords. This rather heavy Knowledge Discovery task implies a previous documents semantic classification in order to find for each document its main subject!. These subjects are somehow hierarchically ordered along very specific “semantic paths” as well. 

    Common Words and Expressions are no more than a few hundred thousand but combining them wisely experts of the different knowledge disciplines were creating along the time millions of concepts per language. These concepts are distinguishable within documents written, by and for, intelligent beings. And one of Darwin conjectures is that other intelligent beings and specially settled and trained agents may unveil them.  

    Human beings as Knowledge Databases users behave as almost inscrutable “black boxes”. Seen from databases side users behave like black boxes that issue queries supposedly oriented to satisfy their “information needs”. Seen them individually it is almost impossible to guess or to infer with a reasonable probability of success what they are looking for. The above described ideas in mind are extremely variable and fuzzy along the time even for the same person and for the same stimulus and also extremely variable and fuzzy their associated queries.         


What´s in an idea

    To understand better the subtle differences among images, ideas, words, ideograms, keywords, concepts, meanings, and subjects we think it is necessary to deep a little in epistemology (2) a branch of philosophy that study the nature of knowledge trying to answer the following questions: what´s knowledge?; how knowledge is acquired?; what do people know?; how do we know what we know?. It seems that what we know about “information” is still limited to Claude Shannon Theory of Information. From those times (1948), we humans are almost in the same place except that our capacity to process information in the computing domain of [memory – speed of process] have stepped up billion times. However we still ignore what knowledge and intelligence are in despite of spectacular advances in AI, Artificial Intelligence. We are absolutely convinced that humans have the most advanced intelligence in the universe at our reach and at the same time we humans have created forms of artificial intelligence that challenges ours and in some instances beat us. We are pragmatic accepting that we are machines of thinking and that perhaps thinking be a new advanced sense and that perhaps the intelligence be a fluid more subtle than information or perhaps a quality to see the immanent and most times hidden order of the universe. Could the scientific challenge in this area be imagined as the sequence material mass => energy => information => intelligence?.

Something about intelligence that lay “behind” documents

    We and many colleagues use and argue about the concepts of “unveiling intelligence”, “knowledge discovery”, “retrieving intelligence” and many audacious expressions of this sort. What we mean by that?. Let´s go back to see critically the best actual search engines performances. They intent to offer us universal indexes of almost everything humans have documented, word by word, like having a celestial map of the whole universe, particle by particle. Is that enough?. Unfortunately no!. Intuitively we dare to say that something like “intelligence” is missing, don’t we?. Of course we are sure that every document has its proper intelligence, the one that was wisely architected by its authors. If the corpus text of each document were written “mathematically” we may argue that perhaps this “hidden” intelligence is closely related to a WFF, Well Formed Formulae that concatenates words and symbols. But unfortunately these corpuses are literary, written under complex but rather fuzzy rules in order to be these “messages” understood by other humans within a wide range of comprehension.

    Ten years ago we devised a set of Darwin Conjectures about how humans document, primitive rules of thumb that if found statistically true will enable us to envisage the cognitive core of the document, something like a meaningful abstract of it. This task resembles the old and patient task of librarians doing the card index for each book. Darwin Methodology does the same but performed almost autonomously by agents. Darwin primitive “intelligence” retrieved has the following form:


The knowledge domain (discipline) to which the document belongs;

The main subject dealt with;

The semantic path, from the discipline root till the main subject dealt with:

The keywords profile, that is the literary concordance of keywords retrieved versus the main subject keywords set;

Abstract of the document, something like its document fingerprint, a statistics weighted vision of the remaining corpus obtained by eliminating non-keywords;


This primitive intelligence could be progressively enhanced via a fast and convergent learning process. Let´s see now what could we learn from philosophers and great thinkers.      



The Idea  (3)


As per Plato

    For Plato “real” things are hosted in the realms of “forms” and “ideas”. No matter here to discuss if they are either finite or infinite in numbers or if they are either forever existent or cumulative along time. What´s important for our reflections is the coherence of Plato´s ontology suggesting that images prompt in our minds pointing to “pre-existent” ideas most of them “old ones” and by exception, from time to time, “new ones”. These ideas rest within our space-time reality but are at large preexistent in an upper level conscious realm. Perhaps in Plato´ terminology idea was similar to what we name as “concept”.


As per Descartes

    For Descartes ideas were images but not necessarily existent “in mind”. He rests more on the ground that Plato saying that not all our thoughts are images of “things” and only those images deserve the name of ideas sustaining that they are “innate”, in practice close to the Plato vision of a pre-existent realm. By the way in Zen practice students work trying to think both ways, with and without images!.


As per John Locke

    For John Locke idea is whatever behaves as the outcome of thinking, something like saying: If I´m thinking so I´m developing ideas. He dared to qualify good thinking as “good sense”, experimenting outcomes “down to earth”.


As per Hume

    For Hume the idea is the outcome of a process of thinking about perceptions. Pragmatic but not bad for our practical purposes!. We know what is outside us, via life experiences and at large via perceptions, ours or derived from others.


As per Kant

    Please let stop a little here because for Kant idea opposes to concept meanwhile for our ontology concepts are derived from ideas. Let’s try first to understand what Kant meant by that opposition. A man may have an idea about something, putting an example within our ontology “parallel processing” within computing and next within programming. John has an idea about it and deepening enough he dared to define it as a “concept”. For him his definition is a concept because following Kant it refers to a very specific definition, that in his criteria it should be accepted universally. But this is only John´s opinion. And we may imagine that for the “same” phenomenon there probably appear hundreds of different opinions. Kant talked about “regulator ideas” or ideals that people (we, not Kahn, dare to say statistically) tend to follow. Then without contradicting Kant we may argue that people thru thinking, thru generating ideas, and statistically moving around “hidden” regulator ideas may eventually create “concepts” that have a restricted and limited life but sound enough to create knowledge, let´s say the knowledge of a given civilization at a given time. These concepts could be considered as “modals”, “dominants”, acceptable as the best definitions for a knowledge realm, but impermanent, however enabling us to “map” the knowledge “as_it_is” at a given moment. This is very important because what Darwin agents retrieve for us when mapping the Web as_it_is are in fact modal concepts!.


As per Steiner

    Rudolf Steiner in the line of Goethe’s thinking launches a very interesting idea saying that perhaps thinking is the outcome of a new organ like the eye, to “see” reality with a new perception. As the eye perceives certain light wavelengths and ear sounds of certain wavelengths also the “thinking organ” perceives ideas. In our Darwin ontology we assimilate concepts as wavelets, behaving intuitively like semantic particles that are born of a sudden within a given discipline, have a certain “lifetime” and finally obsolesce and die.  


As per Spinoza

    We shall not enter here in the Spinoza’ idea of ideas because its complexity. Let´s keep as valuable for our analysis his idea of types of ideas: true, fictitious, dubious, and false. The true idea, is unattainable, beyond our reach, however all ideas derive from it. Fictitious ideas are born from a fiction, are ideas that we “make believe” their existence or inexistence. False ideas are derived from fictitious and are due to errors in our reasoning. Finally dubious are ideas that we “see” as fuzzy, far from clear and distinctly. For our purposes we work with fictitious and dubious ideas. We believe that in the Web space authoritative documents host fictitious ideas meanwhile queries from people interacting versus search engines databases host dubious ideas.


Something of Imagery: from ideas to meanings



                        Kant Ideas

The figure above depicts a free interpretation of Kant idea of “ideas” as “seen” from our Darwin K side Ontology (see below K Realm). Perceptions trigger in our mind ideas via a process not known yet. PK, Personal Knowledge would be the Personal asset of “meanings”. Kant talks about “regulatory ideas” that exist somewhere and that for him are innate (4). Then some unknown yet intricate process involving the interaction of perceptions, PK, plus Regulatory Ideas prompts ideas in our mind. From time to time new meanings may appear that are somehow hierarchically located within PK being among new meanings those that update and/or transforms PK, some ones in large extent. For example meanings that involves new visions.  


Última actualización el Martes 13 de Julio de 2010 02:43  

Imágenes Polls

Poll Darwin

Darwin puede ser usado para


Poll Semántico I

La Semántica es la

Poll Semántico II

La Web Semántica es una

Poll Semántico III

El Conocimiento Humano es:

Poll Semántico IV

El Tesauro Web es

Usuarios Online

Tenemos 49 invitados conectado


Home Darwin - Philosophical approach I