Několik myšlenek o vyhledávání informací a knihovnické profesi

Několik myšlenek o vyhledávání informací a knihovnické profesi

Some thoughts about information retrieval and our profession

sims.gif (79266 bytes)

Sdílení informací

Množství informací v tisku, masmédijích a na webu stále roste a přibývají nové metody jejich zveřejňování. Pro uživatele tím vzniká problém, jak udržet se vzrůstem počtu informací krok.

Vedle informací, které jsou k disposici široké veřejnosti, kolují v malých skupinách uživatelů soukromé nebo polosoukromé informace. Například původním účelem vědeckých pojednání bylo vzájemně se informovat o činnosti a jejím vztahu k práci druhých. Až teprve po zveřejnění v malé skupině se obsah pojednání dostával do vědeckých souhrnů a ještě později do učebnic pro širší veřejnost.

Specialisace vědění vedla k vytvoření „neviditelných kolegijí“ – malých vědeckých skupin, jejichž členové si posílali dopisy, kopie vydaných článků, plány činnosti a jiné informační oběžníky. Publikace pro veřejnost se pokládala za méně důležitou.

Royal Society of London, založená v roce 1660, byla prototypem těchto „neviditelných kolegijí“, kde malá vybraná skupina sdílela své zájmy. Členství v takové skupině se rovnalo schválení účastníka jako rovnocenného partnera, kdežto vyloučení přineslo nejenom pokles prestiže, ale také blokování informací o činnosti druhých.

Zapojení webu do sdílení informací přineslo změnu. Web je globální, je pro všechny. Web je vlastně to největší „neviditelné kolegium“. Zároveň však i menší skupiny si mohou pomocí různých administrativních a technických metod obsadit svůj vlastní prostor na webu a komunikovat..

Klasifikace vědění

Rané pokusy o klasifikaci vědění (Aristoteles) třídily informace podle disciplín. Mělo to výhody pro specializaci vědění a provozování těchto disciplin. Výuka na universitách byla klasifikována stejným způsobem.

Rozdělení vědení do disciplín vyvolalo reakci některých filosofů, kteří chtěli vědení sjednotit (francouzští encyklopedisté v 18. století, The Encyclopaedia of Unified Science na začátku 20. století). V druhé polovině 20. století byla zavedena výuka nových předmětů, které propojovaly discipliny, jako třeba „Area studies“.

Účelem raních disciplin bylo předávání zručností a myšlenkových postupů se zaostřením na jejich provozovatele. Rostlo množství záznamů o obsahu disciplin, což přineslo vznik bibliografických metod, kterými by záznamy mohly být organisovány (Besterman, Otlet). Jednotkou informace pro bibliografy byl dokument, který měl původce (autorská bibliografie), prostředí provenience (národní bibliografie) a obsah (předmětová bibliografie).

Jak rostly sbírky dokumentů, bylo zapotřebí je seřazovat. Některé klasifikace pokračovaly v tradici třídění podle disciplín (Dewey). Jiné se zaměřily na identifikaci malých jednotek informací, nazvaných „facets“ (J. D. Brown. Ranganathan, CRG). Hierarchické systémy byly postupně nahražovány kontrolovaným slovníkem termínů, z nichž lze vybírat popis dokumentu.

Současné informační systémy na webu kladou důraz na uživatele. Některé prvky klasifikace podle tradičních předmětů a disciplin se sice zachovaly, ale důraz se klade na malé zlomky informací, nazvané „infobits“, místo na dokumenty.

Vyhledávání informací na webu

Metody vyhledávání informací na webu se soustřeďují na uživatele spíše než na údaje.

Údajům, které pomahají najít informace, jako třeba katalogy nebo rejstříky, se říká „metadata“. S rychlým růstem informací na webu se hledají nové formy metadat.

Na konci 90. let začal pokus zapojit uživatele do vytváření metadat. Vznikly internetové stránky, zvané „blogs“ (weblogs), na kterých byly anotované odkazy na jiné stránky. Některé blogy jsou vlastně osobní deníky jejich tvůrců.

Knihovníci se snaží ohodnotit tento systém odkazů. Na minulé konferenci o Online Information v Londýně řekl Phil Bradley, že by knihovníci měli brát blogy vážně.

Jedna sekce této konference o Online Information pojednávala o folksonomii – klasifikaci, založené na tagách zvolených uživateli. Název folksonomie vymyslel Thomas Vander Val pro metadata vytvořená spoluprací uživatelů bez hierarchie a bez předem určených hesel.

Tagy

„Tag“ se těžko překládá do češtiny. Je to slovo, kterým uživatel poznačí webovou informaci. Tagy mu pomohou uspořádat informace tak, aby se k nim znovu mohl vrátit.

Během diskuse na konferenci o Online Information Thomas Vander Val poukázal na to, že tagy nepomáhají každému, pro některé uživatele mohou být i zavádějící. Ve folksonomii je tag složen ze tří prvků: slovo vybrané k poznačení, předmět, ke kterému je přidáno, a osoba, která to udělala. Z toho vyplývá, že osobní náhled je částí tagového systému.

Uživatelům se líbí, že k vybrání tagů mohou použít intuici, i když intuice nese s sebou určitá omezení.

V systému tagů schází kontrola synonym, homonym, hesel sestávajících z více než jednoho slova, hesel, v kterých je mezera, a hesel, která jsou v plurálu. Tato omezení způsobují nejasnosti a nedostatek přesnosti. Některým uživatelům také schází hierarchické uspořádání hesel, na které byli zvyklí v tradičním systému.

Počítač automaticky sčítá tagy jednotlivých informací a ukáže, který tag je nejvíc používán. To zdánlivě dodává některým tagům víc váhy. Ellyssa Kroski kritizuje závislost na tagách: „Moudrost davu, mysl úlu a kolektivní inteligence se snaží dělat to, co až dosud dělali odborníci…“

Tagy nám pomohou nahlédnout do mysli uživatele a pochopit jeho potřeby. Můžeme sledovat způsob, kterým si uživatel volí jednotlivé tagy a kategorie, které vzniknou sčítánim s tagy ostatních uživatelů.

Tagy mohou přispět k budování komunity. Uživatel může najít lidi, kteří použili stejný tag a mají s ním něco společného. Folksonomie fungují nejlépe v malých skupinách lidí, kteří mají stejné zájmy nebo spolupracují. Metadata pomáhají budovat komunity.

Nevýhoda folksonomie je nedostatek ochrany před neetickými uživateli, kteří by chtěli systém pokazit. Jejich činnosti se říká „gaming“.

Ušetří tagy čas a peníze?

Regulovaný slovník hesel obsahuje hesla zvolená a schválená odborníky. Někdy se tato hesla liší od slov, která by zvolil uživatel. Při vyhledávání informací existují paralelně tří slovníky: jeden patří autorovi informace, druhý správci systému a třetí uživateli.

Uživatel přemýšlí v „přirozeném“ jazyku, který někdy neodpovídá jazyku použitému v systému. Dotaz je nutno přeložit do „dotazového“ jazyka. Během překladu může dojít k nedorozumění.

Výhodou folksonomie je, že se zakládá na slovníku uživatelů a tím umožňuje, aby uživatelé s tagy pracovali bez předchozího výcviku. Někdy se tato vlastnost mylně označuje za „demokratickou“. Kritikové ji nazývají „ad hoc“ nebo „nemožně idiosynkratickou“.

V regulovaném slovníku trvá dlouho, než se do něj přidá nové heslo. Ale nové tagy se vytvoří okamžitě. Pružnost je velkou výhodou tagů, zvláště na webu.

Vyhledávání určitého údaje v systému založeném na regulovaném slovníku, je přímé, rychlé a přesné. Při zběžném prohlížení (browsing) mají tagy tu výhodu, že odhalují prostředí plné překvapení a objevů.

Folksonomie a tagy se zdají být lacinou alternativou tradičních systémů vyhledávání informací. Zapomíná se ovšem, že zjednodušení záznamu při skladování informace může později vyžadovat více úsilí při vyhledávání. Z dlouhodobého hlediska, kdyby folksonomie nahradila všechny tradiční systémy, uživatelům bychom pak nedávali dobrou službu.

Co to vše znamená pro naší profesi?

Ellyssa Kroski nepochybuje o tom, že hierarchicky řazené klasifikace jsou vhodné pro knihovní sbírky, ale nejsou vhodné pro web. Nové metody nám umožňují porozumět uživatelům a zlepšit existující klasifikace.

Adam Mahes říká, že folksonomie není kontrolovatelná, že je chaotická a trpí problémy nejasnosti. Tagy jsou citlivé k potřebám uživatelů a jejich slovníkům. Zapojení uživatelů do vytváření metadat je důležitý vývoj, který by se měl dále prozkoumat.

Michal Čudrnak soudí, že folksonomie se může v budoucnosti stát užitečným nástrojem vyhledávání, ale je také možné, že je to jen přechodná fáze v hledání nových cest.

Můj názor je, že současný trend je nebezpečný v tom, že se neopírá o sdílený náhled, tvořící kulturní dědictví, nýbrž o proud částic informací, jimž uživatelé dali subjektivní tagy. System folksonomie neni tak výkonný, jak se o něm tvrdí. Dává stranou dlouhou a spolehlivou tradici vyhledávání informací, kterou naše profese vždy střežila.

Some thoughts about information retrieval and our profession

Sharing information

The volume of information available in print, the media and the Web is increasing. Conventional publishing is being supplemented by new methods. Users have problems keeping up with the growth of information.

Apart from information which is available to the public, some private or semi-private information is circulated in small groups of users. For instance, the original purpose of scientific papers was to inform scientists of current work and by means of citations show how it related to their work. Only later did some of these papers find their way into review articles and eventually into textbooks for the general public.

Increasing specialization of knowledge led to the creation of “invisible colleges” – small groups of specialists working in the same field who communicated by letters, exchange of offprints, plans of work or other information circulars. Scientific publications for the general public was considered secondary in importance.

The Royal Society, founded in London in 1660, was a prototype of such “invisible colleges” in which information was shared by a small select group of people who shared their interests. Membership of the group was equivalent to approbation from the peers while exclusion lead to a denial of status and consequently of access to information.

Use of the Web has brought a new dimension to the sharing of information. It is global, open to everybody. One could say it is the largest of the “invisible colleges”. At the same time, smaller groups can share their information by using various administrative and technical methods to preserve their own private space on the Web.

Classification of knowledge

Early attempts at classifying knowledge (Aristotle) grouped information by disciplines. The method was convenient for specialist learning and practising the discipline. Tuition at universities also was classified into the same disciplines.

The breakdown of knowledge into disciplines brought a reaction on the part of some philosophers who wanted to unify knowledge (The French Encyclopaedists in the 18th century, The Encyclopaedia of Unified Science at the beginning of the 20th century). In the second half of the 20th century the concept of cross-disciplinary subjects was introduced into university education.

The purpose of the early disciplines was a transmission of skills and mental processes, focusing on the practitioners. The growth of information in the form of recorded knowledge created a need for bibliographical methods which would organise the information contained in them (Besterman, Otlet). For the bibliographer, the unit of information was the document. Its main characteristics were the originator (author bibliography), the environment of origin (national bibliography) and the subject matter (subject bibliography). The concept of the user was secondary and studies of users formed no part of bibliography.

The building of collections containing recorded knowledge led to the development of library classification. Some classification systems continued with the hierarchical philosophical structures related to disciplines (Dewey). Others came with the idea of identifying units of information , called facets (J.D.Brown, Ranganathan, CRG). In modern library classification the trend has been towards replacing a hierarchical system with a controlled vocabulary from which the best-fitting description of the document could be selected.

Today’s information systems relating to the Web emphasize the users’ needs. Although they retain some classification of information under traditional subjects and disciplines, the main emphasis is on small units of disjointed information – infobits – rather than on documents.

Finding information on the Web

Methods used for access to information are increasingly user-centered rather than data centered.

Metadata are data about data which help users find information, for instance catalogues and indexes. In the case of the rapidly growing volume of information on the World Wide Web new forms of metadata are being sought to replace the traditional ones.

In the late 1990s experiments were started with metadata created by the users themselves in the form of blogs (weblogs)- websites with annotated links to other sites. Some blogs are in the form of personal diaries.

Librarians are trying to assess the value of this new trend. At the last Online Information conference Phil Bradley said that librarians should take blogs seriously.

The Online Information conference also had a section dealing with Folksonomy – a non-hierarchical classification created by tagging. The name was invented by Thomas Vander Val for cooperative metadata production by users, as distinct from classification offering a predetermined set of terms. `

Tagging

Tagging means assigning freely chosen keywords to pieces of information or data. The tags help users to organize their data so that they could find them at a later date.

In his contribution to the Online Information conference Thomas Vander Val points out that not all tagging is useful to all people; in fact it can be misleading. In folksonomy there are three elements; the tag, the object being tagged , and the person doing the tagging. This suggests that a personal viewpoint is an essential part of tagging.

Tagging is an intuitive process which makes it attractive to the user, but is inherently limited.

Tagging lacks control of synonyms, homonyms, terms consisting of multiple words, terms with spaces, plural words. This leads to ambiguity and lack of precision. The absence of hierarchical relationships between terms is to some a disadvantage.

The frequency of tags can be automatically counted and the most used tags for each information item displayed. This seemingly add weight to some of the tags. Ellyssa Kroski criticizes any reliance on tags for information storage and retrieval: “The wisdom of crowds, the hive mind, and the collective intelligence are doing what heretofore only expert catalogers, information architects and website authors have done.”

Tagging helps us to understand the users’ point of view and their information needs. We can observe how users tag individual items, as well as categories being suggested by other users applying the same tags.

Tags can assist in community building. The user can find others who have tagged the same information and with whom they probably have something in common. The best use for folksonomies is in small groups of people who share the same interests or work together. Metadata can help in the formation of communities.

The weakness of folksonomy is its openness to unethical users able to try to interfere with the system. Such activity is referred to as “gaming”.

Does tagging save time and cost?

A controlled vocabulary contains terms selected and agreed by experts. The selected terms do not necessarily match those of the user’s own language. In information retrieval there are two or three vocabularies present: that of the author, the designer of the system, and the user.

The user thinks about his query in “natural language” which does not always correspond to the language of the information system. The query needs to be translated to a “query language” of the system. There is danger of mismatch in this process of translation.

The advantage of folksonomy is that it is built in the vocabulary of users. The term “democratic” has been misapplied to describe this characteristic of folksonomy. However, in spite of being described by its critics as ”ad hoc” or “hopelessly idiosyncratic”, it enables users without any training or previous knowledge to participate in the system.

In a controlled vocabulary the inclusion of new terms takes time. New tags can be created and added instantly. Flexibility of tags is one of their attractive features, particularly in connection with information on the Web.

Finding specific information within the controlled vocabulary environment is direct and reliable, as it avoids some of the flaws of tags described above. However, for browsing, tags provide an environment rich in new and unexpected information.

Folksonomy and use of tagging is a low cost alternative to traditional systems of information retrieval based on controlled vocabularies. However, any simplification in indexing may require from the user greater effort in searching the stored data. In the long run, should the traditional systems be entirely replaced by folksonomy, some users would not be well served.

What does it all mean for our profession?

Ellyssa Kroski is in no doubt that hierarchical taxonomies are suitable for library collections, but are not feasible for the Web. The new retrieval methods offer an opportunity for learning about user behaviour and for improving the existing taxonomies.

Adam Mathes says that the folksonomy’s uncontrolled nature is fundamentally chaotic and suffers from problems of ambiguity. Tagging is responsive to user needs and vocabularies. Transforming the creation of metadata from a professional activity to a shared activity by users is an important development that should be explored further.

Michal Čudrnak points out that it is possible for folksonomy to become an effective tool and equally possible that it is just a passing phase in the search for new forms of information retrieval.

In my opinion, the danger of the current trend in information storage and retrieval is that it does not depend on shared points of view constituting cultural heritage, but on a flow of separate bits of information subjectively tagged by individual users. It is far less efficient than it is claimed to be. It dismisses the long and reliable tradition of information retrieval of which our profession has been a guardian.

Tradiční přístup k vyhledávání informací/ Selected sources about traditional approach to information retireval:

Foskett, D. J.: Classification and indexing in the social sciences. 2nd ed. London: Butterworths, 1974. ISBN 0 408 706449;
Foskett, D. J.:Classification for a general index language: a review of recent research of the CLassification Research Group. London: Library Association, 1970.
ISBN 0853650322;
Foskett, D. J.: Pathways for communication: books and libraries in the information age. London: Bingley, 1984. ISBN 0851573568;
Langridge, D. W. Classification and indexing in the humanities. London : Butterworths, 1976. ISBN 0 408 70777 1;
Langridge, D. W.: Classification: its kinds, elements, systems and applications. London: Bowker Saur, 1992. ISBN 0862916224;
Meadows, A. J.: Knowledge and communication: essays on the information chain. London: Library Asssociation Publishing, 1991. ISBN 0851574548;
Price, D. J. de Solla: Little science, big science… and beyond. NY: Columbia U. P. 1986. ISBN 0231049579.

Nové směry ve vyhledávání informací:

Bradley, P.: Internet Q & A. In: Update, vol. 3 no. 1, January 2004, s. 5;
Pedley, P.: Have you thought of blogging? In: Update, vol. 3 no. 5, May 2004, s. 32 – 33;
Debating the key information industry issues. In: Update, vol. 4, no.,11, November 2005, s. 18 – 9;
Bradley P.: Internet Q. & A. In: Update, vol. 4, no. 12, December 2005, s. 10.

Poznámka: Update [časopis knihovnické asociace CILIP]

Folksonomie:

Watters, C. & Shepherd M. A.: Shifting the information paradigm from data-centered to user-centered. In: Information Processing & Management, vol. 30, no. 4, 1994, s. 455 – 471.

Online Information conference 2005 [discussion of folksonomy]:
www.onlineinformation.co.uk/ol05/conferenceproceedings.html;
www.personalinfocloud.com/2006/01/online_informat.html;
en.wikipedia.org/wiki/Folksonomy;
Čudrnák, M.: Folksonómie: nové prístupy k organizácii digitálneho obsahu na internete (skip.elet.sk/swift_data/source/files/presentations/students/folksonomie/folksonomie.html);
Grassroots Cooperative Categorization Of Digital Content Assets: Folksonomies, What They Are, Why They Work
(www.masternewmedia.org/2005/01/05/grassroots_cooperative_categorization_of_digital.htm);
Kroski, E.: The Hive Mind Folksonomies and User based Tagging (infotangle.blogsome.com/2005/12/07/the-hive-mind-folksonomies-and-user-based-tagging);
Mathes, A.: Cooperative Classification and Communication Through Shared Metadata. Computer Mediated Communication – LIS590CMC, Graduate School of Library and Information ScienceUniversity of Illinois Urbana-Champaign, December 2004;
Tagy už nacházejí komerční využití (www.snizekweb.cz/weblog/tagy-komercni-vyuziti).

Blogy:

Garrod, P.: Weblogs: do they belong in libraries? In: Ariadne, issue 40, July 2004 (www.ariadne.ac.uk/issue40/public-libraries);
[List of blogs in libraries]. (www.libdex.com/weblogs.html);
An Annotated Bibliography on Weblogs and Blogging, with a Focus on Library/Librarian Blogs… (blog-bib.blogspot.com);
www.rebeccablood.net/essays/weblog_history.html;
www.marigold.cz/item/weblogova-doba-cili-novinar-versuswebloger/category/historie-ceskeho-internetu.

* MPhil. PgDipComp. Sylva Šimsová je knihovníčka českého pôvodu žijúca v Londýne, kde pôsobí ako konzultant informačných systémov na University of North London (Information Systems Consultant University of North London).

Několik myšlenek o vyhledávání informací a knihovnické profesi

Zdieľať:

Číslo: 3/2006

Obsah čísla