Software concepts
From OpenSearchServer Wiki
Generally speaking, the OSS software uses industry standard concepts and terminology, as is proper for Open Source resources.
Those concepts are very quickly defined below. Though there will be chiefly useful for those users without a background in computer sciences, we recommend that all users take a quick glance at this page, as this will help with their orientation within the interface.
Main building blocks of the OSS software
- INDEX (plural: INDICES). A database. In each INDEX, the user indicates a set of web pages and/or files to be worked with. Users can define several INDICES if they work with several sets of web sites or web pages and/or several sets of files.
- CRAWLER. The job of this module is go out on the World Wide Web, find the requested pages, and dump a raw copy of those pages in a space where the other modules can work with it. CRAWLERS are sometimes called "spiders".
- PARSER. The PARSER works with files (Word ™ documents, PDF files, etc.) rather than web pages. It doesn't have the equivalent of a CRAWLER -- the user has to put all files that have to be parsed into the BASKET. The PARSER will read everything in the BASKET, using sets of rules appropriate to each document format.
- ANALYSERS (or ANALYZERS). The job of this module is to work on the captured data brought in by the CRAWLER and/or the PARSER. The ANALYSERS are what turn raw data into fully searchable and manipulatable OSS data.
- SCHEMA (plural: SCHEMATA or SCHEMAS). Those are the semantic rules used by ANALYSERS. What the SCHEMAS do is explained in broad terms at the Search engine concepts page, and in specific terms in the documentation at the The schema page.