Product Information for Software Developers

Textweiser is a software library that features an easy to use API, thread-safety and has no software dependencies other than the system's standard C and thread libraries and the database used. As a result, you can integrate Textweiser into your C/C++ software projects both easily and quickly. The lack of dependencies allows to bundle Textweiser with your own software product without the hassle of including other third party libraries as well.

Textweiser's functionality can be accessed through library functions. The library is, however, accompanied by a set of commandline applications that allow easy administration of the text classification system and provide an additional interface to Textweiser's functionality.

In the first instance, Textweiser has to be prepared for use by initializing the designated database and adding the set of categories required for the intended purpose of the classification system. Textweiser supports flat category structures as well as taxonomies. The next step to take is to train each category with at least ten representative documents to allow Textweiser to analyze and extract each category's characteristics. Both the library functions and the commandline applications may be used to accomplish these tasks. The classification system is ready for use afterwards and can thus be utilized to determine the most probable categories of unknown documents.

Features

  • full support and automatic handling of mono-hierarchical category structures (taxonomies)
  • uses Unicode (UTF-8)
  • every library function is thread-safe
  • only a small set of training documents required
  • continuous updates of categories and supplementary training possible
  • embedded SQLite database or interface to another database
  • provides many additional functions that make administration and maintenance comfortable (for example creating and restoring backups)

A Short Introduction to Textweiser's API

Textweiser's API can be divided into five groups of functions: those used for administration, resource handling, learning, classifying and auxiliaries:

Administration Functions Resource Handling Functions Learning/Training Functions Classification Functions Auxiliary Functions
Click on the above diagrams to enlarge them.

Textweiser stores categories along with their data in one of the supported databases. Therefore it is necessary to store all required connectivity parameters in the data structure tw_config_t and pass it to one of Textweiser's functions afterwards.

tw_init() establishes a connection to the used database and initializes a new Textweiser object of type tw_t that is expected by most of the library functions as an argument. The functions to classify, learn or administrate can then be invoked to classify unknown documents, use known documents to train existing categories, undo previous learning steps, retrieve a list of categories or create, rename or delete categories.

You can obtain detailed information on all library functions from either the German or English Textweiser User Manual or the man pages.

Error Handling

Textweiser features both thread-safe and easy to use error handling facilities using return values. Every library function that could possibly fail returns a value of type tw_errno_t indicating success or failure.

Named constants are provided for every error code. This way, case dependent error handling can be implemented conveniently. In addition, the function tw_strerror() may be used to obtain an English error description.

Source Code Examples

You can find the source code of minimal example applications in both the German and English Textweiser User Manual and in the man pages of tw_classify(), tw_get_categories() and tw_parse_config().