Product Information

Textweiser text classifier

Textweiser is a software that categorizes text automatically. You can easily create the required set of categories. The software has to be trained and learns from representative documents for each category. Afterwards unknown text can be assigned to these categories automatically (see workflow as well).

To structure texts and documents in categories helps to maintain the information and knowledge available in these documents. Categories create a context which makes searching information more precisely - searching a keywords in a set of categories decreases the number of irrelevant results and speeds up finding. It may optimize processing as well, for example emails can be routed automatically.

Textweiser as a software library allows to integrate text categorization as part of your own products.

Features

  • Flexible usage
    Textweiser generates a list of possible categories along with their probability. Your application decides how to use these results. You may do automatic categorization as well as manual tagging only.
  • Support for flat or mono-hierarchical category structures (taxonomies)
    Depending on the application it is possible to use either flat structures or taxonomies.
  • Linguistic preprocessing
    Language dependent preprocessing of the data optimizes the results. Additional language support can be added on request easily.
  • Uses Unicode
    A Unicode-encoding ensures support for textual data in any language.
  • Little training
    Textweiser needs little training data (about ten documents per category).
  • Efficient processing
    Optimized algorithms allow a fast and efficient text categorization.
  • Easy to migrate
    The trained data is stored in a database. Textweiser takes care of a backup and restore solution. This way a migration of data between different databases or operating systems is possible easily.

Read more about Textweiser in the product information for developers and for decision-makers.

Supported Platforms

Operating System Distribution/Version Architecture
Linux Debian Lenny (5.0) x86, x86_64
Linux Debian Squeeze (6.0) x86, x86_64
Linux Ubuntu LTS (10.04) x86, x86_64
Linux Red Hat Enterprise 5 x86, x86_64
FreeBSD 7 x86
FreeBSD 8 x86
FreeBSD 9 x86
Windows XP x86
Windows Server 2003 x86
Windows Server 2008 x86
Windows 7 x86, x86_64
Windows Server 2008 R2 x86, x86_64

If you need the software for another operating system or distribution do not hesitate to contact us.

Supported Databases

Textweiser can be used with different databases and provides tools and/or library functions to change from one database to another as well. So migration can be done easily.

Database Version Operating System
SQLite 3 (included) any
Microsoft SQL Server 2008, 2008 R2 Windows

Using Microsoft SQL Server as a backend is supported on Microsoft Windows operating systems only.

If you need support for another database, please contact us.

Interfaces

  • C/C++
  • Java (available soon)
  • Perl (available soon)

Requirements

There are very little requirements as Textweiser only depends on the native C and thread library of the respective operating system.

Any data is stored to a database so Textweiser depends on the database used. With SQLite no further dependencies occur, as this database software is already included in the Textweiser library.

All technical details are summed up in the software specification.