Man page of tw_classify(3) and tw_classify_file(3)
Index
- NAME
- SYNOPSIS
- DESCRIPTION
- PARAMETERS
- RETURN VALUE
- NOTES
- CATEGORY/PROBABILITY DATA STRUCTURE
- EXAMPLE
- SEE ALSO
- COPYRIGHT
NAME
tw_classify, tw_classify_file - classify a document
SYNOPSIS
C/C++ #include <tw.h> tw_errno_t tw_classify(tw_t *tw, const char *str, short n, tw_prob_t ***probs); tw_errno_t tw_classify_file(tw_t *tw, const char *path, short n, tw_prob_t ***probs);
DESCRIPTION
tw_classify() and tw_classify_file() classify documents into previously learned categories.
tw_classify() processes strings while tw_classify_file() handles documents stored within the file system.
Both functions store a user-definable number of categories along with their probabilities (category/probability pairs) to probs.
PARAMETERS
tw_classify() and tw_classify_file() have any but the second parameter in common:
- tw (tw_t *)
-
Pointer to an initialized Textweiser object.
- n (short)
-
Number of category/probability pairs that should be stored within probs.
- probs (tw_prob_t ***)
-
Pointer to pointer to pointer to category/probability pairs (
tw_prob_t).Declare probs as a pointer to pointer to
tw_prob_tand initialize it toNULL. Pass a pointer to probs to tw_classify() or tw_classify_file().On success probs will contain a
NULLterminated array of the relevant n or less pairs in descending order of probability.APPLICATION CODE EXAMPLE:
tw_prob_t **probs = NULL; tw_classify(&tw, string, 5, &probs);
For a complete example, have a look at EXAMPLE below.
tw_classify() expects as a second parameter:
- str (const char *)
-
The document's content as a string.
tw_classify_file() expects as a second parameter:
- path (const char *)
-
The path to the document within the file system.
RETURN VALUE
Both tw_classify() and tw_classify_file() return an error indicator
(tw_errno_t).
A return value of TW_OK indicates success, any other value
discriminates the occurred error.
The function tw_strerror(3) can be used to obtain a natural language error message.
NOTES
- o
-
Both functions require the input to be plain text and should be in a supported language - see the Textweiser User Manual for details.
- o
-
Both functions should be passed valid UTF-8 input for a maximum of classification speed and accuracy.
- o
-
On Windows, the category/probability pairs have to be freed using tw_free_prob_t(3). On any other OS using tw_free_prob_t(3) is recommended.
- o
-
The classification results stored in probs are sorted in descending order by probability. As a result, the category with the highest probability will be placed in the first array element.
CATEGORY/PROBABILITY DATA STRUCTURE
The tw_prob_t data structure contains the following members:
- category (char *)
-
The name of the category.
- probability (float)
-
The calculated probability that the input document belongs to category.
EXAMPLE
For brevity, the following example assumes that Textweiser uses an SQLite database backend and a set of categories has already been added and trained.
C/C++ #include <stdio.h> #include <stdlib.h> #include <tw.h> int main(int argc, char *argv[]) { tw_errno_t rv = TW_OK; tw_prob_t **probs = NULL; const char *string = "The house prices have risen."; tw_t tw = TW_INITIALIZER; /* Initialize a Textweiser object using the SQLite * database backend. */ rv = tw_init(&tw, NULL, NULL, NULL, "example.sqlt", 0); if (rv != TW_OK) { fprintf(stderr, "Failed to initialize: %s\n", tw_strerror(rv)); return EXIT_FAILURE; } rv = tw_classify(&tw, string, 2, &probs); tw_free(&tw); if (rv == TW_OK) { if (probs) { short i = 0; for (i = 0; probs[i]; i++) { printf("Category \"%s\" -> %.2f%%\n", probs[i]->category, probs[i]->probability); } tw_free_prob_t(probs); } else { puts("No results"); } return EXIT_SUCCESS; } else { fprintf(stderr, "Failed to classify: %s\n", tw_strerror(rv)); return EXIT_FAILURE; } return EXIT_SUCCESS; }
Example output:
Category "Economy & Markets" -> 100.00% Category "Holidays" -> 13.02%
SEE ALSO
tw-classify(1)
tw_free_prob_t(3), tw_learn(3), tw_learn_file(3), tw_free(3), tw_strerror(3)
Textweiser User Manual
http://www.lingua-systems.com/text-classifier/textweiser-library/
COPYRIGHT
Copyright (c) 2010-2011 Lingua-Systems Software GmbH


