follow us

The difference

Traditional text mining solutions work pretty well on structured documents like news articles, patents, annual reports...

 

But they found it very difficult to handle user generated content (UGC) or text communications

 

Why?

 

UGC and text communications are highly unstructured contents with specific issues:

  • Lots of variants, typos, alterations...
  • Frequent use of slang, abbreviations
  • Grammatical accuracy quite low
  • Lots of volume

 

Our TMT technology has been specifically designed to handle these requirements to analyze lots of UGC or text communications like Tweets, Facebook comments, forums posts, chat messages, SMS in real time.

 

 

Capture

Unlike solutions based on simple keywords or
semantics, our technology takes into account the different alterations and
variants of expressions to analyze the content:
•    Small/capital letters
use
•    Letter repetition (vvviiiagrrra for example)
•    Spelling
variations (vi@gra, vlagra, v1@gra, v149r4)
•    Misspellings and missing
letters in some cases (v|agra, v agra…)
•    Word alteration using non-alpha
symbols (v.i.a.g.r.a, v_i°ag#r:a, v-iagra, viagr"a...)
•    Phonetic
alterations
•    SMS and IM languages (Arabizi for example)
•    And the
varions combinations of these variations.

 

Understand: Smart Wordbooks

The solution is based on a smart engine that
rates not just single words but the entire content as its passes through the
filtering engine. Words are therefore placed in context
to extract meaning and actionable intelligence for you
.

The
solution applies on detailed thematic thesauruses - our Smart Wordbooks. Filters
are categorized to allow customers to fine-tune the analysis
(Terrorism/Drugs/Violence, etc.) according to their needs.

Smart Wordbooks can be developed either by Scan
& Target, or its Certified Partners, or directly by customers.

 

Measure

Scan & Target has developed a proprietary
scoring technology tailored to short digital text contents.

The scoring
system is specific per products (Moderation, Threat detection, sentiment analysis...) for an
accurate measure.

Using a powerful and accurate conditional analysis
system, our customers experience a very low level of false positives (between
0.05% to 0.001% in average).

© Copyright 2007-2011 Scan & Target SAS. All rights reserved