Terms extraction

With the help of terms extraction you are able to create organization-specific terminology to keep communication consistent within the organization.


Sandstone’s terms extraction solution analyzes and automatically extracts the most important terms from either monolingual or bilingual materials. The produced dictionary or term list will then be merged with the existing terms of the organization. This offers a unified source for all communications within the organization.


Terms extraction provides better quality for higher amounts of material.


Monolingual terms extraction


Monolingual terms extraction builds a list of terms from existing documents of the organization. Most common terms are given a higher priority in a result and they include a translation if such existed.


Monolingual terms can be extracted from the listed file types:


  • DOCX (Microsoft Word 2007-)

  • DOC (Microsoft Word)

  • RTF (Rich Text Format)

  • TXT (ordinary text file)


Bilingual terms extraction

Bilingual terms extraction makes organization-specific dictionary from the most used terms and their translations within the organization.


Bilingual terms extraction can be extracted from the listed file types:


  • TMX (Translation Memory Exchange)

  • TXT (TradosTag XML)

  • CSV (comma-separated-value text file)


Supported languages

Following languages are supported:


  • finnish

  • swedish

  • english



Sandstone provides terms extraction as a service. Customer delivers the material and receives a ready-made terminology listing or a dictionary. The result is a CSV-formatted file. Terms and term translation (when found) are listed as well as the frequency of a listed item. The file may be viewed with a Microsoft Excel or any text editor.


Confidential material

In case of confidential material Sandstone can deliver fully configured server to be placed in a customer’s office premises. This provides a secure way to work with confidential material. There is no need to connect the server with local network as it is fully functional independently.