Using Multi-Language Media Support


By Invite Only: This is not available by default. Please contact Support to enable it.
This quickstart demonstrates a simple way to get started using the Clarify API in a variety of languages. Following these steps, it should take you no more than 5-10 minutes to have search in our supported languages.

Configuring Your Environment

While you can use any programming language you choose, we provide a few helper libraries to get you started. In most cases, you can use your favorite package manager:

  • curl
Although we don't have a curl library, the command-line JSON parser 'jq' is super helpful. Download and install it to get started: http://stedolan.github.io/jq/

Loading Audio by Language

Our system automatically detects the language of your media file so this process is no different than adding a English language media. You can do this with a single command:

  • curl
curl --data "media_url=http://media.clarify.io.s3.amazonaws.com/video/speeches/je-vous-ai-compris-1958-06-04.mp4" \
     --data "notify_url=http://example.org/sample-receiver" \
     --data "name=Je Vous ai Compris" https://api.clarify.io/v1/bundles \
     --X POST --header "Authorization: Bearer myapikey" | jq '.'
# The jq portion is optional and just used to pretty print the resulting json

Naming the bundle and providing a notify_url are both optional. We have a number of audio and video files available for processing on our Media Page.

Note: You don't have to download these files. Instead you can pass the urls via the create/POST method shown above.

After creating a bundle, you'll receive a response which looks something like this:

{
    "id":"abcde12345",
    "_class":"Ref",
    "_links":{
        "self":{
            "href":"/v1/bundles/abcde12345"
        },
        "curies":[
            {
                "href":"/docs/rels/{rel}",
                "name":"clarify",
                "templated":true
            }
        ],
        "clarify:metadata":{
            "href":"/v1/bundles/abcde12345/metadata"
        },
        "clarify:tracks":{
            "href":"/v1/bundles/abcde12345/tracks"
        },
        "clarify:insights":{
            "href":"/v1/bundles/abcde12345/insights"
        }
    }
}

Getting your Language Information

Our language detection is automatic and gets requested like other Insights such as Keywords. This can be accomplished with a single API call:

  • curl
curl https://api.clarify.io/v1/bundles/abcde12345/insights \
     --header "Authorization: Bearer myapikey"  | jq '.'
# The jq portion is optional and just used to pretty print the resulting json

That will return with the list of available insights, including Classification:

{
    "bundle_id": "abcde12345",
    "created": "2015-03-04T05:03:04.292Z",
    "updated": "2015-05-16T20:39:37.508Z",
    "_class": "Insights",
    "_links": {
        "curies": [
            {
                "href": "/docs/insights/{rel}",
                "name": "insight",
                "templated": true
            }
        ],
        "insight:spoken_keywords": {
            "href": "/v1/bundles/abcde12345/insights/54321edcba"
        },
        "insight:classification": {
            "href": "/v1/bundles/abcde12345/insights/edcba56789"
        },
        "insight:spoken_words": {
            "href": "/v1/bundles/abcde12345/insights/12345abcde"
        },
        "insight:spoken_topics": {
            "href": "/v1/bundles/abcde12345/insights/34567abcde"
        },
        "insight:transcript_r4": {
            "href": "/v1/bundles/abcde12345/insights/98765abcde"
        },
        "parent": {
            "href": "/v1/bundles/abcde12345"
        },
        "self": {
            "href": "/v1/bundles/abcde12345/insights"
        }
    }
}

The most important part of this payload is the href of the insight:classification key. By retrieving the contents of that URI, Clarify will give you the dominant language detected in the audio track(s).

{
    "_class": "ClassificationInsight",
    "bundle_id": "abcde12345",
    "created": "2015-09-17T18:48:48.058Z",
    "id": "edcba56789",
    "name": "classification",
    "status": "ready",
    "updated": "2015-09-17T18:48:48.061Z",
    "_links": {
        "clarify:bundle": {
            "href": "/v1/bundles/abcde12345"
        },
        "curies": [
                {
                    "href": "/docs/rels/{rel}",
                    "name": "clarify",
                    "templated": true
                }
            ],
        "parent": {
            "href": "/v1/bundles/abcde12345/insights"
        },
        "self": {
            "href": "/v1/bundles/abcde12345/insights/edcba56789"
        }
    },
    "track_data": [
            {
                "acoustics": [],
                "spoken_languages": ["fr"]
            }
        ]
}

From there, you can use this information for tagging, organization, or to do informed searches as in the next section.

Searching Media by Language

To search, you use the same object you created before and can search using your keywords. The only change is that you have to specify the language of the term you're seeking. This is done with a single command:

  • curl
curl https://api.clarify.io/v1/search?query=compris \
     --data "language=fr" \
     --header "Authorization: Bearer myapikey"

Putting it All Together

From here, we can visualize our search results with our included audio player. The player should work with minimal additional configuration, but the bulk of the logic is already above in the results.

  • PHP
  • Python
<?php

require 'vendor/autoload.php';

$bundle = new Clarify\Bundle('my api key');
$items = $bundle->search($terms);

$search_terms = json_encode($items['search_terms']);
$item_results = json_encode($items['item_results']);

$audiokey = $items['_links']['items'][0]['href'];
$tracks = $bundle->tracks->load($audiokey)['tracks'];
$mediaUrl = $tracks[0]['media_url'];
from clarify_python import clarify
import json

clarify.set_key('my api key')

result = clarify.search(query='dorothy')
search_terms = json.dumps(result['search_terms'])
item_results = json.dumps(result['item_results'])

bundleref = result['_links']['items'][0]['href']
bundle = clarify.get_bundle(bundleref)
tracksref = bundle['_links']['o3v:tracks']['href']
tracks = clarify.get_track_list(tracksref)['tracks']
mediaURL = tracks[0]['media_url']
By Invite Only: This is not available by default. Please contact Support to enable it.
Fork me on GitHub