Ingestion
The following documentation describes our ingestion which includes several automated steps based on queues. While the data are instantly stored in our databases, the audio analysis and the computation of the products could take up to several hours to be available.
Once configured, the ingestion could easily take place daily to ensure a permanent update of your catalogues. In consideration of fair usage of our services, please notify us in prevision of large ingestions.
The ingestion
is the process allowing you to add content to one of your catalogues, including all the related steps to make your track available for your licenced product.
Depending on your assets and/or your usage, you may use one of another way to ingest your content. Note that even though we only require an audio-file and its reference in your system (called outer_id
), it may be interesting for you to ingest more metadata linked to your audio files. To gain a better ergonomy on our Portal, we usually recommend ingesting your outer_id
, the title
, the related artist
and the ISRC
amongst all your audio files
. These metadata will need a specific mapping
for us to parse all your content during the ingestion; this will be described here below.
Note that you can create several catalogs inside your organization, and use a different process of ingestion for each of them at the same time.
Summary
AWS S3 Bucket
Included to your account, you have access to one single dedicated Amazon S3 Bucket in which you will have to deliver your assets to be analyzed:
{
"bucket_name": "msm-s3-{YOUR_ORGANIZATIO_ID}-{AWS_REGION}-prd",
"bucket_region": "{AWS_REGION}",
"aws_access_key_id": "{YOUR_PRIVATE_ACCESS_KEY_ID}",
"aws_secret_access_key": "{YOUR_PRIVATE_ACCESS_KEY_SECRET}"
}
This S3 Bucket
is dedicated and secured for your organization only. All the assets that you are delivering plus all the assets we are producing are stored in this unique location.
Several paths already are configured:
Path | Description | Access |
---|---|---|
csv_outputs | Contains all the export made by Musimotion | ro |
delivery | Root delivery area | ro |
delivery/audio | Root delivery for all audio files | rw |
delivery/ddex | Root delivery for all DDEX files | rw |
delivery/mapping | Root repository for all mapping files | rw |
delivery/metadata | Root delivery for all CSV files | rw |
INFO.json | Technical configuration (JSON format) | ro |
live_analysis | Temporary storage for the Live analysis | - |
logs | Storage for all the logs produced by our services | - |
processed | Root storage area for processed files | - |
processed/audio | Main storage for processed audio files | - |
processed/features | Main storage for extracted features | - |
Access through CLI
Once you have configured your environment to access the bucket, the following commands should work:
For you to get the region in which your bucket is located:
aws s3api get-bucket-location --bucket {YOUR_BUCKET_NAME}
{
"LocationConstraint": "eu-west-1"
}
For you to list the content of your bucket:
aws s3 ls s3://{YOUR_BUCKET_NAME}/
PRE csv_outputs/
PRE delivery/
PRE live_analysis/
PRE logs/
PRE processed/
831 INFO.json
36 {YOUR_ORGANIZATION_NAME}
For you to recursively list all the files that are in the delivery
area:
aws s3 ls s3://{YOUR_BUCKET_NAME}/delivery --recursively
Access through Interface
If you prefer to avoid the usage of the command line, you may connect to your dedicated bucket by using any software supporting the S3-protocol.
We recommend you to use Cyberduck for which you will have to configure the key
and the path
. Once connected, you will be able to navigate in the directories in order to upload your audio files in the right folder:
Audio files only, "free naming"
This is the simplest way to ingest content in one of your catalogs. Note that the outer_id
will correspond to the filename of the related audio file.
Use-cases
- You have a bunch of audio files that are not formatted to follow a proper format.
- You will not match our results with another database or match the data based on the filenames.
- You want to have a quick look at our results through our portal.
Preparation
Simply upload all your audio files into the right path delivery/audio/
:
# one file at once
aws s3 cp LOCAL_AUDIO_FILE.mp3 s3://{YOUR_BUCKET_NAME}/delivery/audio/REMOTE_AUDIO_FILE.mp3
# all the content of a specific folder
aws s3 cp LOCAL_FOLDER/ s3://{YOUR_BUCKET_NAME}/delivery/audio/
Note that you can create as many subfolders you want, as long as their main path is delivery/audio/
.
Read more about the S3 cp commands
Request
curl --location -g --request POST 'https://api-v2.musimap.io/ingestion/audio_only?delivery_dir_path={THE_PARENT_DELIVERY_PATH}&catalog_id={CATALOG_ID}&delete_delivered_file={TRUE|FALSE}' \
--header 'Authorization: Bearer {VALIDE_ACCESS_TOKEN}'
Parameter | Value Type | Description |
---|---|---|
delivery_dir_path | string | Path on S3, relative to ./delivery/audio/ |
catalog_id | string (UUID) | Musimap Internal Unique Identifier for the catalog |
delete_delivered_file | boolean (default = "True") | Whether the file is deleted from the delivery once processed |
Process
Once triggered, the ingestion will start scanning the delivery_dir_path
and create one single entry for each found files. Please note that the filename will become the outer_id
which is supposed to be unique. This means that a file could overwrite any previously uploaded file having the same name.
Note that the audio files are moved during the process. Depending on the parameter delete_delivered_file
, the delivery_dir_path
will be empty at the end of this first step.
Once all the tracks saved in our database, the audio-analysis will start sending every audio file to our audio analyzers. This system is based on messaging queues
; the audio analysis speed depends on the number of tracks being processed.
Several times a day, another process will compute the similarities for Musimatch
or the tags for Musimotion
and Musime
. If a track hasn't been analysed yet, it will be computed in another thread later on. This third step is defining the availability of this entry for your licenced product.
Audio files only, "Structured naming"
This is the simplest way to ingest formatted content in one of your catalogues.
Use-cases
- All your audio files have the same naming convention.
- You want to match our results with your infrastructure, based on a specific reference.
- You want to get benefits of our service by quickly adding simple metadata to your audio files.
Preparation
Simply upload all your audio files into the right path delivery/audio/
:
# one file at once
aws s3 cp LOCAL_AUDIO_FILE.mp3 s3://{YOUR_BUCKET_NAME}/delivery/audio/REMOTE_AUDIO_FILE.mp3
# all the content of a specific folder
aws s3 cp LOCAL_FOLDER/ s3://{YOUR_BUCKET_NAME}/delivery/audio/
Note that you can create as many subfolders you want, as long as their main path is delivery/audio/
.
Read more about the S3 cp commands
Mapping
This file should tell us about your naming convention and will allow us to parse each of your filenames to extract the right information.
As an example, if your filename is composed as "outer_id"-"isrc".ext
, your mapping should look like:
mapping:
- separator: "-"
- column_0: "outer_id"
- column_1: "isrc"
Available values:
- outer_id
- title
- isrc
- release_date
- artist_name
Once this file is written (and validated), you may want to store it inside your bucket:
aws s3 cp local_mapping.yaml s3://{YOUR_BUCKET_NAME}/delivery/mapping/default_mapping.yaml
Note that our Support Team will usually configure your very first ingestion, and a working example is provided as a validation.
Request
curl --location -g --request POST 'https://api-v2.musimap.io/ingestion/audio_mapping?mapping_filename=default_mapping.yaml&delivery_dir_path={THE_PARENT_DELIVERY_PATH}&catalog_id={CATALOG_ID}&delete_delivered_file={TRUE|FALSE}' \
--header 'Authorization: Bearer {VALIDE_ACCESS_TOKEN}'
Parameter | Value Type | Description |
---|---|---|
delivery_dir_path | string | Path on S3, relative to ./delivery/audio/ |
catalog_id | string (UUID) | Musimap Internal Unique Identifier for the catalog. |
delete_delivered_file | boolean (default = "True") | Whether the file is deleted from the delivery once processed |
mapping_filename | string | The filename of your mapping stored into ./delivery/mapping |
mapping_content | string | The YAML file containing your mapping, if not stored in S3 |
Process
Once triggered, the ingestion will start scanning the delivery_dir_path
and create one entry for each found files. The extracted data will be stored amongst the audio file in our databases. Please note that the outer_id
is supposed to be unique and will overwrite any previous entry having the same reference.
Note that the audio files are moved during the process. Depending on the parameter delete_delivered_file
, the delivery_dir_path
will be empty at the end of this first step.
Once all the tracks saved in our database, the audio-analysis will start sending every audio file to our audio analyzers. This system is based on messaging queues
; the audio analysis speed depends on the number of tracks being processed.
Several times a day, another process will compute the similarities for Musimatch
or the tags for Musimotion
and Musime
. If a track hasn't been analysed yet, it will be computed in another thread later on. This third step is defining the availability of this entry for your licenced product.
Audio Files & Metadata (JSON)
This is the most complete way to ingest audio files with metadata.
Use-cases
- Your audio files don't have any structured filenames.
- You want us to store all the metadata related to an audio file in order to retrieve all the information immediatly.
- You want to get benefits of our service through our Portal.
Preparation
S3 Storage
In order to fetch the media on S3, simply upload all your audio files into the right path delivery/audio/
:
# one file at once
aws s3 cp LOCAL_AUDIO_FILE.mp3 s3://{YOUR_BUCKET_NAME}/delivery/audio/REMOTE_AUDIO_FILE.mp3
# all the content of a specific folder
aws s3 cp LOCAL_FOLDER/ s3://{YOUR_BUCKET_NAME}/delivery/audio/
Note that you can create as many subfolders you want, as long as their main path is delivery/audio/
.
Read more about the S3 cp commands
Remote Storage
If your audio files are publicly available, you can setup the ingestion to download them. In such a case, you will need to fill the primary_media
of each track with the complete URL wherefrom the file could be downloaded.
JSON Body
Once your audio files have been uploaded, you will need to query our Web-API to ingest them with the right information. You may use one single query for up to 25 tracks. The query will then contain all the information for us to retrieve the right audio file and to save it with all the information you want us to store. For each of those tracks, the following structure needs to be respected:
{
"outer_id": "string",
"references": [
{
"id": "string",
"source": "string"
}
],
"title": "string",
"lyrics": "string",
"isrc": "string",
"release_date": 0,
"albums": [
{
"upc": "string",
"title": "string",
"release_date": "string",
"type": "Compilation",
"references": [
{
"id": "string",
"source": "string"
}
],
"track_position": "string",
"disk_number": "string"
}
],
"artists": [
{
"id": "string",
"name": "string",
"role": "string"
}
],
"primary_media": "string",
"customer_tags": [
{
"tag": "type of Rock",
"category": "Rock"
}
]
}
Note that only the fields outer_id
and primary_media
are required, all the others are optional.
Parameter | Value Type | Description |
---|---|---|
outer_id | string | Your unique identifier for this track |
references | Nested object | A list of official references for this track, sorted by source |
title | string | The official title for this track |
lyrics | string | The complete lyrics for this track |
isrc | string | The unique ISRC reference for this track |
release_date | integer | The date of release (YYYYMMDD) |
albums.upc | string | The unique UPC reference for this album |
albums.title | string | The title for the album |
albums.release_date | string | The date of release (YYYYMMDD) |
albums.type | string | The type of album (Official, Compilation, Single,...) |
albums.references | Nested object | A list of official references for this album, sorted by source |
albums.track_position | string | The position of the track on the related disk_number |
albums.disk_number | string | The disk on which the track could be listened |
artists.id | string | A specific identifier for the related artist |
artists.name | string | The name of a specific artist related to this track |
artists.role | string | A specific role you would like to attach to the artist |
primary_media | string | Remote URL or path on S3, relative to ./delivery/audio/ |
customer_tags | Nested object | A list of tags, sorted by categories |
Note that you will be able to retrieve all those information by using the enriched response for several of our endpoints.
Request
In addition to this BODY, several QUERY parameters allow you to configure the request:
Parameter | Value Type | Description |
---|---|---|
media_fetch_type | s3 or download (default: s3) | Whether the file is stored on S3 or remotely |
catalog_id | any string | Musimap Internal Unique Identifier for the catalog. |
ingestion_id | any string | A unique reference for this ingestion |
overwrite_audio_file | BOOLEAN (default: TRUE) | In case of an already existing outer_id , whether the audio file needs to be overwritten |
delete_delivered_file | BOOLEAN (default: TRUE) | Whether the file is deleted from the delivery once processed |
Note that the ingestion_id
is any string that will create a sub-collection of your entries. We advise you to generate one unique ingestion_id
per day or week. If omitted, a timestamp will be used.
curl --location -g --request POST 'https://api-v2.musimap.io/ingestion/json?media_fetch_type={s3|download}&catalog_id={CATALOG_ID}&ingestion_id={INGESTION_ID}&overwrite_audio_file={TRUE|FALSE}&delete_delivered_file={TRUE|FALSE}' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer {VALIDE_ACCESS_TOKEN}' \
--data-raw '[
{
"outer_id": "string",
"references": [
{
"id": "string",
"source": "string"
}
],
"title": "string",
"lyrics": "string",
"isrc": "string",
"release_date": 0,
"albums": [
{
"upc": "string",
"title": "string",
"release_date": "string",
"type": "Compilation",
"references": [
{
"id": "string",
"source": "string"
}
],
"track_position": "string",
"disk_number": "string"
}
],
"artists": [
{
"id": "string",
"name": "string",
"role": "string"
}
],
"primary_media": "string",
"customer_tags": [
{
"tag": "type of Rock",
"category": "Rock"
}
]
}
]'
Process
Once triggered, the ingestion will start parsing the BODY of your request and create one entry for each found track. The extracted data will be stored amongst the audio file in our databases. Please note that the outer_id
is supposed to be unique and will overwrite any previous entry having the same reference.
Note that the audio files are moved during the process. Depending on the parameter delete_delivered_file
, the delivery_dir_path
will be empty at the end of this first step.
Once all the tracks saved in our database, the audio-analysis will start sending every audio file to our audio analyzers. This system is based on messaging queues
; the audio analysis speed depends on the number of tracks being processed.
Several times a day, another process will compute the similarities for Musimatch
or the tags for Musimotion
and Musime
. If a track hasn't been analysed yet, it will be computed in another thread later on. This third step is defining the availability of this entry for your licenced product.