Skip to content

Indexer

The indexer is a tool allowing to index analyzed document. This tools is a rest service.

Indexer API

Indexer configuration

Example of Configuration:

indexing.json
{
    "logger": {
        "logging-level": "{{ project.loglevel }}"
    },
    "indexing": {
        "document":{
            "remove-knowledge-graph-duplicates":true
        },
        "elasticsearch":{
            "network": {
                "host": "localhost",
                "port": 9200,
                "use_ssl": false,
                "verify_certs": false,
                "auth":{
                    "user":"admin",
                    "password":"admin",
                    "associate-environment": {
                        "user":"OPENDISTRO_USER",
                        "password":"OPENDISTRO_PASSWORD"
                    }
                },
                "associate-environment": {
                    "host":"OPENDISTRO_DNS_HOST",
                    "port":"OPENDISTRO_PORT",
                    "use_ssl":"OPENDISTRO_USE_SSL",
                    "verify_certs":"OPENDISTRO_VERIFY_CERTS"
                }
            },
            "nms-index":{
               "name":"default-nms-index",              
               "mapping-file":"{{ project.path }}/resources/indices/indices_mapping/nms_cache_index.json"
            },
            "text-index":{
                "name":"default-text-index",
                "mapping-file":"{{ project.path }}/resources/indices/indices_mapping/cache_index.json"
            },
            "relation-index":{
                "name":"default-relation-index",
                "mapping-file":"{{ project.path }}/resources/indices/indices_mapping/relation_index.json"
            }
        },
        "network": {
            "host":"0.0.0.0",
            "port":10012,
            "associate-environment": {
                "host":"INDEX_HOST",
                "port":"INDEX_PORT"
            }
        },
        "runtime":{
            "request-max-size":100000000,
            "request-buffer-queue-size":100,
            "keep-alive":true,
            "keep-alive-timeout":500,
            "graceful-shutown-timeout":15.0,
            "request-timeout":600,
            "response-timeout":600,
            "workers":1
        }
     }
}

Indexer is an aggreation of network configuration, serialize configuration, runtime configuration (in field converter), logger (at top level).

Configure indexer logger

Logger is configuration at top level of json in logger field.

Example of Configuration:

logger configuration
{
    "logger": {
        "logging-level": "debug"
    }    
}

The logger fields is:

  • logging-level

It can be set to the following values:

  • debug for the debug level and developper information
  • info for the level of information
  • warning to display only warning and errors
  • error to display only error
  • critical to display only error

Configure indexer Network

Example of Configuration:

network configuration
{
    "network": {
        "host":"0.0.0.0",
        "port":8080,
        "associate-environment": {
            "host":"HOST_ENVNAME",
            "port":"PORT_ENVNAME"
        },
        "ssl":
        {
            "certificate":"path/to/certificate",
            "key":"path/to/key"
        }
    }
}

The network fields:

  • host : hostname

  • port : port of the service

  • associated-environement : default one. This field is not mandatory.

    • "host" : associated "host" environment variable
    • "port" : associated "port" environment variable
  • ssl : ssl configuration IN PRODUCTION IT IS MANDATORY TO USE CERTIFICATE AND KEY THAT ARE *NOT* SELF SIGNED

  • cert : certificate file

  • key : key file

Configure indexer runtime

Example of Configuration:

network configuration
{
    "runtime":{
        "request-max-size":100000000,
        "request-buffer-queue-size":100,
        "keep-alive":true,
        "keep-alive-timeout":5,
        "graceful-shutown-timeout":15.0,
        "request-timeout":60,
        "response-timeout":60,
        "workers":1
    }    
}

The Runtime fields:

  • request-max-size : how big a request may be (bytes)

  • request-buffer-queue-size: request streaming buffer queue size

  • request-timeout : how long a request can take to arrive (sec)

  • response-timeout : how long a response can take to process (sec)

  • keep-alive: keep-alive

  • keep-alive-timeout: how long to hold a TCP connection open (sec)

  • graceful-shutdown_timeout : how long to wait to force close non-idle connection (sec)

  • workers : number of workers for the service on a node

  • associated-environement : if one of previous field is on the associated environment variables that allows to replace the default one. This field is not mandatory.

  • request-max-size : overwrite with environement variable

  • request-buffer-queue-size: overwrite with environement variable
  • request-timeout : overwrite with environement variable
  • response-timeout : overwrite with environement variable
  • keep-alive: overwrite with environement variable
  • keep-alive-timeout: overwrite with environement variable
  • graceful-shutdown_timeout : overwrite with environement variable
  • workers : overwrite with environement variable

Indexer service

To run the command type simply from tkeir directory:

python3 thot/index_svc.py --config=<path to indexer configuration file>

or if you install tkeir wheel:

tkeir-index-svc --config=<path to indexer configuration file>

A light client can be run through the command

python3 thot/index_client.py --config=<path to indexer configuration file> --input=<input directory> --output=<output directory>

or if you install tkeir wheel:

tkeir-index-client --config=<path to indexer configuration file> --input=<input directory> --output=<output directory>

Indexer Tests

The converter service come with unit and functional testing.

Indexer Unit tests

Unittest allows to test Indexer classes only.

python3 -m unittest thot/tests/unittests/TestIndexingConfiguration.py
python3 -m unittest thot/tests/unittests/TestIndexing.py

Indexer Functional tests

python3 -m unittest thot/tests/functional_tests/TestIndexingSvc.py