Cluster Inference

The Cluster Inference is a tool allowing to infer cluster classes on knowledge graph entries This tools is a rest service.

Cluster Inference API

This API is also available via the service itself on http://<service host>:<service port>/swagger

Cluster Inference configuration

Example of Configuration:

relation.json

{
    "logger": {
        "logging-level": "{{ project.loglevel }}"
    },
    "relations": {
        "cluster":{
            "algorithm":"kmeans",
            "number-of-classes":16,
            "number-of-iterations":16,
            "seed":123456,       
            "batch-size":4096, 
            "embeddings":
            {
                "server":{
                    "host":"0.0.0.0",
                    "port":10005,
                    "associate-environment": {
                        "host":"SENT_EMBEDDING_HOST",
                        "port":"SENT_EMBEDDING_PORT"
                    },
                    "use-ssl":false,
                    "no-verify-ssl":true
                },
                "aggregate":{
                    "configuration":"{{ project.path }}/configs/embeddings.json"
                }
            }
        },
        "clustering-model":{
            "semantic-quantizer-model":"{{ project.path }}/resources/modeling/relation_names.model.pkl",
            "train-if-not-exists":true
        },
        "network": {
            "host":"0.0.0.0",
            "port":10013,
            "associate-environment": {
                "host":"CLUSTER_INFERENCE_HOST",
                "port":"CLUSTER_INFERENCE_PORT"
            }
        },
        "runtime":{
            "request-max-size":100000000,
            "request-buffer-queue-size":100,
            "keep-alive":true,
            "keep-alive-timeout":5,
            "graceful-shutown-timeout":15.0,
            "request-timeout":60,
            "response-timeout":60,
            "workers":1
        }
    }
}

Configure cluster inference logger

Logger is configuration at top level of json in logger field.

Example of Configuration:

logger configuration

{
    "logger": {
        "logging-level": "debug"
    }    
}

The logger fields are:

logging-file is the filename of the log file (notice that "-\" will be added to this name=
logging-path is the path to the logfile (if it does not exist it will be created)
logging-level contains two fields:
file for the logging level of the file
screen for the logging level on screen output

Both can be set to the following values:

debug for the debug level and developper information
info for the level of information
warning to display only warning and errors
error to display only error

Configure cluster inference Network

Example of Configuration:

network configuration

{
    "network": {
        "host":"0.0.0.0",
        "port":8080,
        "associate-environment": {
            "host":"HOST_ENVNAME",
            "port":"PORT_ENVNAME"
        },
        "ssl":
        {
            "certificate":"path/to/certificate",
            "key":"path/to/key"
        }
    }
}

The network fields:

host : hostname
port : port of the service
associated-environement : is the "host" and "port" associated environment variables that allows to replace the default one. This field is not mandatory.
"host" : associated "host" environment variable
"port" : associated "port" environment variable
ssl : ssl configuration IN PRODUCTION IT IS MANDATORY TO USE CERTIFICATE AND KEY THAT ARE *NOT* SELF SIGNED
cert : certificate file
key : key file

Configure cluster inference runtime

Example of Configuration:

network configuration

{
    "runtime":{
        "request-max-size":100000000,
        "request-buffer-queue-size":100,
        "keep-alive":true,
        "keep-alive-timeout":5,
        "graceful-shutown-timeout":15.0,
        "request-timeout":60,
        "response-timeout":60,
        "workers":1
    }    
}

The Runtime fields:

request-max-size : how big a request may be (bytes)
request-buffer-queue-size: request streaming buffer queue size
request-timeout : how long a request can take to arrive (sec)
response-timeout : how long a response can take to process (sec)
keep-alive: keep-alive
keep-alive-timeout: how long to hold a TCP connection open (sec)
graceful-shutdown_timeout : how long to wait to force close non-idle connection (sec)
workers : number of workers for the service on a node
associated-environement : if one of previous field is on the associated environment variables that allows to replace the default one. This field is not mandatory.
request-max-size : overwrite with environement variable
request-buffer-queue-size: overwrite with environement variable
request-timeout : overwrite with environement variable
response-timeout : overwrite with environement variable
keep-alive: overwrite with environement variable
keep-alive-timeout: overwrite with environement variable
graceful-shutdown_timeout : overwrite with environement variable
workers : overwrite with environement variable

Cluster Inference service

To run the command type simply from tkeir directory:

python3 thot/clusterinfer_svc.py --config=<path to configuration file>

or if you install tkeir wheel:

tkeir-clusterinfer-svc.py --config=<path to configuration file>

A light client can be run through the command

python3 thot/clusterinfer_client.py --config=<path to configuration file> --input=<input directory> --output=<output directory>

or if you install tkeir wheel:

tkeir-clusterinfer-client.py --config=<path to configuration file> --input=<input directory> --output=<output directory>

Cluster Inference Tests

The converter service come with unit and functional testing.

Cluster Inference Unit tests

Unittest allows to test Cluster Inference classes only.

python3 -m unittest thot/tests/unittests/TestRelationClusterizerConfiguration.py
python3 -m unittest thot/tests/unittests/TestClusterInference.py

Cluster Inference Functional tests

python3 -m unittest thot/tests/functional_tests/TestClusterInferSvc.py