Relation clustering
Relation clustering allows to create class on SVO extracted during the Syntactic tagging phase.
Relations clustering configuration
Example of Configuration:
relations.json
{
"logger": {
"logging-level": "{{ project.loglevel }}"
},
"relations": {
"cluster":{
"algorithm":"kmeans",
"number-of-classes":16,
"number-of-iterations":16,
"seed":123456,
"batch-size":4096,
"embeddings":
{
"server":{
"host":"0.0.0.0",
"port":10005,
"associate-environment": {
"host":"SENT_EMBEDDING_HOST",
"port":"SENT_EMBEDDING_PORT"
},
"use-ssl":false,
"no-verify-ssl":true
},
"aggregate":{
"configuration":"{{ project.path }}/configs/embeddings.json"
}
}
},
"clustering-model":{
"semantic-quantizer-model":"{{ project.path }}/resources/modeling/relation_names.model.pkl",
"train-if-not-exists":true
},
"network": {
"host":"0.0.0.0",
"port":10013,
"associate-environment": {
"host":"CLUSTER_INFERENCE_HOST",
"port":"CLUSTER_INFERENCE_PORT"
}
},
"runtime":{
"request-max-size":100000000,
"request-buffer-queue-size":100,
"keep-alive":true,
"keep-alive-timeout":5,
"graceful-shutown-timeout":15.0,
"request-timeout":60,
"response-timeout":60,
"workers":1
}
}
}
Relation clustering configuration is an aggreation of serialize configuration, logger (at top level). The clustering configuration allows to define embedding server access and clustering algorithms settings:
- algorithm: ["kmeans","spericalkmeans" (Not yet available)],
- number-of-classes: number of cluster classes,
- number-of-iterations: number of kmeans iterations,
- seed:kmeans seed
- batch-size: we use mini batch kmeans, the batch size if the number of vectors send for partial fit,
- embeddings : embedding server network information (host and port) or aggretion (server-less)
- server : server configuration
- aggregation : path to embedding configuration file
Configure Relations clustering logger
Logger is configuration at top level of json in logger field.
Example of Configuration:
The logger fields is:
- logging-level
It can be set to the following values:
- debug for the debug level and developper information
- info for the level of information
- warning to display only warning and errors
- error to display only error
- critical to display only error
Relation clustering tool
To run the command type simply from tkeir directory: