Skip to content

Tokenizer

The tokenizer is a tool allowing to tokenize "title" and "content" field of tkeir document. This tools is a rest service. Tokenization depends on annotation model created by the tool stored in tkeir/thot/tasks/tokenizer/createAnnotationResouces.py This tools allows to create typed compound word list.

Tokenizer API

Tokenizer configuration

Example of Configuration:

tokenizer.json
{
    "logger": {
        "logging-level": "{{ project.loglevel }}"
    },
    "tokenizers": {
        "segmenters":[{
            "language":"en",       
            "resources-base-path":"{{ project.path }}/resources/modeling/tokenizer/en",
            "mwe": "tkeir_mwe.pkl",
            "normalization-rules":"tokenizer-rules.json",
            "annotation-resources-reference":"annotation-resources.json"
        }],
        "network": {
            "host":"0.0.0.0",
            "port":10001,
            "associate-environment": {
                "host":"TOKENIZER_HOST",
                "port":"TOKENIZER_PORT"
            }
        },
        "runtime":{
            "request-max-size":100000000,
            "request-buffer-queue-size":100,
            "keep-alive":true,
            "keep-alive-timeout":500,
            "graceful-shutown-timeout":15.0,
            "request-timeout":600,
            "response-timeout":600,
            "workers":1
        }
    }
}

Tokenizer is an aggreation of network configuration, serialize configuration, runtime configuration (in field converter), logger (at top level). The segmenter configuration is a table containing path to Multiple Word Expression entries (MWE):

  • language :the language of tokenizer
  • resources-base-path: the path to the resources (containing file created by tools createAnnotationResources.py
  • mwe : the file containing MWE entries
  • normalization-rules : the file containing normalization rules
  • annotation-resources-reference : reference to annotation file, needs on tokenizer init

Tokenizer accepts a rule file to select parser (not yet implemented), common typos fixing and word mapping (for example map english words to us words). The normalization rule is a simple json file with the following fields:

  • parsers (NOT YET IMPLEMENTED) : the available parser (for exemple pyvalem to parse chemestry formulas)
  • normalization/word-mapping: mapping words
  • normalization/typos : typos fixing
tokenizer-rules.json
{
    "parsers": {
        "on-document":["texsoup"],
        "on-tokens":[ 
            {"parsers":"chemparse","max-tokens-merge":50}
        ]
    },

    "normalization": {
        "word-mapping" : [
            {"from":"accessorise", "to":"accessorize"},
            {"from":"accessorised", "to":"accessorized"},
            {"from":"accessorises", "to":"accessorizes"},
            {"from":"accessorising", "to":"accessorizing"},
            {"from":"acclimatisation", "to":"acclimatization"},
            {"from":"acclimatise", "to":"acclimatize"},
            {"from":"acclimatised", "to":"acclimatized"},
            {"from":"acclimatises", "to":"acclimatizes"},
            {"from":"acclimatising", "to":"acclimatizing"},
            {"from":"accoutrements", "to":"accouterments"},
            {"from":"aeon", "to":"eon"},
            {"from":"aeons", "to":"eons"},
            {"from":"aerogramme", "to":"aerogram"},
            {"from":"aerogrammes", "to":"aerograms"},
            {"from":"aeroplane", "to":"airplane"},
            {"from":"aeroplanes", "to":"airplanes"},
            {"from":"aesthete", "to":"esthete"},
            {"from":"aesthetes", "to":"esthetes"},
            {"from":"aesthetic", "to":"esthetic"},
            {"from":"aesthetically", "to":"esthetically"},
            {"from":"aesthetics", "to":"esthetics"},
            {"from":"aetiology", "to":"etiology"},
            {"from":"ageing", "to":"aging"},
            {"from":"aggrandisement", "to":"aggrandizement"},
            {"from":"agonise", "to":"agonize"},
            {"from":"agonised", "to":"agonized"},
            {"from":"agonises", "to":"agonizes"},
            {"from":"agonising", "to":"agonizing"},
            {"from":"agonisingly", "to":"agonizingly"},
            {"from":"almanack", "to":"almanac"},
            {"from":"almanacks", "to":"almanacs"},
            {"from":"aluminium", "to":"aluminum"},
            {"from":"amortisable", "to":"amortizable"},
            {"from":"amortisation", "to":"amortization"},
            {"from":"amortisations", "to":"amortizations"},
            {"from":"amortise", "to":"amortize"},
            {"from":"amortised", "to":"amortized"},
            {"from":"amortises", "to":"amortizes"},
            {"from":"amortising", "to":"amortizing"},
            {"from":"amphitheatre", "to":"amphitheater"},
            {"from":"amphitheatres", "to":"amphitheaters"},
            {"from":"anaemia", "to":"anemia"},
            {"from":"anaemic", "to":"anemic"},
            {"from":"anaesthesia", "to":"anesthesia"},
            {"from":"anaesthetic", "to":"anesthetic"},
            {"from":"anaesthetics", "to":"anesthetics"},
            {"from":"anaesthetise", "to":"anesthetize"},
            {"from":"anaesthetised", "to":"anesthetized"},
            {"from":"anaesthetises", "to":"anesthetizes"},
            {"from":"anaesthetising", "to":"anesthetizing"},
            {"from":"anaesthetist", "to":"anesthetist"},
            {"from":"anaesthetists", "to":"anesthetists"},
            {"from":"anaesthetize", "to":"anesthetize"},
            {"from":"anaesthetized", "to":"anesthetized"},
            {"from":"anaesthetizes", "to":"anesthetizes"},
            {"from":"anaesthetizing", "to":"anesthetizing"},
            {"from":"analogue", "to":"analog"},
            {"from":"analogues", "to":"analogs"},
            {"from":"analyse", "to":"analyze"},
            {"from":"analysed", "to":"analyzed"},
            {"from":"analyses", "to":"analyzes"},
            {"from":"analysing", "to":"analyzing"},
            {"from":"anglicise", "to":"anglicize"},
            {"from":"anglicised", "to":"anglicized"},
            {"from":"anglicises", "to":"anglicizes"},
            {"from":"anglicising", "to":"anglicizing"},
            {"from":"annualised", "to":"annualized"},
            {"from":"antagonise", "to":"antagonize"},
            {"from":"antagonised", "to":"antagonized"},
            {"from":"antagonises", "to":"antagonizes"},
            {"from":"antagonising", "to":"antagonizing"},
            {"from":"apologise", "to":"apologize"},
            {"from":"apologised", "to":"apologized"},
            {"from":"apologises", "to":"apologizes"},
            {"from":"apologising", "to":"apologizing"},
            {"from":"appal", "to":"appall"},
            {"from":"appals", "to":"appalls"},
            {"from":"appetiser", "to":"appetizer"},
            {"from":"appetisers", "to":"appetizers"},
            {"from":"appetising", "to":"appetizing"},
            {"from":"appetisingly", "to":"appetizingly"},
            {"from":"arbour", "to":"arbor"},
            {"from":"arbours", "to":"arbors"},
            {"from":"archaeological", "to":"archeological"},
            {"from":"archaeologically", "to":"archeologically"},
            {"from":"archaeologist", "to":"archeologist"},
            {"from":"archaeologists", "to":"archeologists"},
            {"from":"archaeology", "to":"archeology"},
            {"from":"ardour", "to":"ardor"},
            {"from":"armour", "to":"armor"},
            {"from":"armoured", "to":"armored"},
            {"from":"armourer", "to":"armorer"},
            {"from":"armourers", "to":"armorers"},
            {"from":"armouries", "to":"armories"},
            {"from":"armoury", "to":"armory"},
            {"from":"artefact", "to":"artifact"},
            {"from":"artefacts", "to":"artifacts"},
            {"from":"authorise", "to":"authorize"},
            {"from":"authorised", "to":"authorized"},
            {"from":"authorises", "to":"authorizes"},
            {"from":"authorising", "to":"authorizing"},
            {"from":"axe", "to":"ax"},
            {"from":"backpedalled", "to":"backpedaled"},
            {"from":"backpedalling", "to":"backpedaling"},
            {"from":"bannister", "to":"banister"},
            {"from":"bannisters", "to":"banisters"},
            {"from":"baptise", "to":"baptize"},
            {"from":"baptised", "to":"baptized"},
            {"from":"baptises", "to":"baptizes"},
            {"from":"baptising", "to":"baptizing"},
            {"from":"bastardise", "to":"bastardize"},
            {"from":"bastardised", "to":"bastardized"},
            {"from":"bastardises", "to":"bastardizes"},
            {"from":"bastardising", "to":"bastardizing"},
            {"from":"battleaxe", "to":"battleax"},
            {"from":"baulk", "to":"balk"},
            {"from":"baulked", "to":"balked"},
            {"from":"baulking", "to":"balking"},
            {"from":"baulks", "to":"balks"},
            {"from":"bedevilled", "to":"bedeviled"},
            {"from":"bedevilling", "to":"bedeviling"},
            {"from":"behaviour", "to":"behavior"},
            {"from":"behavioural", "to":"behavioral"},
            {"from":"behaviourism", "to":"behaviorism"},
            {"from":"behaviourist", "to":"behaviorist"},
            {"from":"behaviourists", "to":"behaviorists"},
            {"from":"behaviours", "to":"behaviors"},
            {"from":"behove", "to":"behoove"},
            {"from":"behoved", "to":"behooved"},
            {"from":"behoves", "to":"behooves"},
            {"from":"bejewelled", "to":"bejeweled"},
            {"from":"belabour", "to":"belabor"},
            {"from":"belaboured", "to":"belabored"},
            {"from":"belabouring", "to":"belaboring"},
            {"from":"belabours", "to":"belabors"},
            {"from":"bevelled", "to":"beveled"},
            {"from":"bevvies", "to":"bevies"},
            {"from":"bevvy", "to":"bevy"},
            {"from":"biassed", "to":"biased"},
            {"from":"biassing", "to":"biasing"},
            {"from":"bingeing", "to":"binging"},
            {"from":"bougainvillaea", "to":"bougainvillea"},
            {"from":"bougainvillaeas", "to":"bougainvilleas"},
            {"from":"bowdlerise", "to":"bowdlerize"},
            {"from":"bowdlerised", "to":"bowdlerized"},
            {"from":"bowdlerises", "to":"bowdlerizes"},
            {"from":"bowdlerising", "to":"bowdlerizing"},
            {"from":"breathalyse", "to":"breathalyze"},
            {"from":"breathalysed", "to":"breathalyzed"},
            {"from":"breathalyser", "to":"breathalyzer"},
            {"from":"breathalysers", "to":"breathalyzers"},
            {"from":"breathalyses", "to":"breathalyzes"},
            {"from":"breathalysing", "to":"breathalyzing"},
            {"from":"brutalise", "to":"brutalize"},
            {"from":"brutalised", "to":"brutalized"},
            {"from":"brutalises", "to":"brutalizes"},
            {"from":"brutalising", "to":"brutalizing"},
            {"from":"buses", "to":"busses"},
            {"from":"busing", "to":"bussing"},
            {"from":"caesarean", "to":"cesarean"},
            {"from":"caesareans", "to":"cesareans"},
            {"from":"calibre", "to":"caliber"},
            {"from":"calibres", "to":"calibers"},
            {"from":"calliper", "to":"caliper"},
            {"from":"callipers", "to":"calipers"},
            {"from":"callisthenics", "to":"calisthenics"},
            {"from":"canalise", "to":"canalize"},
            {"from":"canalised", "to":"canalized"},
            {"from":"canalises", "to":"canalizes"},
            {"from":"canalising", "to":"canalizing"},
            {"from":"cancellation", "to":"cancelation"},
            {"from":"cancellations", "to":"cancelations"},
            {"from":"cancelled", "to":"canceled"},
            {"from":"cancelling", "to":"canceling"},
            {"from":"candour", "to":"candor"},
            {"from":"cannibalise", "to":"cannibalize"},
            {"from":"cannibalised", "to":"cannibalized"},
            {"from":"cannibalises", "to":"cannibalizes"},
            {"from":"cannibalising", "to":"cannibalizing"},
            {"from":"canonise", "to":"canonize"},
            {"from":"canonised", "to":"canonized"},
            {"from":"canonises", "to":"canonizes"},
            {"from":"canonising", "to":"canonizing"},
            {"from":"capitalise", "to":"capitalize"},
            {"from":"capitalised", "to":"capitalized"},
            {"from":"capitalises", "to":"capitalizes"},
            {"from":"capitalising", "to":"capitalizing"},
            {"from":"caramelise", "to":"caramelize"},
            {"from":"caramelised", "to":"caramelized"},
            {"from":"caramelises", "to":"caramelizes"},
            {"from":"caramelising", "to":"caramelizing"},
            {"from":"carbonise", "to":"carbonize"},
            {"from":"carbonised", "to":"carbonized"},
            {"from":"carbonises", "to":"carbonizes"},
            {"from":"carbonising", "to":"carbonizing"},
            {"from":"carolled", "to":"caroled"},
            {"from":"carolling", "to":"caroling"},
            {"from":"catalogue", "to":"catalog"},
            {"from":"catalogued", "to":"cataloged"},
            {"from":"catalogues", "to":"catalogs"},
            {"from":"cataloguing", "to":"cataloging"},
            {"from":"catalyse", "to":"catalyze"},
            {"from":"catalysed", "to":"catalyzed"},
            {"from":"catalyses", "to":"catalyzes"},
            {"from":"catalysing", "to":"catalyzing"},
            {"from":"categorise", "to":"categorize"},
            {"from":"categorised", "to":"categorized"},
            {"from":"categorises", "to":"categorizes"},
            {"from":"categorising", "to":"categorizing"},
            {"from":"cauterise", "to":"cauterize"},
            {"from":"cauterised", "to":"cauterized"},
            {"from":"cauterises", "to":"cauterizes"},
            {"from":"cauterising", "to":"cauterizing"},
            {"from":"cavilled", "to":"caviled"},
            {"from":"cavilling", "to":"caviling"},
            {"from":"centigramme", "to":"centigram"},
            {"from":"centigrammes", "to":"centigrams"},
            {"from":"centilitre", "to":"centiliter"},
            {"from":"centilitres", "to":"centiliters"},
            {"from":"centimetre", "to":"centimeter"},
            {"from":"centimetres", "to":"centimeters"},
            {"from":"centralise", "to":"centralize"},
            {"from":"centralised", "to":"centralized"},
            {"from":"centralises", "to":"centralizes"},
            {"from":"centralising", "to":"centralizing"},
            {"from":"centre", "to":"center"},
            {"from":"centred", "to":"centered"},
            {"from":"centrefold", "to":"centerfold"},
            {"from":"centrefolds", "to":"centerfolds"},
            {"from":"centrepiece", "to":"centerpiece"},
            {"from":"centrepieces", "to":"centerpieces"},
            {"from":"centres", "to":"centers"},
            {"from":"channelled", "to":"channeled"},
            {"from":"channelling", "to":"channeling"},
            {"from":"characterise", "to":"characterize"},
            {"from":"characterised", "to":"characterized"},
            {"from":"characterises", "to":"characterizes"},
            {"from":"characterising", "to":"characterizing"},
            {"from":"cheque", "to":"check"},
            {"from":"chequebook", "to":"checkbook"},
            {"from":"chequebooks", "to":"checkbooks"},
            {"from":"chequered", "to":"checkered"},
            {"from":"cheques", "to":"checks"},
            {"from":"chilli", "to":"chili"},
            {"from":"chimaera", "to":"chimera"},
            {"from":"chimaeras", "to":"chimeras"},
            {"from":"chiselled", "to":"chiseled"},
            {"from":"chiselling", "to":"chiseling"},
            {"from":"circularise", "to":"circularize"},
            {"from":"circularised", "to":"circularized"},
            {"from":"circularises", "to":"circularizes"},
            {"from":"circularising", "to":"circularizing"},
            {"from":"civilise", "to":"civilize"},
            {"from":"civilised", "to":"civilized"},
            {"from":"civilises", "to":"civilizes"},
            {"from":"civilising", "to":"civilizing"},
            {"from":"clamour", "to":"clamor"},
            {"from":"clamoured", "to":"clamored"},
            {"from":"clamouring", "to":"clamoring"},
            {"from":"clamours", "to":"clamors"},
            {"from":"clangour", "to":"clangor"},
            {"from":"clarinettist", "to":"clarinetist"},
            {"from":"clarinettists", "to":"clarinetists"},
            {"from":"collectivise", "to":"collectivize"},
            {"from":"collectivised", "to":"collectivized"},
            {"from":"collectivises", "to":"collectivizes"},
            {"from":"collectivising", "to":"collectivizing"},
            {"from":"colonisation", "to":"colonization"},
            {"from":"colonise", "to":"colonize"},
            {"from":"colonised", "to":"colonized"},
            {"from":"coloniser", "to":"colonizer"},
            {"from":"colonisers", "to":"colonizers"},
            {"from":"colonises", "to":"colonizes"},
            {"from":"colonising", "to":"colonizing"},
            {"from":"colour", "to":"color"},
            {"from":"colourant", "to":"colorant"},
            {"from":"colourants", "to":"colorants"},
            {"from":"coloured", "to":"colored"},
            {"from":"coloureds", "to":"coloreds"},
            {"from":"colourful", "to":"colorful"},
            {"from":"colourfully", "to":"colorfully"},
            {"from":"colouring", "to":"coloring"},
            {"from":"colourize", "to":"colorize"},
            {"from":"colourized", "to":"colorized"},
            {"from":"colourizes", "to":"colorizes"},
            {"from":"colourizing", "to":"colorizing"},
            {"from":"colourless", "to":"colorless"},
            {"from":"colours", "to":"colors"},
            {"from":"commercialise", "to":"commercialize"},
            {"from":"commercialised", "to":"commercialized"},
            {"from":"commercialises", "to":"commercializes"},
            {"from":"commercialising", "to":"commercializing"},
            {"from":"compartmentalise", "to":"compartmentalize"},
            {"from":"compartmentalised", "to":"compartmentalized"},
            {"from":"compartmentalises", "to":"compartmentalizes"},
            {"from":"compartmentalising", "to":"compartmentalizing"},
            {"from":"computerise", "to":"computerize"},
            {"from":"computerised", "to":"computerized"},
            {"from":"computerises", "to":"computerizes"},
            {"from":"computerising", "to":"computerizing"},
            {"from":"conceptualise", "to":"conceptualize"},
            {"from":"conceptualised", "to":"conceptualized"},
            {"from":"conceptualises", "to":"conceptualizes"},
            {"from":"conceptualising", "to":"conceptualizing"},
            {"from":"connexion", "to":"connection"},
            {"from":"connexions", "to":"connections"},
            {"from":"contextualise", "to":"contextualize"},
            {"from":"contextualised", "to":"contextualized"},
            {"from":"contextualises", "to":"contextualizes"},
            {"from":"contextualising", "to":"contextualizing"},
            {"from":"cosier", "to":"cozier"},
            {"from":"cosies", "to":"cozies"},
            {"from":"cosiest", "to":"coziest"},
            {"from":"cosily", "to":"cozily"},
            {"from":"cosiness", "to":"coziness"},
            {"from":"cosy", "to":"cozy"},
            {"from":"councillor", "to":"councilor"},
            {"from":"councillors", "to":"councilors"},
            {"from":"counselled", "to":"counseled"},
            {"from":"counselling", "to":"counseling"},
            {"from":"counsellor", "to":"counselor"},
            {"from":"counsellors", "to":"counselors"},
            {"from":"crenellated", "to":"crenelated"},
            {"from":"criminalise", "to":"criminalize"},
            {"from":"criminalised", "to":"criminalized"},
            {"from":"criminalises", "to":"criminalizes"},
            {"from":"criminalising", "to":"criminalizing"},
            {"from":"criticise", "to":"criticize"},
            {"from":"criticised", "to":"criticized"},
            {"from":"criticises", "to":"criticizes"},
            {"from":"criticising", "to":"criticizing"},
            {"from":"crueller", "to":"crueler"},
            {"from":"cruellest", "to":"cruelest"},
            {"from":"crystallisation", "to":"crystallization"},
            {"from":"crystallise", "to":"crystallize"},
            {"from":"crystallised", "to":"crystallized"},
            {"from":"crystallises", "to":"crystallizes"},
            {"from":"crystallising", "to":"crystallizing"},
            {"from":"cudgelled", "to":"cudgeled"},
            {"from":"cudgelling", "to":"cudgeling"},
            {"from":"customise", "to":"customize"},
            {"from":"customised", "to":"customized"},
            {"from":"customises", "to":"customizes"},
            {"from":"customising", "to":"customizing"},
            {"from":"cypher", "to":"cipher"},
            {"from":"cyphers", "to":"ciphers"},
            {"from":"decentralisation", "to":"decentralization"},
            {"from":"decentralise", "to":"decentralize"},
            {"from":"decentralised", "to":"decentralized"},
            {"from":"decentralises", "to":"decentralizes"},
            {"from":"decentralising", "to":"decentralizing"},
            {"from":"decriminalisation", "to":"decriminalization"},
            {"from":"decriminalise", "to":"decriminalize"},
            {"from":"decriminalised", "to":"decriminalized"},
            {"from":"decriminalises", "to":"decriminalizes"},
            {"from":"decriminalising", "to":"decriminalizing"},
            {"from":"defence", "to":"defense"},
            {"from":"defenceless", "to":"defenseless"},
            {"from":"defences", "to":"defenses"},
            {"from":"dehumanisation", "to":"dehumanization"},
            {"from":"dehumanise", "to":"dehumanize"},
            {"from":"dehumanised", "to":"dehumanized"},
            {"from":"dehumanises", "to":"dehumanizes"},
            {"from":"dehumanising", "to":"dehumanizing"},
            {"from":"demeanour", "to":"demeanor"},
            {"from":"demilitarisation", "to":"demilitarization"},
            {"from":"demilitarise", "to":"demilitarize"},
            {"from":"demilitarised", "to":"demilitarized"},
            {"from":"demilitarises", "to":"demilitarizes"},
            {"from":"demilitarising", "to":"demilitarizing"},
            {"from":"demobilisation", "to":"demobilization"},
            {"from":"demobilise", "to":"demobilize"},
            {"from":"demobilised", "to":"demobilized"},
            {"from":"demobilises", "to":"demobilizes"},
            {"from":"demobilising", "to":"demobilizing"},
            {"from":"democratisation", "to":"democratization"},
            {"from":"democratise", "to":"democratize"},
            {"from":"democratised", "to":"democratized"},
            {"from":"democratises", "to":"democratizes"},
            {"from":"democratising", "to":"democratizing"},
            {"from":"demonise", "to":"demonize"},
            {"from":"demonised", "to":"demonized"},
            {"from":"demonises", "to":"demonizes"},
            {"from":"demonising", "to":"demonizing"},
            {"from":"demoralisation", "to":"demoralization"},
            {"from":"demoralise", "to":"demoralize"},
            {"from":"demoralised", "to":"demoralized"},
            {"from":"demoralises", "to":"demoralizes"},
            {"from":"demoralising", "to":"demoralizing"},
            {"from":"denationalisation", "to":"denationalization"},
            {"from":"denationalise", "to":"denationalize"},
            {"from":"denationalised", "to":"denationalized"},
            {"from":"denationalises", "to":"denationalizes"},
            {"from":"denationalising", "to":"denationalizing"},
            {"from":"deodorise", "to":"deodorize"},
            {"from":"deodorised", "to":"deodorized"},
            {"from":"deodorises", "to":"deodorizes"},
            {"from":"deodorising", "to":"deodorizing"},
            {"from":"depersonalise", "to":"depersonalize"},
            {"from":"depersonalised", "to":"depersonalized"},
            {"from":"depersonalises", "to":"depersonalizes"},
            {"from":"depersonalising", "to":"depersonalizing"},
            {"from":"deputise", "to":"deputize"},
            {"from":"deputised", "to":"deputized"},
            {"from":"deputises", "to":"deputizes"},
            {"from":"deputising", "to":"deputizing"},
            {"from":"desensitisation", "to":"desensitization"},
            {"from":"desensitise", "to":"desensitize"},
            {"from":"desensitised", "to":"desensitized"},
            {"from":"desensitises", "to":"desensitizes"},
            {"from":"desensitising", "to":"desensitizing"},
            {"from":"destabilisation", "to":"destabilization"},
            {"from":"destabilise", "to":"destabilize"},
            {"from":"destabilised", "to":"destabilized"},
            {"from":"destabilises", "to":"destabilizes"},
            {"from":"destabilising", "to":"destabilizing"},
            {"from":"dialled", "to":"dialed"},
            {"from":"dialling", "to":"dialing"},
            {"from":"dialogue", "to":"dialog"},
            {"from":"dialogues", "to":"dialogs"},
            {"from":"diarrhoea", "to":"diarrhea"},
            {"from":"digitise", "to":"digitize"},
            {"from":"digitised", "to":"digitized"},
            {"from":"digitises", "to":"digitizes"},
            {"from":"digitising", "to":"digitizing"},
            {"from":"disc", "to":"disk"},
            {"from":"discolour", "to":"discolor"},
            {"from":"discoloured", "to":"discolored"},
            {"from":"discolouring", "to":"discoloring"},
            {"from":"discolours", "to":"discolors"},
            {"from":"discs", "to":"disks"},
            {"from":"disembowelled", "to":"disemboweled"},
            {"from":"disembowelling", "to":"disemboweling"},
            {"from":"disfavour", "to":"disfavor"},
            {"from":"dishevelled", "to":"disheveled"},
            {"from":"dishonour", "to":"dishonor"},
            {"from":"dishonourable", "to":"dishonorable"},
            {"from":"dishonourably", "to":"dishonorably"},
            {"from":"dishonoured", "to":"dishonored"},
            {"from":"dishonouring", "to":"dishonoring"},
            {"from":"dishonours", "to":"dishonors"},
            {"from":"disorganisation", "to":"disorganization"},
            {"from":"disorganised", "to":"disorganized"},
            {"from":"distil", "to":"distill"},
            {"from":"distils", "to":"distills"},
            {"from":"dramatisation", "to":"dramatization"},
            {"from":"dramatisations", "to":"dramatizations"},
            {"from":"dramatise", "to":"dramatize"},
            {"from":"dramatised", "to":"dramatized"},
            {"from":"dramatises", "to":"dramatizes"},
            {"from":"dramatising", "to":"dramatizing"},
            {"from":"draught", "to":"draft"},
            {"from":"draughtboard", "to":"draftboard"},
            {"from":"draughtboards", "to":"draftboards"},
            {"from":"draughtier", "to":"draftier"},
            {"from":"draughtiest", "to":"draftiest"},
            {"from":"draughts", "to":"drafts"},
            {"from":"draughtsman", "to":"draftsman"},
            {"from":"draughtsmanship", "to":"draftsmanship"},
            {"from":"draughtsmen", "to":"draftsmen"},
            {"from":"draughtswoman", "to":"draftswoman"},
            {"from":"draughtswomen", "to":"draftswomen"},
            {"from":"draughty", "to":"drafty"},
            {"from":"drivelled", "to":"driveled"},
            {"from":"drivelling", "to":"driveling"},
            {"from":"duelled", "to":"dueled"},
            {"from":"duelling", "to":"dueling"},
            {"from":"economise", "to":"economize"},
            {"from":"economised", "to":"economized"},
            {"from":"economises", "to":"economizes"},
            {"from":"economising", "to":"economizing"},
            {"from":"edoema", "to":"edema"},
            {"from":"editorialise", "to":"editorialize"},
            {"from":"editorialised", "to":"editorialized"},
            {"from":"editorialises", "to":"editorializes"},
            {"from":"editorialising", "to":"editorializing"},
            {"from":"empathise", "to":"empathize"},
            {"from":"empathised", "to":"empathized"},
            {"from":"empathises", "to":"empathizes"},
            {"from":"empathising", "to":"empathizing"},
            {"from":"emphasise", "to":"emphasize"},
            {"from":"emphasised", "to":"emphasized"},
            {"from":"emphasises", "to":"emphasizes"},
            {"from":"emphasising", "to":"emphasizing"},
            {"from":"enamelled", "to":"enameled"},
            {"from":"enamelling", "to":"enameling"},
            {"from":"enamoured", "to":"enamored"},
            {"from":"encyclopaedia", "to":"encyclopedia"},
            {"from":"encyclopaedias", "to":"encyclopedias"},
            {"from":"encyclopaedic", "to":"encyclopedic"},
            {"from":"endeavour", "to":"endeavor"},
            {"from":"endeavoured", "to":"endeavored"},
            {"from":"endeavouring", "to":"endeavoring"},
            {"from":"endeavours", "to":"endeavors"},
            {"from":"energise", "to":"energize"},
            {"from":"energised", "to":"energized"},
            {"from":"energises", "to":"energizes"},
            {"from":"energising", "to":"energizing"},
            {"from":"enrol", "to":"enroll"},
            {"from":"enrols", "to":"enrolls"},
            {"from":"enthral", "to":"enthrall"},
            {"from":"enthrals", "to":"enthralls"},
            {"from":"epaulette", "to":"epaulet"},
            {"from":"epaulettes", "to":"epaulets"},
            {"from":"epicentre", "to":"epicenter"},
            {"from":"epicentres", "to":"epicenters"},
            {"from":"epilogue", "to":"epilog"},
            {"from":"epilogues", "to":"epilogs"},
            {"from":"epitomise", "to":"epitomize"},
            {"from":"epitomised", "to":"epitomized"},
            {"from":"epitomises", "to":"epitomizes"},
            {"from":"epitomising", "to":"epitomizing"},
            {"from":"equalisation", "to":"equalization"},
            {"from":"equalise", "to":"equalize"},
            {"from":"equalised", "to":"equalized"},
            {"from":"equaliser", "to":"equalizer"},
            {"from":"equalisers", "to":"equalizers"},
            {"from":"equalises", "to":"equalizes"},
            {"from":"equalising", "to":"equalizing"},
            {"from":"eulogise", "to":"eulogize"},
            {"from":"eulogised", "to":"eulogized"},
            {"from":"eulogises", "to":"eulogizes"},
            {"from":"eulogising", "to":"eulogizing"},
            {"from":"evangelise", "to":"evangelize"},
            {"from":"evangelised", "to":"evangelized"},
            {"from":"evangelises", "to":"evangelizes"},
            {"from":"evangelising", "to":"evangelizing"},
            {"from":"exorcise", "to":"exorcize"},
            {"from":"exorcised", "to":"exorcized"},
            {"from":"exorcises", "to":"exorcizes"},
            {"from":"exorcising", "to":"exorcizing"},
            {"from":"extemporisation", "to":"extemporization"},
            {"from":"extemporise", "to":"extemporize"},
            {"from":"extemporised", "to":"extemporized"},
            {"from":"extemporises", "to":"extemporizes"},
            {"from":"extemporising", "to":"extemporizing"},
            {"from":"externalisation", "to":"externalization"},
            {"from":"externalisations", "to":"externalizations"},
            {"from":"externalise", "to":"externalize"},
            {"from":"externalised", "to":"externalized"},
            {"from":"externalises", "to":"externalizes"},
            {"from":"externalising", "to":"externalizing"},
            {"from":"factorise", "to":"factorize"},
            {"from":"factorised", "to":"factorized"},
            {"from":"factorises", "to":"factorizes"},
            {"from":"factorising", "to":"factorizing"},
            {"from":"faecal", "to":"fecal"},
            {"from":"faeces", "to":"feces"},
            {"from":"familiarisation", "to":"familiarization"},
            {"from":"familiarise", "to":"familiarize"},
            {"from":"familiarised", "to":"familiarized"},
            {"from":"familiarises", "to":"familiarizes"},
            {"from":"familiarising", "to":"familiarizing"},
            {"from":"fantasise", "to":"fantasize"},
            {"from":"fantasised", "to":"fantasized"},
            {"from":"fantasises", "to":"fantasizes"},
            {"from":"fantasising", "to":"fantasizing"},
            {"from":"favour", "to":"favor"},
            {"from":"favourable", "to":"favorable"},
            {"from":"favourably", "to":"favorably"},
            {"from":"favoured", "to":"favored"},
            {"from":"favouring", "to":"favoring"},
            {"from":"favourite", "to":"favorite"},
            {"from":"favourites", "to":"favorites"},
            {"from":"favouritism", "to":"favoritism"},
            {"from":"favours", "to":"favors"},
            {"from":"feminise", "to":"feminize"},
            {"from":"feminised", "to":"feminized"},
            {"from":"feminises", "to":"feminizes"},
            {"from":"feminising", "to":"feminizing"},
            {"from":"fertilisation", "to":"fertilization"},
            {"from":"fertilise", "to":"fertilize"},
            {"from":"fertilised", "to":"fertilized"},
            {"from":"fertiliser", "to":"fertilizer"},
            {"from":"fertilisers", "to":"fertilizers"},
            {"from":"fertilises", "to":"fertilizes"},
            {"from":"fertilising", "to":"fertilizing"},
            {"from":"fervour", "to":"fervor"},
            {"from":"fibre", "to":"fiber"},
            {"from":"fibreglass", "to":"fiberglass"},
            {"from":"fibres", "to":"fibers"},
            {"from":"fictionalisation", "to":"fictionalization"},
            {"from":"fictionalisations", "to":"fictionalizations"},
            {"from":"fictionalise", "to":"fictionalize"},
            {"from":"fictionalised", "to":"fictionalized"},
            {"from":"fictionalises", "to":"fictionalizes"},
            {"from":"fictionalising", "to":"fictionalizing"},
            {"from":"fillet", "to":"filet"},
            {"from":"filleted", "to":"fileted"},
            {"from":"filleting", "to":"fileting"},
            {"from":"fillets", "to":"filets"},
            {"from":"finalisation", "to":"finalization"},
            {"from":"finalise", "to":"finalize"},
            {"from":"finalised", "to":"finalized"},
            {"from":"finalises", "to":"finalizes"},
            {"from":"finalising", "to":"finalizing"},
            {"from":"flautist", "to":"flutist"},
            {"from":"flautists", "to":"flutists"},
            {"from":"flavour", "to":"flavor"},
            {"from":"flavoured", "to":"flavored"},
            {"from":"flavouring", "to":"flavoring"},
            {"from":"flavourings", "to":"flavorings"},
            {"from":"flavourless", "to":"flavorless"},
            {"from":"flavours", "to":"flavors"},
            {"from":"flavoursome", "to":"flavorsome"},
            {"from":"flyer / flier", "to":"flier / flyer"},
            {"from":"foetal", "to":"fetal"},
            {"from":"foetid", "to":"fetid"},
            {"from":"foetus", "to":"fetus"},
            {"from":"foetuses", "to":"fetuses"},
            {"from":"formalisation", "to":"formalization"},
            {"from":"formalise", "to":"formalize"},
            {"from":"formalised", "to":"formalized"},
            {"from":"formalises", "to":"formalizes"},
            {"from":"formalising", "to":"formalizing"},
            {"from":"fossilisation", "to":"fossilization"},
            {"from":"fossilise", "to":"fossilize"},
            {"from":"fossilised", "to":"fossilized"},
            {"from":"fossilises", "to":"fossilizes"},
            {"from":"fossilising", "to":"fossilizing"},
            {"from":"fraternisation", "to":"fraternization"},
            {"from":"fraternise", "to":"fraternize"},
            {"from":"fraternised", "to":"fraternized"},
            {"from":"fraternises", "to":"fraternizes"},
            {"from":"fraternising", "to":"fraternizing"},
            {"from":"fulfil", "to":"fulfill"},
            {"from":"fulfilment", "to":"fulfillment"},
            {"from":"fulfils", "to":"fulfills"},
            {"from":"funnelled", "to":"funneled"},
            {"from":"funnelling", "to":"funneling"},
            {"from":"galvanise", "to":"galvanize"},
            {"from":"galvanised", "to":"galvanized"},
            {"from":"galvanises", "to":"galvanizes"},
            {"from":"galvanising", "to":"galvanizing"},
            {"from":"gambolled", "to":"gamboled"},
            {"from":"gambolling", "to":"gamboling"},
            {"from":"gaol", "to":"jail"},
            {"from":"gaolbird", "to":"jailbird"},
            {"from":"gaolbirds", "to":"jailbirds"},
            {"from":"gaolbreak", "to":"jailbreak"},
            {"from":"gaolbreaks", "to":"jailbreaks"},
            {"from":"gaoled", "to":"jailed"},
            {"from":"gaoler", "to":"jailer"},
            {"from":"gaolers", "to":"jailers"},
            {"from":"gaoling", "to":"jailing"},
            {"from":"gaols", "to":"jails"},
            {"from":"gases", "to":"gasses"},
            {"from":"gauge", "to":"gage"},
            {"from":"gauged", "to":"gaged"},
            {"from":"gauges", "to":"gages"},
            {"from":"gauging", "to":"gaging"},
            {"from":"generalisation", "to":"generalization"},
            {"from":"generalisations", "to":"generalizations"},
            {"from":"generalise", "to":"generalize"},
            {"from":"generalised", "to":"generalized"},
            {"from":"generalises", "to":"generalizes"},
            {"from":"generalising", "to":"generalizing"},
            {"from":"ghettoise", "to":"ghettoize"},
            {"from":"ghettoised", "to":"ghettoized"},
            {"from":"ghettoises", "to":"ghettoizes"},
            {"from":"ghettoising", "to":"ghettoizing"},
            {"from":"gipsies", "to":"gypsies"},
            {"from":"glamorise", "to":"glamorize"},
            {"from":"glamorised", "to":"glamorized"},
            {"from":"glamorises", "to":"glamorizes"},
            {"from":"glamorising", "to":"glamorizing"},
            {"from":"glamour", "to":"glamor"},
            {"from":"globalisation", "to":"globalization"},
            {"from":"globalise", "to":"globalize"},
            {"from":"globalised", "to":"globalized"},
            {"from":"globalises", "to":"globalizes"},
            {"from":"globalising", "to":"globalizing"},
            {"from":"glueing", "to":"gluing"},
            {"from":"goitre", "to":"goiter"},
            {"from":"goitres", "to":"goiters"},
            {"from":"gonorrhoea", "to":"gonorrhea"},
            {"from":"gramme", "to":"gram"},
            {"from":"grammes", "to":"grams"},
            {"from":"gravelled", "to":"graveled"},
            {"from":"grey", "to":"gray"},
            {"from":"greyed", "to":"grayed"},
            {"from":"greying", "to":"graying"},
            {"from":"greyish", "to":"grayish"},
            {"from":"greyness", "to":"grayness"},
            {"from":"greys", "to":"grays"},
            {"from":"grovelled", "to":"groveled"},
            {"from":"grovelling", "to":"groveling"},
            {"from":"groyne", "to":"groin"},
            {"from":"groynes", "to":"groins"},
            {"from":"gruelling", "to":"grueling"},
            {"from":"gruellingly", "to":"gruelingly"},
            {"from":"gryphon", "to":"griffin"},
            {"from":"gryphons", "to":"griffins"},
            {"from":"gynaecological", "to":"gynecological"},
            {"from":"gynaecologist", "to":"gynecologist"},
            {"from":"gynaecologists", "to":"gynecologists"},
            {"from":"gynaecology", "to":"gynecology"},
            {"from":"haematological", "to":"hematological"},
            {"from":"haematologist", "to":"hematologist"},
            {"from":"haematologists", "to":"hematologists"},
            {"from":"haematology", "to":"hematology"},
            {"from":"haemoglobin", "to":"hemoglobin"},
            {"from":"haemophilia", "to":"hemophilia"},
            {"from":"haemophiliac", "to":"hemophiliac"},
            {"from":"haemophiliacs", "to":"hemophiliacs"},
            {"from":"haemorrhage", "to":"hemorrhage"},
            {"from":"haemorrhaged", "to":"hemorrhaged"},
            {"from":"haemorrhages", "to":"hemorrhages"},
            {"from":"haemorrhaging", "to":"hemorrhaging"},
            {"from":"haemorrhoids", "to":"hemorrhoids"},
            {"from":"harbour", "to":"harbor"},
            {"from":"harboured", "to":"harbored"},
            {"from":"harbouring", "to":"harboring"},
            {"from":"harbours", "to":"harbors"},
            {"from":"harmonisation", "to":"harmonization"},
            {"from":"harmonise", "to":"harmonize"},
            {"from":"harmonised", "to":"harmonized"},
            {"from":"harmonises", "to":"harmonizes"},
            {"from":"harmonising", "to":"harmonizing"},
            {"from":"homoeopath", "to":"homeopath"},
            {"from":"homoeopathic", "to":"homeopathic"},
            {"from":"homoeopaths", "to":"homeopaths"},
            {"from":"homoeopathy", "to":"homeopathy"},
            {"from":"homogenise", "to":"homogenize"},
            {"from":"homogenised", "to":"homogenized"},
            {"from":"homogenises", "to":"homogenizes"},
            {"from":"homogenising", "to":"homogenizing"},
            {"from":"honour", "to":"honor"},
            {"from":"honourable", "to":"honorable"},
            {"from":"honourably", "to":"honorably"},
            {"from":"honoured", "to":"honored"},
            {"from":"honouring", "to":"honoring"},
            {"from":"honours", "to":"honors"},
            {"from":"hospitalisation", "to":"hospitalization"},
            {"from":"hospitalise", "to":"hospitalize"},
            {"from":"hospitalised", "to":"hospitalized"},
            {"from":"hospitalises", "to":"hospitalizes"},
            {"from":"hospitalising", "to":"hospitalizing"},
            {"from":"humanise", "to":"humanize"},
            {"from":"humanised", "to":"humanized"},
            {"from":"humanises", "to":"humanizes"},
            {"from":"humanising", "to":"humanizing"},
            {"from":"humour", "to":"humor"},
            {"from":"humoured", "to":"humored"},
            {"from":"humouring", "to":"humoring"},
            {"from":"humourless", "to":"humorless"},
            {"from":"humours", "to":"humors"},
            {"from":"hybridise", "to":"hybridize"},
            {"from":"hybridised", "to":"hybridized"},
            {"from":"hybridises", "to":"hybridizes"},
            {"from":"hybridising", "to":"hybridizing"},
            {"from":"hypnotise", "to":"hypnotize"},
            {"from":"hypnotised", "to":"hypnotized"},
            {"from":"hypnotises", "to":"hypnotizes"},
            {"from":"hypnotising", "to":"hypnotizing"},
            {"from":"hypothesise", "to":"hypothesize"},
            {"from":"hypothesised", "to":"hypothesized"},
            {"from":"hypothesises", "to":"hypothesizes"},
            {"from":"hypothesising", "to":"hypothesizing"},
            {"from":"idealisation", "to":"idealization"},
            {"from":"idealise", "to":"idealize"},
            {"from":"idealised", "to":"idealized"},
            {"from":"idealises", "to":"idealizes"},
            {"from":"idealising", "to":"idealizing"},
            {"from":"idolise", "to":"idolize"},
            {"from":"idolised", "to":"idolized"},
            {"from":"idolises", "to":"idolizes"},
            {"from":"idolising", "to":"idolizing"},
            {"from":"immobilisation", "to":"immobilization"},
            {"from":"immobilise", "to":"immobilize"},
            {"from":"immobilised", "to":"immobilized"},
            {"from":"immobiliser", "to":"immobilizer"},
            {"from":"immobilisers", "to":"immobilizers"},
            {"from":"immobilises", "to":"immobilizes"},
            {"from":"immobilising", "to":"immobilizing"},
            {"from":"immortalise", "to":"immortalize"},
            {"from":"immortalised", "to":"immortalized"},
            {"from":"immortalises", "to":"immortalizes"},
            {"from":"immortalising", "to":"immortalizing"},
            {"from":"immunisation", "to":"immunization"},
            {"from":"immunise", "to":"immunize"},
            {"from":"immunised", "to":"immunized"},
            {"from":"immunises", "to":"immunizes"},
            {"from":"immunising", "to":"immunizing"},
            {"from":"impanelled", "to":"impaneled"},
            {"from":"impanelling", "to":"impaneling"},
            {"from":"imperilled", "to":"imperiled"},
            {"from":"imperilling", "to":"imperiling"},
            {"from":"individualise", "to":"individualize"},
            {"from":"individualised", "to":"individualized"},
            {"from":"individualises", "to":"individualizes"},
            {"from":"individualising", "to":"individualizing"},
            {"from":"industrialise", "to":"industrialize"},
            {"from":"industrialised", "to":"industrialized"},
            {"from":"industrialises", "to":"industrializes"},
            {"from":"industrialising", "to":"industrializing"},
            {"from":"inflexion", "to":"inflection"},
            {"from":"inflexions", "to":"inflections"},
            {"from":"initialise", "to":"initialize"},
            {"from":"initialised", "to":"initialized"},
            {"from":"initialises", "to":"initializes"},
            {"from":"initialising", "to":"initializing"},
            {"from":"initialled", "to":"initialed"},
            {"from":"initialling", "to":"initialing"},
            {"from":"instal", "to":"install"},
            {"from":"instalment", "to":"installment"},
            {"from":"instalments", "to":"installments"},
            {"from":"instals", "to":"installs"},
            {"from":"instil", "to":"instill"},
            {"from":"instils", "to":"instills"},
            {"from":"institutionalisation", "to":"institutionalization"},
            {"from":"institutionalise", "to":"institutionalize"},
            {"from":"institutionalised", "to":"institutionalized"},
            {"from":"institutionalises", "to":"institutionalizes"},
            {"from":"institutionalising", "to":"institutionalizing"},
            {"from":"intellectualise", "to":"intellectualize"},
            {"from":"intellectualised", "to":"intellectualized"},
            {"from":"intellectualises", "to":"intellectualizes"},
            {"from":"intellectualising", "to":"intellectualizing"},
            {"from":"internalisation", "to":"internalization"},
            {"from":"internalise", "to":"internalize"},
            {"from":"internalised", "to":"internalized"},
            {"from":"internalises", "to":"internalizes"},
            {"from":"internalising", "to":"internalizing"},
            {"from":"internationalisation", "to":"internationalization"},
            {"from":"internationalise", "to":"internationalize"},
            {"from":"internationalised", "to":"internationalized"},
            {"from":"internationalises", "to":"internationalizes"},
            {"from":"internationalising", "to":"internationalizing"},
            {"from":"ionisation", "to":"ionization"},
            {"from":"ionise", "to":"ionize"},
            {"from":"ionised", "to":"ionized"},
            {"from":"ioniser", "to":"ionizer"},
            {"from":"ionisers", "to":"ionizers"},
            {"from":"ionises", "to":"ionizes"},
            {"from":"ionising", "to":"ionizing"},
            {"from":"italicise", "to":"italicize"},
            {"from":"italicised", "to":"italicized"},
            {"from":"italicises", "to":"italicizes"},
            {"from":"italicising", "to":"italicizing"},
            {"from":"itemise", "to":"itemize"},
            {"from":"itemised", "to":"itemized"},
            {"from":"itemises", "to":"itemizes"},
            {"from":"itemising", "to":"itemizing"},
            {"from":"jeopardise", "to":"jeopardize"},
            {"from":"jeopardised", "to":"jeopardized"},
            {"from":"jeopardises", "to":"jeopardizes"},
            {"from":"jeopardising", "to":"jeopardizing"},
            {"from":"jewelled", "to":"jeweled"},
            {"from":"jeweller", "to":"jeweler"},
            {"from":"jewellers", "to":"jewelers"},
            {"from":"jewellery", "to":"jewelry"},
            {"from":"judgement", "to":"judgment"},
            {"from":"kilogramme", "to":"kilogram"},
            {"from":"kilogrammes", "to":"kilograms"},
            {"from":"kilometre", "to":"kilometer"},
            {"from":"kilometres", "to":"kilometers"},
            {"from":"labelled", "to":"labeled"},
            {"from":"labelling", "to":"labeling"},
            {"from":"labour", "to":"labor"},
            {"from":"laboured", "to":"labored"},
            {"from":"labourer", "to":"laborer"},
            {"from":"labourers", "to":"laborers"},
            {"from":"labouring", "to":"laboring"},
            {"from":"labours", "to":"labors"},
            {"from":"lacklustre", "to":"lackluster"},
            {"from":"legalisation", "to":"legalization"},
            {"from":"legalise", "to":"legalize"},
            {"from":"legalised", "to":"legalized"},
            {"from":"legalises", "to":"legalizes"},
            {"from":"legalising", "to":"legalizing"},
            {"from":"legitimise", "to":"legitimize"},
            {"from":"legitimised", "to":"legitimized"},
            {"from":"legitimises", "to":"legitimizes"},
            {"from":"legitimising", "to":"legitimizing"},
            {"from":"leukaemia", "to":"leukemia"},
            {"from":"levelled", "to":"leveled"},
            {"from":"leveller", "to":"leveler"},
            {"from":"levellers", "to":"levelers"},
            {"from":"levelling", "to":"leveling"},
            {"from":"libelled", "to":"libeled"},
            {"from":"libelling", "to":"libeling"},
            {"from":"libellous", "to":"libelous"},
            {"from":"liberalisation", "to":"liberalization"},
            {"from":"liberalise", "to":"liberalize"},
            {"from":"liberalised", "to":"liberalized"},
            {"from":"liberalises", "to":"liberalizes"},
            {"from":"liberalising", "to":"liberalizing"},
            {"from":"licence", "to":"license"},
            {"from":"licenced", "to":"licensed"},
            {"from":"licences", "to":"licenses"},
            {"from":"licencing", "to":"licensing"},
            {"from":"likeable", "to":"likable"},
            {"from":"lionisation", "to":"lionization"},
            {"from":"lionise", "to":"lionize"},
            {"from":"lionised", "to":"lionized"},
            {"from":"lionises", "to":"lionizes"},
            {"from":"lionising", "to":"lionizing"},
            {"from":"liquidise", "to":"liquidize"},
            {"from":"liquidised", "to":"liquidized"},
            {"from":"liquidiser", "to":"liquidizer"},
            {"from":"liquidisers", "to":"liquidizers"},
            {"from":"liquidises", "to":"liquidizes"},
            {"from":"liquidising", "to":"liquidizing"},
            {"from":"litre", "to":"liter"},
            {"from":"litres", "to":"liters"},
            {"from":"localise", "to":"localize"},
            {"from":"localised", "to":"localized"},
            {"from":"localises", "to":"localizes"},
            {"from":"localising", "to":"localizing"},
            {"from":"louvre", "to":"louver"},
            {"from":"louvred", "to":"louvered"},
            {"from":"louvres", "to":"louvers"},
            {"from":"lustre", "to":"luster"},
            {"from":"magnetise", "to":"magnetize"},
            {"from":"magnetised", "to":"magnetized"},
            {"from":"magnetises", "to":"magnetizes"},
            {"from":"magnetising", "to":"magnetizing"},
            {"from":"manoeuvrability", "to":"maneuverability"},
            {"from":"manoeuvrable", "to":"maneuverable"},
            {"from":"manoeuvre", "to":"maneuver"},
            {"from":"manoeuvred", "to":"maneuvered"},
            {"from":"manoeuvres", "to":"maneuvers"},
            {"from":"manoeuvring", "to":"maneuvering"},
            {"from":"manoeuvrings", "to":"maneuverings"},
            {"from":"marginalisation", "to":"marginalization"},
            {"from":"marginalise", "to":"marginalize"},
            {"from":"marginalised", "to":"marginalized"},
            {"from":"marginalises", "to":"marginalizes"},
            {"from":"marginalising", "to":"marginalizing"},
            {"from":"marshalled", "to":"marshaled"},
            {"from":"marshalling", "to":"marshaling"},
            {"from":"marvelled", "to":"marveled"},
            {"from":"marvelling", "to":"marveling"},
            {"from":"marvellous", "to":"marvelous"},
            {"from":"marvellously", "to":"marvelously"},
            {"from":"materialisation", "to":"materialization"},
            {"from":"materialise", "to":"materialize"},
            {"from":"materialised", "to":"materialized"},
            {"from":"materialises", "to":"materializes"},
            {"from":"materialising", "to":"materializing"},
            {"from":"maximisation", "to":"maximization"},
            {"from":"maximise", "to":"maximize"},
            {"from":"maximised", "to":"maximized"},
            {"from":"maximises", "to":"maximizes"},
            {"from":"maximising", "to":"maximizing"},
            {"from":"meagre", "to":"meager"},
            {"from":"mechanisation", "to":"mechanization"},
            {"from":"mechanise", "to":"mechanize"},
            {"from":"mechanised", "to":"mechanized"},
            {"from":"mechanises", "to":"mechanizes"},
            {"from":"mechanising", "to":"mechanizing"},
            {"from":"mediaeval", "to":"medieval"},
            {"from":"memorialise", "to":"memorialize"},
            {"from":"memorialised", "to":"memorialized"},
            {"from":"memorialises", "to":"memorializes"},
            {"from":"memorialising", "to":"memorializing"},
            {"from":"memorise", "to":"memorize"},
            {"from":"memorised", "to":"memorized"},
            {"from":"memorises", "to":"memorizes"},
            {"from":"memorising", "to":"memorizing"},
            {"from":"mesmerise", "to":"mesmerize"},
            {"from":"mesmerised", "to":"mesmerized"},
            {"from":"mesmerises", "to":"mesmerizes"},
            {"from":"mesmerising", "to":"mesmerizing"},
            {"from":"metabolise", "to":"metabolize"},
            {"from":"metabolised", "to":"metabolized"},
            {"from":"metabolises", "to":"metabolizes"},
            {"from":"metabolising", "to":"metabolizing"},
            {"from":"metre", "to":"meter"},
            {"from":"metres", "to":"meters"},
            {"from":"micrometre", "to":"micrometer"},
            {"from":"micrometres", "to":"micrometers"},
            {"from":"militarise", "to":"militarize"},
            {"from":"militarised", "to":"militarized"},
            {"from":"militarises", "to":"militarizes"},
            {"from":"militarising", "to":"militarizing"},
            {"from":"milligramme", "to":"milligram"},
            {"from":"milligrammes", "to":"milligrams"},
            {"from":"millilitre", "to":"milliliter"},
            {"from":"millilitres", "to":"milliliters"},
            {"from":"millimetre", "to":"millimeter"},
            {"from":"millimetres", "to":"millimeters"},
            {"from":"miniaturisation", "to":"miniaturization"},
            {"from":"miniaturise", "to":"miniaturize"},
            {"from":"miniaturised", "to":"miniaturized"},
            {"from":"miniaturises", "to":"miniaturizes"},
            {"from":"miniaturising", "to":"miniaturizing"},
            {"from":"minibuses", "to":"minibusses"},
            {"from":"minimise", "to":"minimize"},
            {"from":"minimised", "to":"minimized"},
            {"from":"minimises", "to":"minimizes"},
            {"from":"minimising", "to":"minimizing"},
            {"from":"misbehaviour", "to":"misbehavior"},
            {"from":"misdemeanour", "to":"misdemeanor"},
            {"from":"misdemeanours", "to":"misdemeanors"},
            {"from":"misspelt", "to":"misspelled"},
            {"from":"mitre", "to":"miter"},
            {"from":"mitres", "to":"miters"},
            {"from":"mobilisation", "to":"mobilization"},
            {"from":"mobilise", "to":"mobilize"},
            {"from":"mobilised", "to":"mobilized"},
            {"from":"mobilises", "to":"mobilizes"},
            {"from":"mobilising", "to":"mobilizing"},
            {"from":"modelled", "to":"modeled"},
            {"from":"modeller", "to":"modeler"},
            {"from":"modellers", "to":"modelers"},
            {"from":"modelling", "to":"modeling"},
            {"from":"modernise", "to":"modernize"},
            {"from":"modernised", "to":"modernized"},
            {"from":"modernises", "to":"modernizes"},
            {"from":"modernising", "to":"modernizing"},
            {"from":"moisturise", "to":"moisturize"},
            {"from":"moisturised", "to":"moisturized"},
            {"from":"moisturiser", "to":"moisturizer"},
            {"from":"moisturisers", "to":"moisturizers"},
            {"from":"moisturises", "to":"moisturizes"},
            {"from":"moisturising", "to":"moisturizing"},
            {"from":"monologue", "to":"monolog"},
            {"from":"monologues", "to":"monologs"},
            {"from":"monopolisation", "to":"monopolization"},
            {"from":"monopolise", "to":"monopolize"},
            {"from":"monopolised", "to":"monopolized"},
            {"from":"monopolises", "to":"monopolizes"},
            {"from":"monopolising", "to":"monopolizing"},
            {"from":"moralise", "to":"moralize"},
            {"from":"moralised", "to":"moralized"},
            {"from":"moralises", "to":"moralizes"},
            {"from":"moralising", "to":"moralizing"},
            {"from":"motorised", "to":"motorized"},
            {"from":"mould", "to":"mold"},
            {"from":"moulded", "to":"molded"},
            {"from":"moulder", "to":"molder"},
            {"from":"mouldered", "to":"moldered"},
            {"from":"mouldering", "to":"moldering"},
            {"from":"moulders", "to":"molders"},
            {"from":"mouldier", "to":"moldier"},
            {"from":"mouldiest", "to":"moldiest"},
            {"from":"moulding", "to":"molding"},
            {"from":"mouldings", "to":"moldings"},
            {"from":"moulds", "to":"molds"},
            {"from":"mouldy", "to":"moldy"},
            {"from":"moult", "to":"molt"},
            {"from":"moulted", "to":"molted"},
            {"from":"moulting", "to":"molting"},
            {"from":"moults", "to":"molts"},
            {"from":"moustache", "to":"mustache"},
            {"from":"moustached", "to":"mustached"},
            {"from":"moustaches", "to":"mustaches"},
            {"from":"moustachioed", "to":"mustachioed"},
            {"from":"multicoloured", "to":"multicolored"},
            {"from":"nationalisation", "to":"nationalization"},
            {"from":"nationalisations", "to":"nationalizations"},
            {"from":"nationalise", "to":"nationalize"},
            {"from":"nationalised", "to":"nationalized"},
            {"from":"nationalises", "to":"nationalizes"},
            {"from":"nationalising", "to":"nationalizing"},
            {"from":"naturalisation", "to":"naturalization"},
            {"from":"naturalise", "to":"naturalize"},
            {"from":"naturalised", "to":"naturalized"},
            {"from":"naturalises", "to":"naturalizes"},
            {"from":"naturalising", "to":"naturalizing"},
            {"from":"neighbour", "to":"neighbor"},
            {"from":"neighbourhood", "to":"neighborhood"},
            {"from":"neighbourhoods", "to":"neighborhoods"},
            {"from":"neighbouring", "to":"neighboring"},
            {"from":"neighbourliness", "to":"neighborliness"},
            {"from":"neighbourly", "to":"neighborly"},
            {"from":"neighbours", "to":"neighbors"},
            {"from":"neutralisation", "to":"neutralization"},
            {"from":"neutralise", "to":"neutralize"},
            {"from":"neutralised", "to":"neutralized"},
            {"from":"neutralises", "to":"neutralizes"},
            {"from":"neutralising", "to":"neutralizing"},
            {"from":"normalisation", "to":"normalization"},
            {"from":"normalise", "to":"normalize"},
            {"from":"normalised", "to":"normalized"},
            {"from":"normalises", "to":"normalizes"},
            {"from":"normalising", "to":"normalizing"},
            {"from":"odour", "to":"odor"},
            {"from":"odourless", "to":"odorless"},
            {"from":"odours", "to":"odors"},
            {"from":"oesophagus", "to":"esophagus"},
            {"from":"oesophaguses", "to":"esophaguses"},
            {"from":"oestrogen", "to":"estrogen"},
            {"from":"offence", "to":"offense"},
            {"from":"offences", "to":"offenses"},
            {"from":"omelette", "to":"omelet"},
            {"from":"omelettes", "to":"omelets"},
            {"from":"optimise", "to":"optimize"},
            {"from":"optimised", "to":"optimized"},
            {"from":"optimises", "to":"optimizes"},
            {"from":"optimising", "to":"optimizing"},
            {"from":"organisation", "to":"organization"},
            {"from":"organisational", "to":"organizational"},
            {"from":"organisations", "to":"organizations"},
            {"from":"organise", "to":"organize"},
            {"from":"organised", "to":"organized"},
            {"from":"organiser", "to":"organizer"},
            {"from":"organisers", "to":"organizers"},
            {"from":"organises", "to":"organizes"},
            {"from":"organising", "to":"organizing"},
            {"from":"orthopaedic", "to":"orthopedic"},
            {"from":"orthopaedics", "to":"orthopedics"},
            {"from":"ostracise", "to":"ostracize"},
            {"from":"ostracised", "to":"ostracized"},
            {"from":"ostracises", "to":"ostracizes"},
            {"from":"ostracising", "to":"ostracizing"},
            {"from":"outmanoeuvre", "to":"outmaneuver"},
            {"from":"outmanoeuvred", "to":"outmaneuvered"},
            {"from":"outmanoeuvres", "to":"outmaneuvers"},
            {"from":"outmanoeuvring", "to":"outmaneuvering"},
            {"from":"overemphasise", "to":"overemphasize"},
            {"from":"overemphasised", "to":"overemphasized"},
            {"from":"overemphasises", "to":"overemphasizes"},
            {"from":"overemphasising", "to":"overemphasizing"},
            {"from":"oxidisation", "to":"oxidization"},
            {"from":"oxidise", "to":"oxidize"},
            {"from":"oxidised", "to":"oxidized"},
            {"from":"oxidises", "to":"oxidizes"},
            {"from":"oxidising", "to":"oxidizing"},
            {"from":"paederast", "to":"pederast"},
            {"from":"paederasts", "to":"pederasts"},
            {"from":"paediatric", "to":"pediatric"},
            {"from":"paediatrician", "to":"pediatrician"},
            {"from":"paediatricians", "to":"pediatricians"},
            {"from":"paediatrics", "to":"pediatrics"},
            {"from":"paedophile", "to":"pedophile"},
            {"from":"paedophiles", "to":"pedophiles"},
            {"from":"paedophilia", "to":"pedophilia"},
            {"from":"palaeolithic", "to":"paleolithic"},
            {"from":"palaeontologist", "to":"paleontologist"},
            {"from":"palaeontologists", "to":"paleontologists"},
            {"from":"palaeontology", "to":"paleontology"},
            {"from":"panelled", "to":"paneled"},
            {"from":"panelling", "to":"paneling"},
            {"from":"panellist", "to":"panelist"},
            {"from":"panellists", "to":"panelists"},
            {"from":"paralyse", "to":"paralyze"},
            {"from":"paralysed", "to":"paralyzed"},
            {"from":"paralyses", "to":"paralyzes"},
            {"from":"paralysing", "to":"paralyzing"},
            {"from":"parcelled", "to":"parceled"},
            {"from":"parcelling", "to":"parceling"},
            {"from":"parlour", "to":"parlor"},
            {"from":"parlours", "to":"parlors"},
            {"from":"particularise", "to":"particularize"},
            {"from":"particularised", "to":"particularized"},
            {"from":"particularises", "to":"particularizes"},
            {"from":"particularising", "to":"particularizing"},
            {"from":"passivisation", "to":"passivization"},
            {"from":"passivise", "to":"passivize"},
            {"from":"passivised", "to":"passivized"},
            {"from":"passivises", "to":"passivizes"},
            {"from":"passivising", "to":"passivizing"},
            {"from":"pasteurisation", "to":"pasteurization"},
            {"from":"pasteurise", "to":"pasteurize"},
            {"from":"pasteurised", "to":"pasteurized"},
            {"from":"pasteurises", "to":"pasteurizes"},
            {"from":"pasteurising", "to":"pasteurizing"},
            {"from":"patronise", "to":"patronize"},
            {"from":"patronised", "to":"patronized"},
            {"from":"patronises", "to":"patronizes"},
            {"from":"patronising", "to":"patronizing"},
            {"from":"patronisingly", "to":"patronizingly"},
            {"from":"pedalled", "to":"pedaled"},
            {"from":"pedalling", "to":"pedaling"},
            {"from":"pedestrianisation", "to":"pedestrianization"},
            {"from":"pedestrianise", "to":"pedestrianize"},
            {"from":"pedestrianised", "to":"pedestrianized"},
            {"from":"pedestrianises", "to":"pedestrianizes"},
            {"from":"pedestrianising", "to":"pedestrianizing"},
            {"from":"penalise", "to":"penalize"},
            {"from":"penalised", "to":"penalized"},
            {"from":"penalises", "to":"penalizes"},
            {"from":"penalising", "to":"penalizing"},
            {"from":"pencilled", "to":"penciled"},
            {"from":"pencilling", "to":"penciling"},
            {"from":"personalise", "to":"personalize"},
            {"from":"personalised", "to":"personalized"},
            {"from":"personalises", "to":"personalizes"},
            {"from":"personalising", "to":"personalizing"},
            {"from":"pharmacopoeia", "to":"pharmacopeia"},
            {"from":"pharmacopoeias", "to":"pharmacopeias"},
            {"from":"philosophise", "to":"philosophize"},
            {"from":"philosophised", "to":"philosophized"},
            {"from":"philosophises", "to":"philosophizes"},
            {"from":"philosophising", "to":"philosophizing"},
            {"from":"philtre", "to":"filter"},
            {"from":"philtres", "to":"filters"},
            {"from":"phoney", "to":"phony"},
            {"from":"plagiarise", "to":"plagiarize"},
            {"from":"plagiarised", "to":"plagiarized"},
            {"from":"plagiarises", "to":"plagiarizes"},
            {"from":"plagiarising", "to":"plagiarizing"},
            {"from":"plough", "to":"plow"},
            {"from":"ploughed", "to":"plowed"},
            {"from":"ploughing", "to":"plowing"},
            {"from":"ploughman", "to":"plowman"},
            {"from":"ploughmen", "to":"plowmen"},
            {"from":"ploughs", "to":"plows"},
            {"from":"ploughshare", "to":"plowshare"},
            {"from":"ploughshares", "to":"plowshares"},
            {"from":"polarisation", "to":"polarization"},
            {"from":"polarise", "to":"polarize"},
            {"from":"polarised", "to":"polarized"},
            {"from":"polarises", "to":"polarizes"},
            {"from":"polarising", "to":"polarizing"},
            {"from":"politicisation", "to":"politicization"},
            {"from":"politicise", "to":"politicize"},
            {"from":"politicised", "to":"politicized"},
            {"from":"politicises", "to":"politicizes"},
            {"from":"politicising", "to":"politicizing"},
            {"from":"popularisation", "to":"popularization"},
            {"from":"popularise", "to":"popularize"},
            {"from":"popularised", "to":"popularized"},
            {"from":"popularises", "to":"popularizes"},
            {"from":"popularising", "to":"popularizing"},
            {"from":"pouffe", "to":"pouf"},
            {"from":"pouffes", "to":"poufs"},
            {"from":"practise", "to":"practice"},
            {"from":"practised", "to":"practiced"},
            {"from":"practises", "to":"practices"},
            {"from":"practising", "to":"practicing"},
            {"from":"praesidium", "to":"presidium"},
            {"from":"praesidiums", "to":"presidiums"},
            {"from":"pressurisation", "to":"pressurization"},
            {"from":"pressurise", "to":"pressurize"},
            {"from":"pressurised", "to":"pressurized"},
            {"from":"pressurises", "to":"pressurizes"},
            {"from":"pressurising", "to":"pressurizing"},
            {"from":"pretence", "to":"pretense"},
            {"from":"pretences", "to":"pretenses"},
            {"from":"primaeval", "to":"primeval"},
            {"from":"prioritisation", "to":"prioritization"},
            {"from":"prioritise", "to":"prioritize"},
            {"from":"prioritised", "to":"prioritized"},
            {"from":"prioritises", "to":"prioritizes"},
            {"from":"prioritising", "to":"prioritizing"},
            {"from":"privatisation", "to":"privatization"},
            {"from":"privatisations", "to":"privatizations"},
            {"from":"privatise", "to":"privatize"},
            {"from":"privatised", "to":"privatized"},
            {"from":"privatises", "to":"privatizes"},
            {"from":"privatising", "to":"privatizing"},
            {"from":"professionalisation", "to":"professionalization"},
            {"from":"professionalise", "to":"professionalize"},
            {"from":"professionalised", "to":"professionalized"},
            {"from":"professionalises", "to":"professionalizes"},
            {"from":"professionalising", "to":"professionalizing"},
            {"from":"programme", "to":"program"},
            {"from":"programmes", "to":"programs"},
            {"from":"prologue", "to":"prolog"},
            {"from":"prologues", "to":"prologs"},
            {"from":"propagandise", "to":"propagandize"},
            {"from":"propagandised", "to":"propagandized"},
            {"from":"propagandises", "to":"propagandizes"},
            {"from":"propagandising", "to":"propagandizing"},
            {"from":"proselytise", "to":"proselytize"},
            {"from":"proselytised", "to":"proselytized"},
            {"from":"proselytiser", "to":"proselytizer"},
            {"from":"proselytisers", "to":"proselytizers"},
            {"from":"proselytises", "to":"proselytizes"},
            {"from":"proselytising", "to":"proselytizing"},
            {"from":"psychoanalyse", "to":"psychoanalyze"},
            {"from":"psychoanalysed", "to":"psychoanalyzed"},
            {"from":"psychoanalyses", "to":"psychoanalyzes"},
            {"from":"psychoanalysing", "to":"psychoanalyzing"},
            {"from":"publicise", "to":"publicize"},
            {"from":"publicised", "to":"publicized"},
            {"from":"publicises", "to":"publicizes"},
            {"from":"publicising", "to":"publicizing"},
            {"from":"pulverisation", "to":"pulverization"},
            {"from":"pulverise", "to":"pulverize"},
            {"from":"pulverised", "to":"pulverized"},
            {"from":"pulverises", "to":"pulverizes"},
            {"from":"pulverising", "to":"pulverizing"},
            {"from":"pummelled", "to":"pummel"},
            {"from":"pummelling", "to":"pummeled"},
            {"from":"pyjama", "to":"pajama"},
            {"from":"pyjamas", "to":"pajamas"},
            {"from":"pzazz", "to":"pizzazz"},
            {"from":"quarrelled", "to":"quarreled"},
            {"from":"quarrelling", "to":"quarreling"},
            {"from":"radicalise", "to":"radicalize"},
            {"from":"radicalised", "to":"radicalized"},
            {"from":"radicalises", "to":"radicalizes"},
            {"from":"radicalising", "to":"radicalizing"},
            {"from":"rancour", "to":"rancor"},
            {"from":"randomise", "to":"randomize"},
            {"from":"randomised", "to":"randomized"},
            {"from":"randomises", "to":"randomizes"},
            {"from":"randomising", "to":"randomizing"},
            {"from":"rationalisation", "to":"rationalization"},
            {"from":"rationalisations", "to":"rationalizations"},
            {"from":"rationalise", "to":"rationalize"},
            {"from":"rationalised", "to":"rationalized"},
            {"from":"rationalises", "to":"rationalizes"},
            {"from":"rationalising", "to":"rationalizing"},
            {"from":"ravelled", "to":"raveled"},
            {"from":"ravelling", "to":"raveling"},
            {"from":"realisable", "to":"realizable"},
            {"from":"realisation", "to":"realization"},
            {"from":"realisations", "to":"realizations"},
            {"from":"realise", "to":"realize"},
            {"from":"realised", "to":"realized"},
            {"from":"realises", "to":"realizes"},
            {"from":"realising", "to":"realizing"},
            {"from":"recognisable", "to":"recognizable"},
            {"from":"recognisably", "to":"recognizably"},
            {"from":"recognisance", "to":"recognizance"},
            {"from":"recognise", "to":"recognize"},
            {"from":"recognised", "to":"recognized"},
            {"from":"recognises", "to":"recognizes"},
            {"from":"recognising", "to":"recognizing"},
            {"from":"reconnoitre", "to":"reconnoiter"},
            {"from":"reconnoitred", "to":"reconnoitered"},
            {"from":"reconnoitres", "to":"reconnoiters"},
            {"from":"reconnoitring", "to":"reconnoitering"},
            {"from":"refuelled", "to":"refueled"},
            {"from":"refuelling", "to":"refueling"},
            {"from":"regularisation", "to":"regularization"},
            {"from":"regularise", "to":"regularize"},
            {"from":"regularised", "to":"regularized"},
            {"from":"regularises", "to":"regularizes"},
            {"from":"regularising", "to":"regularizing"},
            {"from":"remodelled", "to":"remodeled"},
            {"from":"remodelling", "to":"remodeling"},
            {"from":"remould", "to":"remold"},
            {"from":"remoulded", "to":"remolded"},
            {"from":"remoulding", "to":"remolding"},
            {"from":"remoulds", "to":"remolds"},
            {"from":"reorganisation", "to":"reorganization"},
            {"from":"reorganisations", "to":"reorganizations"},
            {"from":"reorganise", "to":"reorganize"},
            {"from":"reorganised", "to":"reorganized"},
            {"from":"reorganises", "to":"reorganizes"},
            {"from":"reorganising", "to":"reorganizing"},
            {"from":"revelled", "to":"reveled"},
            {"from":"reveller", "to":"reveler"},
            {"from":"revellers", "to":"revelers"},
            {"from":"revelling", "to":"reveling"},
            {"from":"revitalise", "to":"revitalize"},
            {"from":"revitalised", "to":"revitalized"},
            {"from":"revitalises", "to":"revitalizes"},
            {"from":"revitalising", "to":"revitalizing"},
            {"from":"revolutionise", "to":"revolutionize"},
            {"from":"revolutionised", "to":"revolutionized"},
            {"from":"revolutionises", "to":"revolutionizes"},
            {"from":"revolutionising", "to":"revolutionizing"},
            {"from":"rhapsodise", "to":"rhapsodize"},
            {"from":"rhapsodised", "to":"rhapsodized"},
            {"from":"rhapsodises", "to":"rhapsodizes"},
            {"from":"rhapsodising", "to":"rhapsodizing"},
            {"from":"rigour", "to":"rigor"},
            {"from":"rigours", "to":"rigors"},
            {"from":"ritualised", "to":"ritualized"},
            {"from":"rivalled", "to":"rivaled"},
            {"from":"rivalling", "to":"rivaling"},
            {"from":"romanticise", "to":"romanticize"},
            {"from":"romanticised", "to":"romanticized"},
            {"from":"romanticises", "to":"romanticizes"},
            {"from":"romanticising", "to":"romanticizing"},
            {"from":"rumour", "to":"rumor"},
            {"from":"rumoured", "to":"rumored"},
            {"from":"rumours", "to":"rumors"},
            {"from":"sabre", "to":"saber"},
            {"from":"sabres", "to":"sabers"},
            {"from":"saltpetre", "to":"saltpeter"},
            {"from":"sanitise", "to":"sanitize"},
            {"from":"sanitised", "to":"sanitized"},
            {"from":"sanitises", "to":"sanitizes"},
            {"from":"sanitising", "to":"sanitizing"},
            {"from":"satirise", "to":"satirize"},
            {"from":"satirised", "to":"satirized"},
            {"from":"satirises", "to":"satirizes"},
            {"from":"satirising", "to":"satirizing"},
            {"from":"saviour", "to":"savior"},
            {"from":"saviours", "to":"saviors"},
            {"from":"savour", "to":"savor"},
            {"from":"savoured", "to":"savored"},
            {"from":"savouries", "to":"savories"},
            {"from":"savouring", "to":"savoring"},
            {"from":"savours", "to":"savors"},
            {"from":"savoury", "to":"savory"},
            {"from":"scandalise", "to":"scandalize"},
            {"from":"scandalised", "to":"scandalized"},
            {"from":"scandalises", "to":"scandalizes"},
            {"from":"scandalising", "to":"scandalizing"},
            {"from":"sceptic", "to":"skeptic"},
            {"from":"sceptical", "to":"skeptical"},
            {"from":"sceptically", "to":"skeptically"},
            {"from":"scepticism", "to":"skepticism"},
            {"from":"sceptics", "to":"skeptics"},
            {"from":"sceptre", "to":"scepter"},
            {"from":"sceptres", "to":"scepters"},
            {"from":"scrutinise", "to":"scrutinize"},
            {"from":"scrutinised", "to":"scrutinized"},
            {"from":"scrutinises", "to":"scrutinizes"},
            {"from":"scrutinising", "to":"scrutinizing"},
            {"from":"secularisation", "to":"secularization"},
            {"from":"secularise", "to":"secularize"},
            {"from":"secularised", "to":"secularized"},
            {"from":"secularises", "to":"secularizes"},
            {"from":"secularising", "to":"secularizing"},
            {"from":"sensationalise", "to":"sensationalize"},
            {"from":"sensationalised", "to":"sensationalized"},
            {"from":"sensationalises", "to":"sensationalizes"},
            {"from":"sensationalising", "to":"sensationalizing"},
            {"from":"sensitise", "to":"sensitize"},
            {"from":"sensitised", "to":"sensitized"},
            {"from":"sensitises", "to":"sensitizes"},
            {"from":"sensitising", "to":"sensitizing"},
            {"from":"sentimentalise", "to":"sentimentalize"},
            {"from":"sentimentalised", "to":"sentimentalized"},
            {"from":"sentimentalises", "to":"sentimentalizes"},
            {"from":"sentimentalising", "to":"sentimentalizing"},
            {"from":"sepulchre", "to":"sepulcher"},
            {"from":"sepulchres", "to":"sepulchers"},
            {"from":"serialisation", "to":"serialization"},
            {"from":"serialisations", "to":"serializations"},
            {"from":"serialise", "to":"serialize"},
            {"from":"serialised", "to":"serialized"},
            {"from":"serialises", "to":"serializes"},
            {"from":"serialising", "to":"serializing"},
            {"from":"sermonise", "to":"sermonize"},
            {"from":"sermonised", "to":"sermonized"},
            {"from":"sermonises", "to":"sermonizes"},
            {"from":"sermonising", "to":"sermonizing"},
            {"from":"sheikh", "to":"sheik"},
            {"from":"shovelled", "to":"shoveled"},
            {"from":"shovelling", "to":"shoveling"},
            {"from":"shrivelled", "to":"shriveled"},
            {"from":"shrivelling", "to":"shriveling"},
            {"from":"signalise", "to":"signalize"},
            {"from":"signalised", "to":"signalized"},
            {"from":"signalises", "to":"signalizes"},
            {"from":"signalising", "to":"signalizing"},
            {"from":"signalled", "to":"signaled"},
            {"from":"signalling", "to":"signaling"},
            {"from":"smoulder", "to":"smolder"},
            {"from":"smouldered", "to":"smoldered"},
            {"from":"smouldering", "to":"smoldering"},
            {"from":"smoulders", "to":"smolders"},
            {"from":"snivelled", "to":"sniveled"},
            {"from":"snivelling", "to":"sniveling"},
            {"from":"snorkelled", "to":"snorkeled"},
            {"from":"snorkelling", "to":"snorkeling"},
            {"from":"snowplough", "to":"snowplow"},
            {"from":"snowploughs", "to":"snowplow"},
            {"from":"socialisation", "to":"socialization"},
            {"from":"socialise", "to":"socialize"},
            {"from":"socialised", "to":"socialized"},
            {"from":"socialises", "to":"socializes"},
            {"from":"socialising", "to":"socializing"},
            {"from":"sodomise", "to":"sodomize"},
            {"from":"sodomised", "to":"sodomized"},
            {"from":"sodomises", "to":"sodomizes"},
            {"from":"sodomising", "to":"sodomizing"},
            {"from":"solemnise", "to":"solemnize"},
            {"from":"solemnised", "to":"solemnized"},
            {"from":"solemnises", "to":"solemnizes"},
            {"from":"solemnising", "to":"solemnizing"},
            {"from":"sombre", "to":"somber"},
            {"from":"specialisation", "to":"specialization"},
            {"from":"specialisations", "to":"specializations"},
            {"from":"specialise", "to":"specialize"},
            {"from":"specialised", "to":"specialized"},
            {"from":"specialises", "to":"specializes"},
            {"from":"specialising", "to":"specializing"},
            {"from":"spectre", "to":"specter"},
            {"from":"spectres", "to":"specters"},
            {"from":"spiralled", "to":"spiraled"},
            {"from":"spiralling", "to":"spiraling"},
            {"from":"splendour", "to":"splendor"},
            {"from":"splendours", "to":"splendors"},
            {"from":"squirrelled", "to":"squirreled"},
            {"from":"squirrelling", "to":"squirreling"},
            {"from":"stabilisation", "to":"stabilization"},
            {"from":"stabilise", "to":"stabilize"},
            {"from":"stabilised", "to":"stabilized"},
            {"from":"stabiliser", "to":"stabilizer"},
            {"from":"stabilisers", "to":"stabilizers"},
            {"from":"stabilises", "to":"stabilizes"},
            {"from":"stabilising", "to":"stabilizing"},
            {"from":"standardisation", "to":"standardization"},
            {"from":"standardise", "to":"standardize"},
            {"from":"standardised", "to":"standardized"},
            {"from":"standardises", "to":"standardizes"},
            {"from":"standardising", "to":"standardizing"},
            {"from":"stencilled", "to":"stenciled"},
            {"from":"stencilling", "to":"stenciling"},
            {"from":"sterilisation", "to":"sterilization"},
            {"from":"sterilisations", "to":"sterilizations"},
            {"from":"sterilise", "to":"sterilize"},
            {"from":"sterilised", "to":"sterilized"},
            {"from":"steriliser", "to":"sterilizer"},
            {"from":"sterilisers", "to":"sterilizers"},
            {"from":"sterilises", "to":"sterilizes"},
            {"from":"sterilising", "to":"sterilizing"},
            {"from":"stigmatisation", "to":"stigmatization"},
            {"from":"stigmatise", "to":"stigmatize"},
            {"from":"stigmatised", "to":"stigmatized"},
            {"from":"stigmatises", "to":"stigmatizes"},
            {"from":"stigmatising", "to":"stigmatizing"},
            {"from":"storey", "to":"story"},
            {"from":"storeys", "to":"stories"},
            {"from":"subsidisation", "to":"subsidization"},
            {"from":"subsidise", "to":"subsidize"},
            {"from":"subsidised", "to":"subsidized"},
            {"from":"subsidiser", "to":"subsidizer"},
            {"from":"subsidisers", "to":"subsidizers"},
            {"from":"subsidises", "to":"subsidizes"},
            {"from":"subsidising", "to":"subsidizing"},
            {"from":"succour", "to":"succor"},
            {"from":"succoured", "to":"succored"},
            {"from":"succouring", "to":"succoring"},
            {"from":"succours", "to":"succors"},
            {"from":"sulphate", "to":"sulfate"},
            {"from":"sulphates", "to":"sulfates"},
            {"from":"sulphide", "to":"sulfide"},
            {"from":"sulphides", "to":"sulfides"},
            {"from":"sulphur", "to":"sulfur"},
            {"from":"sulphurous", "to":"sulfurous"},
            {"from":"summarise", "to":"summarize"},
            {"from":"summarised", "to":"summarized"},
            {"from":"summarises", "to":"summarizes"},
            {"from":"summarising", "to":"summarizing"},
            {"from":"swivelled", "to":"swiveled"},
            {"from":"swivelling", "to":"swiveling"},
            {"from":"symbolise", "to":"symbolize"},
            {"from":"symbolised", "to":"symbolized"},
            {"from":"symbolises", "to":"symbolizes"},
            {"from":"symbolising", "to":"symbolizing"},
            {"from":"sympathise", "to":"sympathize"},
            {"from":"sympathised", "to":"sympathized"},
            {"from":"sympathiser", "to":"sympathizer"},
            {"from":"sympathisers", "to":"sympathizers"},
            {"from":"sympathises", "to":"sympathizes"},
            {"from":"sympathising", "to":"sympathizing"},
            {"from":"synchronisation", "to":"synchronization"},
            {"from":"synchronise", "to":"synchronize"},
            {"from":"synchronised", "to":"synchronized"},
            {"from":"synchronises", "to":"synchronizes"},
            {"from":"synchronising", "to":"synchronizing"},
            {"from":"synthesise", "to":"synthesize"},
            {"from":"synthesised", "to":"synthesized"},
            {"from":"synthesiser", "to":"synthesizer"},
            {"from":"synthesisers", "to":"synthesizers"},
            {"from":"synthesises", "to":"synthesizes"},
            {"from":"synthesising", "to":"synthesizing"},
            {"from":"syphon", "to":"siphon"},
            {"from":"syphoned", "to":"siphoned"},
            {"from":"syphoning", "to":"siphoning"},
            {"from":"syphons", "to":"siphons"},
            {"from":"systematisation", "to":"systematization"},
            {"from":"systematise", "to":"systematize"},
            {"from":"systematised", "to":"systematized"},
            {"from":"systematises", "to":"systematizes"},
            {"from":"systematising", "to":"systematizing"},
            {"from":"tantalise", "to":"tantalize"},
            {"from":"tantalised", "to":"tantalized"},
            {"from":"tantalises", "to":"tantalizes"},
            {"from":"tantalising", "to":"tantalizing"},
            {"from":"tantalisingly", "to":"tantalizingly"},
            {"from":"tasselled", "to":"tasseled"},
            {"from":"technicolour", "to":"technicolor"},
            {"from":"temporise", "to":"temporize"},
            {"from":"temporised", "to":"temporized"},
            {"from":"temporises", "to":"temporizes"},
            {"from":"temporising", "to":"temporizing"},
            {"from":"tenderise", "to":"tenderize"},
            {"from":"tenderised", "to":"tenderized"},
            {"from":"tenderises", "to":"tenderizes"},
            {"from":"tenderising", "to":"tenderizing"},
            {"from":"terrorise", "to":"terrorize"},
            {"from":"terrorised", "to":"terrorized"},
            {"from":"terrorises", "to":"terrorizes"},
            {"from":"terrorising", "to":"terrorizing"},
            {"from":"theatre", "to":"theater"},
            {"from":"theatregoer", "to":"theatergoer"},
            {"from":"theatregoers", "to":"theatergoers"},
            {"from":"theatres", "to":"theaters"},
            {"from":"theorise", "to":"theorize"},
            {"from":"theorised", "to":"theorized"},
            {"from":"theorises", "to":"theorizes"},
            {"from":"theorising", "to":"theorizing"},
            {"from":"tonne", "to":"ton"},
            {"from":"tonnes", "to":"tons"},
            {"from":"towelled", "to":"toweled"},
            {"from":"towelling", "to":"toweling"},
            {"from":"toxaemia", "to":"toxemia"},
            {"from":"tranquillise", "to":"tranquilize"},
            {"from":"tranquillised", "to":"tranquilized"},
            {"from":"tranquilliser", "to":"tranquilizer"},
            {"from":"tranquillisers", "to":"tranquilizers"},
            {"from":"tranquillises", "to":"tranquilizes"},
            {"from":"tranquillising", "to":"tranquilizing"},
            {"from":"tranquillity", "to":"tranquility"},
            {"from":"tranquillize", "to":"tranquilize"},
            {"from":"tranquillized", "to":"tranquilized"},
            {"from":"tranquillizer", "to":"tranquilizer"},
            {"from":"tranquillizers", "to":"tranquilizers"},
            {"from":"tranquillizes", "to":"tranquilizes"},
            {"from":"tranquillizing", "to":"tranquilizing"},
            {"from":"tranquilly", "to":"tranquility"},
            {"from":"transistorised", "to":"transistorized"},
            {"from":"traumatise", "to":"traumatize"},
            {"from":"traumatised", "to":"traumatized"},
            {"from":"traumatises", "to":"traumatizes"},
            {"from":"traumatising", "to":"traumatizing"},
            {"from":"travelled", "to":"traveled"},
            {"from":"traveller", "to":"traveler"},
            {"from":"travellers", "to":"travelers"},
            {"from":"travelling", "to":"traveling"},
            {"from":"travelogue", "to":"travelog"},
            {"from":"travelogues", "to":"travelogs"},
            {"from":"trialled", "to":"trialed"},
            {"from":"trialling", "to":"trialing"},
            {"from":"tricolour", "to":"tricolor"},
            {"from":"tricolours", "to":"tricolors"},
            {"from":"trivialise", "to":"trivialize"},
            {"from":"trivialised", "to":"trivialized"},
            {"from":"trivialises", "to":"trivializes"},
            {"from":"trivialising", "to":"trivializing"},
            {"from":"tumour", "to":"tumor"},
            {"from":"tumours", "to":"tumors"},
            {"from":"tunnelled", "to":"tunneled"},
            {"from":"tunnelling", "to":"tunneling"},
            {"from":"tyrannise", "to":"tyrannize"},
            {"from":"tyrannised", "to":"tyrannized"},
            {"from":"tyrannises", "to":"tyrannizes"},
            {"from":"tyrannising", "to":"tyrannizing"},
            {"from":"tyre", "to":"tire"},
            {"from":"tyres", "to":"tires"},
            {"from":"unauthorised", "to":"unauthorized"},
            {"from":"uncivilised", "to":"uncivilized"},
            {"from":"underutilised", "to":"underutilized"},
            {"from":"unequalled", "to":"unequaled"},
            {"from":"unfavourable", "to":"unfavorable"},
            {"from":"unfavourably", "to":"unfavorably"},
            {"from":"unionisation", "to":"unionization"},
            {"from":"unionise", "to":"unionize"},
            {"from":"unionised", "to":"unionized"},
            {"from":"unionises", "to":"unionizes"},
            {"from":"unionising", "to":"unionizing"},
            {"from":"unorganised", "to":"unorganized"},
            {"from":"unravelled", "to":"unraveled"},
            {"from":"unravelling", "to":"unraveling"},
            {"from":"unrecognisable", "to":"unrecognizable"},
            {"from":"unrecognised", "to":"unrecognized"},
            {"from":"unrivalled", "to":"unrivaled"},
            {"from":"unsavoury", "to":"unsavory"},
            {"from":"untrammelled", "to":"untrammeled"},
            {"from":"urbanisation", "to":"urbanization"},
            {"from":"urbanise", "to":"urbanize"},
            {"from":"urbanised", "to":"urbanized"},
            {"from":"urbanises", "to":"urbanizes"},
            {"from":"urbanising", "to":"urbanizing"},
            {"from":"utilisable", "to":"utilizable"},
            {"from":"utilisation", "to":"utilization"},
            {"from":"utilise", "to":"utilize"},
            {"from":"utilised", "to":"utilized"},
            {"from":"utilises", "to":"utilizes"},
            {"from":"utilising", "to":"utilizing"},
            {"from":"valour", "to":"valor"},
            {"from":"vandalise", "to":"vandalize"},
            {"from":"vandalised", "to":"vandalized"},
            {"from":"vandalises", "to":"vandalizes"},
            {"from":"vandalising", "to":"vandalizing"},
            {"from":"vaporisation", "to":"vaporization"},
            {"from":"vaporise", "to":"vaporize"},
            {"from":"vaporised", "to":"vaporized"},
            {"from":"vaporises", "to":"vaporizes"},
            {"from":"vaporising", "to":"vaporizing"},
            {"from":"vapour", "to":"vapor"},
            {"from":"vapours", "to":"vapors"},
            {"from":"verbalise", "to":"verbalize"},
            {"from":"verbalised", "to":"verbalized"},
            {"from":"verbalises", "to":"verbalizes"},
            {"from":"verbalising", "to":"verbalizing"},
            {"from":"victimisation", "to":"victimization"},
            {"from":"victimise", "to":"victimize"},
            {"from":"victimised", "to":"victimized"},
            {"from":"victimises", "to":"victimizes"},
            {"from":"victimising", "to":"victimizing"},
            {"from":"videodisc", "to":"videodisk"},
            {"from":"videodiscs", "to":"videodisks"},
            {"from":"vigour", "to":"vigor"},
            {"from":"visualisation", "to":"visualization"},
            {"from":"visualisations", "to":"visualizations"},
            {"from":"visualise", "to":"visualize"},
            {"from":"visualised", "to":"visualized"},
            {"from":"visualises", "to":"visualizes"},
            {"from":"visualising", "to":"visualizing"},
            {"from":"vocalisation", "to":"vocalization"},
            {"from":"vocalisations", "to":"vocalizations"},
            {"from":"vocalise", "to":"vocalize"},
            {"from":"vocalised", "to":"vocalized"},
            {"from":"vocalises", "to":"vocalizes"},
            {"from":"vocalising", "to":"vocalizing"},
            {"from":"vulcanised", "to":"vulcanized"},
            {"from":"vulgarisation", "to":"vulgarization"},
            {"from":"vulgarise", "to":"vulgarize"},
            {"from":"vulgarised", "to":"vulgarized"},
            {"from":"vulgarises", "to":"vulgarizes"},
            {"from":"vulgarising", "to":"vulgarizing"},
            {"from":"waggon", "to":"wagon"},
            {"from":"waggons", "to":"wagons"},
            {"from":"watercolour", "to":"watercolor"},
            {"from":"watercolours", "to":"watercolors"},
            {"from":"weaselled", "to":"weaseled"},
            {"from":"weaselling", "to":"weaseling"},
            {"from":"westernisation", "to":"westernization"},
            {"from":"westernise", "to":"westernize"},
            {"from":"westernised", "to":"westernized"},
            {"from":"westernises", "to":"westernizes"},
            {"from":"westernising", "to":"westernizing"},
            {"from":"womanise", "to":"womanize"},
            {"from":"womanised", "to":"womanized"},
            {"from":"womaniser", "to":"womanizer"},
            {"from":"womanisers", "to":"womanizers"},
            {"from":"womanises", "to":"womanizes"},
            {"from":"womanising", "to":"womanizing"},
            {"from":"woollen", "to":"woolen"},
            {"from":"woollens", "to":"woolens"},
            {"from":"woollies", "to":"woolies"},
            {"from":"woolly", "to":"wooly"},
            {"from":"worshipped", "to":"worshiped"},
            {"from":"worshipping", "to":"worshiping"},
            {"from":"worshipper", "to":"worshiper"},
            {"from":"yodelled", "to":"yodeled"},
            {"from":"yodelling", "to":"yodeling"},
            {"from":"yoghourt", "to":"yogurt"},
            {"from":"yoghourts", "to":"yogurts"},
            {"from":"yoghurt", "to":"yogurt"},
            {"from":"yoghurts", "to":"yogurts"}            
        ],
        "typos":[
            {"misspelling":"accomodation","correct":"accommodation"},
            {"misspelling":"acommodation","correct":"accommodation"},
            {"misspelling":"acheive","correct":"achieve"},
            {"misspelling":"accross","correct":"across"},
            {"misspelling":"adress","correct":"address"},
            {"misspelling":"agressive","correct":"aggressive"},
            {"misspelling":"alot","correct":"a lot"},
            {"misspelling":"apparantly","correct":"apparently"},
            {"misspelling":"appearence","correct":"appearance"},
            {"misspelling":"arguement","correct":"argument"},
            {"misspelling":"assasination","correct":"assassination"},
            {"misspelling":"basicly","correct":"basically"},
            {"misspelling":"beggining","correct":"beginning"},
            {"misspelling":"beleive","correct":"believe"},
            {"misspelling":"bizzare","correct":"bizarre"},
            {"misspelling":"buisness","correct":"business"},
            {"misspelling":"carribean","correct":"caribbean"},
            {"misspelling":"chauffer","correct":"chauffeur"},
            {"misspelling":"cemetary","correct":"cemetery"},
            {"misspelling":"collegue","correct":"colleague"},
            {"misspelling":"commitee","correct":"committee"},
            {"misspelling":"committment","correct":"commitment"},
            {"misspelling":"completly","correct":"completely"},
            {"misspelling":"concious","correct":"conscious"},
            {"misspelling":"copywrite","correct":"copyright"},
            {"misspelling":"curiousity","correct":"curiosity"},
            {"misspelling":"decaffinated","correct":"decaffeinated"},
            {"misspelling":"definately","correct":"definitely"},
            {"misspelling":"dependance","correct":"dependence"},
            {"misspelling":"desireable","correct":"desirable"},
            {"misspelling":"diarhea","correct":"diarrhoea"},
            {"misspelling":"dissapoint","correct":"disappoint"},
            {"misspelling":"dissapear","correct":"disappear"},
            {"misspelling":"dispell","correct":"dispel"},
            {"misspelling":"ecstacy","correct":"ecstasy"},
            {"misspelling":"embarass","correct":"embarrass"},
            {"misspelling":"enviroment","correct":"environment"},
            {"misspelling":"Farenheit","correct":"Fahrenheit"},
            {"misspelling":"febuary","correct":"february"},
            {"misspelling":"finaly","correct":"finally"},
            {"misspelling":"fluoroscent","correct":"fluorescent"},
            {"misspelling":"flouride","correct":"fluoride"},
            {"misspelling":"foriegn","correct":"foreign"},
            {"misspelling":"forteen","correct":"fourteen"},
            {"misspelling":"fourty","correct":"forty"},
            {"misspelling":"freind","correct":"friend"},
            {"misspelling":"geneology","correct":"genealogy"},
            {"misspelling":"glamourous","correct":"glamorous"},
            {"misspelling":"goverment","correct":"government"},
            {"misspelling":"grammer","correct":"grammar"},
            {"misspelling":"happend","correct":"happened"},
            {"misspelling":"hemorage","correct":"haemorrhage"},
            {"misspelling":"heros","correct":"heroes"},
            {"misspelling":"hight","correct":"height"},
            {"misspelling":"humourous","correct":"humorous"},
            {"misspelling":"hygeine","correct":"hygiene"},
            {"misspelling":"idiosyncracy","correct":"idiosyncrasy"},
            {"misspelling":"independance","correct":"independence"},
            {"misspelling":"interupt","correct":"interrupt"},
            {"misspelling":"intresting","correct":"interesting"},
            {"misspelling":"juge","correct":"judge"},
            {"misspelling":"knowlege","correct":"knowledge"},
            {"misspelling":"lazer","correct":"laser"},
            {"misspelling":"liason","correct":"liaison"},
            {"misspelling":"libary","correct":"library"},
            {"misspelling":"lightening","correct":"lightning"},
            {"misspelling":"lollypop","correct":"lollipop"},
            {"misspelling":"millenium","correct":"millennium"},
            {"misspelling":"mischievious","correct":"mischievous"},
            {"misspelling":"mispell","correct":"misspell"},
            {"misspelling":"monkies","correct":"monkeys"},
            {"misspelling":"morgage","correct":"mortgage"},
            {"misspelling":"neccessary","correct":"necessary"},
            {"misspelling":"neice","correct":"niece"},
            {"misspelling":"noone","correct":"no one"},
            {"misspelling":"noticable","correct":"noticeable"},
            {"misspelling":"occassion","correct":"occasion"},
            {"misspelling":"occured","correct":"occurred"},
            {"misspelling":"oppurtunity","correct":"opportunity"},
            {"misspelling":"paralell","correct":"parallel"},
            {"misspelling":"pasttime","correct":"pastime"},
            {"misspelling":"peice","correct":"piece"},
            {"misspelling":"persistant","correct":"persistent"},
            {"misspelling":"persue","correct":"pursue"},
            {"misspelling":"pharoah","correct":"pharaoh"},
            {"misspelling":"portugese","correct":"portuguese"},
            {"misspelling":"posession","correct":"possession"},
            {"misspelling":"potatoe","correct":"potato"},
            {"misspelling":"preceeding","correct":"preceding"},
            {"misspelling":"prefered","correct":"preferred"},
            {"misspelling":"pronounciation","correct":"pronunciation"},
            {"misspelling":"propoganda","correct":"propaganda"},
            {"misspelling":"privelige","correct":"privilege"},
            {"misspelling":"publically","correct":"publicly"},
            {"misspelling":"rasberry","correct":"raspberry"},
            {"misspelling":"recieve","correct":"receive"},
            {"misspelling":"reccomend","correct":"recommend"},
            {"misspelling":"rythm","correct":"rhythm"},
            {"misspelling":"shedule","correct":"schedule"},
            {"misspelling":"seige","correct":"siege"},
            {"misspelling":"sentance","correct":"sentence"},
            {"misspelling":"seperate","correct":"separate"},
            {"misspelling":"sieze","correct":"seize"},
            {"misspelling":"sincerly","correct":"sincerely"},
            {"misspelling":"supercede","correct":"supersede"},
            {"misspelling":"suprise","correct":"surprise"},
            {"misspelling":"tatoo","correct":"tattoo"},
            {"misspelling":"tendancy","correct":"tendency"},
            {"misspelling":"thier","correct":"their"},
            {"misspelling":"threshhold","correct":"threshold"},
            {"misspelling":"tommorrow","correct":"tomorrow"},
            {"misspelling":"truely","correct":"truly"},
            {"misspelling":"untill","correct":"until"},
            {"misspelling":"vaccuum","correct":"vacuum"},
            {"misspelling":"vegeterian","correct":"vegetarian"},
            {"misspelling":"wendesday","correct":"wednesday"},
            {"misspelling":"whereever","correct":"wherever"},
            {"misspelling":"wierd","correct":"weird"},
            {"misspelling":"writen","correct":"written"}
        ]
    }
}

Configure tokenizer logger

Logger is configuration at top level of json in logger field.

Example of Configuration:

logger configuration
{
    "logger": {
        "logging-level": "debug"
    }    
}

The logger fields is:

  • logging-level

It can be set to the following values:

  • debug for the debug level and developper information
  • info for the level of information
  • warning to display only warning and errors
  • error to display only error
  • critical to display only error

Configure tokenizer Network

Example of Configuration:

network configuration
{
    "network": {
        "host":"0.0.0.0",
        "port":8080,
        "associate-environment": {
            "host":"HOST_ENVNAME",
            "port":"PORT_ENVNAME"
        },
        "ssl":
        {
            "certificate":"path/to/certificate",
            "key":"path/to/key"
        }
    }
}

The network fields:

  • host : hostname

  • port : port of the service

  • associated-environement

: default one. This field is not mandatory.

  • "host" : associated "host" environment variable
  • "port" : associated "port" environment variable

  • ssl : ssl configuration IN PRODUCTION IT IS MANDATORY TO USE CERTIFICATE AND KEY THAT ARE *NOT* SELF SIGNED

  • cert : certificate file

  • key : key file

Configure tokenizer runtime

Example of Configuration:

network configuration
{
    "runtime":{
        "request-max-size":100000000,
        "request-buffer-queue-size":100,
        "keep-alive":true,
        "keep-alive-timeout":5,
        "graceful-shutown-timeout":15.0,
        "request-timeout":60,
        "response-timeout":60,
        "workers":1
    }    
}

The Runtime fields:

  • request-max-size : how big a request may be (bytes)

  • request-buffer-queue-size: request streaming buffer queue size

  • request-timeout : how long a request can take to arrive (sec)

  • response-timeout : how long a response can take to process (sec)

  • keep-alive: keep-alive

  • keep-alive-timeout: how long to hold a TCP connection open (sec)

  • graceful-shutdown_timeout : how long to wait to force close non-idle connection (sec)

  • workers : number of workers for the service on a node

  • associated-environement : if one of previous field is on the associated environment variables that allows to replace the default one. This field is not mandatory.

  • request-max-size : overwrite with environement variable

  • request-buffer-queue-size: overwrite with environement variable
  • request-timeout : overwrite with environement variable
  • response-timeout : overwrite with environement variable
  • keep-alive: overwrite with environement variable
  • keep-alive-timeout: overwrite with environement variable
  • graceful-shutdown_timeout : overwrite with environement variable
  • workers : overwrite with environement variable

Tokenizer service

To create these resources simply run

python3 thot/tasks/tokenizer/createAnnotationResource.py --entries=/home/tkeir_svc/tkeir/configs/default/configs/annotation-resources.json --output=/home/tkeir_svc/tkeir/configs/default/resources/modeling/tokenizer/en/tkeir_mwe.pkl

To run the command type simply from tkeir directory:

python3 thot/tokenizer_svc.py --config=<path to tokenizer configuration file>

or if you install tkeir wheel:

tkeir-tokenizer-svc --config=<path to tokenizer configuration file>

A light client can be run through the command

python3 thot/tokenizer_client.py --config=<path to tokenizer configuration file> --input=<input directory> --output=<output directory>

or if you install tkeir wheel:

python3 tkeir-tokenizer-client.py --config=<path to tokenizer configuration file> --input=<input directory> --output=<output directory>

Tokenizer Tests

The converter service come with unit and functional testing.

Tokenizer Unit tests

Unittest allows to test Tokenizer classes only.

python3 -m unittest thot/tests/unittests/TestTokenizerConfiguration.py
python3 -m unittest thot/tests/unittests/TestTokenizer.py

Notes: : - if there is error due to the file tkeir_mwe.mkl it is normal. You can avoid this error by creating the

the resources model : - the model data directory is mapped into docker-compose file, please check if all the configuration files are inside this directory

Tokenizer Functional tests

python3 -m unittest thot/tests/functional_tests/TestTokenizerSvc.py