soton_corenlppy.re.openie_lib module

Open information extraction library

soton_corenlppy.re.openie_lib.calc_inverted_index_pos_phrase_patterns(dict_pos_phrase_patterns={'ADJECTIVE_P': {'JJ', 'JJR', 'JJS'}, 'ADVERB_P': {'RB', 'RBR', 'RBS'}, 'DETERMINER_P': {'DT', 'EX', 'PDT', 'WDT'}, 'NOUN_P': {'NN', 'NNS'}, 'PREPOSITION_P': {'IN', 'IN/that', 'TO'}, 'PRONOUN_P': {'PP', 'PP$', 'WP', 'WP$'}, 'PROPER_NOUN_P': {'NP', 'NPS'}, 'VERB_AUXILLARY_P': {'VB', 'VBD', 'VBG', 'VBN', 'VBP', 'VBZ', 'VD', 'VDD', 'VDG', 'VDN', 'VDP', 'VDZ', 'VH', 'VHD', 'VHG', 'VHN', 'VHP', 'VHZ'}, 'VERB_P': {'VV', 'VVD', 'VVG', 'VVN', 'VVP', 'VVZ'}}, dict_openie_config=None)[source]

generate an inverted index from a dict of POS phrase patterns.

Parameters
  • dict_pos_phrase_patterns (dict) – definition of the set of POS tags that constitute each phrase pattern. POS tags cannot be shared between phrase patterns. { phrase_name : set( pos_tag, pos_tag … ) }. default is setup for processing tagged output using TreeTagger.

  • dict_openie_config (dict) – config object returned from openie_lib.get_openie_config()

Returns

inverted POS pattern index = { pos_tag : phrase_name }

Return type

dict

soton_corenlppy.re.openie_lib.get_openie_config(**kwargs)[source]

return a openie config object.

note: a config object approach is used, as opposed to a global variable, to allow openie_lib functions to work in a multi-threaded environment
Returns

configuration settings to be used by all openie_lib functions

Return type

dict