soton_corenlppy.re.sem_map_lib module¶

Semantic mapping for openie library

soton_corenlppy.re.sem_map_lib.apply_semantic_mappings_to_extractions(mapped_extracted_vars=None, list_parsed_semantic_patterns=None, this_uri=None, stemmer=None, binding_strategy='best_one', namespace_unknown_vocab='unknown_vocab', dict_openie_config=None)[source]¶

apply a list of parsed semantic patterns to a set of mapped extraction variables (originating from a sent) to generate a set of resulting RDF productions in the form (subj pred obj). each extraction is checked against the mapping patterns. the extracted variables are bound to mapping pattern variables, and if all variables are successfully bound the associated production is generated. where multiple binding options exist, all the the top confidence value, either a single production is generated (best_one strategy) or all possibile productions are generated (best_all strategy).

Parameters

mapped_extracted_vars (list) – list of extraction variables mapped using sem_map_lib.map_extractions_to_lexicon()
list_parsed_semantic_patterns (list) – list of parsed semantic mapping patterns from sem_map_lib.import_semantic_mapping_patterns()
this_uri (str) – optional uri value for ‘this’ semantic pattern production entry (default is None). this allows productions to be generated that reference an implied subject uri. for example text for a physical object description where the physical object is not explicitly mentioned but implied.
stemmer (nltk.stem.api.StemmerI) – stemmer to use on phrases (default is None)
binding_strategy (str) – strategy for choosing variable bindings where multiple options exist = best_one|best_all. best_one will generate a production with the first occuring binding with the highest confidence value (i.e. a single production is generated). best_all will generate a production with all occuring bindings with the highest confidence value (i.e. many productions are generated).
namespace_unknown_vocab (str) – namespace to use for production URIs for text that matches the semantic mapping pattern conditions, but for which there are no lexicon matches. a None value will disable use of such non-lexicon text in output productions.
dict_openie_config (dict) – config object returned from soton_corenlppy.re.openie_lib.get_openie_config()

Returns

set of RDF production triples = [ (subj pred obj), … ]

Return type

list

soton_corenlppy.re.sem_map_lib.generate_uri_for_literals(literal_value=None, namespace=None, dict_openie_config=None)[source]¶

generate a safe RDF TTL entry for literal values which do not have any lexicon SKOS URI

Parameters

literal_value (unicode) – literal text
namespace (unicode) – namespace for RDF node
dict_openie_config (dict) – config object returned from soton_corenlppy.re.openie_lib.get_openie_config()

Returns

TTL formatted RDF node

Return type

unicode

soton_corenlppy.re.sem_map_lib.import_semantic_mapping_patterns(filename_patterns=None, dict_openie_config=None)[source]¶

import from disk a set of serialized semantic mapping patterns (newline delimited). see sem_map_lib.parse_semantic_mapping_pattern() for pattern format.

Parameters

filename_patterns (str) – filename of semantic patterns to import
dict_openie_config (dict) – config object returned from soton_corenlppy.re.openie_lib.get_openie_config()

Returns

list of tuple_pattern’s from sem_map_lib.parse_semantic_mapping_pattern()

Return type

list

soton_corenlppy.re.sem_map_lib.map_encoded_extraction_to_lexicon(extraction_vars=None, lex_phrase_index=None, lex_uri_index=None, only_best_matches=False, stemmer=None, max_gram=5, dict_openie_config=None)[source]¶

for an extraction map variable phrases to lexicon phrases. this will semantically ground extracted phrases in the sent to lexicon URIs. a phrase gram size is associated with each variable semantic mapping. all potential mappings are returned but the highest gram size matches are most likely to be a good mapping. the confidence score is based on the precentage of tokens in an extracted variable that match the lexicon phrase. variables without a semantic mapping are removed from the final extraction list. the returned var_phrase and matched_phrase entries are lower() and have had tokens stemmed.

Parameters

extraction_vars (list) – extraction var produced from soton_corenlppy.re.comp_sem_lib.parse_encoded_extraction()
lex_phrase_index (dict) – lexicon phrase index from soton_corenlppy.lexico.lexicon_lib.import_lexicon()
lex_uri_index (dict) – lexicon uri index from soton_corenlppy.lexico.lexicon_lib.import_lexicon()
only_best_matches (bool) – only return variable matches with the highest confidence score for that variable
stemmer (nltk.stem.api.StemmerI) – stemmer to use on phrases (default is None)
max_gram (int) – maximum phrase gram size to check for matches in lexicon. larger gram sizes means more lexicon checks, which is slower.
dict_openie_config (dict) – config object returned from soton_corenlppy.re.openie_lib.get_openie_config()

Returns

semantically mapped sent extractions = [ [ (var_name, var_phrase, var_gram_size, [lexicon_uri, schema_uri, matched_phrase, phrase_gram_size, confidence_score] ), … ]

Return type

list

soton_corenlppy.re.sem_map_lib.parse_semantic_mapping_pattern(str_pattern=None, dict_openie_config=None)[source]¶

parse a serialized semantic mapping pattern into a structure suitable for efficient mapping on demand. phrase constraints must have spaces replaced with the ‘_’ token to avoid parsing ambiguity.

example patterns below

{arg1:schema=<http://collection.britishmuseum.org/id/thesauri/object>|<http://collection.britishmuseum.org/id/thesauri/subject>} -> {this rso:PX_object_type arg1}
{arg1:schema=<http://collection.britishmuseum.org/id/thesauri/object>} {rel1:phrase=carved|weaved|sculpted} {prep1:phrase=from} {arg2:schema=<http://collection.britishmuseum.org/id/thesauri/material>} -> {this crm:P45_consists_of arg2}
{arg1:schema=<http://collection.britishmuseum.org/id/thesauri/object>} {rel1:skos=<http://collection.britishmuseum.org/id/thesauri/script/carved>} {prep1:phrase=from} {arg2:schema=<http://collection.britishmuseum.org/id/thesauri/material>} -> {this crm:P45_consists_of arg2}

Parameters

str_pattern (unicode) – serialized semantic mapping pattern
dict_openie_config (dict) – config object returned from soton_corenlppy.re.openie_lib.get_openie_config()

Returns

tuple_pattern = ( list_conditions, tuple_production ). list_conditions = [ ( var_type, var_name, schema_uri[], phrase[], skos_uri[] ), … ]. tuple_production == [ ( subject, object, predicate ), … ].

Return type

tuple

soton_corenlppy.re.sem_map_lib module¶

soton_corenlppy

Navigation

Related Topics