pattern_clustering.pattern_automaton.PatternAutomaton

class PatternAutomaton(word: str, map_name_dfa: dict, make_mg: Optional[callable] = None, filtered_patterns: Optional[set] = None)[source]

Bases: Automaton

A PatternAutomaton models a string at the pattern level using a automaton-like structure where each vertex corresponds to a string index; each arc corresponds to an infix and its corresponding pattern.

Constructs the PatternAutomaton related to an input word according to a collection of patterns and according to a multi_grep strategy.

Parameters
  • word (str) – The input string.

  • map_name_dfa (dict) – The pattern collection mapping each pattern name (str) with its corresponding Automaton instance. The "any" pattern is always ignored.

  • filtered_patterns (set) – A subset (possibly empty) of map_name_dfa.keys() keying the types that must be caught my multi_grep, but not appearing in the arcs involved in the PatternAutomaton. It may be used for instance to drop spaces and get a smaller PatternAutomaton, but the position of spaces in the original lines will be lost.

Methods

accepts

add_edge

add_vertex

alphabet

delta

delta_word

edge

Retrieve the edge from a vertex u to vertex v.

edges

finals

get_infix

Retrieves the infix (substring) related to an edge.

get_slice

Retrieves the slice (pair of uint indices delimiting a substring) related to an edge.

has_edge

has_vertex

in_edges

initial

is_complete

is_deterministic

is_final

is_finite

is_initial

label

num_edges

num_vertices

out_edges

remove_edge

remove_vertex

set_final

set_initial

sigma

source

target

to_dot

vertices

Attributes

adjacencies

directed

edge(q: int, r: int, a: chr) tuple

Retrieve the edge from a vertex u to vertex v. :param u: The source of the edge. :param v: The target of the edge.

Returns

(e, True) if it exists a single edge from u to v, (None, False) otherwise.

get_infix(e: EdgeDescriptor) str[source]

Retrieves the infix (substring) related to an edge.

Parameters

e (EdgeDescriptor) – The queried edge identifier.

Returns

The infix related to an arbitrary edge of this PatternAutomaton instance.

get_slice(e: EdgeDescriptor) tuple[source]

Retrieves the slice (pair of uint indices delimiting a substring) related to an edge.

Parameters

e (EdgeDescriptor) – The queried edge identifier.

Returns

The slice related to an arbitrary edge of this PatternAutomaton instance.