pattern_clustering.boost.pattern_clustering

pattern_clustering(lines: list, map_name_dfa: Optional[dict] = None, densities: Optional[list] = None, max_dist: float = 0.6, use_async: bool = True, make_mg: Optional[callable] = None) list

Computes the pattern clustering of input lines without aggregating duplicated PAs.

Parameters
  • lines – A list(str) gathering the input lines.

  • map_name_dfa – A dict{str : Automaton} mapping each pattern name with the corresponding Automaton.

  • densities – A density vector. See make_densities().

  • max_dist – The maximum distance between an element of a cluster and the cluster representative. As distances are normalized, this value should be between 0.0 and 1.0.

  • use_async – Pass True to run computations using async calls. This accelerates computations.

  • make_mg – A MultiGrepFunctor instance.

Returns

A list(int) mapping each line index with its corresponding cluster identifier.