pattern_clustering.boost.pattern_clustering_without_preprocess
- pattern_clustering_without_preprocess(lines: list, map_name_dfa: Optional[dict] = None, densities: Optional[list] = None, max_dist: float = 0.6, use_async: bool = True, make_mg: Optional[callable] = None) list [source]
Computes the pattern clustering of input lines without aggregating duplicated PAs.
- Parameters
lines – A list(str) gathering the input lines.
map_name_dfa – A dict{str : Automaton} mapping each pattern name with the corresponding Automaton.
densities – A density vector. See make_densities().
max_dist – The maximum distance between an element of a cluster and the cluster representative. As distances are normalized, this value should be lower than 1.0.
use_async – Pass True to run computations using async calls. This accelerates computations.
make_mg – A MultiGrepFunctor instance.
- Returns
A list(int) mapping each line index with its corresponding cluster identifier.