Affinity propagation did not converge, this model will not have any cluster centers
Question:
When I try to cluster using affinity propagation, the below error occurs and the number of clusters is one.
"...anacondalibsite-packagessklearncluster_affinity_propagation.py:246: ConvergenceWarning: Affinity propagation did not converge, this model will not have any cluster centers.
warnings.warn("Affinity propagation did not converge, this model ""
Below is the code I tried.
def build_feature_matrix(documents, feature_type='frequency',
ngram_range=(1, 1), min_df=0.0, max_df=1.0):
feature_type = feature_type.lower().strip()
if feature_type == 'binary':
vectorizer = CountVectorizer(binary=True, min_df=min_df,
max_df=max_df, ngram_range=ngram_range)
elif feature_type == 'frequency':
vectorizer = CountVectorizer(binary=False, min_df=min_df,
max_df=max_df, ngram_range=ngram_range)
elif feature_type == 'tfidf':
vectorizer = TfidfVectorizer(min_df=min_df, max_df=max_df,
ngram_range=ngram_range)
else:
raise Exception("Wrong feature type entered. Possible values: 'binary', 'frequency', 'tfidf'")
feature_matrix = vectorizer.fit_transform(documents).astype(float)
return vectorizer, feature_matrix
vectorizer, feature_matrix = build_feature_matrix(filtered_list_6,
feature_type='tfidf',
min_df=0.15, max_df=0.85,
ngram_range=(1, 2))
def affinity_propagation(feature_matrix):
sim = feature_matrix * feature_matrix.T
sim = sim.todense()
ap = AffinityPropagation()
ap.fit(sim)
clusters = ap.labels_
return ap, clusters
ap_obj, clusters = affinity_propagation(feature_matrix=feature_matrix)
df[len(df.columns)] = clusters
c = Counter(clusters)
print(c.items())
total_clusters = len(c)
print('Total Clusters:', total_clusters)
Could someone point what I am doing wrong here?
Thanks in advance!
Answers:
I could change the damping value, max_iter and preference values to eliminate the issue. Initially you can start with damping = 0.9, max_iter = 1000.
You can change the preference value as needed and this will change the number of clusters generated by the model
The problem is that in the dataset, you have duplicated lines. If you have duplicated lines, the model will never converge
according to this link faq about affinity you can try to increase the damping factor up to 0.95 and if the algorithm still did not converge then try to increase the iteration limit.
When I try to cluster using affinity propagation, the below error occurs and the number of clusters is one.
"...anacondalibsite-packagessklearncluster_affinity_propagation.py:246: ConvergenceWarning: Affinity propagation did not converge, this model will not have any cluster centers.
warnings.warn("Affinity propagation did not converge, this model ""
Below is the code I tried.
def build_feature_matrix(documents, feature_type='frequency',
ngram_range=(1, 1), min_df=0.0, max_df=1.0):
feature_type = feature_type.lower().strip()
if feature_type == 'binary':
vectorizer = CountVectorizer(binary=True, min_df=min_df,
max_df=max_df, ngram_range=ngram_range)
elif feature_type == 'frequency':
vectorizer = CountVectorizer(binary=False, min_df=min_df,
max_df=max_df, ngram_range=ngram_range)
elif feature_type == 'tfidf':
vectorizer = TfidfVectorizer(min_df=min_df, max_df=max_df,
ngram_range=ngram_range)
else:
raise Exception("Wrong feature type entered. Possible values: 'binary', 'frequency', 'tfidf'")
feature_matrix = vectorizer.fit_transform(documents).astype(float)
return vectorizer, feature_matrix
vectorizer, feature_matrix = build_feature_matrix(filtered_list_6,
feature_type='tfidf',
min_df=0.15, max_df=0.85,
ngram_range=(1, 2))
def affinity_propagation(feature_matrix):
sim = feature_matrix * feature_matrix.T
sim = sim.todense()
ap = AffinityPropagation()
ap.fit(sim)
clusters = ap.labels_
return ap, clusters
ap_obj, clusters = affinity_propagation(feature_matrix=feature_matrix)
df[len(df.columns)] = clusters
c = Counter(clusters)
print(c.items())
total_clusters = len(c)
print('Total Clusters:', total_clusters)
Could someone point what I am doing wrong here?
Thanks in advance!
I could change the damping value, max_iter and preference values to eliminate the issue. Initially you can start with damping = 0.9, max_iter = 1000.
You can change the preference value as needed and this will change the number of clusters generated by the model
The problem is that in the dataset, you have duplicated lines. If you have duplicated lines, the model will never converge
according to this link faq about affinity you can try to increase the damping factor up to 0.95 and if the algorithm still did not converge then try to increase the iteration limit.