Affinity propagation did not converge, this model will not have any cluster centers

Question:

When I try to cluster using affinity propagation, the below error occurs and the number of clusters is one.

"...anacondalibsite-packagessklearncluster_affinity_propagation.py:246: ConvergenceWarning: Affinity propagation did not converge, this model will not have any cluster centers.
  warnings.warn("Affinity propagation did not converge, this model ""

Below is the code I tried.

def build_feature_matrix(documents, feature_type='frequency',
                         ngram_range=(1, 1), min_df=0.0, max_df=1.0):

    feature_type = feature_type.lower().strip()  
    
    if feature_type == 'binary':
        vectorizer = CountVectorizer(binary=True, min_df=min_df,
                                     max_df=max_df, ngram_range=ngram_range)
    elif feature_type == 'frequency':
        vectorizer = CountVectorizer(binary=False, min_df=min_df,
                                     max_df=max_df, ngram_range=ngram_range)
    elif feature_type == 'tfidf':
        vectorizer = TfidfVectorizer(min_df=min_df, max_df=max_df, 
                                     ngram_range=ngram_range)
    else:
        raise Exception("Wrong feature type entered. Possible values: 'binary', 'frequency', 'tfidf'")

    feature_matrix = vectorizer.fit_transform(documents).astype(float)
    
    return vectorizer, feature_matrix

vectorizer, feature_matrix = build_feature_matrix(filtered_list_6,
                                                  feature_type='tfidf',
                                                  min_df=0.15, max_df=0.85,
                                                  ngram_range=(1, 2))

def affinity_propagation(feature_matrix):
    
    sim = feature_matrix * feature_matrix.T
    sim = sim.todense()
    ap = AffinityPropagation()
    ap.fit(sim)
    clusters = ap.labels_          
    return ap, clusters

ap_obj, clusters = affinity_propagation(feature_matrix=feature_matrix)
df[len(df.columns)] = clusters

c = Counter(clusters)   
print(c.items())

total_clusters = len(c)
print('Total Clusters:', total_clusters)

Could someone point what I am doing wrong here?

Thanks in advance!

Asked By: lse23

||

Answers:

I could change the damping value, max_iter and preference values to eliminate the issue. Initially you can start with damping = 0.9, max_iter = 1000.

You can change the preference value as needed and this will change the number of clusters generated by the model

Answered By: lse23

The problem is that in the dataset, you have duplicated lines. If you have duplicated lines, the model will never converge

Answered By: Matias

according to this link faq about affinity you can try to increase the damping factor up to 0.95 and if the algorithm still did not converge then try to increase the iteration limit.

Answered By: Fady Samann
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.