Skip to content

Bug in construct_W.py #58

@mary-design-testing

Description

@mary-design-testing

Hi there,

I'm trying to construct the W weight matrix to work with lap_score on the following simple dataset: employes-region.txt. I've tried the following code, which is provided as an example in file test_lap_score.py:

    kwargs_W = {"metric": "euclidean", "neighbor_mode": "knn", "weight_mode": "heat_kernel", "k": 5, 't': 1}
    W = construct_W.construct_W(X, **kwargs_W)

Unfortunately, it fails with the following exception at line 152 of file construct_W.py:

could not broadcast input array from shape (25) into shape (30) 

I've gone through the code, and I think that the problem's that the dimensions of G are wrong. This is the piece of code involved in the exception:

            t = kwargs['t']
            # compute pairwise euclidean distances
            D = pairwise_distances(X)
            D **= 2
            # sort the distance matrix D in ascending order
            dump = np.sort(D, axis=1)
            idx = np.argsort(D, axis=1)  #  *** 1
            idx_new = idx[:, 0:k+1]  #  *** 2
            dump_new = dump[:, 0:k+1] #  *** 2
            # compute the pairwise heat kernel distances
            dump_heat_kernel = np.exp(-dump_new/(2*t*t))
            G = np.zeros((n_samples*(k+1), 3)) #  *** 2
            G[:, 0] = np.tile(np.arange(n_samples), (k+1, 1)).reshape(-1) #  *** 2
            G[:, 1] = np.ravel(idx_new, order='F') # *** EXCEPTION HERE!!
            G[:, 2] = np.ravel(dump_heat_kernel, order='F')
            # build the sparse affinity matrix W
            W = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples))
            bigger = np.transpose(W) > W
            W = W - W.multiply(bigger) + np.transpose(W).multiply(bigger)

I think that there's a problem at line *** 1. Should it compute idxusing dump? I mean:

            idx = np.argsort(dump, axis=1)  #  *** 1

And the other problem is at the lines *** 2. Shouldn't they use k as a multiplier instead of k+1? That is:

            idx_new = idx[:, 0:k]  #  *** 2
            dump_new = dump[:, 0:k] #  *** 2
            # compute the pairwise heat kernel distances
            dump_heat_kernel = np.exp(-dump_new/(2*t*t))
            G = np.zeros((n_samples*(k), 3)) #  *** 2
            G[:, 0] = np.tile(np.arange(n_samples), (k, 1)).reshape(-1) #  *** 2

I've fixed my local installation using this path and I've run the system on a large collection with 200+ datasets. It works correctly now.

I've seen that there are many other lines in which a similar patch might apply, bu I haven't tried other configuration options.

Thanks! Regards

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions