GenLouvain»Gen Louvain

Gen Louvain

This "generalized Louvain" MATLAB code for community detection allows the user to define a quality function in terms of a generalized-modularity null model framework and then follows a two-phase iterative procedure similar to the "Louvain" method, with the important distinction that the Louvain passes in the codes here work directly with the modularity matrix, not the adjacency matrix. That is, the main genlouvain.m code can be used with any quality function specified in terms of a modularity matrix; but as such it does not take advantage of any particular structure to those matrices (cf. the excellent findcommunities code).

The most recent package file is GenLouvain2.0.zip. Please see the README file there.

These codes are essentially similar to (but have been optimized to be faster than) those used in our paper "Community structure in time-dependent, multiscale, and multiplex networks", P. J. Mucha, T. Richardson, K. Macon, M. A. Porter and J.-P. Onnela, Science 328, 876-878 (2010). The definitive version of this paper is freely available via referring links on Mucha's "publications" page.

The development of these codes was supported at different times by NSF DMS-0645369 (in particular in the form of supporting Jutla's Research Experiences for Undergraduates project), NIGMS R21-GM099493 (supporting Mucha's contributions of improving such codes as a specific aim of this project), and an EPSRC CASE studentship (Jeub). Additional thanks go to our users whose experiences have helped improve this code, especially Dani Bassett, Marya Bazzi, Jesse Blocher, Mason Porter and Simi Wang.

Please also note that Vincent Traag has released his Louvain Community Detection software with a variety of options, including multislice contributions. I am very hopeful that new and efficient codes will soon be released by Giuseppe Mangioni and his collaborators. I also hear from Bruce Rogers that he is developing an R code for multislice community detection.

Example 1: Small Network

Define the NxN adjacency matrix A representing an undirected network with N small enough that you can expect a fully dense NxN B matrix to fit in memory, the following commands will run genlouvain for modularity at the default resolution (gamma = 1) and rescale modularity appropriately:

gamma = 1;
k = full(sum(A));
twom = sum(k);
B = full(A - gamma*k'*k/twom);
[S,Q] = genlouvain(B);
Q = Q/twom

To force deterministic (cf. pseudorandom) results, see the documentation (help genlouvain) about how to set the 4th input appropriately, or directly reset the random stream yourself (e.g., reset(RandStream.getGlobalStream) or reset(RandStream.getDefaultStream), depending on your version of MATLAB).

Again, the primary point of this code is if you have a different null model than that used in modularity for the standard network cases; but it nevertheless will work (albeit less efficiently) for such standard cases if you define B appropriately.

Example 1a: Larger Network

If your adjacency matrix A is large enough that you cannot expect to fit a fully dense NxN B matrix into memory, then you need to define B as a function handle such that B(i) returns the ith column of the B matrix. For modularity of an undirected network with resolution parameter gamma, replace the B call in Example 1 above with the command

B = @(i) A(:,i) - gamma*k'*k(i)/twom;

Example 2: Ordered Multislice Matrix

Define the cell array A of square symmetric NxN matrices of equal size each representing one of the T ordered, undirected network "slices". After setting values for the parameters gamma and omega, the multislice modularity with nearest-slice identity arcs of equal strength omega is calculated by

N=length(A{1});
T=length(A);
B=spalloc(N*T,N*T,N*N*T+2*N*T);
twomu=0;
for s=1:T
    k=sum(A{s});
    twom=sum(k);
    twomu=twomu+twom;
    indx=[1:N]+(s-1)*N;
    B(indx,indx)=A{s}-gamma*k'*k/twom;
end
twomu=twomu+2*omega*N*(T-1);
B = B + omega*spdiags(ones(N*T,2),[-N,N],N*T,N*T);
[S,Q] = genlouvain(B);
Q = Q/twomu
S = reshape(S,N,T);

Note that the last command conveniently reshapes the community assignment vector into an N (nodes) by T (slices) matrix so that the movement of identified nodes between communities can be easily tracked through the slices.

It is important to stress here that multislice modularity is not limited to undirected network slices, nor does it require the same N nodes to be present in all slices; rather, these restrictions placed in this example are merely for the simplicity of the resulting function calls. For that matter, it does not require that ordered slices be connected by equal-valued nearest-slice couplings; but this is a simple way to treat ordered slices in the absence of other guidance. The multislice modularity methodology is as flexible as one's ability to cast the null models in terms of a quality function. No matter how many indices that quality function might have in its easiest-to-express form, it should be recast in terms of an equivalent modularity matrix B for input into genlouvain codes.

Example 2a: Larger Networks

N=length(A{1});
T=length(A);
ii=[]; jj=[]; vv=[];
twomu=0;
for s=1:T
    indx=[1:N]'+(s-1)*N;
    [i,j,v]=find(A{s});
    ii=[ii;indx(i)]; jj=[jj;indx(j)]; vv=[vv;v];
    k=sum(A{s});
    kv=zeros(N*T,1);
    twom=sum(k);
    twomu=twomu+twom;
    kv(indx)=k/twom;
    kcell{s}=kv;
end
AA = sparse(ii,jj,vv,N*T,N*T);
clear ii jj vv
kvec = full(sum(AA));
AA = AA + omega*spdiags(ones(N*T,2),[-N,N],N*T,N*T);
twomu=twomu+2*omega*N*(T-1);
B = @(i) AA(:,i) - gamma*kcell{ceil(i/(N+eps))}*kvec(i);
[S,Q] = genlouvain(B);
Q = Q/twomu
S = reshape(S,N,T);

Example 3: Categorical Multislice Matrix

The distinction between ordered slices and categorical slices manifests in the call below by the presence of all-to-all identity arcs between slices (cf. the nearest-slice couplings in the above example):

N=length(A{1});
T=length(A);
B=spalloc(N*T,N*T,(N+T)*N*T);
twomu=0;
for s=1:T
    k=sum(A{s});
    twom=sum(k);
    twomu=twomu+twom;
    indx=[1:N]+(s-1)*N;
    B(indx,indx)=A{s}-gamma*k'*k/twom;
end
twomu=twomu+T*omega*N*(T-1);
all2all = N*[(-T+1):-1,1:(T-1)];
B = B + omega*spdiags(ones(N*T,2*T-2),all2all,N*T,N*T);
[S,Q] = genlouvain(B);
Q = Q/twomu
S = reshape(S,N,T);

Example 3a: Larger Networks

N=length(A{1});
T=length(A);
ii=[]; jj=[]; vv=[];
twomu=0;
for s=1:T
    indx=[1:N]'+(s-1)*N;
    [i,j,v]=find(A{s});
    ii=[ii;indx(i)]; jj=[jj;indx(j)]; vv=[vv;v];
    k=sum(A{s});
    kv=zeros(N*T,1);
    twom=sum(k);
    twomu=twomu+twom;
    kv(indx)=k/twom;
    kcell{s}=kv;
end
AA = sparse(ii,jj,vv,N*T,N*T);
clear ii jj vv
kvec = full(sum(AA));
all2all = N*[(-T+1):-1,1:(T-1)];
AA = AA + omega*spdiags(ones(N*T,2*T-2),all2all,N*T,N*T);
twomu=twomu+T*omega*N*(T-1);
B = @(i) sparse(AA(:,i) - gamma*kcell{ceil(i/(N+eps))}*kvec(i));
[S,Q] = genlouvain(B);
S = reshape(S,N,T);

Bipartite Networks

Code for bipartite networks is available upon request.


Related Codes

Coming later, including...

zrand.m calculates the z-Rand score and Variation of Information distance between a pair of partitions. If you use this code, please cite Amanda L. Traud, Eric D. Kelsic, Peter J. Mucha, and Mason A. Porter, "Comparing Community Structure to Characteristics in Online Collegiate Social Networks," SIAM Review 53, 526-543 (2011).

spectral23.m provides various options for spectral bipartitioning and tripartitioning to optimize modularity. If you use this code, please cite T. Richardson, P. J. Mucha, and M. A. Porter, "Spectral tripartitioning of networks," Physical Review E 80, 036111 (2009). Please also be sure that you understand that nothing about this code has been optimized for speed; rather, it was designed for proof of concept.

klnb.m is our implementation of Newman's description of Kernighan-Lin node-swapping steps for improving modularity.


License

All codes included here are provided for broad use under the same terms as a "FreeBSD License", with copyright holders and dates as indicated in individual files or directories.