I need your expertise:

I am about to implement a graph class in c++ and thinking about the right representation. The graphs are simple and undirected. Number of vertices get for now just up to 1000 but maybe higher in the future. Number of edges up to 200k and maybe higher. Each vertex got a color (int) and an id (int). Edges transport no more information than connecting to vertices.

I store the graph and just need to access if x and y are connected or not – this I need very often. After initialising i never remove or add new vertices or edges (N = Number of vertices and M=number of edges given from the start)!

The one representation which is already available to me:

An adjacency list rolled out into just one long list. Along with this representation goes an array with starting indices for each vertex. Storage O(2M) and check if edge between x and y in an average of O(n/m)

A representation I thought of:

the idea is to, instead of rolling out the adjacency list into one array, do it with the adjacency matrix. So storage O(N^2)? Yes but I want to store an edge in one bit except of one byte.(actually 2 bits symmetricallywise) Example: Say N=8, then create an vector<uint8_t> of length 8 (64 bit). Init each entry on 0. If there is an edge between vertex 3 and vertex 5, then add pow(2,5) to the entry of my vector belonging to vertex 3 and symmetrically. So there is a 1 in the entry of vertex 3 at position of vertex 5 exactly when there is an edge between 3 and 5. After inserting my graph into this data structure I think one should be able to access neighborhood in constant time by just a binary operation: Are 3 and 5 connected? Yes if v[3] ^ pow(2,5) == 0. When there are more vertices than 8, then every vertex needs to get more than one entry in the vector and I need to perform one modulo and one division operation for accessing the correct spot.

What do you think of the second solution – is it maybe already known and in use? Am I wrong by thinking about an access of O(1)? Is it to much effort for no real performance improvement?

The reason for loading both representations in one big list is due to cache improvements I was told.

I am happy to get some feedback on this idea. I might be way off – pls be kind in that case 😀

## Answer

A 1000×1000 matrix with 200,000 edges will be quite sparse. Since the graph is undirected, the edges in the matrix will be written twice:

VerticeA -> VerticeB and VerticeB -> VerticeA

You will end up filling up 40% of the matrix, the rest will be empty.

## Edges

The best approach I can think of here is to use a **2D vector of booleans**:

std::vector<std::vector<bool>> matrix(1000, std::vector<bool>(1000, false));

The lookup will take O(1) time and `std::vector<bool>`

saves space by using a single bit for each boolean value. You will end up using 1Mbit or ~128kB (125 kB) of memory.

The storage is not necessarily an array of bool values, but the library implementation may optimize storage so that each value is stored in a single bit.

This will allow you to check for an edge like this:

if( matrix[3][5] ) { // vertice 3 and 5 are connected } else { // vertice 3 and 5 are not connected }

## Vertices

If the id values of the vertices form a continuous range of ints (e.g. 0,1,2,3,…,999) then you could store the color information in a `std::vector<int>`

which has O(1) access time:

std::vector<int> colors(1000);

This would use up memory equal to:

1000 * sizeof(int) = 4000 B ~ 4 kB (3.9 kB)

On the other hand, if the id values don’t form a continuous range of ints it might be a better idea to use a `std::unordered_map<int, int>`

which will on average give you O(1) lookup time.

std::unordered_map<int, int> map;

So e.g. to store and look up the color of vertice 4:

map[4] = 5; // assign color 5 to vertice 4 std::cout << map[4]; // prints 5

The amount of memory used by `std::unordered_map<int, int>`

will be:

1000 * 2 * sizeof(int) = 8000 B ~ 8 kB (7.81 kB)

All together, for **edges**:

Type | Memory | Access time |
---|---|---|

`std::vector<std::vector<bool>>` |
125 kB | O(1) |

and for **vertices**:

Type | Memory | Access time |
---|---|---|

`std::vector<int>` |
3.9 kB | O(1) |

`std::unordered_map<int, int>` |
7.8 kB | O(1) on avg. |