Size of the largest connected component in a grid in Python












3














Given an R x C grid of 1s and 0s (or True and False values), I need a function that can find the size of the largest connected component of 1s. For example, for the following grid,



grid = [[0, 1, 0, 1],
[1, 1, 1, 0],
[0, 1, 0, 0],
[0, 0, 0, 1]]


The answer is 5.





Here is my implementation:



def largest_connected_component(nrows, ncols, grid):
"""Find largest connected component of 1s on a grid."""

def traverse_component(i, j):
"""Returns no. of unseen elements connected to (i,j)."""
seen[i][j] = True
result = 1

# Check all four neighbours
if i > 0 and grid[i-1][j] and not seen[i-1][j]:
result += traverse_component(i-1, j)
if j > 0 and grid[i][j-1] and not seen[i][j-1]:
result += traverse_component(i, j-1)
if i < len(grid)-1 and grid[i+1][j] and not seen[i+1][j]:
result += traverse_component(i+1, j)
if j < len(grid[0])-1 and grid[i][j+1] and not seen[i][j+1]:
result += traverse_component(i, j+1)
return result

seen = [[False] * ncols for _ in range(nrows)]

# Tracks size of largest connected component found
component_size = 0

for i in range(nrows):
for j in range(ncols):
if grid[i][j] and not seen[i][j]:
temp = traverse_component(i, j)
if temp > component_size:
component_size = temp

return component_size


Feel free to use the following code to generate random grids to test the function,



from random import randint

N = 20
grid = [[randint(0,1) for _ in range(N)] for _ in range(N)]




Problem: My implementation runs too slow (by about a factor of 3). Since I wrote this as a naive approach by myself, I am guessing there are clever optimizations that can be made.



Context: This is for solving the Gridception problem from Round 2 of Google Codejam 2018. My goal is to solve the problem in Python 3. As a result, there is a hard constraint of using only the Python 3 standard library.



I have figured out that this particular portion of the full solution is my performance bottleneck and thus, my solution fails to clear the Large Input due to being too slow.



Thank you so much!










share|improve this question









New contributor




XYZT is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
















  • 1




    This question is very similar to counting the connected components. Maybe the approach using a set can help speed it up a little.
    – Mathias Ettinger
    7 hours ago
















3














Given an R x C grid of 1s and 0s (or True and False values), I need a function that can find the size of the largest connected component of 1s. For example, for the following grid,



grid = [[0, 1, 0, 1],
[1, 1, 1, 0],
[0, 1, 0, 0],
[0, 0, 0, 1]]


The answer is 5.





Here is my implementation:



def largest_connected_component(nrows, ncols, grid):
"""Find largest connected component of 1s on a grid."""

def traverse_component(i, j):
"""Returns no. of unseen elements connected to (i,j)."""
seen[i][j] = True
result = 1

# Check all four neighbours
if i > 0 and grid[i-1][j] and not seen[i-1][j]:
result += traverse_component(i-1, j)
if j > 0 and grid[i][j-1] and not seen[i][j-1]:
result += traverse_component(i, j-1)
if i < len(grid)-1 and grid[i+1][j] and not seen[i+1][j]:
result += traverse_component(i+1, j)
if j < len(grid[0])-1 and grid[i][j+1] and not seen[i][j+1]:
result += traverse_component(i, j+1)
return result

seen = [[False] * ncols for _ in range(nrows)]

# Tracks size of largest connected component found
component_size = 0

for i in range(nrows):
for j in range(ncols):
if grid[i][j] and not seen[i][j]:
temp = traverse_component(i, j)
if temp > component_size:
component_size = temp

return component_size


Feel free to use the following code to generate random grids to test the function,



from random import randint

N = 20
grid = [[randint(0,1) for _ in range(N)] for _ in range(N)]




Problem: My implementation runs too slow (by about a factor of 3). Since I wrote this as a naive approach by myself, I am guessing there are clever optimizations that can be made.



Context: This is for solving the Gridception problem from Round 2 of Google Codejam 2018. My goal is to solve the problem in Python 3. As a result, there is a hard constraint of using only the Python 3 standard library.



I have figured out that this particular portion of the full solution is my performance bottleneck and thus, my solution fails to clear the Large Input due to being too slow.



Thank you so much!










share|improve this question









New contributor




XYZT is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
















  • 1




    This question is very similar to counting the connected components. Maybe the approach using a set can help speed it up a little.
    – Mathias Ettinger
    7 hours ago














3












3








3


2





Given an R x C grid of 1s and 0s (or True and False values), I need a function that can find the size of the largest connected component of 1s. For example, for the following grid,



grid = [[0, 1, 0, 1],
[1, 1, 1, 0],
[0, 1, 0, 0],
[0, 0, 0, 1]]


The answer is 5.





Here is my implementation:



def largest_connected_component(nrows, ncols, grid):
"""Find largest connected component of 1s on a grid."""

def traverse_component(i, j):
"""Returns no. of unseen elements connected to (i,j)."""
seen[i][j] = True
result = 1

# Check all four neighbours
if i > 0 and grid[i-1][j] and not seen[i-1][j]:
result += traverse_component(i-1, j)
if j > 0 and grid[i][j-1] and not seen[i][j-1]:
result += traverse_component(i, j-1)
if i < len(grid)-1 and grid[i+1][j] and not seen[i+1][j]:
result += traverse_component(i+1, j)
if j < len(grid[0])-1 and grid[i][j+1] and not seen[i][j+1]:
result += traverse_component(i, j+1)
return result

seen = [[False] * ncols for _ in range(nrows)]

# Tracks size of largest connected component found
component_size = 0

for i in range(nrows):
for j in range(ncols):
if grid[i][j] and not seen[i][j]:
temp = traverse_component(i, j)
if temp > component_size:
component_size = temp

return component_size


Feel free to use the following code to generate random grids to test the function,



from random import randint

N = 20
grid = [[randint(0,1) for _ in range(N)] for _ in range(N)]




Problem: My implementation runs too slow (by about a factor of 3). Since I wrote this as a naive approach by myself, I am guessing there are clever optimizations that can be made.



Context: This is for solving the Gridception problem from Round 2 of Google Codejam 2018. My goal is to solve the problem in Python 3. As a result, there is a hard constraint of using only the Python 3 standard library.



I have figured out that this particular portion of the full solution is my performance bottleneck and thus, my solution fails to clear the Large Input due to being too slow.



Thank you so much!










share|improve this question









New contributor




XYZT is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











Given an R x C grid of 1s and 0s (or True and False values), I need a function that can find the size of the largest connected component of 1s. For example, for the following grid,



grid = [[0, 1, 0, 1],
[1, 1, 1, 0],
[0, 1, 0, 0],
[0, 0, 0, 1]]


The answer is 5.





Here is my implementation:



def largest_connected_component(nrows, ncols, grid):
"""Find largest connected component of 1s on a grid."""

def traverse_component(i, j):
"""Returns no. of unseen elements connected to (i,j)."""
seen[i][j] = True
result = 1

# Check all four neighbours
if i > 0 and grid[i-1][j] and not seen[i-1][j]:
result += traverse_component(i-1, j)
if j > 0 and grid[i][j-1] and not seen[i][j-1]:
result += traverse_component(i, j-1)
if i < len(grid)-1 and grid[i+1][j] and not seen[i+1][j]:
result += traverse_component(i+1, j)
if j < len(grid[0])-1 and grid[i][j+1] and not seen[i][j+1]:
result += traverse_component(i, j+1)
return result

seen = [[False] * ncols for _ in range(nrows)]

# Tracks size of largest connected component found
component_size = 0

for i in range(nrows):
for j in range(ncols):
if grid[i][j] and not seen[i][j]:
temp = traverse_component(i, j)
if temp > component_size:
component_size = temp

return component_size


Feel free to use the following code to generate random grids to test the function,



from random import randint

N = 20
grid = [[randint(0,1) for _ in range(N)] for _ in range(N)]




Problem: My implementation runs too slow (by about a factor of 3). Since I wrote this as a naive approach by myself, I am guessing there are clever optimizations that can be made.



Context: This is for solving the Gridception problem from Round 2 of Google Codejam 2018. My goal is to solve the problem in Python 3. As a result, there is a hard constraint of using only the Python 3 standard library.



I have figured out that this particular portion of the full solution is my performance bottleneck and thus, my solution fails to clear the Large Input due to being too slow.



Thank you so much!







python performance algorithm python-3.x programming-challenge






share|improve this question









New contributor




XYZT is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




XYZT is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 9 hours ago









Josay

25.5k13987




25.5k13987






New contributor




XYZT is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 11 hours ago









XYZT

1162




1162




New contributor




XYZT is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





XYZT is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






XYZT is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.








  • 1




    This question is very similar to counting the connected components. Maybe the approach using a set can help speed it up a little.
    – Mathias Ettinger
    7 hours ago














  • 1




    This question is very similar to counting the connected components. Maybe the approach using a set can help speed it up a little.
    – Mathias Ettinger
    7 hours ago








1




1




This question is very similar to counting the connected components. Maybe the approach using a set can help speed it up a little.
– Mathias Ettinger
7 hours ago




This question is very similar to counting the connected components. Maybe the approach using a set can help speed it up a little.
– Mathias Ettinger
7 hours ago










2 Answers
2






active

oldest

votes


















3














In programming challenges, proper performances improvement usually come from a smarter algorithm. Unfortunately, I have no algorithm better than the one you have implemented.



I have found a single trick to shave off some time.



Remove all the logic around seen



In all places where you access elements from grid and seen, we have basically: if grid[pos] and not seen[pos].



An idea could be to update grid in place to remove seen elements from it. From an engineering point of view, it is not very nice: I would not expect an function computing the size of the biggest connected components to update the provided input. For a programming challenge, we can probably accept such a thing...



We'd get:



def largest_connected_component(nrows, ncols, grid):
"""Find largest connected component of 1s on a grid."""

def traverse_component(i, j):
"""Returns no. of unseen elements connected to (i,j)."""
grid[i][j] = False
result = 1

# Check all four neighbours
if i > 0 and grid[i-1][j]:
result += traverse_component(i-1, j)
if j > 0 and grid[i][j-1]:
result += traverse_component(i, j-1)
if i < len(grid)-1 and grid[i+1][j]:
result += traverse_component(i+1, j)
if j < len(grid[0])-1 and grid[i][j+1]:
result += traverse_component(i, j+1)
return result

# Tracks size of largest connected component found
component_size = 0

for i in range(nrows):
for j in range(ncols):
if grid[i][j]:
temp = traverse_component(i, j)
if temp > component_size:
component_size = temp

return component_size




Another idea in order to do the same type of things without changing grid could be to store "positive" elements in a set. This also remove the need to check for edge cases of the grid. The great thing is that we can populate that set with less array accesses. This is still pretty hackish:



import itertools

def largest_connected_component(nrows, ncols, grid):
"""Find largest connected component of 1s on a grid."""

def traverse_component(pos):
"""Returns no. of unseen elements connected to (i,j)."""
elements.remove(pos)
i, j = pos
result = 1

# Check all four neighbours
for new_pos in [(i-1, j), (i+1, j), (i, j-1), (i, j+1)]:
if new_pos in elements:
result += traverse_component(new_pos)
return result

# Tracks size of largest connected component found
elements = set()

for i, line in enumerate(grid):
for j, cell in enumerate(line):
if cell:
elements.add((i, j))

return max(traverse_component(pos) for pos in set(elements) if pos in elements)





share|improve this answer























  • Clever! Regarding the thought about a smarter algorithm, I think there may be a smarter algorithm for the overall program, given that this is only one part of a larger program. Speaking of which, you should also note that the list will need to be copied when passed to the function if the list is going to be used later, because of the way lists references work in Python.
    – Graham
    9 hours ago








  • 1




    @Graham yes, that's the whole issue with updating the structure in place. I tried to add a call to copy.deepcopy but we (obviously) lose all the benefits we had from not defining a seen matrix. Also I've updated my answer to add a quick alternative but I didn't get a chance to perform full benchmarks before I left my computer. We'll see next year! Have a good evening!
    – Josay
    9 hours ago








  • 1




    I can assure you, given the rest of my code, that none of the mutable variables here are used again, so they can be sacrificed in place if needed!
    – XYZT
    4 hours ago










  • @Josay Is there a particular reason you import itertools in the last code snippet?
    – XYZT
    2 hours ago



















1














Are you absolutely sure this is the bottleneck? Looking at the linked problem and the solution analysis, I'm not sure if this component is even needed to solve the given problem. It's possible your overall algorithm is inefficient in some way, but obviously I couldn't really tell unless I saw the whole program that this is part of.



@Josay's already given a good improvement, but in the grand scheme of things, this doesn't really shave off that much measurable time for larger grids. The original solution was a pretty good algorithm for solving the problem of largest connected subsections.



General comments



Having three lines here is unnecessary:



temp = traverse_component(i, j)
if temp > component_size:
component_size = temp


because of one nice Python built-in, max:



component_size = max(component_size, traverse_component(i,j))




component_size could be named more descriptively as largest_size.






share|improve this answer





















  • The analysis for the problem says, "For each quadrant center and combination of colors, we want to get the largest connected component where each cell in this connected component has the same color as the color assigned to the quadrant it belongs to." Is this not what I am trying to do? Or did I misunderstand?
    – XYZT
    4 hours ago










  • @XYZT How do you connect the four quadrants to measure the overall component size?
    – Graham
    3 hours ago










  • I hope it's appropriate to link my whole code here: gist.github.com/theXYZT/f13ac490e842552a2f470afac1001b7b
    – XYZT
    2 hours ago











Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});






XYZT is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210641%2fsize-of-the-largest-connected-component-in-a-grid-in-python%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









3














In programming challenges, proper performances improvement usually come from a smarter algorithm. Unfortunately, I have no algorithm better than the one you have implemented.



I have found a single trick to shave off some time.



Remove all the logic around seen



In all places where you access elements from grid and seen, we have basically: if grid[pos] and not seen[pos].



An idea could be to update grid in place to remove seen elements from it. From an engineering point of view, it is not very nice: I would not expect an function computing the size of the biggest connected components to update the provided input. For a programming challenge, we can probably accept such a thing...



We'd get:



def largest_connected_component(nrows, ncols, grid):
"""Find largest connected component of 1s on a grid."""

def traverse_component(i, j):
"""Returns no. of unseen elements connected to (i,j)."""
grid[i][j] = False
result = 1

# Check all four neighbours
if i > 0 and grid[i-1][j]:
result += traverse_component(i-1, j)
if j > 0 and grid[i][j-1]:
result += traverse_component(i, j-1)
if i < len(grid)-1 and grid[i+1][j]:
result += traverse_component(i+1, j)
if j < len(grid[0])-1 and grid[i][j+1]:
result += traverse_component(i, j+1)
return result

# Tracks size of largest connected component found
component_size = 0

for i in range(nrows):
for j in range(ncols):
if grid[i][j]:
temp = traverse_component(i, j)
if temp > component_size:
component_size = temp

return component_size




Another idea in order to do the same type of things without changing grid could be to store "positive" elements in a set. This also remove the need to check for edge cases of the grid. The great thing is that we can populate that set with less array accesses. This is still pretty hackish:



import itertools

def largest_connected_component(nrows, ncols, grid):
"""Find largest connected component of 1s on a grid."""

def traverse_component(pos):
"""Returns no. of unseen elements connected to (i,j)."""
elements.remove(pos)
i, j = pos
result = 1

# Check all four neighbours
for new_pos in [(i-1, j), (i+1, j), (i, j-1), (i, j+1)]:
if new_pos in elements:
result += traverse_component(new_pos)
return result

# Tracks size of largest connected component found
elements = set()

for i, line in enumerate(grid):
for j, cell in enumerate(line):
if cell:
elements.add((i, j))

return max(traverse_component(pos) for pos in set(elements) if pos in elements)





share|improve this answer























  • Clever! Regarding the thought about a smarter algorithm, I think there may be a smarter algorithm for the overall program, given that this is only one part of a larger program. Speaking of which, you should also note that the list will need to be copied when passed to the function if the list is going to be used later, because of the way lists references work in Python.
    – Graham
    9 hours ago








  • 1




    @Graham yes, that's the whole issue with updating the structure in place. I tried to add a call to copy.deepcopy but we (obviously) lose all the benefits we had from not defining a seen matrix. Also I've updated my answer to add a quick alternative but I didn't get a chance to perform full benchmarks before I left my computer. We'll see next year! Have a good evening!
    – Josay
    9 hours ago








  • 1




    I can assure you, given the rest of my code, that none of the mutable variables here are used again, so they can be sacrificed in place if needed!
    – XYZT
    4 hours ago










  • @Josay Is there a particular reason you import itertools in the last code snippet?
    – XYZT
    2 hours ago
















3














In programming challenges, proper performances improvement usually come from a smarter algorithm. Unfortunately, I have no algorithm better than the one you have implemented.



I have found a single trick to shave off some time.



Remove all the logic around seen



In all places where you access elements from grid and seen, we have basically: if grid[pos] and not seen[pos].



An idea could be to update grid in place to remove seen elements from it. From an engineering point of view, it is not very nice: I would not expect an function computing the size of the biggest connected components to update the provided input. For a programming challenge, we can probably accept such a thing...



We'd get:



def largest_connected_component(nrows, ncols, grid):
"""Find largest connected component of 1s on a grid."""

def traverse_component(i, j):
"""Returns no. of unseen elements connected to (i,j)."""
grid[i][j] = False
result = 1

# Check all four neighbours
if i > 0 and grid[i-1][j]:
result += traverse_component(i-1, j)
if j > 0 and grid[i][j-1]:
result += traverse_component(i, j-1)
if i < len(grid)-1 and grid[i+1][j]:
result += traverse_component(i+1, j)
if j < len(grid[0])-1 and grid[i][j+1]:
result += traverse_component(i, j+1)
return result

# Tracks size of largest connected component found
component_size = 0

for i in range(nrows):
for j in range(ncols):
if grid[i][j]:
temp = traverse_component(i, j)
if temp > component_size:
component_size = temp

return component_size




Another idea in order to do the same type of things without changing grid could be to store "positive" elements in a set. This also remove the need to check for edge cases of the grid. The great thing is that we can populate that set with less array accesses. This is still pretty hackish:



import itertools

def largest_connected_component(nrows, ncols, grid):
"""Find largest connected component of 1s on a grid."""

def traverse_component(pos):
"""Returns no. of unseen elements connected to (i,j)."""
elements.remove(pos)
i, j = pos
result = 1

# Check all four neighbours
for new_pos in [(i-1, j), (i+1, j), (i, j-1), (i, j+1)]:
if new_pos in elements:
result += traverse_component(new_pos)
return result

# Tracks size of largest connected component found
elements = set()

for i, line in enumerate(grid):
for j, cell in enumerate(line):
if cell:
elements.add((i, j))

return max(traverse_component(pos) for pos in set(elements) if pos in elements)





share|improve this answer























  • Clever! Regarding the thought about a smarter algorithm, I think there may be a smarter algorithm for the overall program, given that this is only one part of a larger program. Speaking of which, you should also note that the list will need to be copied when passed to the function if the list is going to be used later, because of the way lists references work in Python.
    – Graham
    9 hours ago








  • 1




    @Graham yes, that's the whole issue with updating the structure in place. I tried to add a call to copy.deepcopy but we (obviously) lose all the benefits we had from not defining a seen matrix. Also I've updated my answer to add a quick alternative but I didn't get a chance to perform full benchmarks before I left my computer. We'll see next year! Have a good evening!
    – Josay
    9 hours ago








  • 1




    I can assure you, given the rest of my code, that none of the mutable variables here are used again, so they can be sacrificed in place if needed!
    – XYZT
    4 hours ago










  • @Josay Is there a particular reason you import itertools in the last code snippet?
    – XYZT
    2 hours ago














3












3








3






In programming challenges, proper performances improvement usually come from a smarter algorithm. Unfortunately, I have no algorithm better than the one you have implemented.



I have found a single trick to shave off some time.



Remove all the logic around seen



In all places where you access elements from grid and seen, we have basically: if grid[pos] and not seen[pos].



An idea could be to update grid in place to remove seen elements from it. From an engineering point of view, it is not very nice: I would not expect an function computing the size of the biggest connected components to update the provided input. For a programming challenge, we can probably accept such a thing...



We'd get:



def largest_connected_component(nrows, ncols, grid):
"""Find largest connected component of 1s on a grid."""

def traverse_component(i, j):
"""Returns no. of unseen elements connected to (i,j)."""
grid[i][j] = False
result = 1

# Check all four neighbours
if i > 0 and grid[i-1][j]:
result += traverse_component(i-1, j)
if j > 0 and grid[i][j-1]:
result += traverse_component(i, j-1)
if i < len(grid)-1 and grid[i+1][j]:
result += traverse_component(i+1, j)
if j < len(grid[0])-1 and grid[i][j+1]:
result += traverse_component(i, j+1)
return result

# Tracks size of largest connected component found
component_size = 0

for i in range(nrows):
for j in range(ncols):
if grid[i][j]:
temp = traverse_component(i, j)
if temp > component_size:
component_size = temp

return component_size




Another idea in order to do the same type of things without changing grid could be to store "positive" elements in a set. This also remove the need to check for edge cases of the grid. The great thing is that we can populate that set with less array accesses. This is still pretty hackish:



import itertools

def largest_connected_component(nrows, ncols, grid):
"""Find largest connected component of 1s on a grid."""

def traverse_component(pos):
"""Returns no. of unseen elements connected to (i,j)."""
elements.remove(pos)
i, j = pos
result = 1

# Check all four neighbours
for new_pos in [(i-1, j), (i+1, j), (i, j-1), (i, j+1)]:
if new_pos in elements:
result += traverse_component(new_pos)
return result

# Tracks size of largest connected component found
elements = set()

for i, line in enumerate(grid):
for j, cell in enumerate(line):
if cell:
elements.add((i, j))

return max(traverse_component(pos) for pos in set(elements) if pos in elements)





share|improve this answer














In programming challenges, proper performances improvement usually come from a smarter algorithm. Unfortunately, I have no algorithm better than the one you have implemented.



I have found a single trick to shave off some time.



Remove all the logic around seen



In all places where you access elements from grid and seen, we have basically: if grid[pos] and not seen[pos].



An idea could be to update grid in place to remove seen elements from it. From an engineering point of view, it is not very nice: I would not expect an function computing the size of the biggest connected components to update the provided input. For a programming challenge, we can probably accept such a thing...



We'd get:



def largest_connected_component(nrows, ncols, grid):
"""Find largest connected component of 1s on a grid."""

def traverse_component(i, j):
"""Returns no. of unseen elements connected to (i,j)."""
grid[i][j] = False
result = 1

# Check all four neighbours
if i > 0 and grid[i-1][j]:
result += traverse_component(i-1, j)
if j > 0 and grid[i][j-1]:
result += traverse_component(i, j-1)
if i < len(grid)-1 and grid[i+1][j]:
result += traverse_component(i+1, j)
if j < len(grid[0])-1 and grid[i][j+1]:
result += traverse_component(i, j+1)
return result

# Tracks size of largest connected component found
component_size = 0

for i in range(nrows):
for j in range(ncols):
if grid[i][j]:
temp = traverse_component(i, j)
if temp > component_size:
component_size = temp

return component_size




Another idea in order to do the same type of things without changing grid could be to store "positive" elements in a set. This also remove the need to check for edge cases of the grid. The great thing is that we can populate that set with less array accesses. This is still pretty hackish:



import itertools

def largest_connected_component(nrows, ncols, grid):
"""Find largest connected component of 1s on a grid."""

def traverse_component(pos):
"""Returns no. of unseen elements connected to (i,j)."""
elements.remove(pos)
i, j = pos
result = 1

# Check all four neighbours
for new_pos in [(i-1, j), (i+1, j), (i, j-1), (i, j+1)]:
if new_pos in elements:
result += traverse_component(new_pos)
return result

# Tracks size of largest connected component found
elements = set()

for i, line in enumerate(grid):
for j, cell in enumerate(line):
if cell:
elements.add((i, j))

return max(traverse_component(pos) for pos in set(elements) if pos in elements)






share|improve this answer














share|improve this answer



share|improve this answer








edited 9 hours ago

























answered 9 hours ago









Josay

25.5k13987




25.5k13987












  • Clever! Regarding the thought about a smarter algorithm, I think there may be a smarter algorithm for the overall program, given that this is only one part of a larger program. Speaking of which, you should also note that the list will need to be copied when passed to the function if the list is going to be used later, because of the way lists references work in Python.
    – Graham
    9 hours ago








  • 1




    @Graham yes, that's the whole issue with updating the structure in place. I tried to add a call to copy.deepcopy but we (obviously) lose all the benefits we had from not defining a seen matrix. Also I've updated my answer to add a quick alternative but I didn't get a chance to perform full benchmarks before I left my computer. We'll see next year! Have a good evening!
    – Josay
    9 hours ago








  • 1




    I can assure you, given the rest of my code, that none of the mutable variables here are used again, so they can be sacrificed in place if needed!
    – XYZT
    4 hours ago










  • @Josay Is there a particular reason you import itertools in the last code snippet?
    – XYZT
    2 hours ago


















  • Clever! Regarding the thought about a smarter algorithm, I think there may be a smarter algorithm for the overall program, given that this is only one part of a larger program. Speaking of which, you should also note that the list will need to be copied when passed to the function if the list is going to be used later, because of the way lists references work in Python.
    – Graham
    9 hours ago








  • 1




    @Graham yes, that's the whole issue with updating the structure in place. I tried to add a call to copy.deepcopy but we (obviously) lose all the benefits we had from not defining a seen matrix. Also I've updated my answer to add a quick alternative but I didn't get a chance to perform full benchmarks before I left my computer. We'll see next year! Have a good evening!
    – Josay
    9 hours ago








  • 1




    I can assure you, given the rest of my code, that none of the mutable variables here are used again, so they can be sacrificed in place if needed!
    – XYZT
    4 hours ago










  • @Josay Is there a particular reason you import itertools in the last code snippet?
    – XYZT
    2 hours ago
















Clever! Regarding the thought about a smarter algorithm, I think there may be a smarter algorithm for the overall program, given that this is only one part of a larger program. Speaking of which, you should also note that the list will need to be copied when passed to the function if the list is going to be used later, because of the way lists references work in Python.
– Graham
9 hours ago






Clever! Regarding the thought about a smarter algorithm, I think there may be a smarter algorithm for the overall program, given that this is only one part of a larger program. Speaking of which, you should also note that the list will need to be copied when passed to the function if the list is going to be used later, because of the way lists references work in Python.
– Graham
9 hours ago






1




1




@Graham yes, that's the whole issue with updating the structure in place. I tried to add a call to copy.deepcopy but we (obviously) lose all the benefits we had from not defining a seen matrix. Also I've updated my answer to add a quick alternative but I didn't get a chance to perform full benchmarks before I left my computer. We'll see next year! Have a good evening!
– Josay
9 hours ago






@Graham yes, that's the whole issue with updating the structure in place. I tried to add a call to copy.deepcopy but we (obviously) lose all the benefits we had from not defining a seen matrix. Also I've updated my answer to add a quick alternative but I didn't get a chance to perform full benchmarks before I left my computer. We'll see next year! Have a good evening!
– Josay
9 hours ago






1




1




I can assure you, given the rest of my code, that none of the mutable variables here are used again, so they can be sacrificed in place if needed!
– XYZT
4 hours ago




I can assure you, given the rest of my code, that none of the mutable variables here are used again, so they can be sacrificed in place if needed!
– XYZT
4 hours ago












@Josay Is there a particular reason you import itertools in the last code snippet?
– XYZT
2 hours ago




@Josay Is there a particular reason you import itertools in the last code snippet?
– XYZT
2 hours ago













1














Are you absolutely sure this is the bottleneck? Looking at the linked problem and the solution analysis, I'm not sure if this component is even needed to solve the given problem. It's possible your overall algorithm is inefficient in some way, but obviously I couldn't really tell unless I saw the whole program that this is part of.



@Josay's already given a good improvement, but in the grand scheme of things, this doesn't really shave off that much measurable time for larger grids. The original solution was a pretty good algorithm for solving the problem of largest connected subsections.



General comments



Having three lines here is unnecessary:



temp = traverse_component(i, j)
if temp > component_size:
component_size = temp


because of one nice Python built-in, max:



component_size = max(component_size, traverse_component(i,j))




component_size could be named more descriptively as largest_size.






share|improve this answer





















  • The analysis for the problem says, "For each quadrant center and combination of colors, we want to get the largest connected component where each cell in this connected component has the same color as the color assigned to the quadrant it belongs to." Is this not what I am trying to do? Or did I misunderstand?
    – XYZT
    4 hours ago










  • @XYZT How do you connect the four quadrants to measure the overall component size?
    – Graham
    3 hours ago










  • I hope it's appropriate to link my whole code here: gist.github.com/theXYZT/f13ac490e842552a2f470afac1001b7b
    – XYZT
    2 hours ago
















1














Are you absolutely sure this is the bottleneck? Looking at the linked problem and the solution analysis, I'm not sure if this component is even needed to solve the given problem. It's possible your overall algorithm is inefficient in some way, but obviously I couldn't really tell unless I saw the whole program that this is part of.



@Josay's already given a good improvement, but in the grand scheme of things, this doesn't really shave off that much measurable time for larger grids. The original solution was a pretty good algorithm for solving the problem of largest connected subsections.



General comments



Having three lines here is unnecessary:



temp = traverse_component(i, j)
if temp > component_size:
component_size = temp


because of one nice Python built-in, max:



component_size = max(component_size, traverse_component(i,j))




component_size could be named more descriptively as largest_size.






share|improve this answer





















  • The analysis for the problem says, "For each quadrant center and combination of colors, we want to get the largest connected component where each cell in this connected component has the same color as the color assigned to the quadrant it belongs to." Is this not what I am trying to do? Or did I misunderstand?
    – XYZT
    4 hours ago










  • @XYZT How do you connect the four quadrants to measure the overall component size?
    – Graham
    3 hours ago










  • I hope it's appropriate to link my whole code here: gist.github.com/theXYZT/f13ac490e842552a2f470afac1001b7b
    – XYZT
    2 hours ago














1












1








1






Are you absolutely sure this is the bottleneck? Looking at the linked problem and the solution analysis, I'm not sure if this component is even needed to solve the given problem. It's possible your overall algorithm is inefficient in some way, but obviously I couldn't really tell unless I saw the whole program that this is part of.



@Josay's already given a good improvement, but in the grand scheme of things, this doesn't really shave off that much measurable time for larger grids. The original solution was a pretty good algorithm for solving the problem of largest connected subsections.



General comments



Having three lines here is unnecessary:



temp = traverse_component(i, j)
if temp > component_size:
component_size = temp


because of one nice Python built-in, max:



component_size = max(component_size, traverse_component(i,j))




component_size could be named more descriptively as largest_size.






share|improve this answer












Are you absolutely sure this is the bottleneck? Looking at the linked problem and the solution analysis, I'm not sure if this component is even needed to solve the given problem. It's possible your overall algorithm is inefficient in some way, but obviously I couldn't really tell unless I saw the whole program that this is part of.



@Josay's already given a good improvement, but in the grand scheme of things, this doesn't really shave off that much measurable time for larger grids. The original solution was a pretty good algorithm for solving the problem of largest connected subsections.



General comments



Having three lines here is unnecessary:



temp = traverse_component(i, j)
if temp > component_size:
component_size = temp


because of one nice Python built-in, max:



component_size = max(component_size, traverse_component(i,j))




component_size could be named more descriptively as largest_size.







share|improve this answer












share|improve this answer



share|improve this answer










answered 9 hours ago









Graham

861113




861113












  • The analysis for the problem says, "For each quadrant center and combination of colors, we want to get the largest connected component where each cell in this connected component has the same color as the color assigned to the quadrant it belongs to." Is this not what I am trying to do? Or did I misunderstand?
    – XYZT
    4 hours ago










  • @XYZT How do you connect the four quadrants to measure the overall component size?
    – Graham
    3 hours ago










  • I hope it's appropriate to link my whole code here: gist.github.com/theXYZT/f13ac490e842552a2f470afac1001b7b
    – XYZT
    2 hours ago


















  • The analysis for the problem says, "For each quadrant center and combination of colors, we want to get the largest connected component where each cell in this connected component has the same color as the color assigned to the quadrant it belongs to." Is this not what I am trying to do? Or did I misunderstand?
    – XYZT
    4 hours ago










  • @XYZT How do you connect the four quadrants to measure the overall component size?
    – Graham
    3 hours ago










  • I hope it's appropriate to link my whole code here: gist.github.com/theXYZT/f13ac490e842552a2f470afac1001b7b
    – XYZT
    2 hours ago
















The analysis for the problem says, "For each quadrant center and combination of colors, we want to get the largest connected component where each cell in this connected component has the same color as the color assigned to the quadrant it belongs to." Is this not what I am trying to do? Or did I misunderstand?
– XYZT
4 hours ago




The analysis for the problem says, "For each quadrant center and combination of colors, we want to get the largest connected component where each cell in this connected component has the same color as the color assigned to the quadrant it belongs to." Is this not what I am trying to do? Or did I misunderstand?
– XYZT
4 hours ago












@XYZT How do you connect the four quadrants to measure the overall component size?
– Graham
3 hours ago




@XYZT How do you connect the four quadrants to measure the overall component size?
– Graham
3 hours ago












I hope it's appropriate to link my whole code here: gist.github.com/theXYZT/f13ac490e842552a2f470afac1001b7b
– XYZT
2 hours ago




I hope it's appropriate to link my whole code here: gist.github.com/theXYZT/f13ac490e842552a2f470afac1001b7b
– XYZT
2 hours ago










XYZT is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded


















XYZT is a new contributor. Be nice, and check out our Code of Conduct.













XYZT is a new contributor. Be nice, and check out our Code of Conduct.












XYZT is a new contributor. Be nice, and check out our Code of Conduct.
















Thanks for contributing an answer to Code Review Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210641%2fsize-of-the-largest-connected-component-in-a-grid-in-python%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Understanding the information contained in the Deep Space Network XML data?

Ross-on-Wye

Eastern Orthodox Church