I don't think there are many significant optimizations with regards to reducing the search tree. It took me long enough to get behind it, but the "solution" (not saying there aren't other ways) to part 2 is to not calculate anything more than once. Instead put partial solutions in a dict indexed by the current state and use that cached value if you need it again.
It seems like you are actually constructing all rows with replaced ?
. This won't be viable for part 2, your memory usage will explode. I have a recursive function that calls itself twice whenever a ?
is encountered, once assuming it's a .
, and once a #
.