Efficient Battleship Board Generation and Data Management with Python – Russell Ballestrini

Topic: https://russell.ballestrini.net/exploring-battleship-board-generation/

russell 1y, 91d ago [edited]

Nobody seems to have a definite answer to how many unique non-overlapping battleship configurations.

There were 10 dupes per 1M sampled using sort -u | wc -l

For 30B there could be 300,000 dupes or more statistically, in the total set.

With overlaps 74B as a best guess, without was about 30B, if this is wildly off let me know.

When I took two 1M samples (two million lines) we had 49 dupes.

cat battleship_boards.json battleship_boards.json.orig | sort -u | wc -l
1999951

>>> 2000000 - 1999951
49

remark

russell 1y, 91d ago [edited]

Benchmark: Ad Hoc vs Precomputed Boards for Battleship Solvers

Ad Hoc Computation:

The place_ships function generates boards randomly. Its time complexity is O(n), where n is the grid size (100 in this case). However, due to the random nature of ship placement, there might be multiple attempts required to place all ships successfully, potentially increasing the effective time complexity.

For each scale:

100 boards: 100 * O(n) = O(100n)
1000 boards: 1000 * O(n) = O(1000n)
10000 boards: 10000 * O(n) = O(10000n)
100000 boards: 100000 * O(n) = O(100000n)

Given the O(n) complexity of Battleship puzzles, we conducted a benchmark to compare ad hoc board generation with loading precomputed data. Here's the code and results:

import time
import random
import itertools

#import json
import orjson as json

from battleship_boards import place_ships


def load_random_subset(file_path, num_records):
    with open(file_path, "r") as f:
        total_lines = sum(1 for _ in f)
        f.seek(0)

        start_line = random.randint(0, total_lines - num_records)
        skipped_lines = itertools.islice(f, start_line)

        selected_lines = []
        for line in skipped_lines:
            try:
                parsed_board = json.loads(line.strip())
                selected_lines.append(parsed_board)
                if len(selected_lines) == num_records:
                    break
            except json.JSONDecodeError:
                continue

        return selected_lines


def generate_boards_ad_hoc(num_boards):
    return [place_ships() for _ in range(num_boards)]


def benchmark(scale):
    print(f"\nBenchmarking for {scale} boards:")

    start_time = time.time()
    ad_hoc_boards = generate_boards_ad_hoc(scale)
    ad_hoc_time = time.time() - start_time

    start_time = time.time()
    random_subset_boards = load_random_subset("battleship_boards_merge.json", scale)
    random_subset_time = time.time() - start_time

    print(f"\nAd hoc generation time: {ad_hoc_time:.2f} seconds")
    print(f"Random subset loading time: {random_subset_time:.2f} seconds")
    print(f"Efficiency gain: {ad_hoc_time / random_subset_time:.2f}x\n")


# Run the benchmark
for scale in [100, 1000, 10000, 100000, 1000000]:
    benchmark(scale)

Results:

(base) [fox@blanka battleship-solvers]$ python battleship_boards_benchmark.py

Benchmarking for 100 boards:

Ad hoc generation time: 0.00 seconds
Random subset loading time: 0.92 seconds
Efficiency gain: 0.00x


Benchmarking for 1000 boards:

Ad hoc generation time: 0.02 seconds
Random subset loading time: 0.92 seconds
Efficiency gain: 0.02x


Benchmarking for 10000 boards:

Ad hoc generation time: 0.21 seconds
Random subset loading time: 0.99 seconds
Efficiency gain: 0.21x


Benchmarking for 100000 boards:

Ad hoc generation time: 2.11 seconds
Random subset loading time: 1.66 seconds
Efficiency gain: 1.27x


Benchmarking for 1000000 boards:

Ad hoc generation time: 22.23 seconds
Random subset loading time: 1.95 seconds
Efficiency gain: 11.40x

Key findings:

Ad hoc generation is faster for small to medium scales (up to 10,000 boards).
Precomputed data shows slight advantages at larger scales (100,000+ boards).

These results align with our initial Big O complexity analysis, confirming the efficiency of ad hoc generation for typical puzzle sizes.

Interestingly, 1 million boards represent only about 1/30,000th of the approximate total number of possible Battleship configurations.

remark

Benchmark: Ad Hoc vs Precomputed Boards for Battleship Solvers

Ad Hoc Computation:

For each scale:

100 boards: 100 * O(n) = O(100n)
    1000 boards: 1000 * O(n) = O(1000n)
    10000 boards: 10000 * O(n) = O(10000n)
    100000 boards: 100000 * O(n) = O(100000n)

Given the O(n) complexity of Battleship puzzles, we conducted a benchmark to compare ad hoc board generation with loading precomputed data. Here's the code and results:

```python

import time
import random
import itertools

#import json
import orjson as json

from battleship_boards import place_ships

def load_random_subset(file_path, num_records):
    with open(file_path, "r") as f:
        total_lines = sum(1 for _ in f)
        f.seek(0)

start_line = random.randint(0, total_lines - num_records)
        skipped_lines = itertools.islice(f, start_line)

selected_lines = []
        for line in skipped_lines:
            try:
                parsed_board = json.loads(line.strip())
                selected_lines.append(parsed_board)
                if len(selected_lines) == num_records:
                    break
            except json.JSONDecodeError:
                continue

return selected_lines

def generate_boards_ad_hoc(num_boards):
    return [place_ships() for _ in range(num_boards)]

def benchmark(scale):
    print(f"\nBenchmarking for {scale} boards:")

start_time = time.time()
    ad_hoc_boards = generate_boards_ad_hoc(scale)
    ad_hoc_time = time.time() - start_time

start_time = time.time()
    random_subset_boards = load_random_subset("battleship_boards_merge.json", scale)
    random_subset_time = time.time() - start_time

print(f"\nAd hoc generation time: {ad_hoc_time:.2f} seconds")
    print(f"Random subset loading time: {random_subset_time:.2f} seconds")
    print(f"Efficiency gain: {ad_hoc_time / random_subset_time:.2f}x\n")

# Run the benchmark
for scale in [100, 1000, 10000, 100000, 1000000]:
    benchmark(scale)

```

Results:

```
(base) [fox@blanka battleship-solvers]$ python battleship_boards_benchmark.py

Benchmarking for 100 boards:

Ad hoc generation time: 0.00 seconds
Random subset loading time: 0.92 seconds
Efficiency gain: 0.00x

Benchmarking for 1000 boards:

Ad hoc generation time: 0.02 seconds
Random subset loading time: 0.92 seconds
Efficiency gain: 0.02x

Benchmarking for 10000 boards:

Ad hoc generation time: 0.21 seconds
Random subset loading time: 0.99 seconds
Efficiency gain: 0.21x

Benchmarking for 100000 boards:

Ad hoc generation time: 2.11 seconds
Random subset loading time: 1.66 seconds
Efficiency gain: 1.27x

Benchmarking for 1000000 boards:

Ad hoc generation time: 22.23 seconds
Random subset loading time: 1.95 seconds
Efficiency gain: 11.40x
```

Key findings:

* Ad hoc generation is faster for small to medium scales (up to 10,000 boards).
* Precomputed data shows slight advantages at larger scales (100,000+ boards).

These results align with our initial Big O complexity analysis, confirming the efficiency of ad hoc generation for typical puzzle sizes.

Interestingly, 1 million boards represent only about 1/30,000th of the approximate total number of possible Battleship configurations.

russell 1y, 89d ago

Still not sure but here are some approximations using math/python.

🚢 Exploring Battleship configurations revealed surprising results! Initially, ChatGPT guessed 49 trillion configurations for the no-touch rule. However, our exploration showed a stark contrast.

First, we attempted a recursive simulation to calculate configurations, but it timed out due to the complexity of enforcing the no-touch rule, which requires checking every possible placement and adjusting the grid dynamically. This approach was inefficient and impractical.

Next, we shifted to a combinatorial approach, calculating maximum placements for each ship and applying a buffer to account for the no-touch rule. This method was more efficient, avoiding the recursive complexity by estimating blocked space without checking every configuration.

The results were eye-opening: 77.4 billion configurations when ships can touch, but only 5.3 million when they cannot. The no-touch rule drastically reduces possibilities by blocking more grid space, highlighting how small rule changes can have huge impacts.

Our findings show that the initial guess was off by orders of magnitude for the no-touch rule, emphasizing the importance of methodical analysis in complex systems.

https://chatgpt.com/share/66e70579-6460-8000-9d09-a668caf98ddc

GM!

Does anyone know how many Battleship combinations there are?

https://russell.ballestrini.net/exploring-battleship-board-generation/

Does only God know the number? 🦊🐺

remark

Still not sure but here are some approximations using math/python.

🚢 Exploring Battleship configurations revealed surprising results! Initially, ChatGPT guessed 49 trillion configurations for the no-touch rule. However, our exploration showed a stark contrast.

Our findings show that the initial guess was off by orders of magnitude for the no-touch rule, emphasizing the importance of methodical analysis in complex systems.

https://chatgpt.com/share/66e70579-6460-8000-9d09-a668caf98ddc

GM!

Does anyone know how many Battleship combinations there are?

https://russell.ballestrini.net/exploring-battleship-board-generation/

Does only God know the number? 🦊🐺

russell 1y, 88d ago

If 77.4B is the accurate number we should more the double all our estimations for the full dataset.

remark

russell 1y, 53d ago

Why should I care?

You should care because this webpage provides a detailed explanation and a step-by-step guide on how to efficiently generate and manage large datasets for the popular game "Battleship" using Python. The techniques and strategies discussed can be useful for various applications, such as:

Exploring the vast solution space of complex problems: The multiprocessing approach used in this experiment can be adapted to solve other complex computational problems, allowing for efficient exploration of solution spaces that would otherwise be impractical to tackle.
Data analysis and machine learning: The generated dataset can be used for various data analysis tasks or as training data for machine learning models, potentially leading to new insights or improvements in performance.
Problem-solving and optimization: By understanding the process and techniques used in this experiment, you can apply similar strategies to optimize your own computational tasks and improve overall performance.
AI-assisted problem-solving: The collaboration between the author and an AI assistant highlights the potential of AI in guiding and enhancing problem-solving processes, which can be beneficial for anyone looking to leverage AI in their own projects.

Overall, understanding the content of this webpage can provide valuable insights and techniques that can be applied to various fields, making it worth your time to explore and learn from it.

remark

**Why should I care?**

1. Exploring the vast solution space of complex problems: The multiprocessing approach used in this experiment can be adapted to solve other complex computational problems, allowing for efficient exploration of solution spaces that would otherwise be impractical to tackle.

1. Data analysis and machine learning: The generated dataset can be used for various data analysis tasks or as training data for machine learning models, potentially leading to new insights or improvements in performance.

1. Problem-solving and optimization: By understanding the process and techniques used in this experiment, you can apply similar strategies to optimize your own computational tasks and improve overall performance.

1. AI-assisted problem-solving: The collaboration between the author and an AI assistant highlights the potential of AI in guiding and enhancing problem-solving processes, which can be beneficial for anyone looking to leverage AI in their own projects.

Overall, understanding the content of this webpage can provide valuable insights and techniques that can be applied to various fields, making it worth your time to explore and learn from it.