sc2_datasets.utils.json_utils

Functions

get_json_offsets(→ list[int])

Retrieves or creates the list of byte offsets for each JSON object

index_json_objects(→ list[int])

Indexes the starting byte offset of each JSON object.

get_object_at_index(→ dict)

Retrieves the complete JSON object at the specified index by seeking

merge_json_files(→ pathlib.Path)

Merges all JSON files in the input directory into a single JSON file.

json_to_line(→ None)

Loads a JSON replay file, adds additional information, and writes it as a single line.

dataset_to_single_json(→ pathlib.Path)

Iterates over an input SC2Dataset and writes all replays into a single JSon file.

Module Contents

get_json_offsets(json_filepath: pathlib.Path, offsets_filepath: pathlib.Path | None) list[int]

Retrieves or creates the list of byte offsets for each JSON object in a large JSON file.

Parameters:
  • json_filepath (Path) – Specifies the path to the JSON file.

  • offsets_filepath (Path | None) – Specifies the path to the offsets file. If None, offsets are not loaded/saved to disk.

Returns:

Returns the list of byte offsets for each JSON object.

Return type:

list[int]

index_json_objects(json_filepath: pathlib.Path) list[int]

Indexes the starting byte offset of each JSON object.

ASSUMES the file is a single pretty-printed array: Line 1: [ Lines 2 to N-1: object followed by a comma (e.g., {…},) Line N: object (no comma) Last Line: ]

Parameters:

filepath (Path) – Specifies the path to the JSON file.

get_object_at_index(file_handle: io.BufferedReader, offsets: list[int], index: int) dict

Retrieves the complete JSON object at the specified index by seeking to the pre-calculated line offset, reading one line, and stripping the trailing comma. Assumes each JSON object is on its own line.

Parameters:
  • file_handle

  • offsets (list[int]) – List of byte offsets for each JSON object.

  • index (int) – Index of the JSON object to retrieve.

Returns:

The JSON object parsed into a Python dictionary.

Return type:

dict

Raises:

Exception – If reading the line at the specified offset fails.

merge_json_files(input_dir: pathlib.Path, output_filepath: pathlib.Path) pathlib.Path

Merges all JSON files in the input directory into a single JSON file. Adds additional information fields to each JSON object.

Parameters:
  • input_dir (Path) – Input directory containing JSON files to be merged.

  • output_filepath (Path) – Output filepath where the merged JSON file will be written.

Returns:

Returns the output filepath where the merged JSON file was written.

Return type:

Path

json_to_line(json_replay_path: pathlib.Path, output_file: io.BufferedReader, first_entry: bool, replaypack_name: str | None = None, replaypack_url: str | None = None, filename: str | None = None, old_file_path: pathlib.Path | None = None, sort_keys: bool = False) None

Loads a JSON replay file, adds additional information, and writes it as a single line. By default the additional information is set to none for compatibility with older replays, and potentially other datasets that do not have this information.

Parameters:
  • json_replay_path (Path) – Specifies a path to the JSON file.

  • output_file (BufferedReader) – Specifies the output file handle where the single-line JSON will be written.

  • first_entry (bool) – Whether this is the first entry in the output file.

  • replaypack_name (str | None, optional) – Replaypack name to beplaced in additional information, by default None

  • replaypack_url (str | None, optional) – Replaypack url to be placed in additional information, by default None

  • filename (str | None, optional) – Filename from which the JSON came originally, by default None

  • old_file_path (Path | None, optional) – Path to the file before it was processed, by default None

  • sort_keys (bool, optional) – If the JSON keys are supposed to be sorted, by default False

dataset_to_single_json(dataset: sc2_datasets.torch.datasets.sc2_dataset.SC2Dataset, output_filepath: pathlib.Path, sort_keys: bool = False) pathlib.Path

Iterates over an input SC2Dataset and writes all replays into a single JSon file.

Parameters:
  • dataset (SC2Dataset) – Specifies the input dataset that will be processed.

  • output_filepath (Path) – Specifies the output filepath where the single JSON file will be written.

  • sort_keys (bool, optional) – Whether to sort the keys in the output JSON file, by default False.

Returns:

Returns the output filepath where the single JSON file was written.

Return type:

Path