sc2_datasets.utils.json_utils¶
Functions¶
|
Retrieves or creates the list of byte offsets for each JSON object |
|
Indexes the starting byte offset of each JSON object. |
|
Retrieves the complete JSON object at the specified index by seeking |
|
Merges all JSON files in the input directory into a single JSON file. |
|
Loads a JSON replay file, adds additional information, and writes it as a single line. |
|
Iterates over an input SC2Dataset and writes all replays into a single JSon file. |
Module Contents¶
- get_json_offsets(json_filepath: pathlib.Path, offsets_filepath: pathlib.Path | None) list[int]¶
Retrieves or creates the list of byte offsets for each JSON object in a large JSON file.
- Parameters:
json_filepath (Path) – Specifies the path to the JSON file.
offsets_filepath (Path | None) – Specifies the path to the offsets file. If None, offsets are not loaded/saved to disk.
- Returns:
Returns the list of byte offsets for each JSON object.
- Return type:
list[int]
- index_json_objects(json_filepath: pathlib.Path) list[int]¶
Indexes the starting byte offset of each JSON object.
ASSUMES the file is a single pretty-printed array: Line 1: [ Lines 2 to N-1: object followed by a comma (e.g., {…},) Line N: object (no comma) Last Line: ]
- Parameters:
filepath (Path) – Specifies the path to the JSON file.
- get_object_at_index(file_handle: io.BufferedReader, offsets: list[int], index: int) dict¶
Retrieves the complete JSON object at the specified index by seeking to the pre-calculated line offset, reading one line, and stripping the trailing comma. Assumes each JSON object is on its own line.
- Parameters:
file_handle
offsets (list[int]) – List of byte offsets for each JSON object.
index (int) – Index of the JSON object to retrieve.
- Returns:
The JSON object parsed into a Python dictionary.
- Return type:
dict
- Raises:
Exception – If reading the line at the specified offset fails.
- merge_json_files(input_dir: pathlib.Path, output_filepath: pathlib.Path) pathlib.Path¶
Merges all JSON files in the input directory into a single JSON file. Adds additional information fields to each JSON object.
- Parameters:
input_dir (Path) – Input directory containing JSON files to be merged.
output_filepath (Path) – Output filepath where the merged JSON file will be written.
- Returns:
Returns the output filepath where the merged JSON file was written.
- Return type:
Path
- json_to_line(json_replay_path: pathlib.Path, output_file: io.BufferedReader, first_entry: bool, replaypack_name: str | None = None, replaypack_url: str | None = None, filename: str | None = None, old_file_path: pathlib.Path | None = None, sort_keys: bool = False) None¶
Loads a JSON replay file, adds additional information, and writes it as a single line. By default the additional information is set to none for compatibility with older replays, and potentially other datasets that do not have this information.
- Parameters:
json_replay_path (Path) – Specifies a path to the JSON file.
output_file (BufferedReader) – Specifies the output file handle where the single-line JSON will be written.
first_entry (bool) – Whether this is the first entry in the output file.
replaypack_name (str | None, optional) – Replaypack name to beplaced in additional information, by default None
replaypack_url (str | None, optional) – Replaypack url to be placed in additional information, by default None
filename (str | None, optional) – Filename from which the JSON came originally, by default None
old_file_path (Path | None, optional) – Path to the file before it was processed, by default None
sort_keys (bool, optional) – If the JSON keys are supposed to be sorted, by default False
- dataset_to_single_json(dataset: sc2_datasets.torch.datasets.sc2_dataset.SC2Dataset, output_filepath: pathlib.Path, sort_keys: bool = False) pathlib.Path¶
Iterates over an input SC2Dataset and writes all replays into a single JSon file.
- Parameters:
dataset (SC2Dataset) – Specifies the input dataset that will be processed.
output_filepath (Path) – Specifies the output filepath where the single JSON file will be written.
sort_keys (bool, optional) – Whether to sort the keys in the output JSON file, by default False.
- Returns:
Returns the output filepath where the single JSON file was written.
- Return type:
Path