sc2_datasets.utils.json_utils ============================= .. py:module:: sc2_datasets.utils.json_utils Functions --------- .. autoapisummary:: sc2_datasets.utils.json_utils.get_json_offsets sc2_datasets.utils.json_utils.index_json_objects sc2_datasets.utils.json_utils.get_object_at_index sc2_datasets.utils.json_utils.merge_json_files sc2_datasets.utils.json_utils.json_to_line sc2_datasets.utils.json_utils.dataset_to_single_json Module Contents --------------- .. py:function:: get_json_offsets(json_filepath: pathlib.Path, offsets_filepath: pathlib.Path | None) -> list[int] Retrieves or creates the list of byte offsets for each JSON object in a large JSON file. :param json_filepath: Specifies the path to the JSON file. :type json_filepath: Path :param offsets_filepath: Specifies the path to the offsets file. If None, offsets are not loaded/saved to disk. :type offsets_filepath: Path | None :returns: Returns the list of byte offsets for each JSON object. :rtype: list[int] .. py:function:: index_json_objects(json_filepath: pathlib.Path) -> list[int] Indexes the starting byte offset of each JSON object. ASSUMES the file is a single pretty-printed array: Line 1: [ Lines 2 to N-1: object followed by a comma (e.g., {...},) Line N: object (no comma) Last Line: ] :param filepath: Specifies the path to the JSON file. :type filepath: Path .. py:function:: get_object_at_index(file_handle: io.BufferedReader, offsets: list[int], index: int) -> dict Retrieves the complete JSON object at the specified index by seeking to the pre-calculated line offset, reading one line, and stripping the trailing comma. Assumes each JSON object is on its own line. :param file_handle: :param offsets: List of byte offsets for each JSON object. :type offsets: list[int] :param index: Index of the JSON object to retrieve. :type index: int :returns: The JSON object parsed into a Python dictionary. :rtype: dict :raises Exception: If reading the line at the specified offset fails. .. py:function:: merge_json_files(input_dir: pathlib.Path, output_filepath: pathlib.Path) -> pathlib.Path Merges all JSON files in the input directory into a single JSON file. Adds additional information fields to each JSON object. :param input_dir: Input directory containing JSON files to be merged. :type input_dir: Path :param output_filepath: Output filepath where the merged JSON file will be written. :type output_filepath: Path :returns: Returns the output filepath where the merged JSON file was written. :rtype: Path .. py:function:: json_to_line(json_replay_path: pathlib.Path, output_file: io.BufferedReader, first_entry: bool, replaypack_name: str | None = None, replaypack_url: str | None = None, filename: str | None = None, old_file_path: pathlib.Path | None = None, sort_keys: bool = False) -> None Loads a JSON replay file, adds additional information, and writes it as a single line. By default the additional information is set to none for compatibility with older replays, and potentially other datasets that do not have this information. :param json_replay_path: Specifies a path to the JSON file. :type json_replay_path: Path :param output_file: Specifies the output file handle where the single-line JSON will be written. :type output_file: BufferedReader :param first_entry: Whether this is the first entry in the output file. :type first_entry: bool :param replaypack_name: Replaypack name to beplaced in additional information, by default None :type replaypack_name: str | None, optional :param replaypack_url: Replaypack url to be placed in additional information, by default None :type replaypack_url: str | None, optional :param filename: Filename from which the JSON came originally, by default None :type filename: str | None, optional :param old_file_path: Path to the file before it was processed, by default None :type old_file_path: Path | None, optional :param sort_keys: If the JSON keys are supposed to be sorted, by default False :type sort_keys: bool, optional .. py:function:: dataset_to_single_json(dataset: sc2_datasets.torch.datasets.sc2_dataset.SC2Dataset, output_filepath: pathlib.Path, sort_keys: bool = False) -> pathlib.Path Iterates over an input SC2Dataset and writes all replays into a single JSon file. :param dataset: Specifies the input dataset that will be processed. :type dataset: SC2Dataset :param output_filepath: Specifies the output filepath where the single JSON file will be written. :type output_filepath: Path :param sort_keys: Whether to sort the keys in the output JSON file, by default False. :type sort_keys: bool, optional :returns: Returns the output filepath where the single JSON file was written. :rtype: Path