sc2_datasets.validators.multiprocess_validator¶
Functions¶
|
Exposes logic for multiprocess validation of the replays. |
|
Exposes the logic for validating replays using multiple processes. |
Module Contents¶
- validate_integrity_mp(list_of_replays: list[pathlib.Path], n_workers: int) tuple[set[pathlib.Path], set[pathlib.Path]]¶
Exposes logic for multiprocess validation of the replays. Validates if the replay can be parsed by using SC2ReplayData by spawning multiple processes.
- Parameters:
list_of_replays (list[Path]) – Specifies a list of paths to replays that should be checked by the validator.
n_workers (int) – Specifies the number of workers (processes) that will be used for validating replays. Must be a positive int.
- Returns:
Returns a tuple that contains (all validated replays, files to be skipped).
- Return type:
tuple[set[Path], set[Path]]
- Raises:
AssertionError – If n_workers is not a positive integer.
Examples
Correct Usage Examples:
Validators can be used to check if a file is correct before loading it for some modeling task. Below you will find a sample execution that should contain one correct file and one incorrect file. This results in the final tuple containing two sets. The first tuple denotes correctly validated files, whereas the second tuple denotes the files that should be skipped in modeling tasks.
Example using more workers than replays:
>>> validated_replays = validate_integrity_mp( ... list_of_replays=[ ... "./test/test_files/single_replay/test_replay.json", ... "./test/test_files/single_replay/test_bit_flip_example.json"], ... n_workers=1) >>> assert len(validated_replays[0]) == 1 >>> assert len(validated_replays[1]) == 1
Example using more workers than replays:
>>> validated_replays = validate_integrity_mp( ... list_of_replays=[ ... "./test/test_files/single_replay/test_replay.json", ... "./test/test_files/single_replay/test_bit_flip_example.json"], ... n_workers=8) >>> assert len(validated_replays[0]) == 1 >>> assert len(validated_replays[1]) == 1
Example showing passing an empty list to the valdation function:
>>> validated_replays = validate_integrity_mp( ... list_of_replays=[], ... n_workers=8) >>> assert len(validated_replays[0]) == 0 >>> assert len(validated_replays[1]) == 0
- validate_integrity_persist_mp(list_of_replays: list[pathlib.Path], n_workers: int, validation_file_path: pathlib.Path = Path('validator_file.json')) set[str]¶
Exposes the logic for validating replays using multiple processes. This function uses a validation file that persists the files which were previously checked.
- Parameters:
list_of_replays (list[Path]) – Specifies the list of filepaths to replays that are supposed to be validated.
n_workers (int) – Specifies the number of workers that will be used to validate the files.
validation_file_path (Path, optional) – Specifies the path to the validation file which will be read to obtain the files that should be included and files that should be skipped, by default Path(“validator_file.json”)
- Returns:
Returns a set of files that should be skipped in further processing.
- Return type:
set[str]
Examples
Persistent validators save the validation information to a specified filepath. Only the files that ought to be skipped are returned as a set from this function.
>>> from pathlib import Path >>> replays_to_skip = validate_integrity_persist_mp( ... list_of_replays=[ ... "test/test_files/single_replay/test_replay.json", ... "test/test_files/single_replay/test_bit_flip_example.json"], ... n_workers=1, ... validation_file_path=Path("validator_file.json")) >>> assert len(replays_to_skip) == 1