vllm.v1.executor.uniproc_executor ¶
ExecutorWithExternalLauncher ¶
Bases: UniProcExecutor
An executor that uses external launchers to launch engines, specially designed for torchrun-compatible launchers, for offline inference with tensor parallelism.
see https://github.com/vllm-project/vllm/issues/11400 for the motivation, and examples/offline_inference/torchrun_example.py for the usage example.
The key idea: although it is tensor-parallel inference, we only create one worker per executor, users will launch multiple engines with torchrun-compatible launchers, and all these engines work together to process the same prompts. When scheduling is deterministic, all the engines will generate the same outputs, and they don't need to synchronize the states with each other.
Source code in vllm/v1/executor/uniproc_executor.py
_distributed_args ¶
Source code in vllm/v1/executor/uniproc_executor.py
_init_executor ¶
Initialize the worker and load the model.
Source code in vllm/v1/executor/uniproc_executor.py
determine_available_memory ¶
Source code in vllm/v1/executor/uniproc_executor.py
UniProcExecutor ¶
Bases: Executor
Source code in vllm/v1/executor/uniproc_executor.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 | |
_distributed_args ¶
Return (distributed_init_method, rank, local_rank).
Source code in vllm/v1/executor/uniproc_executor.py
_init_executor ¶
Initialize the worker and load the model.
Source code in vllm/v1/executor/uniproc_executor.py
check_health ¶
collective_rpc ¶
collective_rpc(
method: str | Callable,
timeout: float | None = None,
args: tuple = (),
kwargs: dict | None = None,
non_block: bool = False,
single_value: bool = False,
) -> Any | list[Any] | Future[Any | list[Any]]
Source code in vllm/v1/executor/uniproc_executor.py
execute_model ¶
execute_model(
scheduler_output: SchedulerOutput,
non_block: bool = False,
) -> (
ModelRunnerOutput
| None
| Future[ModelRunnerOutput | None]
)
Source code in vllm/v1/executor/uniproc_executor.py
reinitialize_distributed ¶
reinitialize_distributed(
reconfig_request: ReconfigureDistributedRequest,
) -> None
Source code in vllm/v1/executor/uniproc_executor.py
sample_tokens ¶
sample_tokens(
grammar_output: GrammarOutput | None,
non_block: bool = False,
) -> (
ModelRunnerOutput
| None
| Future[ModelRunnerOutput | None]
)
Source code in vllm/v1/executor/uniproc_executor.py
shutdown ¶
take_draft_token_ids ¶
take_draft_token_ids() -> DraftTokenIds | None