Reference for ultralytics/data/build.py
Note
This file is available at https://github.com/ultralytics/ultralytics/blob/main/ultralytics/data/build.py. If you spot a problem please help fix it by contributing a Pull Request 🛠️. Thank you 🙏!
ultralytics.data.build.InfiniteDataLoader
InfiniteDataLoader(*args: Any, **kwargs: Any)
Bases: DataLoader
Dataloader that reuses workers for infinite iteration.
This dataloader extends the PyTorch DataLoader to provide infinite recycling of workers, which improves efficiency for training loops that need to iterate through the dataset multiple times without recreating workers.
Attributes:
Name | Type | Description |
---|---|---|
batch_sampler |
_RepeatSampler
|
A sampler that repeats indefinitely. |
iterator |
Iterator
|
The iterator from the parent DataLoader. |
Methods:
Name | Description |
---|---|
__len__ |
Return the length of the batch sampler's sampler. |
__iter__ |
Create a sampler that repeats indefinitely. |
__del__ |
Ensure workers are properly terminated. |
reset |
Reset the iterator, useful when modifying dataset settings during training. |
Examples:
Create an infinite dataloader for training
>>> dataset = YOLODataset(...)
>>> dataloader = InfiniteDataLoader(dataset, batch_size=16, shuffle=True)
>>> for batch in dataloader: # Infinite iteration
>>> train_step(batch)
Source code in ultralytics/data/build.py
54 55 56 57 58 |
|
__del__
__del__()
Ensure that workers are properly terminated when the dataloader is deleted.
Source code in ultralytics/data/build.py
69 70 71 72 73 74 75 76 77 78 79 |
|
__iter__
__iter__() -> Iterator
Create an iterator that yields indefinitely from the underlying iterator.
Source code in ultralytics/data/build.py
64 65 66 67 |
|
__len__
__len__() -> int
Return the length of the batch sampler's sampler.
Source code in ultralytics/data/build.py
60 61 62 |
|
reset
reset()
Reset the iterator to allow modifications to the dataset during training.
Source code in ultralytics/data/build.py
81 82 83 |
|
ultralytics.data.build._RepeatSampler
_RepeatSampler(sampler: Any)
Sampler that repeats forever for infinite iteration.
This sampler wraps another sampler and yields its contents indefinitely, allowing for infinite iteration over a dataset without recreating the sampler.
Attributes:
Name | Type | Description |
---|---|---|
sampler |
sampler
|
The sampler to repeat. |
Source code in ultralytics/data/build.py
97 98 99 |
|
__iter__
__iter__() -> Iterator
Iterate over the sampler indefinitely, yielding its contents.
Source code in ultralytics/data/build.py
101 102 103 104 |
|
ultralytics.data.build.seed_worker
seed_worker(worker_id: int)
Set dataloader worker seed for reproducibility across worker processes.
Source code in ultralytics/data/build.py
107 108 109 110 111 |
|
ultralytics.data.build.build_yolo_dataset
build_yolo_dataset(
cfg,
img_path,
batch,
data,
mode="train",
rect=False,
stride=32,
multi_modal=False,
)
Build and return a YOLO dataset based on configuration parameters.
Source code in ultralytics/data/build.py
114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 |
|
ultralytics.data.build.build_grounding
build_grounding(
cfg, img_path, json_file, batch, mode="train", rect=False, stride=32
)
Build and return a GroundingDataset based on configuration parameters.
Source code in ultralytics/data/build.py
136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 |
|
ultralytics.data.build.build_dataloader
build_dataloader(
dataset, batch: int, workers: int, shuffle: bool = True, rank: int = -1
)
Create and return an InfiniteDataLoader or DataLoader for training or validation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset
|
Dataset
|
Dataset to load data from. |
required |
batch
|
int
|
Batch size for the dataloader. |
required |
workers
|
int
|
Number of worker threads for loading data. |
required |
shuffle
|
bool
|
Whether to shuffle the dataset. |
True
|
rank
|
int
|
Process rank in distributed training. -1 for single-GPU training. |
-1
|
Returns:
Type | Description |
---|---|
InfiniteDataLoader
|
A dataloader that can be used for training or validation. |
Examples:
Create a dataloader for training
>>> dataset = YOLODataset(...)
>>> dataloader = build_dataloader(dataset, batch=16, workers=4, shuffle=True)
Source code in ultralytics/data/build.py
157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 |
|
ultralytics.data.build.check_source
check_source(source)
Check the type of input source and return corresponding flag values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
source
|
str | int | Path | list | tuple | ndarray | Image | Tensor
|
The input source to check. |
required |
Returns:
Name | Type | Description |
---|---|---|
source |
str | int | Path | list | tuple | ndarray | Image | Tensor
|
The processed source. |
webcam |
bool
|
Whether the source is a webcam. |
screenshot |
bool
|
Whether the source is a screenshot. |
from_img |
bool
|
Whether the source is an image or list of images. |
in_memory |
bool
|
Whether the source is an in-memory object. |
tensor |
bool
|
Whether the source is a torch.Tensor. |
Examples:
Check a file path source
>>> source, webcam, screenshot, from_img, in_memory, tensor = check_source("image.jpg")
Check a webcam source
>>> source, webcam, screenshot, from_img, in_memory, tensor = check_source(0)
Source code in ultralytics/data/build.py
195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 |
|
ultralytics.data.build.load_inference_source
load_inference_source(
source=None,
batch: int = 1,
vid_stride: int = 1,
buffer: bool = False,
channels: int = 3,
)
Load an inference source for object detection and apply necessary transformations.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
source
|
str | Path | Tensor | Image | ndarray
|
The input source for inference. |
None
|
batch
|
int
|
Batch size for dataloaders. |
1
|
vid_stride
|
int
|
The frame interval for video sources. |
1
|
buffer
|
bool
|
Whether stream frames will be buffered. |
False
|
channels
|
int
|
The number of input channels for the model. |
3
|
Returns:
Type | Description |
---|---|
Dataset
|
A dataset object for the specified input source with attached source_type attribute. |
Examples:
Load an image source for inference
>>> dataset = load_inference_source("image.jpg", batch=1)
Load a video stream source
>>> dataset = load_inference_source("rtsp://example.com/stream", vid_stride=2)
Source code in ultralytics/data/build.py
242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 |
|