Parallel Domain SDK¶
Introduction¶
The Parallel Domain SDK (or short: PD SDK) allows the community to access Parallel Domain’s synthetic data as Python objects.
The PD SDK can also decode different data formats into its common Python object representation (more public dataset formats will be supported in the future):
Currently, local file system and s3 buckets are supported as dataset locations for decoding. See AnyPath documentation for our file system abstraction.
PD SDK is designed to serve the following use cases:
load data in ML data pipelines from local or cloud storage directly into memory.
encode data into different dataset formats. Currently, it’s possible to convert into DGP and DGPv1 format.
generate data in PD’s Data Lab.
Example: Load and visualize data¶
To run show_sensor_frame
you need to install the visualization
dependencies with one of the methods described in Installation.
Next, you can decode a dataset located at a given local or S3 path using the decode_dataset
method.
To quickly access all sensor frames in a dataset, use the sensor_frame_pipeline
method for that dataset. This method yields all the sensor frames in order, so the outer loop iterates through scenes, followed by frames, and finally sensors within each frame.
from paralleldomain.decoding.helper import decode_dataset
from paralleldomain.model.annotation import AnnotationTypes
from paralleldomain.visualization.model_visualization import show_frame
pd_dataset = decode_dataset(dataset_path="s3://bucket/with/dgp/dataset", dataset_format="dgp")
for sensor_frame, frame, scene in pd_dataset.sensor_frame_pipeline():
show_frame(frame=frame, annotations_to_show=[AnnotationTypes.BoundingBoxes2D])
Or
from paralleldomain.decoding.helper import decode_dataset
from paralleldomain.visualization.model_visualization import show_dataset
pd_dataset = decode_dataset(dataset_path="s3://bucket/with/dgp/dataset", dataset_format="dgp")
show_dataset(pd_dataset=pd_dataset)
CLI Visualization¶
You can also use the cli to visualize datasets stored locally or on s3. To do so, you need to install the visualization
dependencies with one of the methods described in Installation.
a stored dataset can be visualized like this:
pd visualize s3://bucket/with/dgp/dataset --dataset_format dgp
if you only want to show Depth annotations of the first scene you can run:
pd visualize s3://bucket/with/dgp/dataset -f dgp --annotations Depth --scene_names scene_000000
For more information on the cli run:
pd visualize -h
For more examples make sure to check out our Documentation.
Documentation¶
Tutorials¶
There are several tutorials available covering common use cases. Those can be found under Documentation -> Tutorials. In case you are missing an important tutorial, feel free to request it via a GitHub Issue or create a PR, in case you have written one already yourself.
API Reference¶
Public classes / methods / properties are annotated with Docstrings. The compiled API Reference can be found under Documentation -> API Reference
Installation¶
Supported Python Versions: 3.8, 3.9, 3.10, 3.11
Choose one of the following installation methods for the Python package from GitHub based on your use case:
Quick Installation¶
For users who just want to use the library without editing it, this method is recommended. It quickly installs the package from GitHub, allowing you to access its functionalities without modifying the source code. Run the following command:
pip install "paralleldomain @ git+https://github.com/parallel-domain/pd-sdk.git@main#egg=paralleldomain"
Developer Setup¶
This method is suitable for developers who need to modify the source code, contribute to the project, or parallelize the build process. With this setup, you can make changes to the library, test new features, and experiment with the codebase. Follow these steps:
# Clone latest PD SDK release
git clone https://github.com/parallel-domain/pd-sdk.git
# Change directory
cd pd-sdk
# Optional: Parallelize build process for dependencies using gcc, e.g., `opencv-python-headless`
export MAKEFLAGS="-j$(nproc)"
# Install PD SDK from local clone
pip install .
This method allows you to directly work on the library’s source code, which is not possible with the quick installation method.
Install Extras¶
These optional extras can be installed to enhance the functionality of PD SDK based on your specific needs:
data_lab
: This extra includes dependencies for Data Lab, PD’s synthetic data generation platform (includesvisualization
).pip install -e data_lab
visualization
: Install this extra to includeopencv
with GUI components, helpful to visualize data.pip install -e opencv
dev
: The development extra contains dependencies for developers, such as testing tools, pre-commit hooks, and other utilities that assist in maintaining code quality and ensuring a smooth development process. This is recommended for users who plan to contribute to the project or work extensively with the source code.pip install -e dev
Testing¶
Before running pytest
you need to make sure to have its package installed. To do this, add the dev
install extra as described in Install Extras.
Go to the root folder of your pd-sdk repo and run:
pytest test_paralleldomain
If you’d like to run tests for Data Lab, make sure that your PD_CLIENT_CREDENTIALS_PATH_ENV
, PD_CLIENT_STEP_API_KEY_ENV
and PD_CLIENT_ORG_ENV
environment variables are set, and you have the data_lab
install extra set up.
Otherwise, those tests will be skipped. You can find more details on how to set those in Data Lab Quickstart