The AnyPath Object

PD SDK’s AnyPath objects are a layer build on top of Python’s pathlib.Path as well as S3Path. As a result, it accepts both local file system paths and s3 addresses. AnyPath has most common methods known from pathlib.Path implemented and fitted towards also working with s3 addresses, when applicable.

Whenever using any file or directory references in PD SDK, you should use an AnyPath instance to assure correct behaviour. In some methods, PD SDK accepts addresses in str format, then the conversion to AnyPath is handled internally for you.

Simply import AnyPath as follows:

[1]:
from paralleldomain.utilities.any_path import AnyPath

Instantiate AnyPath for different addresses

For local filesystem references, we can use either an absolute or a relative path.

[2]:
absolute_path = "/home/nisseknudsen/Data/testset_dgp"
absolute_anypath = AnyPath(path=absolute_path)

relative_path = "testset_dgp"
relative_anypath = AnyPath(path=relative_path)

s3_path = "s3://pd-sdk-c6b4d2ea-0301-46c9-8b63-ef20c0d014e9/testset_dgp/"
s3_anypath = AnyPath(path=s3_path)

print(absolute_anypath)
print(relative_anypath)
print(s3_anypath)
/home/nisseknudsen/Data/testset_dgp
testset_dgp
s3://pd-sdk-c6b4d2ea-0301-46c9-8b63-ef20c0d014e9/testset_dgp

Next, let’s see how what files are present in each location.

Absolute + S3 Paths

For the absolute and s3 path, we can simply go ahead and use the .iterdir() method to iterate over directory contents:

[3]:
content_absolute = []
for fp_abs in absolute_anypath.iterdir():
    content_absolute.append(fp_abs)

content_s3 = []
for fp_s3 in s3_anypath.iterdir():
    content_s3.append(fp_s3)

We capture all contents for each AnyPath object in an array and can now print it out.

[4]:
print("Content Absolute Path:")
print(*content_absolute, sep="\n")

print("\nContent S3 Path:")
print(*content_s3, sep="\n")
Content Absolute Path:
/home/nisseknudsen/Data/testset_dgp/pd-sdk_test_set
/home/nisseknudsen/Data/testset_dgp/scene_dataset.json

Content S3 Path:
s3://pd-sdk-c6b4d2ea-0301-46c9-8b63-ef20c0d014e9/testset_dgp/pd-sdk_test_set
s3://pd-sdk-c6b4d2ea-0301-46c9-8b63-ef20c0d014e9/testset_dgp/scene_dataset.json

As it turns out, both directories have the same content, just once on local filesystem and once on an s3 bucket. We can also observe that the return types are themselves AnyPath objects again.

[5]:
assert isinstance(content_absolute[0], AnyPath)
print(f"Type: {type(content_absolute[0])}")
Type: <class 'paralleldomain.utilities.any_path.AnyPath'>

Relative Paths

For relative paths, we need to consider the current working directory (cwd) of our Python environment.

[6]:
import os

print(os.getcwd())
/home/nisseknudsen/Development/nisse_laptop_linux

As we can see, the cwd is currently not set to the expected (/home/nisseknudsen/Data) parent directory. In fact, calling .iterdir() now will throw a FileNotFoundError, because no sub-directory can be found with such name.

[7]:
try:
    for fp_rel in relative_anypath.iterdir():
        print(fp_rel)
except FileNotFoundError:
    print(f"Nice try!\nUnfortunately, {os.getcwd()}/{relative_anypath} does not exist.")
Nice try!
Unfortunately, /home/nisseknudsen/Development/nisse_laptop_linux/testset_dgp does not exist.

As a solution, we could either provide an absolute path as described above, or we can change the cwd of our Python environment appropriately. As a start, let’s convert the relative AnyPath object to an absolute one.

[8]:
parent_path = "/home/nisseknudsen/Data"
parent_anypath = AnyPath(parent_path)

absolute_concatenated_path = parent_anypath / relative_anypath

As you can see, the __truediv__ operator works with AnyPath the same way as with pathlib.Path objects. Now we can compare that the contents of the absolute path equal the content of the concatenated path. Since we receive AnyPath objects, we can not compare them directly but need to compare the string representations.

[9]:
content_relative = []
for fp_abs in absolute_concatenated_path.iterdir():
    content_relative.append(fp_abs)

# cast each `AnyPath` to `str` and generated `list` for comparison.
list(map(str, content_absolute)) == list(map(str, content_relative))
[9]:
True

Next, let’s change the cwd of the Python environment.

Note: This can have several side effects for other packages that rely on os.getcwd(). Please handle with care.

[10]:
os.chdir(parent_path)
print(list(relative_anypath.iterdir()))
[testset_dgp/pd-sdk_test_set, testset_dgp/scene_dataset.json]

The simple print statement shows the expected files, just by using now the relative_anypath object without having to concatenate any absolute path information beforehand.

File Access

In the .iterdir() example above we have received new AnyPath objects that sometimes point at files we want to access. We can also construct directly a file reference through AnyPath if we know the target.

[11]:
scene_file = absolute_anypath / "scene_dataset.json"
assert scene_file.exists()
print(f"{scene_file} found!")
/home/nisseknudsen/Data/testset_dgp/scene_dataset.json found!

Files buffers are accessed through the instance method .open(). The API Reference docs provide more detail into all available parameters, but most importantly it accepts the mode for read/write and text/bytes. In this case, we want to load the scene_dataset.json file and deserialize it into a dict.

[12]:
import json

with scene_file.open(mode="r") as fp:
    scene_dict = json.load(fp)

print(scene_dict.keys())
dict_keys(['metadata', 'scene_splits'])

The printed keys are correct. Let’s add another key and save the contents to a new file.

[13]:
import tempfile
_, out_file = tempfile.mkstemp()
out_file = AnyPath(out_file)

print(out_file)

scene_dict["metadata"]["foo"] = "bar"

with out_file.open("w") as fp:
    json.dump(scene_dict, fp, indent=2)
/tmp/tmpcfudqj12

Last but not least, let’s check if the file was written correctly by doing a simple print and using Python’s std IO library.

[14]:
with open(str(out_file), "r") as fp:
    print(fp.read())

{
  "metadata": {
    "name": "DefaultDatasetName",
    "version": "",
    "creation_date": "2021-06-22T15:16:21.317Z",
    "creator": "",
    "description": "",
    "origin": "INTERNAL",
    "available_annotation_types": [
      0,
      1,
      2,
      3,
      4,
      5,
      6,
      10,
      7,
      8,
      9
    ],
    "foo": "bar"
  },
  "scene_splits": {
    "0": {
      "filenames": [
        "pd-sdk_test_set/scene_b16cbd4723f626cf87b96daab6b0efda68ca0454.json"
      ]
    }
  }
}