core

set of functions and classes used across this package and usable for other packages

Validation tools


source

is_type

 is_type (obj:Any, obj_type:type, raise_error:bool=False)

Validate that obj is of type obj_type. Raise error in the negative when raise_error is True

Type Default Details
obj Any object whose type to validate
obj_type type expected type for obj
raise_error bool False when True, raise a ValueError is obj is not of the right type
Returns bool True when obj is of the right type, False otherwise
is_type(obj='this is a string', obj_type=str)
True
is_type(obj=np.ones(shape=(2,2)), obj_type=np.ndarray)
True

source

validate_path

 validate_path (path:str|pathlib.Path, path_type:str='file',
                raise_error:bool=False)

Validate that path is a Path or str and points to a real file or directory

Type Default Details
path str | Path path to validate
path_type str file type of the target path: 'file', 'dir' or 'any'
raise_error bool False when True, raise a ValueError is path does not a file
Returns bool True when path is a valid path, False otherwise
path_file = Path('../data/img/IMG_001_512px.jpg')
validate_path(path_file)
True
validate_path(path_file, path_type='any')
True
path_dir = Path('../data')
validate_path(path_dir, path_type='dir')
True
validate_path(path_dir, path_type='any')
True
path_error = Path('../data/img/IIIMG_001_512px.jpg')
validate_path(path_error)
False

source

safe_path

 safe_path (path:str|pathlib.Path)

Return a Path object when given a valid path as a string or a Path, raise error otherwise

Type Details
path str | Path path to validate
Returns Path validated path returned as a pathlib.Path

Access key files and directories


source

get_config_value

 get_config_value (section:str, key:str,
                   path_to_config_file:pathlib.Path|str=None)

Returns the value corresponding to the key-value pair in the configuration file (configparser format)

When no path_to_config_file is provided, the function will try to find the file in: the system’s home, the parent directory of the current directory, and the Google drive directory mounted to the Colab environment.

Type Default Details
section str section in the configparser cfg file
key str key in the selected section
path_to_config_file Path | str None path to the cfg file
Returns Any the value corresponding to section>key>value

By defaults (path_to_config_file is None), it is assumed that the configuration file is located in the private-accross-accounts directory on google drive. If not, a path to the file (Path or str) must be provided.

The configuration file is expected to be in the format used by the standard module configparser documentation

    [DEFAULT]
    key = value

    [section_name]
    key = value

    [section_name]
    key = value
path2cfg = Path('../config-sample.cfg').resolve()
assert path2cfg.is_file(), f"{path2cfg} is not a file"

with open(path2cfg, 'r') as fp:
    print(fp.read())
[azure]
azure-api-key= dummy_api_key_for_azure

[kaggle]
kaggle_username = not_my_real_kaggle_name
kaggle_key = dummy_api_key_for_kaggle

[wandb]
api_key = dummy_api_key_for_wandb
value = get_config_value(section='azure', key='azure-api-key', path_to_config_file=path2cfg)
assert value == 'dummy_api_key_for_azure'
Using config file at /home/vtec/projects/ec-packages/ecutilities/config-sample.cfg
value = get_config_value(section='kaggle', key='kaggle_username', path_to_config_file=path2cfg)
assert value == 'not_my_real_kaggle_name'
Using config file at /home/vtec/projects/ec-packages/ecutilities/config-sample.cfg
value = get_config_value(section='wandb', key='api_key', path_to_config_file=path2cfg)
assert value == 'dummy_api_key_for_wandb'
Using config file at /home/vtec/projects/ec-packages/ecutilities/config-sample.cfg
value = get_config_value(section='dummy', key='dummy-user-id')
assert value.startswith('dummy-userID-from')
Using config file at /home/vtec/config-api-keys.cfg

Setup utilities


source

CurrentMachine

 CurrentMachine (*args, **kwargs)

Callable class to represent info on the current machine. When called, instance return a dict all attrs:

  • os
  • home path
  • is_local, is_colab, is_kaggle
  • p2config path to the config file
  • package_root path to the root of the package root directory
machine = CurrentMachine()
machine()
{'os': 'linux',
 'home': Path('/home/vtec'),
 'is_local': True,
 'is_colab': False,
 'is_kaggle': False,
 'p2config': Path('/home/vtec/.ecutilities/ecutilities.cfg'),
 'package_root': Path('/home/vtec/projects/ec-packages/ecutilities')}
machine.is_local, machine.is_colab, machine.is_kaggle
(False, False, False)

This machine is not registered a local machine, but is also not running in the cloud. We should register it as a local machine with register_as_local


source

CurrentMachine.register_as_local

 CurrentMachine.register_as_local ()

Update the configuration file to register the machine as local machine

Use this method to register the current machine as local machine. Only needs to be used once on a machine. Do not use on cloud VMs

machine.register_as_local()
machine.is_local, machine.is_colab, machine.is_kaggle
(True, False, False)

Technical Note:

The configuration file is located at a standard location, which varies depending on the OS:

  • Windows:
    • home is C:\Users\username
    • application data in C:\Users\username\AppData/Local/... or C:\Users\username\AppData\Roaming\... (see StackExchange)
    • application also can be loaded under a dedicated directory under C:\Users\username like C:\Users\username\.conda\...
  • Linux:
    • home is /home/username
    • application data in a file or dedicated directory /home/username/ s.a.:
      • file in home directory, e.g. .gitconfig
      • file in an application dedicated directory, e.g. /home/username/.conda/...

ecutilities places the configuration file in a dedicated directory in the home directory: - C:\Users\username\.ecutilities\ecutilities.cfg - /home/username/.ecutilities/ecutilities.cfg

Retrieve the OS:

sys.platform
win32           with Windows
linux           with linux
darwin          with macOs

Accessing the correct path depending on the OS:

Path().home().absolute()
WindowsPath('C:/Users/username') with Windows
Path('/home/username')           with linux

source

ProjectFileSystem

 ProjectFileSystem (*args, **kwargs)

Class representing the project file system and key subfolders (data, nbs, src)

Set paths to key directories, according to whether the code is running locally or in the cloud. Give access to path to these key folders and information about the environment.

pfs = ProjectFileSystem()
pfs()
{'os': 'linux',
 'home': Path('/home/vtec'),
 'is_local': True,
 'is_colab': False,
 'is_kaggle': False,
 'p2config': Path('/home/vtec/.ecutilities/ecutilities.cfg'),
 'package_root': Path('/home/vtec/projects/ec-packages/ecutilities')}

source

ProjectFileSystem.create_project_file_system

 ProjectFileSystem.create_project_file_system (p2project_root,
                                               overwrite=False)

Create a standard project file system with the following structure:

    project_root
        |--- data   all data files
        |--- nbs    all notebooks for work and experiments
        |--- src    all scripts and code
Type Default Details
p2project_root path to project root, where all subfolder will be located
overwrite bool False overwrite current folders if they exist when True (not implemented yet)
pfs.create_project_file_system(Path('/home/vtec/projects/ec-packages/ecutilities'))
/home/vtec/projects/ec-packages/ecutilities/data
/home/vtec/projects/ec-packages/ecutilities/nbs
/home/vtec/projects/ec-packages/ecutilities/src
Created project file system in /home/vtec/projects/ec-packages/ecutilities

File structure exploration


source

files_in_tree

 files_in_tree (path:str|pathlib.Path, pattern:str|None=None)

List files in directory and its subdiretories, print tree starting from parent directory

Type Default Details
path str | Path path to the directory to scan
pattern str | None None pattern (glob style) to match in file name to filter the content
p2dir = Path('').resolve()
print(p2dir, '\n')

files = files_in_tree(p2dir)
print(f"List of {len(files)} files when unfiltered")
/home/vtec/projects/ec-packages/ecutilities/nbs-dev 

ecutilities
  |--nbs-dev
  |    |--0_02_plotting.ipynb (0)
  |    |--2_01_image_utils.ipynb (1)
  |    |--1_01_eda_stats_utils.ipynb (2)
  |    |--0_01_ipython.ipynb (3)
  |    |--0_00_core.ipynb (4)
  |    |--.last_checked (5)
  |    |--sidebar.yml (6)
  |    |--1_02_ml.ipynb (7)
  |    |--index.ipynb (8)
  |    |--nbdev.yml (9)
  |    |--9_01_dev_utils.ipynb (10)
  |    |--styles.css (11)
  |    |--_quarto.yml (12)
  |    |--.ipynb_checkpoints
  |    |    |--0_02_plotting-checkpoint.ipynb (13)
  |    |    |--9_01_dev_utils-checkpoint.ipynb (14)
  |    |    |--0_01_ipython-checkpoint.ipynb (15)
  |    |    |--0_00_core-checkpoint.ipynb (16)
  |    |    |--1_01_eda_stats_utils-checkpoint.ipynb (17)
  |    |    |--index-checkpoint.ipynb (18)
  |    |    |--2_01_image_utils-checkpoint.ipynb (19)
  |    |    |--1_02_ml-checkpoint.ipynb (20)
List of 21 files when unfiltered

Use pattern to filter the paths to return (using glob syntax)

files = files_in_tree(p2dir, pattern='ipynb')
print(f"List of {len(files)} files when filtered")
ecutilities
  |--nbs-dev
  |    |--0_02_plotting.ipynb (0)
  |    |--2_01_image_utils.ipynb (1)
  |    |--1_01_eda_stats_utils.ipynb (2)
  |    |--0_01_ipython.ipynb (3)
  |    |--0_00_core.ipynb (4)
  |    |--1_02_ml.ipynb (5)
  |    |--index.ipynb (6)
  |    |--9_01_dev_utils.ipynb (7)
  |    |--.ipynb_checkpoints
  |    |    |--0_02_plotting-checkpoint.ipynb (8)
  |    |    |--9_01_dev_utils-checkpoint.ipynb (9)
  |    |    |--0_01_ipython-checkpoint.ipynb (10)
  |    |    |--0_00_core-checkpoint.ipynb (11)
  |    |    |--1_01_eda_stats_utils-checkpoint.ipynb (12)
  |    |    |--index-checkpoint.ipynb (13)
  |    |    |--2_01_image_utils-checkpoint.ipynb (14)
  |    |    |--1_02_ml-checkpoint.ipynb (15)
List of 16 files when filtered

source

path_to_parent_dir

 path_to_parent_dir (pattern:str, path:str|pathlib.Path|None=None)

Climb directory tree up to a directory starting with pattern, and return its path.

  • When no directory is found in the tree starting with pattern, return the current directory path.

  • It is possible to pass a path as starting path to climb from.

Type Default Details
pattern str pattern to identify the parent directory
path str | Path | None None optional path from where to seek for parent directory
Returns Path path of the parent directory
p2dir = path_to_parent_dir('nbs')
assert 'nbs-dev' in p2dir.parts and 'nbs' not in p2dir.parts
p2dir
Path('/home/vtec/projects/ec-packages/ecutilities/nbs-dev')
p2dir = path_to_parent_dir('nbs', Path('../nbs/sandbox.ipynb').resolve())
assert 'nbs' in p2dir.parts and 'nbs-dev' not in p2dir.parts
p2dir
Path('/home/vtec/projects/ec-packages/ecutilities/nbs')
p2dir = path_to_parent_dir('not-in-tree').resolve()
assert p2dir == Path().absolute()
p2project_root = path_to_parent_dir('ecutilities')
assert 'ecutilities' in p2project_root.parts and 'nbs' not in p2project_root.parts
p2project_root
Path('/home/vtec/projects/ec-packages/ecutilities')