core

set of functions and classes used across this package and usable for other packages

List of fastcore object to load to use once eccore is imported

from basics:

from foundation:

from Utility functions:

from Meta:

Data structures

Classes to handle data structure more easily

with open('data-dev/jsondict-test.json', 'r') as fp:
    d =  json.load(fp)
d

d.items()
dict_items([('b', 2), ('c', 3), ('d', 4)])

source

JsonDict

 JsonDict (p2json:str|pathlib.Path, dictionary:Optional[dict]=None)

*Dictionary whose current value is mirrored in a json file and can be initated from a json file

JsonDict requires a path to json file at creation. An optional dict can be passed as argument.

Behavior at creation:

  • JsonDict(p2json, dict) will create a JsonDict with key-values from dict, and mirrored in p2json
  • JsonDict(p2json) will create a JsonDict with empty dictionary and load json content if file exists

Once created, JsonDict instances behave exactly as a dictionary*

Type Default Details
p2json str | pathlib.Path path to the json file to mirror with the dictionary
dictionary Optional None optional dictionary to initialize the JsonDict

Create a new dictionary mirrored to a JSON file:

d = {'a': 1, 'b': 2, 'c': 3}
p2json = Path('data-dev/jsondict-test.json')
jsond = JsonDict(p2json, d)
jsond
{'a': 1, 'b': 2, 'c': 3}
dict mirrored in /home/vtec/projects/ec-packages/eccore/nbs-dev/data-dev/jsondict-test.json

Once created, the JsonFile instance behaves exactly like a dictionary, with the added benefit that any change to the dictionary is automatically saved to the JSON file.

jsond['a'], jsond['b'], jsond['c']
(1, 2, 3)
for k, v in jsond.items():
    print(f"key: {k}; value: {v}")
key: a; value: 1
key: b; value: 2
key: c; value: 3

Adding or removing a value from the dictionary works in the same way as for a normal dictionary. But the json file is automatically updated.

jsond['d'] = 4
jsond
{'a': 1, 'b': 2, 'c': 3, 'd': 4}
dict mirrored in /home/vtec/projects/ec-packages/eccore/nbs-dev/data-dev/jsondict-test.json
with open(p2json, 'r') as fp:
    print(fp.read())
{
    "a": 1,
    "b": 2,
    "c": 3,
    "d": 4
}
del jsond['a']
jsond
{'b': 2, 'c': 3, 'd': 4}
dict mirrored in /home/vtec/projects/ec-packages/eccore/nbs-dev/data-dev/jsondict-test.json
with open(p2json, 'r') as fp:
    print(fp.read())
{
    "b": 2,
    "c": 3,
    "d": 4
}

Validation functions


source

is_type

 is_type (obj:Any, obj_type:type, raise_error:bool=False)

Validate that obj is of type obj_type. Raise error in the negative when raise_error is True

Type Default Details
obj Any object whose type to validate
obj_type type expected type for obj
raise_error bool False when True, raise a ValueError is obj is not of the right type
Returns bool True when obj is of the right type, False otherwise
is_type(obj='this is a string', obj_type=str)
True
is_type(obj=np.ones(shape=(2,2)), obj_type=np.ndarray)
True

Path validation

Functions to ensure path are properly formated and point to a real file or directory.


source

validate_path

 validate_path (path:str|pathlib.Path, path_type:str='file',
                raise_error:bool=False)

Validate that path is a Path or str and points to a real file or directory

Type Default Details
path str | pathlib.Path path to validate
path_type str file type of the target path: 'file', 'dir' or 'any'
raise_error bool False when True, raise a ValueError is path does not a file
Returns bool True when path is a valid path, False otherwise
path_file = Path('data-dev/jsondict-test.json')
validate_path(path_file)
True
validate_path(path_file, path_type='any')
True
path_dir = Path('../data')
validate_path(path_dir, path_type='dir')
True
validate_path(path_dir, path_type='any')
True
path_error = Path('../data/img/IIIMG_001_512px.jpg')
validate_path(path_error)
False

source

safe_path

 safe_path (path:str|pathlib.Path)

*Return a Path object when given a valid path as a str or a Path, raise error otherwise

Note: This function does not check whether the file or directory exists.*

Type Details
path str | pathlib.Path path to validate
Returns Path validated path returned as a pathlib.Path

Access key files and directories


source

get_config_value

 get_config_value (section:str, key:str,
                   path_to_config_file:Union[pathlib.Path,str,NoneType]=No
                   ne)

*Returns the value corresponding to the key-value pair in the configuration file (configparser format)

When no path_to_config_file is provided, the function will try to find the file in: the system’s home, the parent directory of the current directory, and the Google drive directory mounted to the Colab environment.*

Type Default Details
section str section in the configparser cfg file
key str key in the selected section
path_to_config_file Union None path to the cfg file
Returns Any the value corresponding to section>key>value

By defaults (path_to_config_file is None), it is assumed that the configuration file is located in: - the local package config directory (home/.eccore/) - the working directory - the folder above the working directory - the private-accross-accounts directory on google drive.

File names are expected to be either config-api-keys.cfg or config-sample.cfg.

If not, a path to the file (Path or str) must be provided.

The configuration file is expected to be in the format used by the standard module configparser documentation

    [DEFAULT]
    key = value

    [section_name]
    key = value

    [section_name]
    key = value
get_config_value(section="github", key="git_name")
found /home/vtec/projects/ec-packages/eccore/config-api-keys.cfg
Using config file at /home/vtec/projects/ec-packages/eccore/config-api-keys.cfg
'Etienne Charlier'
path2cfg = Path('../config-sample.cfg').resolve()
assert path2cfg.is_file(), f"{path2cfg} is not a file"
print(path2cfg.absolute())

with open(path2cfg, 'r') as fp:
    print(fp.read())
/home/vtec/projects/ec-packages/eccore/config-sample.cfg
[azure]
azure-api-key= dummy_api_key_for_azure

[github]
git_name = not_my_real_github_name
git_email = not_my_real_git_email
github_username = not_my_real_git_username

[kaggle]
kaggle_username = not_my_real_kaggle_name
kaggle_key = dummy_api_key_for_kaggle

[wandb]
api_key = dummy_api_key_for_wandb
value = get_config_value(section='azure', key='azure-api-key', path_to_config_file=path2cfg)
assert value == 'dummy_api_key_for_azure'
Using config file at /home/vtec/projects/ec-packages/eccore/config-sample.cfg
value = get_config_value(section='kaggle', key='kaggle_username', path_to_config_file=path2cfg)
assert value == 'not_my_real_kaggle_name'
Using config file at /home/vtec/projects/ec-packages/eccore/config-sample.cfg
value = get_config_value(section='wandb', key='api_key', path_to_config_file=path2cfg)
assert value == 'dummy_api_key_for_wandb'
Using config file at /home/vtec/projects/ec-packages/eccore/config-sample.cfg
value = get_config_value(section='dummy', key='dummy-user-id')
assert value.startswith('dummy-userID-from')
found /home/vtec/projects/ec-packages/eccore/config-api-keys.cfg
Using config file at /home/vtec/projects/ec-packages/eccore/config-api-keys.cfg

Setup utilities


source

CurrentMachine

 CurrentMachine (*args, **kwargs)

*Callable class representing the current machine. When called, instance return a dict all attrs:

  • os: the operating system running on the machine
  • home: path to home on the machine
  • is_local, is_colab, is_kaggle: whether the machine is running locally or not
  • p2config: path to the config file
  • package_root: path to the package root directory

CurrentMachine is a singleton class.*

machine = CurrentMachine()
machine()
{'os': 'linux',
 'home': Path('/home/vtec'),
 'is_local': True,
 'is_colab': False,
 'is_kaggle': False,
 'p2config': Path('/home/vtec/.ecutilities/ecutilities.cfg'),
 'package_root': Path('/home/vtec/projects/ec-packages/eccore')}
machine.is_local, machine.is_colab, machine.is_kaggle
(False, False, False)

This machine is not registered a local machine, but is also not running in the cloud. We should register it as a local machine with register_as_local


source

CurrentMachine.register_as_local

 CurrentMachine.register_as_local ()

Update the configuration file to register the machine as local machine

Use this method to register the current machine as local machine. Only needs to be used once on a machine. Do not use on cloud VMs

machine.register_as_local()
machine.is_local, machine.is_colab, machine.is_kaggle
(True, False, False)

Technical Note:

The configuration file is located at a standard location, which varies depending on the OS:

  • Windows:
    • home is C:\Users\username
    • application data in C:\Users\username\AppData/Local/... or C:\Users\username\AppData\Roaming\... (see StackExchange)
    • application also can be loaded under a dedicated directory under C:\Users\username like C:\Users\username\.conda\...
  • Linux:
    • home is /home/username
    • application data in a file or dedicated directory /home/username/ s.a.:
      • file in home directory, e.g. .gitconfig
      • file in an application dedicated directory, e.g. /home/username/.conda/...

ecutilities places the configuration file in a dedicated directory in the home directory: - C:\Users\username\.ecutilities\ecutilities.cfg - /home/username/.ecutilities/ecutilities.cfg

Retrieve the OS:

sys.platform
win32           with Windows
linux           with linux
darwin          with macOs

Accessing the correct path depending on the OS:

Path().home().absolute()
WindowsPath('C:/Users/username') with Windows
Path('/home/username')           with linux

source

ProjectFileSystem

 ProjectFileSystem (*args, **kwargs)

*Class representing the project file system and key subfolders (data, nbs, src)

Set paths to key directories, according to whether the code is running locally or in the cloud. Give access to path to these key folders and information about the environment.*

pfs = ProjectFileSystem()
pfs()
{'os': 'linux',
 'home': Path('/home/vtec'),
 'is_local': True,
 'is_colab': False,
 'is_kaggle': False,
 'p2config': Path('/home/vtec/.ecutilities/ecutilities.cfg'),
 'package_root': Path('/home/vtec/projects/ec-packages/eccore')}

source

ProjectFileSystem.create_project_file_system

 ProjectFileSystem.create_project_file_system (p2project_root,
                                               overwrite=False)

*Create a standard project file system with the following structure:

    project_root
        |--- data   all data files
        |--- nbs    all notebooks for work and experiments
        |--- src    all scripts and code
```*

|    | **Type** | **Default** | **Details** |
| -- | -------- | ----------- | ----------- |
| p2project_root |  |  | path to project root, where all subfolder will be located |
| overwrite | bool | False | overwrite current folders if they exist when True (not implemented yet) |


::: {#efcea5b2 .cell}
``` {.python .cell-code}
pfs.create_project_file_system(Path('/home/vtec/projects/ec-packages/eccore'))
/home/vtec/projects/ec-packages/eccore/data
/home/vtec/projects/ec-packages/eccore/nbs
/home/vtec/projects/ec-packages/eccore/src
Created project file system in /home/vtec/projects/ec-packages/eccore

:::

Logging setup and functions


source

setup_logging

 setup_logging (logfile:pathlib.Path|None=None)

Setup logging to console and to file if logfile is not None


source

logthis

 logthis (*args)

Logs all elements passed to logs


source

monitor_fn

 monitor_fn (fn)

Highlights when function in entered to and exited from

After setting up the logging, it is easy to create log entries:

p2log = pfs.package_root / 'nbs-dev/data-dev/dev.log'
setup_logging(p2log)
Logging to console and to /home/vtec/projects/ec-packages/eccore/nbs-dev/data-dev/dev.log.
Logging setup finished
logging.info('Logging manually as info, only shows in logfile')
logging.warning('Logging manually as warning, shows in logfile and console')

logthis('Using log function, as info, only shows in the log file')
2025-04-27 20:45:42: Logging manually as warning, shows in logfile and console

See logfile content:

if p2log.exists():
    print('Log file content:')
    with open(p2log, 'r') as f:
        print(''.join(f.readlines()))
Log file content:
2025-04-27 20:45:42: Logging manually as info, only shows in logfile
2025-04-27 20:45:42: Logging manually as warning, shows in logfile and console
2025-04-27 20:45:42: Using log function, as info, only shows in the log file

Create a function decorated with @monitor_fn to monitor function calls, i.e. when function in entered and exited.

@monitor_fn
def a_function(a,b):
    """Test functions to add two numbers"""
    return a + b

print(f"function output is {a_function(1,2)}")
print(f"")
function output is 3
if p2log.exists():
    print('Log file content:')
    with open(p2log, 'r') as f:
        print(''.join(f.readlines()))

    p2log.unlink()
Log file content:
2025-04-27 20:45:42: Logging manually as info, only shows in logfile
2025-04-27 20:45:42: Logging manually as warning, shows in logfile and console
2025-04-27 20:45:42: Using log function, as info, only shows in the log file
2025-04-27 20:46:59: Entering `a_function`
2025-04-27 20:46:59: Exiting  `a_function`

File structure exploration


source

files_in_tree

 files_in_tree (path:str|pathlib.Path, pattern:str|None=None)

List files in directory and its subdiretories, print tree starting from parent directory

Type Default Details
path str | pathlib.Path path to the directory to scan
pattern str | None None pattern (glob style) to match in file name to filter the content
p2dir = Path('').resolve()
print(p2dir, '\n')

files = files_in_tree(p2dir)
print(f"List of {len(files)} files when unfiltered")
/home/vtec/projects/ec-packages/eccore/nbs-dev 

eccore
  |--nbs-dev
  |    |--0_02_plotting.ipynb (0)
  |    |--0_01_ipython.ipynb (1)
  |    |--0_00_core.ipynb (2)
  |    |--.last_checked (3)
  |    |--sidebar.yml (4)
  |    |--index.ipynb (5)
  |    |--nbdev.yml (6)
  |    |--9_01_dev_utils.ipynb (7)
  |    |--styles.css (8)
  |    |--_quarto.yml (9)
  |    |--data-dev
  |    |    |--jsondict-test.json (10)
  |    |    |--ten-blobs-6-cols-clusters.npy (11)
  |    |    |--ten-blobs-6-cols-y.npy (12)
  |    |    |--ten-blobs-6-cols-X.npy (13)
List of 14 files when unfiltered

Use pattern to filter the paths to return (using glob syntax)

files = files_in_tree(p2dir, pattern='ipynb')
print(f"List of {len(files)} files when filtered")
eccore
  |--nbs-dev
  |    |--0_02_plotting.ipynb (0)
  |    |--0_01_ipython.ipynb (1)
  |    |--0_00_core.ipynb (2)
  |    |--index.ipynb (3)
  |    |--9_01_dev_utils.ipynb (4)
  |    |--data-dev
List of 5 files when filtered

source

path_to_parent_dir

 path_to_parent_dir (pattern:str, path:str|pathlib.Path|None=None)

*Climb directory tree up to a directory starting with pattern, and return its path.

  • When no directory is found in the tree starting with pattern, return the current directory path.

  • It is possible to pass a path as starting path to climb from.*

Type Default Details
pattern str pattern to identify the parent directory
path str | pathlib.Path | None None optional path from where to seek for parent directory
Returns Path path of the parent directory
p2dir = path_to_parent_dir('nbs')
assert 'nbs-dev' in p2dir.parts and 'nbs' not in p2dir.parts
p2dir
Path('/home/vtec/projects/ec-packages/eccore/nbs-dev')
# p2dir = path_to_parent_dir('nbs', Path('../nbs/sandbox.ipynb').resolve())
# assert 'nbs' in p2dir.parts and 'nbs-dev' not in p2dir.parts
# p2dir
p2dir = path_to_parent_dir('not-in-tree').resolve()
assert p2dir == Path().absolute()
p2project_root = path_to_parent_dir('eccore')
assert 'eccore' in p2project_root.parts and 'nbs' not in p2project_root.parts
p2project_root
Path('/home/vtec/projects/ec-packages/eccore')