'01_wandb') login_nb(
Logging in from notebook: /home/vtec/projects/bio/metagentools/nbs-dev/01_wandb.ipynb
wandb: Currently logged in as: vtecftyw. Use `wandb login --relogin` to force relogin
Once setup, WandB tracks datasets, models, training runs, evaluation runs across several experiments. The original documentation is here.
Key concepts we use in this package:
(status, losses and other metadata logged during training or evaluation experiments).Artifact
(datasets, code (incl. notebooks), models, …).
with desired parameters and metadataRun
login_nb (nb_file:str|pathlib.Path)
First step to setup WandB from notebook. Logs in and logs passed notebook as source of code
Type | Details | |
nb_file | str | pathlib.Path | name of the notebook (str) or path to the notebook (Path) |
To allow WandB to store the code used for the session, the name or path of the notebook must be passed as argument nb_file
Logging in from notebook: /home/vtec/projects/bio/metagentools/nbs-dev/01_wandb.ipynb
wandb: Currently logged in as: vtecftyw. Use `wandb login --relogin` to force relogin
raises error in the following cases:
If nb_file
is not passed, the function raises a TypeError
If nb_file
is not a string or a Path, the function raises a TypeError
There must exist a file nb_file
or a ValueError
is raised
WandbRun (entity:str='', project:str='', run_name:str='', job_type:str='', notes:str='', logs_dir:str|pathlib.Path|None=None, testing:bool=False)
Manages a WandB run and all logged actions performed while run is active. Close run with .finish()
Type | Default | Details | |
entity | str | user or organization under which the run will be logged. Default: metagenomics_sh |
project | str | name of the WandB project under which the run will be logged | |
run_name | str | unique name for the run, | |
job_type | str | e.g.: load_datasets , train_exp , … |
notes | str | any text description or additional information to store with the run | |
logs_dir | str | pathlib.Path | None | None | default is project_root/wandb-logs if None, or uses the passed Path |
testing | bool | False | (optional) If True, will not create a run on WandB. Use for local testing |
WandbRun allows to define a set of metadata associated with the run, such as entity
, project
, name
, job_type
and additional notes
instance called wandb_run
wandb_run = WandbRun(
wandb: Currently logged in as: vtecftyw (metagenomics_sh). Use `wandb login --relogin` to force relogin
instantiation raises an error in the following cases:
If one of entity
, project
, run_name
or job_type
is not passed, the function raises a ValueError
If one of entity
, project
, run_name
, job_type
or notes
is not a string, the function raises a TypeError
WandbRun.upload_dataset (ds_path:str, ds_name:str, ds_type:str, ds_descr:str, ds_metadata:dict, load_type:str='file', wait_completion:bool=False)
Load a dataset from a file as WandB artifact, with associated information and metadata
Type | Default | Details | |
ds_path | str | path to the file or directory to load as dataset artifact | |
ds_name | str | name for the dataset | |
ds_type | str | type of dataset: e.g. raw_data, processed_data, … | |
ds_descr | str | short description of the dataset | |
ds_metadata | dict | keys/values for metadata on the dataset, eg. nb_samples, … | |
load_type | str | file | file to load a single file, dir to load all files in a directory |
wait_completion | bool | False | when True, wait completion of the logging before returning artifact |
p2ds_dir = Path('data_dev/ncbi/refsequences/cov/single_1seq_150bp')
assert p2ds_dir.is_dir()
ds_dirname = str(p2ds_dir.absolute())
ds_name = 'cov_reads_single_1_sequence_150bp'
ds_type = 'sim_reads'
ds_descr = 'Simulated single reads of one cov sequence fq and aln files'
ds_metadata = {
'nb_sequences': 1,
'sim_type': 'single',
'read_length': 150,
'fold': 100,
atx_multi_files = wandb_run.upload_dataset(
wandb: Adding directory to artifact (/home/vtec/projects/bio/metagentools/nbs-dev/data_dev/ncbi/refsequences/cov/single_1seq_150bp)... Done. 0.2s
Dataset cov_reads_single_1_sequence_150bp is being logged as artifact ...
raises an error in the following cases:
is a file and load_type
is dir
is a directory and load_type
is ’file`
has another value then file
or dir
entity_projects (entity:str)
Returns all projects under ‘entity’, as an iterable collection
Type | Details | |
entity | str | name of the entity from which the projects will be retrieved |
Returns | Projects | Projects iterator |
inquires WandB to retrieve all the projects, and returns them as an iterable object.
Each element in the iterator is a wandb.Project
object. Each Project
object has the following attributes:
: dict of attributes associated with the project (id
, name
, entityName
, createdAt
). These attributes can be called directly as object.id
, …entity
: project namepath
: as a list [entity, name]url
: the url to the project workspace (‘https://wandb.ai/entity/project/workspace’)projs = entity_projects(entity='vtecftyw')
for p in projs:
print(' name: ', p.name)
print(' entity ', p.entity)
print(' path: ', p.path)
print(' url: ', p.url)
print(' id: ', p.id)
print(' created:', p.createdAt)
print(' _attrs: ', p._attrs)
name: pytorch-intro
entity vtecftyw
path: ['vtecftyw', 'pytorch-intro']
url: https://wandb.ai/vtecftyw/pytorch-intro/workspace
id: UHJvamVjdDp2MTpweXRvcmNoLWludHJvOnZ0ZWNmdHl3
created: 2024-12-12T09:04:33Z
_attrs: {'id': 'UHJvamVjdDp2MTpweXRvcmNoLWludHJvOnZ0ZWNmdHl3', 'name': 'pytorch-intro', 'entityName': 'vtecftyw', 'createdAt': '2024-12-12T09:04:33Z', 'isBenchmark': False}
name: basic-intro
entity vtecftyw
path: ['vtecftyw', 'basic-intro']
url: https://wandb.ai/vtecftyw/basic-intro/workspace
id: UHJvamVjdDp2MTpiYXNpYy1pbnRybzp2dGVjZnR5dw==
created: 2024-12-12T08:54:34Z
_attrs: {'id': 'UHJvamVjdDp2MTpiYXNpYy1pbnRybzp2dGVjZnR5dw==', 'name': 'basic-intro', 'entityName': 'vtecftyw', 'createdAt': '2024-12-12T08:54:34Z', 'isBenchmark': False}
name: tut_artifacts
entity vtecftyw
path: ['vtecftyw', 'tut_artifacts']
url: https://wandb.ai/vtecftyw/tut_artifacts/workspace
id: UHJvamVjdDp2MTp0dXRfYXJ0aWZhY3RzOnZ0ZWNmdHl3
created: 2022-09-30T04:39:35Z
_attrs: {'id': 'UHJvamVjdDp2MTp0dXRfYXJ0aWZhY3RzOnZ0ZWNmdHl3', 'name': 'tut_artifacts', 'entityName': 'vtecftyw', 'createdAt': '2022-09-30T04:39:35Z', 'isBenchmark': False}
name: metagenomics
entity vtecftyw
path: ['vtecftyw', 'metagenomics']
url: https://wandb.ai/vtecftyw/metagenomics/workspace
id: UHJvamVjdDp2MTptZXRhZ2Vub21pY3M6dnRlY2Z0eXc=
created: 2022-09-09T10:39:00Z
_attrs: {'id': 'UHJvamVjdDp2MTptZXRhZ2Vub21pY3M6dnRlY2Z0eXc=', 'name': 'metagenomics', 'entityName': 'vtecftyw', 'createdAt': '2022-09-09T10:39:00Z', 'isBenchmark': False}
name: wand-hello-world-fastai
entity vtecftyw
path: ['vtecftyw', 'wand-hello-world-fastai']
url: https://wandb.ai/vtecftyw/wand-hello-world-fastai/workspace
id: UHJvamVjdDp2MTp3YW5kLWhlbGxvLXdvcmxkLWZhc3RhaTp2dGVjZnR5dw==
created: 2022-06-14T15:45:17Z
_attrs: {'id': 'UHJvamVjdDp2MTp3YW5kLWhlbGxvLXdvcmxkLWZhc3RhaTp2dGVjZnR5dw==', 'name': 'wand-hello-world-fastai', 'entityName': 'vtecftyw', 'createdAt': '2022-06-14T15:45:17Z', 'isBenchmark': False}
get_project (entity:str, project_name:str)
Returns project object defined by entity and project name
Type | Details | |
entity | str | name of the entity from which the project will be retrieved |
project_name | str | name of the project to retrieve |
Returns | Project | Project object |
p = get_project('vtecftyw', 'tut_artifacts')
print(p.entity,'\n', p.name,'\n', p.path,'\n', p.url)
<class 'wandb.apis.public.projects.Project'>
['vtecftyw', 'tut_artifacts']
print_entity_project_list (entity)
Print the name and url of all projects in entity
List of projects under entity <vtecftyw>
0. pytorch-intro (url: https://wandb.ai/vtecftyw/pytorch-intro/workspace)
1. basic-intro (url: https://wandb.ai/vtecftyw/basic-intro/workspace)
2. tut_artifacts (url: https://wandb.ai/vtecftyw/tut_artifacts/workspace)
3. metagenomics (url: https://wandb.ai/vtecftyw/metagenomics/workspace)
4. wand-hello-world-fastai (url: https://wandb.ai/vtecftyw/wand-hello-world-fastai/workspace)
project_artifacts (entity:str, project_name:str, by_alias:str='latest', by_type:str=None, by_version:str=None)
Returns all artifacts in project, w/ key info, filtered by alias, types and version + list of artifact types
Type | Default | Details | |
entity | str | name of the entity from which to retrieve the artifacts | |
project_name | str | name of the project from which to retrieve the artifacts | |
by_alias | str | latest | name of the alias to filter by |
by_type | str | None | name of the artifact type to filter by (optional) |
by_version | str | None | version to filter by (optional) |
Returns | Tuple | df w/ all artifacts and related info; list of artifact types in the project |
)atx_df, atx_type_list = project_artifacts(
['code', 'cov_sequences', 'sim_reads', 'job']
atx_name | atx_version | atx_type | atx_aliases | file_count | created | updated | atx_id | |
0 | source-coding-with-nbdev-_home_vtec_projects_b... | v0 | code | latest | 1 | 2025-02-01T09:46:40Z | 2025-02-01T09:46:43Z | QXJ0aWZhY3Q6MTQ4MjA0MzE4Ng== |
1 | cov_one_sequence:v0 | v0 | cov_sequences | latest | 1 | 2025-02-01T09:50:13Z | 2025-02-01T09:50:15Z | QXJ0aWZhY3Q6MTQ4MjA0NjkyNQ== |
2 | cov_reads_single_1_sequence_150bp:v0 | v0 | sim_reads | latest | 2 | 2025-02-01T09:52:45Z | 2025-02-01T10:02:12Z | QXJ0aWZhY3Q6MTQ4MjA0OTY3MA== |
The list of artifacts can be filtered, for instance, by artifact type
atx_df, atx_type_list = project_artifacts(
atx_name | atx_version | atx_type | atx_aliases | file_count | created | updated | atx_id | |
0 | cov_one_sequence:v0 | v0 | cov_sequences | latest | 1 | 2025-02-01T09:50:13Z | 2025-02-01T09:50:15Z | QXJ0aWZhY3Q6MTQ4MjA0NjkyNQ== |
run_name_exists (run_name:str, entity:str, project_name:str)
Check whether a run with name run_name
already exists in entity/project_name
Type | Details | |
run_name | str | name of the run to check |
entity | str | name of the entity from which to retrieve the artifacts |
project_name | str | name of the project from which to retrieve the artifacts |
Returns | bool | True if a run exists with the name run_name, False otherwise |
unique_run_name (name_seed:str)
Create a unique run name by adding a timestamp to the passed seed
Type | Details | |
name_seed | str | Run name to which a timestamp will be added |
Resolve problem with nbdev_export()
for this notebook
When using nbdev.nbdev_export()
in this notebook, the code exported seems to be old code. In particular, the dependency import section in cell is exported as:
# %% ../nbs-dev/wandb/run-20221123_121523-2z5ycjrb/tmp/code/01_wandb.ipynb 2
# Imports all dependencies
import configparser
import numpy as np
import psutil
import os
The hint is in the first line:
# %% ../nbs-dev/wandb/run-20221123_121523-2z5ycjrb/tmp/code/01_wandb.ipynb 2
It shows that the notebook used for exporting is not /nbs-dev/01_wandb.ipynb
as it should be. This is because the WandB package creates a local directory /nbs-dev/wandb/
where it keeps local logs and artifacts.
The solution is to move the directory where WandB stores local logs outside nbs-dev
, which can be done with the dir
argument in wandb.Run()
Illustrating by reproducing the functions from nbdev and a few dependencies
# from nbdev.doclinks.py
# line 105
def nbglob(path=None, skip_folder_re = '^[_.]', file_glob='*.ipynb', skip_file_re='^[_.]', key='nbs_path', as_path=False, **kwargs):
"Find all files in a directory matching an extension given a config key."
path = Path(path or get_config()[key])
res = globtastic(path, file_glob=file_glob, skip_folder_re=skip_folder_re,
skip_file_re=skip_file_re, recursive=recursive, **kwargs)
return res.map(Path) if as_path else res
# line 131 MODIFIED
def modified_nbdev_export(
path:str=None, # Path or filename
"Export notebooks in `path` to Python modules"
if os.environ.get('IN_TEST',0): return
files = nbglob(path=path, as_path=True, **kwargs).sorted('name')
# for f in files: nb_export(f)
for f in files: print(f)
# add_init(get_config().lib_path)
# _build_modidx()
Before the change:
After the change