wandb

Tracking projects experiments with WandB

Once setup, WandB tracks datasets, models, training runs, evaluation runs across several experiments. The original documentation is here.

Key concepts we use in this package:

Steps:

  • Login to wandb:
  • Initialize a Run with desired parameters and metadata
  • Perform operations to be tracked (e.g. train model, load dataset as artifact, …)
  • Finish the Run

source

login_nb

 login_nb (nb_file:str|pathlib.Path)

First step to setup WandB from notebook. Logs in and logs passed notebook as source of code

Type Details
nb_file str | pathlib.Path name of the notebook (str) or path to the notebook (Path)

To allow WandB to store the code used for the session, the name or path of the notebook must be passed as argument nb_file.

Example:

login_nb('01_wandb')
Logging in from notebook: /home/vtec/projects/bio/metagentools/nbs-dev/01_wandb.ipynb
wandb: Currently logged in as: vtecftyw. Use `wandb login --relogin` to force relogin

login_nb raises error in the following cases:

  • If nb_file is not passed, the function raises a TypeError

  • If nb_file is not a string or a Path, the function raises a TypeError

  • There must exist a file nb_file or a ValueError is raised


source

WandbRun

 WandbRun (entity:str='', project:str='', run_name:str='',
           job_type:str='', notes:str='',
           logs_dir:str|pathlib.Path|None=None, testing:bool=False)

Manages a WandB run and all logged actions performed while run is active. Close run with .finish()

Type Default Details
entity str user or organization under which the run will be logged. Default: metagenomics_sh
project str name of the WandB project under which the run will be logged
run_name str unique name for the run,
job_type str e.g.: load_datasets, train_exp, …
notes str any text description or additional information to store with the run
logs_dir str | pathlib.Path | None None default is project_root/wandb-logs if None, or uses the passed Path
testing bool False (optional) If True, will not create a run on WandB. Use for local testing

Create a Run instance

WandbRun allows to define a set of metadata associated with the run, such as entity, project, name, job_type and additional notes.

Example:

  • set the parameters
entity = 'metagenomics_sh'
project = 'coding-with-nbdev'
run_name = 'nbdev-test'
job_type = "code_testing"
notes = 'any other information of interest for the future'
  • create a WandbRun instance called wandb_run
wandb_run = WandbRun(
    entity=entity, 
    project=project, 
    run_name=run_name, 
    job_type=job_type, 
    notes=notes
    )
wandb: Currently logged in as: vtecftyw (metagenomics_sh). Use `wandb login --relogin` to force relogin
wandb version 0.19.5 is available! To upgrade, please run: $ pip install wandb --upgrade
Tracking run with wandb version 0.16.6
Run data is saved locally in /home/vtec/projects/bio/metagentools/wandb-logs/wandb/run-20250201_174624-e4wz0f2e
Syncing run nbdev-test to Weights & Biases (docs)

WandbRun instantiation raises an error in the following cases:

  • If one of entity, project, run_name or job_type is not passed, the function raises a ValueError

  • If one of entity, project, run_name, job_type or notes is not a string, the function raises a TypeError


source

WandbRun.upload_dataset

 WandbRun.upload_dataset (ds_path:str, ds_name:str, ds_type:str,
                          ds_descr:str, ds_metadata:dict,
                          load_type:str='file',
                          wait_completion:bool=False)

Load a dataset from a file as WandB artifact, with associated information and metadata

Type Default Details
ds_path str path to the file or directory to load as dataset artifact
ds_name str name for the dataset
ds_type str type of dataset: e.g. raw_data, processed_data, …
ds_descr str short description of the dataset
ds_metadata dict keys/values for metadata on the dataset, eg. nb_samples, …
load_type str file file to load a single file, dir to load all files in a directory
wait_completion bool False when True, wait completion of the logging before returning artifact

Load a dataset from a single file

p2ds = Path('data_dev/ncbi/refsequences/cov/cov_virus_sequence_one.fa')
assert p2ds.is_file()

ds_fname = str(p2ds.absolute())
ds_name = 'cov_one_sequence'
ds_type = 'cov_sequences'
ds_descr = 'one covid sequence fasta file'

ds_metadata = {
    'nb_sequences': 1,
    'file type': 'fasta',
}
atx_one_file = wandb_run.upload_dataset(
    ds_path=ds_fname,
    ds_name=ds_name,
    ds_type=ds_type,
    ds_descr=ds_descr,
    ds_metadata=ds_metadata,
    load_type='file',
)
Dataset cov_one_sequence is being logged as artifact ...

Load a dataset with several files from a directory.

p2ds_dir = Path('data_dev/ncbi/refsequences/cov/single_1seq_150bp')
assert p2ds_dir.is_dir()

ds_dirname = str(p2ds_dir.absolute())
ds_name = 'cov_reads_single_1_sequence_150bp'
ds_type = 'sim_reads'
ds_descr = 'Simulated single reads of one cov sequence fq and aln files'

ds_metadata = {
    'nb_sequences': 1,
    'sim_type': 'single',
    'read_length': 150,
    'fold': 100,
}
atx_multi_files = wandb_run.upload_dataset(
    ds_path=ds_dirname,
    ds_name=ds_name,
    ds_type=ds_type,
    ds_descr=ds_descr,
    ds_metadata=ds_metadata,
    load_type='dir',
)
wandb: Adding directory to artifact (/home/vtec/projects/bio/metagentools/nbs-dev/data_dev/ncbi/refsequences/cov/single_1seq_150bp)... Done. 0.2s
Dataset cov_reads_single_1_sequence_150bp is being logged as artifact ...

WandbRun.upload_dataset raises an error in the following cases:

  • ds_path is a file and load_type is dir

  • ds_path is a directory and load_type is ’file`

  • load_type has another value then file or dir

Close a WandB run

wandb_run.finish()
View run nbdev-test at: https://wandb.ai/metagenomics_sh/coding-with-nbdev/runs/e4wz0f2e
View project at: https://wandb.ai/metagenomics_sh/coding-with-nbdev
Synced 7 W&B file(s), 0 media file(s), 6 artifact file(s) and 0 other file(s)
Find logs at: /home/vtec/projects/bio/metagentools/wandb-logs/wandb/run-20250201_174624-e4wz0f2e/logs

source

entity_projects

 entity_projects (entity:str)

Returns all projects under ‘entity’, as an iterable collection

Type Details
entity str name of the entity from which the projects will be retrieved
Returns Projects Projects iterator

entity_projects inquires WandB to retrieve all the projects, and returns them as an iterable object.

Each element in the iterator is a wandb.Project object. Each Project object has the following attributes:

  • _attrs: dict of attributes associated with the project (id, name, entityName, createdAt). These attributes can be called directly as object.id, …
  • entity
  • name: project name
  • path: as a list [entity, name]
  • url: the url to the project workspace (‘https://wandb.ai/entity/project/workspace’)
projs = entity_projects(entity='vtecftyw')

for p in projs:
    print(f"{p.name}:")
    print('  name:   ', p.name)
    print('  entity  ', p.entity)
    print('  path:   ', p.path)
    print()
    print('  url:    ', p.url)
    print('  id:     ', p.id)
    print('  created:', p.createdAt)
    print('  _attrs: ', p._attrs)
    print()
pytorch-intro:
  name:    pytorch-intro
  entity   vtecftyw
  path:    ['vtecftyw', 'pytorch-intro']

  url:     https://wandb.ai/vtecftyw/pytorch-intro/workspace
  id:      UHJvamVjdDp2MTpweXRvcmNoLWludHJvOnZ0ZWNmdHl3
  created: 2024-12-12T09:04:33Z
  _attrs:  {'id': 'UHJvamVjdDp2MTpweXRvcmNoLWludHJvOnZ0ZWNmdHl3', 'name': 'pytorch-intro', 'entityName': 'vtecftyw', 'createdAt': '2024-12-12T09:04:33Z', 'isBenchmark': False}

basic-intro:
  name:    basic-intro
  entity   vtecftyw
  path:    ['vtecftyw', 'basic-intro']

  url:     https://wandb.ai/vtecftyw/basic-intro/workspace
  id:      UHJvamVjdDp2MTpiYXNpYy1pbnRybzp2dGVjZnR5dw==
  created: 2024-12-12T08:54:34Z
  _attrs:  {'id': 'UHJvamVjdDp2MTpiYXNpYy1pbnRybzp2dGVjZnR5dw==', 'name': 'basic-intro', 'entityName': 'vtecftyw', 'createdAt': '2024-12-12T08:54:34Z', 'isBenchmark': False}

tut_artifacts:
  name:    tut_artifacts
  entity   vtecftyw
  path:    ['vtecftyw', 'tut_artifacts']

  url:     https://wandb.ai/vtecftyw/tut_artifacts/workspace
  id:      UHJvamVjdDp2MTp0dXRfYXJ0aWZhY3RzOnZ0ZWNmdHl3
  created: 2022-09-30T04:39:35Z
  _attrs:  {'id': 'UHJvamVjdDp2MTp0dXRfYXJ0aWZhY3RzOnZ0ZWNmdHl3', 'name': 'tut_artifacts', 'entityName': 'vtecftyw', 'createdAt': '2022-09-30T04:39:35Z', 'isBenchmark': False}

metagenomics:
  name:    metagenomics
  entity   vtecftyw
  path:    ['vtecftyw', 'metagenomics']

  url:     https://wandb.ai/vtecftyw/metagenomics/workspace
  id:      UHJvamVjdDp2MTptZXRhZ2Vub21pY3M6dnRlY2Z0eXc=
  created: 2022-09-09T10:39:00Z
  _attrs:  {'id': 'UHJvamVjdDp2MTptZXRhZ2Vub21pY3M6dnRlY2Z0eXc=', 'name': 'metagenomics', 'entityName': 'vtecftyw', 'createdAt': '2022-09-09T10:39:00Z', 'isBenchmark': False}

wand-hello-world-fastai:
  name:    wand-hello-world-fastai
  entity   vtecftyw
  path:    ['vtecftyw', 'wand-hello-world-fastai']

  url:     https://wandb.ai/vtecftyw/wand-hello-world-fastai/workspace
  id:      UHJvamVjdDp2MTp3YW5kLWhlbGxvLXdvcmxkLWZhc3RhaTp2dGVjZnR5dw==
  created: 2022-06-14T15:45:17Z
  _attrs:  {'id': 'UHJvamVjdDp2MTp3YW5kLWhlbGxvLXdvcmxkLWZhc3RhaTp2dGVjZnR5dw==', 'name': 'wand-hello-world-fastai', 'entityName': 'vtecftyw', 'createdAt': '2022-06-14T15:45:17Z', 'isBenchmark': False}

source

get_project

 get_project (entity:str, project_name:str)

Returns project object defined by entity and project name

Type Details
entity str name of the entity from which the project will be retrieved
project_name str name of the project to retrieve
Returns Project Project object
p = get_project('vtecftyw', 'tut_artifacts')

print(type(p))

print(p.entity,'\n', p.name,'\n', p.path,'\n', p.url)
<class 'wandb.apis.public.projects.Project'>
vtecftyw 
 tut_artifacts 
 ['vtecftyw', 'tut_artifacts'] 
 https://wandb.ai/vtecftyw/tut_artifacts/workspace

source

project_artifacts

 project_artifacts (entity:str, project_name:str, by_alias:str='latest',
                    by_type:str=None, by_version:str=None)

Returns all artifacts in project, w/ key info, filtered by alias, types and version + list of artifact types

Type Default Details
entity str name of the entity from which to retrieve the artifacts
project_name str name of the project from which to retrieve the artifacts
by_alias str latest name of the alias to filter by
by_type str None name of the artifact type to filter by (optional)
by_version str None version to filter by (optional)
Returns Tuple df w/ all artifacts and related info; list of artifact types in the project

project_artifacts returns:

  • a DataFrame including all the artifacts available under the project (entity/project_name)
  • a list of all artifact types in the projects
atx_df, atx_type_list = project_artifacts(
    entity='metagenomics_sh', 
    project_name='coding-with-nbdev'
    )

atx_type_list
['code', 'cov_sequences', 'sim_reads', 'job']
atx_df
atx_name atx_version atx_type atx_aliases file_count created updated atx_id
0 source-coding-with-nbdev-_home_vtec_projects_b... v0 code latest 1 2025-02-01T09:46:40Z 2025-02-01T09:46:43Z QXJ0aWZhY3Q6MTQ4MjA0MzE4Ng==
1 cov_one_sequence:v0 v0 cov_sequences latest 1 2025-02-01T09:50:13Z 2025-02-01T09:50:15Z QXJ0aWZhY3Q6MTQ4MjA0NjkyNQ==
2 cov_reads_single_1_sequence_150bp:v0 v0 sim_reads latest 2 2025-02-01T09:52:45Z 2025-02-01T10:02:12Z QXJ0aWZhY3Q6MTQ4MjA0OTY3MA==

The list of artifacts can be filtered, for instance, by artifact type

atx_df, atx_type_list = project_artifacts(
    entity='metagenomics_sh', 
    project_name='coding-with-nbdev',
    by_type='cov_sequences'
    )

atx_df
atx_name atx_version atx_type atx_aliases file_count created updated atx_id
0 cov_one_sequence:v0 v0 cov_sequences latest 1 2025-02-01T09:50:13Z 2025-02-01T09:50:15Z QXJ0aWZhY3Q6MTQ4MjA0NjkyNQ==

source

run_name_exists

 run_name_exists (run_name:str, entity:str, project_name:str)

Check whether a run with name run_name already exists in entity/project_name

Type Details
run_name str name of the run to check
entity str name of the entity from which to retrieve the artifacts
project_name str name of the project from which to retrieve the artifacts
Returns bool True if a run exists with the name run_name, False otherwise
run_name_exists(
    run_name='nbdev-test', 
    entity='metagenomics_sh', 
    project_name='coding-with-nbdev'
    )
True
run_name_exists(
    run_name='train_1M', 
    entity='metagenomics_sh', 
    project_name='coding-with-nbdev'
    )
False

source

unique_run_name

 unique_run_name (name_seed:str)

Create a unique run name by adding a timestamp to the passed seed

Type Details
name_seed str Run name to which a timestamp will be added
unique_run_name('this_is_a_run_name')
'this_is_a_run_name-250201-1816'

Technical Notes for development with nbdev

Resolve problem with nbdev_export() for this notebook

When using nbdev.nbdev_export() in this notebook, the code exported seems to be old code. In particular, the dependency import section in cell is exported as:

# %% ../nbs-dev/wandb/run-20221123_121523-2z5ycjrb/tmp/code/01_wandb.ipynb 2
# Imports all dependencies

import configparser
import numpy as np
import psutil
import os

The hint is in the first line:

# %% ../nbs-dev/wandb/run-20221123_121523-2z5ycjrb/tmp/code/01_wandb.ipynb 2

It shows that the notebook used for exporting is not /nbs-dev/01_wandb.ipynb as it should be. This is because the WandB package creates a local directory /nbs-dev/wandb/ where it keeps local logs and artifacts.

The solution is to move the directory where WandB stores local logs outside nbs-dev, which can be done with the dir argument in wandb.Run()

Illustrating by reproducing the functions from nbdev and a few dependencies

from nbdev.config import get_config
from fastcore.xtras import globtastic
from fastcore.meta import delegates
# from nbdev.doclinks.py

# line 105
@delegates(globtastic)
def nbglob(path=None, skip_folder_re = '^[_.]', file_glob='*.ipynb', skip_file_re='^[_.]', key='nbs_path', as_path=False, **kwargs):
    "Find all files in a directory matching an extension given a config key."
    path = Path(path or get_config()[key])
    recursive=get_config().recursive
    res = globtastic(path, file_glob=file_glob, skip_folder_re=skip_folder_re,
                     skip_file_re=skip_file_re, recursive=recursive, **kwargs)
    return res.map(Path) if as_path else res
# line 131 MODIFIED
def modified_nbdev_export(
    path:str=None, # Path or filename
    **kwargs):
    "Export notebooks in `path` to Python modules"
    if os.environ.get('IN_TEST',0): return
    files = nbglob(path=path, as_path=True, **kwargs).sorted('name')
#     for f in files: nb_export(f)
    for f in files: print(f)
#     add_init(get_config().lib_path)
#     _build_modidx()

Before the change:

modified_nbdev_export()
/home/vtec/projects/bio/metagentools/nbs-dev/00_core.ipynb
/home/vtec/projects/bio/metagentools/nbs-dev/01_wandb.ipynb
/home/vtec/projects/bio/metagentools/nbs-dev/wandb/run-20221122_182641-1eafsab9/tmp/code/01_wandb.ipynb
/home/vtec/projects/bio/metagentools/nbs-dev/wandb/run-20221122_180513-1vgzoryt/tmp/code/01_wandb.ipynb
/home/vtec/projects/bio/metagentools/nbs-dev/wandb/run-20221123_121523-2z5ycjrb/tmp/code/01_wandb.ipynb
/home/vtec/projects/bio/metagentools/nbs-dev/index.ipynb

After the change

modified_nbdev_export()
/home/vtec/projects/bio/metagentools/nbs-dev/00_core.ipynb
/home/vtec/projects/bio/metagentools/nbs-dev/01_wandb.ipynb
/home/vtec/projects/bio/metagentools/nbs-dev/02_art.ipynb
/home/vtec/projects/bio/metagentools/nbs-dev/03_bio.ipynb
/home/vtec/projects/bio/metagentools/nbs-dev/03_cnn_virus_architecture.ipynb
/home/vtec/projects/bio/metagentools/nbs-dev/03_cnn_virus_data.ipynb
/home/vtec/projects/bio/metagentools/nbs-dev/03_cnn_virus_utils.ipynb
/home/vtec/projects/bio/metagentools/nbs-dev/index.ipynb