wandb

Tracking projects experiments with WandB

Once setup, WandB tracks datasets, models, training runs, evaluation runs across several experiments. The original documentation is here.

Key concepts we use in this package:

Run (status, losses and other metadata logged during training or evaluation experiments).
Artifact (datasets, code (incl. notebooks), models, …).
- Artifacts can refer to a single file or to a directory with multiple files

Steps:

Login to wandb:
- May require an API key, which is available at https://wandb.ai/authorize. To access the API key, must be logged in onto WandB.
Initialize a Run with desired parameters and metadata
Perform operations to be tracked (e.g. train model, load dataset as artifact, …)
Finish the Run

source

login_nb

 login_nb (nb_file:str|pathlib.Path)

First step to setup WandB from notebook. Logs in and logs passed notebook as source of code

	Type	Details
nb_file	str \| pathlib.Path	name of the notebook (str) or path to the notebook (Path)

To allow WandB to store the code used for the session, the name or path of the notebook must be passed as argument nb_file.

Example:

login_nb('01_wandb')

Logging in from notebook: /home/vtec/projects/bio/metagentools/nbs-dev/01_wandb.ipynb

wandb: Currently logged in as: vtecftyw. Use `wandb login --relogin` to force relogin

login_nb raises error in the following cases:

If nb_file is not passed, the function raises a TypeError
If nb_file is not a string or a Path, the function raises a TypeError
There must exist a file nb_file or a ValueError is raised

source

WandbRun

 WandbRun (entity:str='', project:str='', run_name:str='',
           job_type:str='', notes:str='',
           logs_dir:str|pathlib.Path|None=None, testing:bool=False)

Manages a WandB run and all logged actions performed while run is active. Close run with .finish()

	Type	Default	Details
entity	str		user or organization under which the run will be logged. Default: `metagenomics_sh`
project	str		name of the WandB project under which the run will be logged
run_name	str		unique name for the run,
job_type	str		e.g.: `load_datasets`, `train_exp`, …
notes	str		any text description or additional information to store with the run
logs_dir	str \| pathlib.Path \| None	None	default is project_root/wandb-logs if None, or uses the passed Path
testing	bool	False	(optional) If True, will not create a run on WandB. Use for local testing

Create a Run instance

WandbRun allows to define a set of metadata associated with the run, such as entity, project, name, job_type and additional notes.

Example:

set the parameters

entity = 'metagenomics_sh'
project = 'coding-with-nbdev'
run_name = 'nbdev-test'
job_type = "code_testing"
notes = 'any other information of interest for the future'

create a WandbRun instance called wandb_run

wandb_run = WandbRun(
    entity=entity, 
    project=project, 
    run_name=run_name, 
    job_type=job_type, 
    notes=notes
    )

wandb: Currently logged in as: vtecftyw (metagenomics_sh). Use `wandb login --relogin` to force relogin

wandb version 0.19.5 is available! To upgrade, please run: $ pip install wandb --upgrade

Tracking run with wandb version 0.16.6

Run data is saved locally in /home/vtec/projects/bio/metagentools/wandb-logs/wandb/run-20250201_174624-e4wz0f2e

Syncing run nbdev-test to Weights & Biases (docs)

View project at https://wandb.ai/metagenomics_sh/coding-with-nbdev

View run at https://wandb.ai/metagenomics_sh/coding-with-nbdev/runs/e4wz0f2e

WandbRun instantiation raises an error in the following cases:

If one of entity, project, run_name or job_type is not passed, the function raises a ValueError
If one of entity, project, run_name, job_type or notes is not a string, the function raises a TypeError

source

WandbRun.upload_dataset

 WandbRun.upload_dataset (ds_path:str, ds_name:str, ds_type:str,
                          ds_descr:str, ds_metadata:dict,
                          load_type:str='file',
                          wait_completion:bool=False)

Load a dataset from a file as WandB artifact, with associated information and metadata

	Type	Default	Details
ds_path	str		path to the file or directory to load as dataset artifact
ds_name	str		name for the dataset
ds_type	str		type of dataset: e.g. raw_data, processed_data, …
ds_descr	str		short description of the dataset
ds_metadata	dict		keys/values for metadata on the dataset, eg. nb_samples, …
load_type	str	file	`file` to load a single file, `dir` to load all files in a directory
wait_completion	bool	False	when True, wait completion of the logging before returning artifact

Load a dataset from a single file

p2ds = Path('data_dev/ncbi/refsequences/cov/cov_virus_sequence_one.fa')
assert p2ds.is_file()

ds_fname = str(p2ds.absolute())
ds_name = 'cov_one_sequence'
ds_type = 'cov_sequences'
ds_descr = 'one covid sequence fasta file'

ds_metadata = {
    'nb_sequences': 1,
    'file type': 'fasta',
}

atx_one_file = wandb_run.upload_dataset(
    ds_path=ds_fname,
    ds_name=ds_name,
    ds_type=ds_type,
    ds_descr=ds_descr,
    ds_metadata=ds_metadata,
    load_type='file',
)

Dataset cov_one_sequence is being logged as artifact ...

Load a dataset with several files from a directory.

p2ds_dir = Path('data_dev/ncbi/refsequences/cov/single_1seq_150bp')
assert p2ds_dir.is_dir()

ds_dirname = str(p2ds_dir.absolute())
ds_name = 'cov_reads_single_1_sequence_150bp'
ds_type = 'sim_reads'
ds_descr = 'Simulated single reads of one cov sequence fq and aln files'

ds_metadata = {
    'nb_sequences': 1,
    'sim_type': 'single',
    'read_length': 150,
    'fold': 100,
}

atx_multi_files = wandb_run.upload_dataset(
    ds_path=ds_dirname,
    ds_name=ds_name,
    ds_type=ds_type,
    ds_descr=ds_descr,
    ds_metadata=ds_metadata,
    load_type='dir',
)

wandb: Adding directory to artifact (/home/vtec/projects/bio/metagentools/nbs-dev/data_dev/ncbi/refsequences/cov/single_1seq_150bp)... Done. 0.2s

Dataset cov_reads_single_1_sequence_150bp is being logged as artifact ...

WandbRun.upload_dataset raises an error in the following cases:

ds_path is a file and load_type is dir
ds_path is a directory and load_type is ’file`
load_type has another value then file or dir

Close a WandB run

wandb_run.finish()

View run nbdev-test at: https://wandb.ai/metagenomics_sh/coding-with-nbdev/runs/e4wz0f2e
View project at: https://wandb.ai/metagenomics_sh/coding-with-nbdev
Synced 7 W&B file(s), 0 media file(s), 6 artifact file(s) and 0 other file(s)

Find logs at: /home/vtec/projects/bio/metagentools/wandb-logs/wandb/run-20250201_174624-e4wz0f2e/logs

source

entity_projects

 entity_projects (entity:str)

Returns all projects under ‘entity’, as an iterable collection

	Type	Details
entity	str	name of the entity from which the projects will be retrieved
Returns	Projects	Projects iterator

entity_projects inquires WandB to retrieve all the projects, and returns them as an iterable object.

Each element in the iterator is a wandb.Project object. Each Project object has the following attributes:

_attrs: dict of attributes associated with the project (id, name, entityName, createdAt). These attributes can be called directly as object.id, …
entity
name: project name
path: as a list [entity, name]
url: the url to the project workspace (‘https://wandb.ai/entity/project/workspace’)

projs = entity_projects(entity='vtecftyw')

for p in projs:
    print(f"{p.name}:")
    print('  name:   ', p.name)
    print('  entity  ', p.entity)
    print('  path:   ', p.path)
    print()
    print('  url:    ', p.url)
    print('  id:     ', p.id)
    print('  created:', p.createdAt)
    print('  _attrs: ', p._attrs)
    print()

pytorch-intro:
  name:    pytorch-intro
  entity   vtecftyw
  path:    ['vtecftyw', 'pytorch-intro']

  url:     https://wandb.ai/vtecftyw/pytorch-intro/workspace
  id:      UHJvamVjdDp2MTpweXRvcmNoLWludHJvOnZ0ZWNmdHl3
  created: 2024-12-12T09:04:33Z
  _attrs:  {'id': 'UHJvamVjdDp2MTpweXRvcmNoLWludHJvOnZ0ZWNmdHl3', 'name': 'pytorch-intro', 'entityName': 'vtecftyw', 'createdAt': '2024-12-12T09:04:33Z', 'isBenchmark': False}

basic-intro:
  name:    basic-intro
  entity   vtecftyw
  path:    ['vtecftyw', 'basic-intro']

  url:     https://wandb.ai/vtecftyw/basic-intro/workspace
  id:      UHJvamVjdDp2MTpiYXNpYy1pbnRybzp2dGVjZnR5dw==
  created: 2024-12-12T08:54:34Z
  _attrs:  {'id': 'UHJvamVjdDp2MTpiYXNpYy1pbnRybzp2dGVjZnR5dw==', 'name': 'basic-intro', 'entityName': 'vtecftyw', 'createdAt': '2024-12-12T08:54:34Z', 'isBenchmark': False}

tut_artifacts:
  name:    tut_artifacts
  entity   vtecftyw
  path:    ['vtecftyw', 'tut_artifacts']

  url:     https://wandb.ai/vtecftyw/tut_artifacts/workspace
  id:      UHJvamVjdDp2MTp0dXRfYXJ0aWZhY3RzOnZ0ZWNmdHl3
  created: 2022-09-30T04:39:35Z
  _attrs:  {'id': 'UHJvamVjdDp2MTp0dXRfYXJ0aWZhY3RzOnZ0ZWNmdHl3', 'name': 'tut_artifacts', 'entityName': 'vtecftyw', 'createdAt': '2022-09-30T04:39:35Z', 'isBenchmark': False}

metagenomics:
  name:    metagenomics
  entity   vtecftyw
  path:    ['vtecftyw', 'metagenomics']

  url:     https://wandb.ai/vtecftyw/metagenomics/workspace
  id:      UHJvamVjdDp2MTptZXRhZ2Vub21pY3M6dnRlY2Z0eXc=
  created: 2022-09-09T10:39:00Z
  _attrs:  {'id': 'UHJvamVjdDp2MTptZXRhZ2Vub21pY3M6dnRlY2Z0eXc=', 'name': 'metagenomics', 'entityName': 'vtecftyw', 'createdAt': '2022-09-09T10:39:00Z', 'isBenchmark': False}

wand-hello-world-fastai:
  name:    wand-hello-world-fastai
  entity   vtecftyw
  path:    ['vtecftyw', 'wand-hello-world-fastai']

  url:     https://wandb.ai/vtecftyw/wand-hello-world-fastai/workspace
  id:      UHJvamVjdDp2MTp3YW5kLWhlbGxvLXdvcmxkLWZhc3RhaTp2dGVjZnR5dw==
  created: 2022-06-14T15:45:17Z
  _attrs:  {'id': 'UHJvamVjdDp2MTp3YW5kLWhlbGxvLXdvcmxkLWZhc3RhaTp2dGVjZnR5dw==', 'name': 'wand-hello-world-fastai', 'entityName': 'vtecftyw', 'createdAt': '2022-06-14T15:45:17Z', 'isBenchmark': False}

source

get_project

 get_project (entity:str, project_name:str)

Returns project object defined by entity and project name

	Type	Details
entity	str	name of the entity from which the project will be retrieved
project_name	str	name of the project to retrieve
Returns	Project	Project object

p = get_project('vtecftyw', 'tut_artifacts')

print(type(p))

print(p.entity,'\n', p.name,'\n', p.path,'\n', p.url)

<class 'wandb.apis.public.projects.Project'>
vtecftyw 
 tut_artifacts 
 ['vtecftyw', 'tut_artifacts'] 
 https://wandb.ai/vtecftyw/tut_artifacts/workspace

source

print_entity_project_list

 print_entity_project_list (entity)

Print the name and url of all projects in entity

print_entity_project_list('vtecftyw')

List of projects under entity <vtecftyw>
  0. pytorch-intro                  (url: https://wandb.ai/vtecftyw/pytorch-intro/workspace)
  1. basic-intro                    (url: https://wandb.ai/vtecftyw/basic-intro/workspace)
  2. tut_artifacts                  (url: https://wandb.ai/vtecftyw/tut_artifacts/workspace)
  3. metagenomics                   (url: https://wandb.ai/vtecftyw/metagenomics/workspace)
  4. wand-hello-world-fastai        (url: https://wandb.ai/vtecftyw/wand-hello-world-fastai/workspace)

source

project_artifacts

 project_artifacts (entity:str, project_name:str, by_alias:str='latest',
                    by_type:str=None, by_version:str=None)

Returns all artifacts in project, w/ key info, filtered by alias, types and version + list of artifact types

	Type	Default	Details
entity	str		name of the entity from which to retrieve the artifacts
project_name	str		name of the project from which to retrieve the artifacts
by_alias	str	latest	name of the alias to filter by
by_type	str	None	name of the artifact type to filter by (optional)
by_version	str	None	version to filter by (optional)
Returns	Tuple		df w/ all artifacts and related info; list of artifact types in the project

project_artifacts returns:

a DataFrame including all the artifacts available under the project (entity/project_name)
a list of all artifact types in the projects

atx_df, atx_type_list = project_artifacts(
    entity='metagenomics_sh', 
    project_name='coding-with-nbdev'
    )

atx_type_list

['code', 'cov_sequences', 'sim_reads', 'job']

atx_df

	atx_name	atx_version	atx_type	atx_aliases	file_count	created	updated	atx_id
0	source-coding-with-nbdev-_home_vtec_projects_b...	v0	code	latest	1	2025-02-01T09:46:40Z	2025-02-01T09:46:43Z	QXJ0aWZhY3Q6MTQ4MjA0MzE4Ng==
1	cov_one_sequence:v0	v0	cov_sequences	latest	1	2025-02-01T09:50:13Z	2025-02-01T09:50:15Z	QXJ0aWZhY3Q6MTQ4MjA0NjkyNQ==
2	cov_reads_single_1_sequence_150bp:v0	v0	sim_reads	latest	2	2025-02-01T09:52:45Z	2025-02-01T10:02:12Z	QXJ0aWZhY3Q6MTQ4MjA0OTY3MA==

The list of artifacts can be filtered, for instance, by artifact type

atx_df, atx_type_list = project_artifacts(
    entity='metagenomics_sh', 
    project_name='coding-with-nbdev',
    by_type='cov_sequences'
    )

atx_df

	atx_name	atx_version	atx_type	atx_aliases	file_count	created	updated	atx_id
0	cov_one_sequence:v0	v0	cov_sequences	latest	1	2025-02-01T09:50:13Z	2025-02-01T09:50:15Z	QXJ0aWZhY3Q6MTQ4MjA0NjkyNQ==

source

run_name_exists

 run_name_exists (run_name:str, entity:str, project_name:str)

Check whether a run with name run_name already exists in entity/project_name

	Type	Details
run_name	str	name of the run to check
entity	str	name of the entity from which to retrieve the artifacts
project_name	str	name of the project from which to retrieve the artifacts
Returns	bool	True if a run exists with the name run_name, False otherwise

run_name_exists(
    run_name='nbdev-test', 
    entity='metagenomics_sh', 
    project_name='coding-with-nbdev'
    )

True

run_name_exists(
    run_name='train_1M', 
    entity='metagenomics_sh', 
    project_name='coding-with-nbdev'
    )

False

source

unique_run_name

 unique_run_name (name_seed:str)

Create a unique run name by adding a timestamp to the passed seed

	Type	Details
name_seed	str	Run name to which a timestamp will be added

unique_run_name('this_is_a_run_name')

'this_is_a_run_name-250201-1816'

Technical Notes for development with `nbdev`

Resolve problem with nbdev_export() for this notebook

When using nbdev.nbdev_export() in this notebook, the code exported seems to be old code. In particular, the dependency import section in cell is exported as:

# %% ../nbs-dev/wandb/run-20221123_121523-2z5ycjrb/tmp/code/01_wandb.ipynb 2
# Imports all dependencies

import configparser
import numpy as np
import psutil
import os

The hint is in the first line:

# %% ../nbs-dev/wandb/run-20221123_121523-2z5ycjrb/tmp/code/01_wandb.ipynb 2

It shows that the notebook used for exporting is not /nbs-dev/01_wandb.ipynb as it should be. This is because the WandB package creates a local directory /nbs-dev/wandb/ where it keeps local logs and artifacts.

The solution is to move the directory where WandB stores local logs outside nbs-dev, which can be done with the dir argument in wandb.Run()

Illustrating by reproducing the functions from nbdev and a few dependencies

from nbdev.config import get_config
from fastcore.xtras import globtastic
from fastcore.meta import delegates

# from nbdev.doclinks.py

# line 105
@delegates(globtastic)
def nbglob(path=None, skip_folder_re = '^[_.]', file_glob='*.ipynb', skip_file_re='^[_.]', key='nbs_path', as_path=False, **kwargs):
    "Find all files in a directory matching an extension given a config key."
    path = Path(path or get_config()[key])
    recursive=get_config().recursive
    res = globtastic(path, file_glob=file_glob, skip_folder_re=skip_folder_re,
                     skip_file_re=skip_file_re, recursive=recursive, **kwargs)
    return res.map(Path) if as_path else res

# line 131 MODIFIED
def modified_nbdev_export(
    path:str=None, # Path or filename
    **kwargs):
    "Export notebooks in `path` to Python modules"
    if os.environ.get('IN_TEST',0): return
    files = nbglob(path=path, as_path=True, **kwargs).sorted('name')
#     for f in files: nb_export(f)
    for f in files: print(f)
#     add_init(get_config().lib_path)
#     _build_modidx()

Before the change:

modified_nbdev_export()

/home/vtec/projects/bio/metagentools/nbs-dev/00_core.ipynb
/home/vtec/projects/bio/metagentools/nbs-dev/01_wandb.ipynb
/home/vtec/projects/bio/metagentools/nbs-dev/wandb/run-20221122_182641-1eafsab9/tmp/code/01_wandb.ipynb
/home/vtec/projects/bio/metagentools/nbs-dev/wandb/run-20221122_180513-1vgzoryt/tmp/code/01_wandb.ipynb
/home/vtec/projects/bio/metagentools/nbs-dev/wandb/run-20221123_121523-2z5ycjrb/tmp/code/01_wandb.ipynb
/home/vtec/projects/bio/metagentools/nbs-dev/index.ipynb

After the change

modified_nbdev_export()

/home/vtec/projects/bio/metagentools/nbs-dev/00_core.ipynb
/home/vtec/projects/bio/metagentools/nbs-dev/01_wandb.ipynb
/home/vtec/projects/bio/metagentools/nbs-dev/02_art.ipynb
/home/vtec/projects/bio/metagentools/nbs-dev/03_bio.ipynb
/home/vtec/projects/bio/metagentools/nbs-dev/03_cnn_virus_architecture.ipynb
/home/vtec/projects/bio/metagentools/nbs-dev/03_cnn_virus_data.ipynb
/home/vtec/projects/bio/metagentools/nbs-dev/03_cnn_virus_utils.ipynb
/home/vtec/projects/bio/metagentools/nbs-dev/index.ipynb

Steps:

login_nb

WandbRun

Create a Run instance

WandbRun.upload_dataset

Load a dataset from a single file

Load a dataset with several files from a directory.

Close a WandB run

entity_projects

get_project

print_entity_project_list

project_artifacts

run_name_exists

unique_run_name

Technical Notes for development with nbdev

Technical Notes for development with `nbdev`