DVC: version control your datasets and ML experiments

Seminar 2

12:0014/11/2020

Usage of version control systems (VCS) such as Git, which is an established software engineering practice, is challenging for machine learning (ML) projects. Artifacts produced by ML pipelines, such as datasets, pre-processed data, trained models, are often large in size. Once generated, they have to be stored on a disk since reproducing them over and over is expensive. Unfortunately, traditional VCSs have restrictions on handling such large artifacts. Not using version control instead makes reproducibility of results unreliable.

DVC (Data Version Control) not only version-controls large artifacts but also keeps track of the commands that are run to produce them. It detects changes made to the input data and knows which steps in the pipeline have to be rerun to keep the final result up-to-date. By adopting DVC machine learning community can make a big step towards the reproducibility of research.

DVC: version control your datasets and ML experiments

Seminar 2

Video

Presentation

Speaker

Hlib Babii

Our Supporters

Our Partners

Cookie utilizzati

Cookie tecnici necessari

Prima parte6

itsec-hb-login

wordpress_logged_in

wordpress_sec

cm_cookie_sfscon

w3tc_logged_out

__cf_bm