Dataset and Network Introspection ToolKit (DNIKit)


We introduce the Data and Network Introspection toolkit DNIKit, an open source Python framework for analyzing machine learning models and datasets. DNIKit contains a collection of algorithms that all operate on intermediate network responses, providing a unique understanding of how the network perceives data throughout the different stages of computation.

With DNIKit, you can:

  • create a comprehensive dataset analysis report
  • find dataset samples that are near duplicates of each other
  • discover rare data samples, annotation errors, or model biases
  • compress networks by removing highly correlated neurons
  • detect inactive units in a model

To visualize certain analyses, DNIKit also works with Symphony, a research platform for creating interactive data science components we originally published at ACM CHI 2022. Now open-sourced, Symphony components enable multiple stakeholders in cross-functional AI/ML teams to explore, visualize, and share analyses for AI/ML. Symphony supports a variety of data types and models, and can be used across platforms such as Jupyter Notebooks to standalone web-based dashboards. Symphony also has specific components to visualize the results from DNIKit analyses, such as computing dataset familiarity and duplicates.

We use Symphony together with DNIKit for interactive, visual dataset analysis – most notably, the Dataset Report.



Source link