TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial…

Follow publication

Data wrangling of Excel file produced by Wandb.ai

--

It’s best to keep acting real. — Author: Liam Neeson

Nutshell: Deleting Excel column containing a specific word in the header using VBA.

Wandb.ai provides a central dashboard to keep track of your hyperparameters, system metrics, and predictions so you can compare models live and share your findings. This allows researchers to plot charts out of the logged data; Simultaneously, it also plots the default values logged over a period of time.

Wandb Dashboard (Image courtesy: https://wandb.ai/site)

Motivation

As lots of information gets logged into the Wandb, it becomes easier to utilise all those information to log meaningful charts.

A problem arises when someone wishes to download those charts' data into an Excel file for more processing. As lovely as the data on those charts look, underneath supporting data come out to be huge in Excel.

Last night I was working on a nice nifty sci-fi chart which was great, but then I realised I needed to do more processing on that data. For that, I downloaded the chart data as Excel and found a massive number of data fields, roughly around 256 attributes; And more than half of them were not needed. So I searched for filter functionality in wandb, but it wasn't present at the time of writing: (Hopefully, it will come out sooner but after my post).

One fix one can think of is to do Ctrl +F and find your column of interest and move them to a new sheet, but I took the opportunity to learn a handy macro that is worth sharing with the DataScience community.

Resolution

Contribution: How to remove unnecessary columns in Excel if their name contains ('somethingThatIDontKnowYet')?

Steps to resolve the issue:

  1. Find the pattern in your header column in your Excel file, which is not required for further processing. For example, I wanted to remove the columns with suffixes as _steps, _MAX, and _MIN.
Exported wandb Excel file

2. Once you have identified the 'text' in your columns that you wish to delete, next is to fire some VBA commands to get it removed.

For Windows 10 users, it's Ctrl + F11. This will open a new Macro sheet for you. Copy all your data from the wandb sheet to the Macro sheet. Then right-click on any cell of the macro sheet and select the Run menu item.

Image on right is a Macro sheet that should appear once you press (Ctrl+ F11). Image on left shows the menu item that you can expect after right-clicking on any sheet cell.

3. Give a name to your macro. I called it DeleteSpecificCell (which should be a DeleteColumnsContaining; the brain got its funny ways of doing things 😐). A VBA editor will open once you click on create, as shown in the right-hand side image underneath.

In your VBA editor, you will be seeing two lines initially:

Sub DeleteSpecificCell()    <<We will be writing code in here>>End Sub

Inside the code block, write:

Set MR = Range("A1:EI1")
For Each cell In MR
If InStrB(1, cell.Value, "step", vbBinaryCompare) <> 0 Then cell.EntireColumn.Delete
Next

Code block explanation:

# Underneath line represents the selection of columns in the workheet which will be considered while running macro 
Set MR = Range("A1:EI1")
For Each cell In MR
# Underneath line tells macro to delete entire column where header contains "step" replace this with your identified text.
If InStrB(1, cell.Value, "step", vbBinaryCompare) <> 0 Then cell.EntireColumn.Delete
Next
Image on right-hand side shows the screen which comes once you click on Run in the menu while following the previous step. Image on left-hand side shows the VAB editor, which opens up once you provide a name for your macro and click Create button.

3. Voila!! The recipe is cooked. Press F5 once you fill in the data and go back to the Macro sheet to see the magic. I was able to reduce 256 columns to 65 columns for further processing.

This way, Wandb customers processed their data happily ever after.

Happy Coding!!

Reference:

https://www.extendoffice.com/documents/excel/3086-excel-delete-columns-based-on-header.html#:~:text=Right%20click%20the%20column%20header,columns%20are%20deleted%20at%20once.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Praveen Joshi
Praveen Joshi

Written by Praveen Joshi

Director of Technology @ Speire | AI and ML consultant | Casual NLP Lecturer @ Munster Technological University | ADAPT Researcher

Responses (1)

Write a response