Theory & Practice of Data Cleaning

January 14, 2018

This is a topic that I found very interesting, however I find the industry's best practices to counterintuitive and in opposition of this very idea. In the near future I am making plans to attend multiple online classes pertaining in or around the subject, however I believe very little of it will be put into use professionally. Instead this is more for my own systems and business model. Since I'm limited by my hardware and the services I've created. Data backups need to be precise without question.

An Example of this very issue, on a project sometime ago I built an application where the clients could upload image to update their profile. Simple enough however the system had no way of knowing if it was the same image or did any kind of file system clean up for images unattached to the profile. This part was left up to another developer, and wasn't a concern for the developer in regards to the sever's hard-drive space. The previous images in the system were unnecessary information and weren't an option for historic and previous information. It was lost to the user and unchecked to the developer which images were in-use without a compassion in the database. The old images weren't important but if they were they were almost useless without some effect to find the one you were looking for.

So that being said I was annoyed to the core. This being just one of many examples, I've encountered. Unused data is useless data without some kind of processing, which means more work and I'm not about that shit. Work smarter not harder, its issues like this that bit a developer back in the ass sometime down the road. Either way I'll leave this article here to remind me to do some more research. If you have any notes feel free to comment below.

Search This Blog

Anthony's Blog

Theory & Practice of Data Cleaning

Popular posts from this blog

Aspetcore: NLog with Postgresql

Entity Framework: ToListAynsc & WhereAsync

Installing Windows on Acer chromebook 15 cb3-532