The idea of organic and synthetic data came up again today. I came up with a new metaphor to explain it…
Back when I started taking digital photos back in 2002 (nearly 20 years -sheesh!), the cameras that I used would record a fair bit of metadata about the image when it was taken. The date and time, the camera type, the focal length of the lens, the shutter speed and so on. All of this data was generated organically.
I didn’t need to think about any of that, it just happened automatically.
Those early digital cameras though didn’t have a GPS chip in them. They had no idea where in the world they were. So there was no way in which I could automatically record the location of the photos I had taken.
I could have manually edited the metadata to be able to add location information. I could have set up a GPS location recorder in time sync with my camera to then record locations and append the metadata somehow later. But to be frank, it wasn’t valuable enough to be bothered. I had no motivation to create this synthetic data, so I didn’t.
These days, all of my photos from my smartphone are automatically encoded with location data unless I choose not to. As a result, Google Photos is now able to provide me with search by map across the many thousands of photos I’ve taken in the past 19 years.
This is nice. It gives me another way to explore my photos. It’s still not enough to make me want to hand classify old photos, though. If the data is generated organically, great. But I have no motivation to go back to the old stuff.
And indeed Google Photos now has ways to attempt to derive location by using pattern recognition in the image itself. If there is a photo of a flower, it probably can’t tell me where the image is from. But if it is a picture of the Eiffel Tower, then Google Photos will be able to suggest that the photo is in Paris.
The moral of this tale? Well, if you are looking at data in an organisation, you should keep a view as to whether the data is organic (has been generated automatically as part of the process of doing things) or is synthetic (has been manually entered by someone).
If the data is synthetic, you then should ask what motivation the person generating that data has to a) input it in the first place and b) ensure the data is accurate and timely. If you draw a blank on those questions, you should question the quality of the data, and also start to think whether there are alternative ways in which you can find the same outcome by being clever with the organic data.