Lately, I have been experimenting with Create ML, Vision and AVFoundation in Xcode. I'm trying to make an app that recognizes animals from live capture.
The app works fine if the animal undergoing analysis is nearby the camera. However, if it isn't, the app works rather messy (it gives wrong results). I think that must be caused by the dataset I used to train the machine learning model, since it only contains pictures of nearby animals.
Anyway, I want to make it possible to recognize distant animals, because sometimes it is impossible to get near the animal without scaring it. A simple way of doing this could be by just using a dataset that contains pictures of distant animals, but then I'm araid that machine learning will mainly focus on the background, instead of the animal.
What do you think would be the best approach for this? Should I somehow crop the frames, so that only the animal is captured? Or is there another, more efficient way?
Edit: I'm using an image classifier model.