SOLVED: Voice Gender Recognition

@TheTwoNotes

Nov '21

I’m working on a vocal singing app that needs some level of gender and age recognition in order to attempt to improve suggestions of songs to sing.

Does anyone know if there is an Apple SDK or 3rd party SDK (Git Repo or otherwise) that can take a voice sample and attempt to predict the gender and/or age of the speaker? Preference is to make it all on-device to allow off-line use as well as to protect the users information. We have a dedicated server that could be available if on-device is not an option.

We do already have a voice interpreter for key phrases, but the gender portion is (understandably) more of a challenge for a variety of reasons.

As always, thank you in advance to the great community of folks who help each other on this platform.

@roosterboy HWS+

Nov '21

Keeping in mind that I know nothing about speech recognition...

You could take a look at Apple's Speech framework, particularly the SFSpeechRecognizerResult class. Examining its speechRecognitionMetadata property exposes some voiceAnalytics that may be of use to you:

// Voice analytics corresponding to a segment of recorded audio
@available(iOS 13, *)
open class SFVoiceAnalytics : NSObject, NSCopying, NSSecureCoding {

    // Jitter measures vocal stability and is measured as an absolute difference between consecutive periods, divided by the average period. It is expressed as a percentage
    @NSCopying open var jitter: SFAcousticFeature { get }

    // Shimmer measures vocal stability and is measured in decibels
    @NSCopying open var shimmer: SFAcousticFeature { get }

    // Pitch measures the highness and lowness of tone and is measured in logarithm of normalized pitch estimates
    @NSCopying open var pitch: SFAcousticFeature { get }

    // Voicing measures the probability of whether a frame is voiced or not and is measured as a probability
    @NSCopying open var voicing: SFAcousticFeature { get }
}

(that comes from the header files in Xcode)

Another option might be to use some form of ML. Maybe you could find a data set that would work to train up a model you can then use to analyze incoming speech/song and figure out what you need from that?

I would think that voice gender recognition would be fraught with edge cases and tricky bits, so good luck!

@TheTwoNotes

Dec '21

Turns out that the Apple Core ML seems to do the trick. With it, you use the sound classifier, create a collection of sample male and female recordings as well as other non-target recordings (dogs barking, crowd noise, etc...) and use Create ML to train your model.

The machine learning stuff is pretty cool.

Sponsor Hacking with Swift and reach the world's largest Swift community!

Archived topic

This topic has been closed due to inactivity, so you can't reply. Please create a new topic if you need to.

All interactions here are governed by our code of conduct.