WWDC22 SALE: Save 50% on all my Swift books and bundles! >>

SOLVED: Voice Gender Recognition

Forums > iOS

I’m working on a vocal singing app that needs some level of gender and age recognition in order to attempt to improve suggestions of songs to sing.

Does anyone know if there is an Apple SDK or 3rd party SDK (Git Repo or otherwise) that can take a voice sample and attempt to predict the gender and/or age of the speaker? Preference is to make it all on-device to allow off-line use as well as to protect the users information. We have a dedicated server that could be available if on-device is not an option.

We do already have a voice interpreter for key phrases, but the gender portion is (understandably) more of a challenge for a variety of reasons.

As always, thank you in advance to the great community of folks who help each other on this platform.


Keeping in mind that I know nothing about speech recognition...

You could take a look at Apple's Speech framework, particularly the SFSpeechRecognizerResult class. Examining its speechRecognitionMetadata property exposes some voiceAnalytics that may be of use to you:

// Voice analytics corresponding to a segment of recorded audio
@available(iOS 13, *)
open class SFVoiceAnalytics : NSObject, NSCopying, NSSecureCoding {

    // Jitter measures vocal stability and is measured as an absolute difference between consecutive periods, divided by the average period. It is expressed as a percentage
    @NSCopying open var jitter: SFAcousticFeature { get }

    // Shimmer measures vocal stability and is measured in decibels
    @NSCopying open var shimmer: SFAcousticFeature { get }

    // Pitch measures the highness and lowness of tone and is measured in logarithm of normalized pitch estimates
    @NSCopying open var pitch: SFAcousticFeature { get }

    // Voicing measures the probability of whether a frame is voiced or not and is measured as a probability
    @NSCopying open var voicing: SFAcousticFeature { get }

(that comes from the header files in Xcode)

Another option might be to use some form of ML. Maybe you could find a data set that would work to train up a model you can then use to analyze incoming speech/song and figure out what you need from that?

I would think that voice gender recognition would be fraught with edge cases and tricky bits, so good luck!


Turns out that the Apple Core ML seems to do the trick. With it, you use the sound classifier, create a collection of sample male and female recordings as well as other non-target recordings (dogs barking, crowd noise, etc...) and use Create ML to train your model.

The machine learning stuff is pretty cool.


Save 50% in my Black Friday sale.

SAVE 50% To celebrate WWDC22, all our books and bundles are half price, so you can take your Swift knowledge further without spending big! Get the Swift Power Pack to build your iOS career faster, get the Swift Platform Pack to builds apps for macOS, watchOS, and beyond, or get the Swift Plus Pack to learn advanced design patterns, testing skills, and more.

Save 50% on all our books and bundles!

Sponsor Hacking with Swift and reach the world's largest Swift community!

Reply to this topic…

You need to create an account or log in to reply.

All interactions here are governed by our code of conduct.

Unknown user

You are not logged in

Log in or create account

Link copied to your pasteboard.